Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why expect Z in Adapter? #8

Open
niedakh opened this issue Jun 25, 2024 · 1 comment
Open

Why expect Z in Adapter? #8

niedakh opened this issue Jun 25, 2024 · 1 comment

Comments

@niedakh
Copy link

niedakh commented Jun 25, 2024

The class Adapter expects Z in constructor:

class Adapter(transformers.PreTrainedModel):
    config_class = transformers.PretrainedConfig
    def __init__(self, config, classifiers=None, Z=None, labels_list=[]):
        super().__init__(config)    
        self.Z= torch.nn.Embedding(len(config.classifiers_size),config.hidden_size, max_norm=1.0).weight if Z==None else Z
        self.classifiers=torch.nn.ModuleList(
            [torch.nn.Linear(config.hidden_size,size) for size in config.classifiers_size]
        ) if classifiers==None else classifiers
        self.config=self.config.from_dict(
            {**self.config.to_dict(),
            'labels_list':labels_list}
        )
    def adapt_model_to_task(self, model, task_name):
        task_index=self.config.tasks.index(task_name)
        #setattr(model,search_module(model,'linear',mode='class')[-1], self.classifiers[task_index])
        model.classifier=self.classifiers[task_index]
        return model
    def _init_weights(*args):
        pass 

but doesn't use it at all when adapting model to task?

@sileod
Copy link
Owner

sileod commented Jun 25, 2024

Hi, great question

It is used here:

if adapt_task_embedding:

But actually, it would be cleaner to have it in adapt_model_to_task
I'll try to do it for the next release

The general idea is to have a shared encoder, one classifier per task (unless some task share all their labels), and task embedding per task
The task embedding is randomly dropped at 10% rate to work without using it, but it allows the model to "see" the task it should do and it improves results, so it is best to add it alongsides the classifier. It's actually the core of the Adapter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants