You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That represents the token index in the output sequence. Your two examples come from different models. If I remember correctly, the first line is used when the model is supposed to output "answer: positive/negative," so that index 2 is the probability of vocab at position 2, with 1465 and 2841 representing "positive/negative" respectively.
That represents the token index in the output sequence. Your two examples come from different models. If I remember correctly, the first line is used when the model is supposed to output "answer: positive/negative," so that index 2 is the probability of vocab at position 2, with 1465 and 2841 representing "positive/negative" respectively.
Thank you. I have one more question. Why is the "answer: positive <extrat_id_2>" always the target of input_start, I mean why it cannot be "answer: positive <extrat_id_3>" or "answer: positive <extrat_id_4>"? And in the pretrain duration data, <extract_id_n> really serves as supervision?
Not sure what you mean here, where do you see the target always being <extra_id_2>? These IDs are actually used during the pre-training stage, so there is a semantic associated with different extra ids.
lm_logits_start[:, 2, 1465].view(-1, 1) - lm_logits_start[:, 2, 2841].view(-1, 1)
lm_logits_start[:, 3, self.discrete_value_ids[0]].view(-1, 1)
What do index 2 and 3 map with?
The text was updated successfully, but these errors were encountered: