You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear author, thank you very much for your excellent work. I have a question that I would like to ask you. Is the classifier designed to calculate the cosine similarity between images and text in the same way as CLIP, or is it designed differently? I don't seem to have found detailed information on this part.
The text was updated successfully, but these errors were encountered:
Hi there, thank you for your interest in our work. Yes, the classifier works in the same way as CLIP, i.e, the classifier weights are essentially composed of text embeddings.
Dear author, thank you very much for your excellent work. I have a question that I would like to ask you. Is the classifier designed to calculate the cosine similarity between images and text in the same way as CLIP, or is it designed differently? I don't seem to have found detailed information on this part.
The text was updated successfully, but these errors were encountered: