Replies: 1 comment
-
@WoosukKwon I believe the comment was written by you. Can you give me any hints of where to look in the code to understand the multiple of 16 requirement? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Referencing this comment: https://github.com/vllm-project/vllm/blob/98cf2ed678580326ffc39c987304c61cb0ce4981/csrc/attention/attention_kernels.cu#L739C1-L741C59
I would like to use a head size that is a multiple of 8. So far I have not found issues with doing this:
So I'm wondering, am I missing something?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions