-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flash attention版本 #21
Comments
这个我没有仔细测过,应该 2.4.x 以上肯定是能跑的 |
如果要使用llama3的方案,至少需要 2.6.0 (支持unpadded lse),参考这个commit:Dao-AILab/flash-attention@f816dee |
前两天更新过了,现在支持 unpadded lse~ |
@zhuzilin 支持完unpadded lse后,老版本不兼容了,用不支持unpadded lse的FA版本会错误的slice softmax_lse,导致illegal memory access |
@void-main 我刚刚测试了一下,flash_attn 2.5.9 也就是老版本的 lse,和最新的 flash_attn,都是可以正确运行 |
@zhuzilin 啊,可能没说清楚,我这里跑的是llama3那个 |
而且是GQA场景下,例如num_heads=64,num_kv_heads=8,这时候如果head_k_stride==1,就会挂 |
这个是测试代码:
|
奥奥,好的~ llama3 那个因为是新写的,所以确实没写老版本 flash attn 的兼容.... 我去加一下~ |
@void-main 已经修了,可以 pull 一下最新的代码试试~ |
@zhuzilin 基于FA v2.4.2测试能过,谢谢大佬 👍 |
请问最低的flash-attention版本是?
The text was updated successfully, but these errors were encountered: