Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for CUDA 12.4 and above? URGENT PERHAPS? #1292

Open
BBC-Esq opened this issue Oct 22, 2024 · 7 comments
Open

Support for CUDA 12.4 and above? URGENT PERHAPS? #1292

BBC-Esq opened this issue Oct 22, 2024 · 7 comments

Comments

@BBC-Esq
Copy link

BBC-Esq commented Oct 22, 2024

Currently, the latest prebuilt wheels for FA2 only support up to CUDA 12.3...This is problematic since torch versions 2.3.1 through 2.5.0 only support CUDA 12.1 or CUDA 12.4 - i.e. not CUDA 12.3.

Further, recent models like minicpm 2.6, phi 3.5 mini, and deepseek coder lite either prefer flash attention 2 and/or will not work without it (e.g. using SDPA).

Further, Triton wheels 3.1.0 and above require torch 2.4.0+...

What I'm saying is...why haven't there been any releases of FA2 for CUDA support above and beyond CUDA 12.3??

@tridao
Copy link
Contributor

tridao commented Oct 22, 2024

Because wheels compiled w CUDA 12.3 will work with 12.4, as long as the pytorch versions are the same.

@BBC-Esq
Copy link
Author

BBC-Esq commented Oct 22, 2024

Thanks for clarifying. With that being said, will you please clarify in the release notes? For example, when a wheel's name contains "cu118" I assume it'll only work with CUDA 11.8...not CUDA 12 and so on. And when it states "cu122" (e.g. release 2.6.0.post1) it's natural to interpret this as ONLY working with CUDA 12.2, nothing higher or lower.

Basically, can you clarify the release notes as to whether a wheel named " flash_attn-2.6.3+cu123torch2.4cxx11abiFALSE-cp311-cp311-linux_x86_64.whl ", for example, will work with:

CUDA 12.4, 12.5...and we're all the way up to 12.6 now too!

@tridao
Copy link
Contributor

tridao commented Oct 22, 2024

setup.py downloads the right wheel automatically

@BBC-Esq
Copy link
Author

BBC-Esq commented Oct 22, 2024

I understand that, but I'm using Windows and hence am using the wheels here:

https://github.com/bdashore3/flash-attention/releases/

Can you please just clarify to me what CUDA level (e.g. 12.6 even?) the release 2.6.3 supports, even if you don't want to update the release notes for me as a favor?

@tridao
Copy link
Contributor

tridao commented Oct 22, 2024

All CUDA 12 minor versions are compatible

@BBC-Esq
Copy link
Author

BBC-Esq commented Oct 22, 2024

Do you care to update the release notes by adding a sentence, which should take you less than 5 minutes, for myself and the thousands of others who use this great library? Whether you do or don't, you might leave this issue open for others with the same question...or close it, my feelings won't be hurt. Thanks for clarifying at any rate.

@tridao
Copy link
Contributor

tridao commented Oct 22, 2024

I'll probably change the wheel name to cu11 and cu12 to avoid confusion, thanks for the feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants