Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 1.3k
Star 13.9k

Code
Issues 556
Pull requests 47
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: Dao-AILab/flash-attention

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

556 Open 527 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Build stuck on torch2.5.0

#1295 opened Oct 23, 2024 by ycformal

any plan for varlen fwd support hopper FP8?

#1294 opened Oct 22, 2024 by pengwu22

Request for New Release with PT Compile Ops

#1293 opened Oct 22, 2024 by kostum123

Support for CUDA 12.4 and above? URGENT PERHAPS?

#1292 opened Oct 22, 2024 by BBC-Esq

Support different shape attention mask

#1291 opened Oct 22, 2024 by SunzeY

CUTLASS 3.5.1 makes Flash Attention 3 slower?

#1289 opened Oct 22, 2024 by fno2010

undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

#1287 opened Oct 20, 2024 by LanXingXuan

In unit test，how is the dropout_fraction diff tolerance selected?

#1286 opened Oct 18, 2024 by muoshuosha

How to profile standard attention written in pytorch?

#1283 opened Oct 18, 2024 by woongjoonchoi

FlashAttention installation error: "CUDA 11.6 and above" requirement issue

#1282 opened Oct 17, 2024 by 21X5122

Softcap for FlashAttention v3

#1281 opened Oct 16, 2024 by Jeff-Zilence

ImportError: cannot import name 'flash_attn_unpadded_qkvpacked_func' from 'flash_attn.flash_attn_interface, why i cannot import it?? QAQ

#1280 opened Oct 16, 2024 by YANGTUOMAO

Unable to import my new kernel function after compilation success.

#1278 opened Oct 15, 2024 by jpli02

Why does the flash_attn_varlen_func method increase GPU memory usage?

#1277 opened Oct 15, 2024 by shaonan1993

Is there a way to install flash-attention without specific cuda version ?

#1276 opened Oct 14, 2024 by HuangChiEn

Concurrent Warp Group Execution in FA3: Tensor Core Resource Limitation?

#1275 opened Oct 13, 2024 by ziyuhuang123

Does FA2 support 4D attention mask?

#1274 opened Oct 12, 2024 by XiangTodayEatsWhat

Six Flash-Attention-3 unit tests fail on H20

#1272 opened Oct 11, 2024 by cailun01

Would using both strategies simultaneously theoretically result in better overlap between TC and MUFU? How could this be explained with a diagram?

#1271 opened Oct 11, 2024 by ziyuhuang123

How to use the function of flash-attn-1 to mimic the behavior of flash_attn_func in flash-attn-2?

#1270 opened Oct 11, 2024 by jpWang

Unable to compile for MI300X (gfx942) with ROCm 6.2.2 due to getCurrentHIPStream().stream();

#1269 opened Oct 10, 2024 by lhl

Where in the code demonstrate inter-warp policy?

#1266 opened Oct 10, 2024 by ziyuhuang123

flash-attention

#1264 opened Oct 10, 2024 by 21X5122

FlashAttention3 support for forward pass with kv cache

#1263 opened Oct 8, 2024 by jorgeantonio21

Speeding up exp with lookup tables?

#1261 opened Oct 8, 2024 by ethansmith2000

Previous 1 2 3 4 5 … 22 23 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly