Got error in ZigZagRingFlashAttnVarlenFunc #46

ThisisBillhe · 2024-09-03T03:08:02Z

It seems the batch dimension will be disappeared after _upad_input function (this function is usually copied from transformers.models.mistral.modeling_mistral.MistralFlashAttention2._upad_input). Then the block_lse obtained from L118 in zigzag_ring_flash_attn_varlen.py only has 2 dimensions (num_head and seq_len). It will cause error in the flatten_varlen_lse function (L120 in zigzag_ring_flash_attn_varlen.py), where the block_lse are required to have three dimensions.
An illegal memory access error will be reported in the 'else' branch in L135 of zigzag_ring_flash_attn_varlen.py. I can not even print the half_cu_seqlens or cu_seqlens tensor before flatten_varlen_lse function:

  File "/mnt/workspace/anaconda3/envs/longva/lib/python3.10/site-packages/ring_flash_attn/zigzag_ring_flash_attn_varlen.py", line 140, in zigzag_ring_flash_attn_varlen_forward
    print(cu_seqlens)
  File "/mnt/workspace/anaconda3/envs/longva/lib/python3.10/site-packages/torch/_tensor.py", line 431, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/mnt/workspace/anaconda3/envs/longva/lib/python3.10/site-packages/torch/_tensor_str.py", line 664, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/mnt/workspace/anaconda3/envs/longva/lib/python3.10/site-packages/torch/_tensor_str.py", line 595, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/mnt/workspace/anaconda3/envs/longva/lib/python3.10/site-packages/torch/_tensor_str.py", line 347, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/mnt/workspace/anaconda3/envs/longva/lib/python3.10/site-packages/torch/_tensor_str.py", line 133, in __init__
    value_str = f"{value}"
  File "/mnt/workspace/anaconda3/envs/longva/lib/python3.10/site-packages/torch/_tensor.py", line 933, in __format__
    return self.item().__format__(format_spec)
RuntimeError: CUDA error: an illegal memory access was encountered
'''

The text was updated successfully, but these errors were encountered:

zhuzilin · 2024-09-05T09:06:46Z

fixed in #47. the reason of the bug is: #44 (comment)

ThisisBillhe · 2024-09-10T01:46:42Z

Thanks for your reply! I have tried your latest commit and sadly it did not run well in my case. The program will get stuck. I think the reason is the attention mask for _flash_attn_varlen_forward is different across ranks. Do you possibly know how to address this?

ThisisBillhe · 2024-09-10T03:04:51Z

Perhaps we should send cu_seqlens_k and max_seqlen_in_batch_k along with k and v to other ranks.

zhuzilin · 2024-09-10T06:05:48Z

hmm... are you using the lastest main branch of the repo? I've just given it another try, it should works with:

torchrun --nproc_per_node 8 test/test_zigzag_ring_flash_attn_varlen_func.py

And as for attention mask is different across ranks, it is by design.

ThisisBillhe closed this as completed Sep 3, 2024

ThisisBillhe reopened this Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got error in ZigZagRingFlashAttnVarlenFunc #46

Got error in ZigZagRingFlashAttnVarlenFunc #46

ThisisBillhe commented Sep 3, 2024 •

edited

Loading

zhuzilin commented Sep 5, 2024

ThisisBillhe commented Sep 10, 2024 •

edited

Loading

ThisisBillhe commented Sep 10, 2024

zhuzilin commented Sep 10, 2024

Got error in ZigZagRingFlashAttnVarlenFunc #46

Got error in ZigZagRingFlashAttnVarlenFunc #46

Comments

ThisisBillhe commented Sep 3, 2024 • edited Loading

zhuzilin commented Sep 5, 2024

ThisisBillhe commented Sep 10, 2024 • edited Loading

ThisisBillhe commented Sep 10, 2024

zhuzilin commented Sep 10, 2024

ThisisBillhe commented Sep 3, 2024 •

edited

Loading

ThisisBillhe commented Sep 10, 2024 •

edited

Loading