Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does ring-attn not support dropout? #36

Open
chinapanda opened this issue May 20, 2024 · 3 comments
Open

Does ring-attn not support dropout? #36

chinapanda opened this issue May 20, 2024 · 3 comments

Comments

@chinapanda
Copy link

In the backward function of ring-attn, rng_state does not use the value from forward function, but directly passes in None.
Does this indicate that ring-attn does not support dropout?

@zhuzilin
Copy link
Owner

Good catch! yeah, I think it's hard to support dropout with the current implementation...

@chinapanda
Copy link
Author

chinapanda commented May 21, 2024

咱们有没有研究过te中的cp并行呢?这块是支持dropout的。这块逻辑是否可借鉴呢?
https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/attention.py#L1018-L1025

@zhuzilin
Copy link
Owner

嗯... 可以确实是可以... 就是要单独存 ring_size 份 rng_state,我有点不太确定 rng_state 的显存占用多不多...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants