Question about recomputation #1740
-
I have a question about why recomputation can be implemented by converting the prompt tokens and the generated tokens into a new prompt and running one prefill stage computation. I am really confused here and hope someone can help with this. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
There is BlockDiagonalCausalMask.from_seqlens() as the attention bias in the prompt stage. |
Beta Was this translation helpful? Give feedback.
There is BlockDiagonalCausalMask.from_seqlens() as the attention bias in the prompt stage.