-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the K tile size in L1 for matmul ops #846
base: main
Are you sure you want to change the base?
Conversation
Currently 5 bf16 ci tests failed because of @Abhishek-Varma Did you see the similar error? How do you solve it? |
Yes, I'm seeing this issue for larger shape Matmul + Truncf bf16. From what I could speculate yesterday, in Matmul + Truncf's case, is that it seemed to be due to vector unrolling of |
Since both of us are facing the same issue - I tried looking into the CI failure (as the current speculation was that I was trying to compare the e2e IR for the case of PM issue (Program Memory issue) with the one that doesn't use this patch - to see what the delta is. The only difference (besides the new tile size) that I could see was in the
To
But I don't think this is the cause of PM exhaustion - when trying to understand the same issue for Truncf as well. I tried changing the Peano flags too :-
but none seem to work for both this as well as the Truncf PM issue. :/ I even tried re-looking at the changes done in #822 (in case any of that was a culprit common to us) - but those changes seem to not touch at least this PR's delta. |
@Abhishek-Varma Thanks for looking into the issue! So what's the way to move forward? We disable that flag in Peano or add some optimizations? |
Maybe we can check whether it has any performance impact? At some point, this might matter a lot, but not sure at this point. We could make it a an optional flag for now? And potentially, we can unroll/partially unroll on our side? |
Unfortunately, it does have performance impact. For large size such as 4096x4096x2048, the total execution time doubles while disabling the loop unrolling in peano :( (Comparison is without any change of tile sizes in this PR.) |
-- This commit makes the following updates to insert-loops-for-vectorization pass. -- It makes it to work on bufferized inputs. -- It also involves update pertaining to collapsing unit dimensions of a candidate generic op. -- Also involves coalescing of the loops generated for tiles. -- This is the first logically grouped PR needed to make Matmul + Truncf work for larger shape, and also unblock other outstanding/dependent PR like #846 Signed-off-by: Abhishek Varma <abhvarma@amd.com>
-- This commit makes the following updates to insert-loops-for-vectorization pass. -- It makes it to work on bufferized inputs. -- It also involves update pertaining to collapsing unit dimensions of a candidate generic op. -- Also involves coalescing of the loops generated for tiles. -- This is the first logically grouped PR needed to make Matmul + Truncf work for larger shape, and also unblock other outstanding/dependent PR like #846 Signed-off-by: Abhishek Varma <abhvarma@amd.com>
No description provided.