-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in version 0.3.28 on Graviton4 #4939
Comments
Hi @dnoan, That is unfortunate, what's interesting is that many of these are Lines 501 to 545 in 8483a71
Whilst the patch you're indicating is triggered later, after checking the small GEMM permit: Lines 551 to 571 in 8483a71
The This would indicate the last line ( That leaves the Could you also let me know what compiler you're using? |
This could indeed be an unintentional downgrade caused by switching from SVE GEMM to NEON GEMV (the SVE implementation of the latter - from #4803 - only being available on A64FX right now) |
might be worthwile to copy kernel/arm64/KERNEL.A64FX to kernel/arm64/KERNEL.NEOVERSEV2 and rebuild/retest |
Copying KERNEL.A64FX to KERNEL.NEOVERSEV2 didn't improve performance. Actually the workload ran marginally slower. @Mousius, can provide a diff so that I don't mess things up? |
guess that would be something like
|
Recently I reported a performance regression at a4e56e0 There is some glitch in GitHub which prevents me from posting a follow up so I am opening this ticket.
In my case the app makes calls to DGEMM with 86100 different inputs. I picked some of what appears to be the most common calls:
The text was updated successfully, but these errors were encountered: