-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling Grid for AMD GPUS #343
Comments
yes - hip is believed working but not efficient for AMD GPUs |
Status of multi-GPU and "nvlink" equivalent is untested. --enable-shm=none and MPI between GPU's is probably safer. |
BTW, I have benchmarked AMD MI50 and MI100, but want to revisit with the new explicit Nc=3 kernel. I have also compiled under HIP on Summit for Nvidia, and got the same performance as Cuda compile. |
I was able to compile grid, and also to run the the benchmark you suggested. However some of the test are failing, e.g.
Or Test_nersc_io fails because the plaquette is not correctly reproduced:
Other test like, e.g. Test_wilson_even_odd seem to work fine. The configure command I used is:
|
Thanks - haven't tried WilsonClover on GPU to be honest, so not absolutely sure if tit works on Nvidia either. Re. the plaquette - this does work on CUDA, so something interesting to look at on HIP..... |
HIP is definitely in the "experimental" category for now, but getting everything to work would be good. |
I am running on a machine at JLab |
I should have asked what specifically is the hardware you are running on, rather than physically where is it is located. |
It's a machine equipped with 4 Vega 20 cards and an AMD Epyc CPU |
can you tell me the performance you get with benchmarks/Benchmark_dwf_fp32 --grid 16.16.16.16 and benchmarks/Benchmark_dwf_fp32 --grid 16.16.16.16 --dslash-unroll Thanks |
Here are the results for Benchmark_dwf_fp32 --grid 16.16.16.16:
and here for Benchmark_dwf_fp32 --grid 16.16.16.16 --dslash-unroll:
|
I just ran the Test_wilson_clover on summit and the test ran without any errors |
Thanks. My hypothesis that the --dslash-unroll might fix the performance issues is not correct then. Glad to hear it re. Clover - it's a HIP / CUDA difference, and not general breakage of Clover. More joy to look forward to.... if you were able to track down which accelerator_for/line of code fails with Clover, that would help. |
The error occurs in the constructor of the WilsonCloverFermion. To be more precise it is happening in the ImportGauge in WilsonCloverFermion.h on line 109 |
Could you either A) run it under a debugger (gdb) and trap the fault and ask it for a back trace with "bt". OR B) go to: Grid/qcd/action/fermion/implementation/WilsonCloverFermionImplementation.h
A) is not guaranteed to work because I don't know how the GPU runtime is operating, but significantly less effort if |
though the AMD node I had access to, the rocm debugger didn't work for me. |
I tried both option A and B.
The output of bt was not very enlightening.
Approach B) tells me the error is between line 81 and 91 in Grid/qcd/action/fermion/implementation/WilsonCloverFermionImplementation.h.
… Am 20.03.2021 um 01:25 schrieb Peter Boyle ***@***.***>:
though the AMD node I had access to, the rocm debugger didn't work for me.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
that was enough to go on for me to eyeball at least one error. |
More later - I'll try and patch develop. |
Sorry - reviewed again and the code looks right. Darn it... |
Hi guys, I just saw this. I have been working on grid some some weeks now and it seems like the Wilson clover implementation exceeds the maximum limit of local memory per thread (128k for now). That could explain the runtime error (More recent ROCm releases have an assertion against that, which makes the code fail to compile). |
Hi, I just tried to compile Grid on a new AMD GPU (MI100) machine at JLab. Unfortunately, I get errors during compilation:
My configure command: Any ideas how to solve this? |
For whatever it's worth, I'm seeing the same error on OLCF spock, with
rocm 4.2.0.
Chulwoo
…On 2021-08-18 11:12, philomat wrote:
Hi,
I just tried to compile Grid on a new AMD GPU (MI100) machine at JLab.
Unfortunately, I get errors during compilation:
error: stack size limit exceeded (131088) in
_ZN4Grid11LambdaApplyIZNS_3adjINS_7iScalarINS_7iMatrixINS3_INS_9Grid_simdIN6thrust7complexIdEENS_9GpuVectorILi4ENS_10GpuComplexI15HIP_vector_typeIdLj2EEEEEEEELi8EEELi4EEEEEEENS_7LatticeIT_EERKSK_EUlmmmE_EEvmmmSJ_
error: stack size limit exceeded (131552) in
_ZN4Grid11LambdaApplyIZNS_12outerProductINS_7iScalarINS_7iVectorINS3_INS_9Grid_simdIN6thrust7complexIdEENS_9GpuVectorILi4ENS_10GpuComplexI15HIP_vector_typeIdLj2EEEEEEEELi8EEELi4EEEEESH_EENS_7LatticeIDTcl12outerProductcvT__EcvT0__EEEEERKNSI_ISJ_EERKNSI_ISK_EEEUlmmmE_EEvmmmSJ_
2 errors generated when compiling for gfx906.
My configure command:
../configure --enable-unified=no --enable-shm=no
--enable-accelerator=hip --enable-comms=mpi3-auto --enable-simd=GPU
--enable-gen-simd-width=64 CXX=/opt/rocm-4.3.0/bin/hipcc MPICXX=mpicxx
CXXFLAGS="-fPIC -I/opt/rocm-4.3.0/ -std=c++14"
Any ideas how to solve this?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub [1], or unsubscribe
[2].
Triage notifications on the go with GitHub Mobile for iOS [3] or
Android [4].
Links:
------
[1] #343 (comment)
[2]
https://github.com/notifications/unsubscribe-auth/ABFOT73Q3VPIOPKSRHSKELTT5PEXJANCNFSM4YPC44IQ
[3]
https://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!P4SdNyxKAPE!UulLPgaa-YT0rQSGErIN-mCfLWKRbpv8iJ3prjvwxw10K-66idjNqWTCGV2pA8Vo$
[4]
https://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email__;!!P4SdNyxKAPE!UulLPgaa-YT0rQSGErIN-mCfLWKRbpv8iJ3prjvwxw10K-66idjNqWTCGa3-jFZl$
|
Can you give the complete call tree that is failing?
From: chulwoo1 ***@***.***>
Reply to: paboyle/Grid ***@***.***>
Date: Thursday, 19 August 2021 at 17:48
To: paboyle/Grid ***@***.***>
Cc: Peter Boyle ***@***.***>, Comment ***@***.***>
Subject: Re: [paboyle/Grid] Compiling Grid for AMD GPUS (#343)
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
For whatever it's worth, I'm seeing the same error on OLCF spock, with
rocm 4.2.0.
Chulwoo
On 2021-08-18 11:12, philomat wrote:
Hi,
I just tried to compile Grid on a new AMD GPU (MI100) machine at JLab.
Unfortunately, I get errors during compilation:
error: stack size limit exceeded (131088) in
_ZN4Grid11LambdaApplyIZNS_3adjINS_7iScalarINS_7iMatrixINS3_INS_9Grid_simdIN6thrust7complexIdEENS_9GpuVectorILi4ENS_10GpuComplexI15HIP_vector_typeIdLj2EEEEEEEELi8EEELi4EEEEEEENS_7LatticeIT_EERKSK_EUlmmmE_EEvmmmSJ_
error: stack size limit exceeded (131552) in
_ZN4Grid11LambdaApplyIZNS_12outerProductINS_7iScalarINS_7iVectorINS3_INS_9Grid_simdIN6thrust7complexIdEENS_9GpuVectorILi4ENS_10GpuComplexI15HIP_vector_typeIdLj2EEEEEEEELi8EEELi4EEEEESH_EENS_7LatticeIDTcl12outerProductcvT__EcvT0__EEEEERKNSI_ISJ_EERKNSI_ISK_EEEUlmmmE_EEvmmmSJ_
2 errors generated when compiling for gfx906.
My configure command:
../configure --enable-unified=no --enable-shm=no
--enable-accelerator=hip --enable-comms=mpi3-auto --enable-simd=GPU
--enable-gen-simd-width=64 CXX=/opt/rocm-4.3.0/bin/hipcc MPICXX=mpicxx
CXXFLAGS="-fPIC -I/opt/rocm-4.3.0/ -std=c++14"
Any ideas how to solve this?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub [1], or unsubscribe
[2].
Triage notifications on the go with GitHub Mobile for iOS [3] or
Android [4].
Links:
------
[1] #343 (comment)
[2]
https://github.com/notifications/unsubscribe-auth/ABFOT73Q3VPIOPKSRHSKELTT5PEXJANCNFSM4YPC44IQ
[3]
https://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!P4SdNyxKAPE!UulLPgaa-YT0rQSGErIN-mCfLWKRbpv8iJ3prjvwxw10K-66idjNqWTCGV2pA8Vo$
[4]
https://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email__;!!P4SdNyxKAPE!UulLPgaa-YT0rQSGErIN-mCfLWKRbpv8iJ3prjvwxw10K-66idjNqWTCGa3-jFZl$
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#343 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZRZTOJOVTNQ3KVMR4C2TDT5UYVPANCNFSM4YPC44IQ>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
|
Hi Peter, this is the complete output of make:
|
Any progress on this issue? I pulled Grid a couple of days ago and still get the same error. |
@philomat For now I'm avoiding hitting this problem by conditionally compiling the problematic operators, which seems to be fine to build the main benchmark binary (Benchmark_ITT), but I still need to take a look at the code and see if we can reduce the amount of local data allocated per thread and place it somewhere else to avoid hitting this issue. |
I've run on Spock and doing well on Benchmark_ITT and Benchmark_dwf_fp32. Added the systems/Spock directory with compile and run scripts. |
Also get 4TF/s on a whole Spock node, 4x MI-100. |
I'm also hitting the "stack frame size exceeds limit" error. |
There are some hardware-related limitations of the stack frame on AMGPUs. You need to reduce usage of private memory in the kernels. Note that gfx10 GPUs can use twice more private memory than gfx9 because of narrower wavesize (32 vs 64). Details can be found here: llvm/llvm-project@1ed4caf |
I know the wiki says there is currently no support for AMD GPUs. But I saw commits concerning HIP. Is there a way one could try experimenting with Grid on AMD GPUs?
The text was updated successfully, but these errors were encountered: