You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Failures when using ROCM builds that have particular type of debug info in them (both in JLL-mixed-mode and in system-ROCM mode), e.g. on Arch Linux
#620
Closed
Krastanov opened this issue
Apr 13, 2024
· 3 comments
does not work in Arch Linux with the arch-provided ROCM
but it works fine in the official Ubuntu docker container provided by AMD running on an the same Arch host.
This is not a problem of driver installation, user permissions, etc, rather a problem with the particular stringent standards Arch follows for builds and the fact that ROCM has some potentially broken asserts. Thus I am creating this issue to track this specific question. Please excuse me if this is not considered appropriate for this issue tracker and please close this issue in such a circumstance.
many other ROCM-using packages work fine with Arch's build of ROCM
There are three responsible parties here:
arch for not wanting more lenient builds (but all other ROCM-using tools work fine with their build, so they probably will not be making exceptions)
AMD for having broken debug statements (but we can not expect that to be resolved soon)
julia AMDGPU.jl for being more picky than other tools (which is probably good engineering, but it would be nice if there was an arch+julia+amdgpu hacker that has the time to fix this and contribute a fix here -- regrettably I do not have these skills yet, but I am happy to debug work through it if there is someone to hold my hand)
It is probably reasonable to close this issue as "will not fix" if the JLL ROCM artifacts become the established way to use AMDGPU.jl (in a non-mixed pure-jll mode).
All of this, tested on my end, with ROCM 6, 7900 XTX, julia 1.11
To run the official AMD Ubuntu ROCM container under Arch Linux so that you can use AMDGPU.jl (in the container) you can do:
sudo pacman -S hsa-rocr rocm-hip-runtime rocm-device-libs rocm-llvm rocminfo # usually not needed because the docker image will have its own, but useful if you do testing on the host
sudo usermod -a -G render YOUR_USERNAME # maybe not needed
sudo usermod -a -G video YOUR_USERNAME # maybe not needed
docker run -it --rm --device=/dev/kfd --device=/dev/dri --ipc=host --group-add=video --shm-size=16G --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/rocm-terminal /bin/bash
The text was updated successfully, but these errors were encountered:
I confirm that I do not have this issue anymore either. While this contains a useful reference on how to use Docker / Podman containers with AMDGPU.jl, it does not seem to be necessary anymore.
I am filing this issue because this library:
This is not a problem of driver installation, user permissions, etc, rather a problem with the particular stringent standards Arch follows for builds and the fact that ROCM has some potentially broken asserts. Thus I am creating this issue to track this specific question. Please excuse me if this is not considered appropriate for this issue tracker and please close this issue in such a circumstance.
This issue has overlap with:
In particular, in #371 it is already stated that:
There are three responsible parties here:
It is probably reasonable to close this issue as "will not fix" if the JLL ROCM artifacts become the established way to use AMDGPU.jl (in a non-mixed pure-jll mode).
All of this, tested on my end, with ROCM 6, 7900 XTX, julia 1.11
To run the official AMD Ubuntu ROCM container under Arch Linux so that you can use AMDGPU.jl (in the container) you can do:
The text was updated successfully, but these errors were encountered: