-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU fails test and crashes when initialized #570
Comments
Navi 3 is supported only on Julia 1.10+, but I'm not sure that will fix your error... |
@pxl-th I'm recompiling it now. I think it may be a permissions error or directory searching issue because when I ran julia as a superuser with sudo I get the error in the test that says Navi 3 is supported by Julia 1.10 and it doesn't immediately crash. |
Make sure your user is in the same group as |
Also, Navi 3 may hang during tests, I'm not sure why. That only happens on Linux and may be a Linux kernel issue |
However, outside of AMDGPU.jl tests it works fine |
@pxl-th upgrading to 1.10.0-rc2 got it working but failed tests. will close this because it works now. |
I have the same problem with julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × AMD Ryzen 7 7840U w/ Radeon 780M Graphics
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 16 virtual cores)
Environment:
JULIA_NUM_THREADS = 8
julia> ENV["HSA_OVERRIDE_GFX_VERSION"]
"11.0.0"
Full error:
rocminfo:
|
What OS are you on? |
it's ubuntu 22.04 LTS with HWE
official rocm-6.2 installation following the instructions from amd docs. |
Hm... I'm on ROCm 6.1.2 (as well as our CI machines), let me try 6.2, maybe something has changed. |
I just installed ROCm 6.2 on Ubuntu 22.04 and it works without issues. |
Do you have AMDGPU artifacts enabled? I see references to them in your stacktrace. If so, disable them (with code below or removing julia> AMDGPU.ROCmDiscovery.use_artifacts!(false) We probably should remove them for now, to not confuse users, since they are quite old. |
@pxl-th Which options did you use for amdgpu-install?
I can't even disable the artifacts, as simple |
You could add a
and then try |
that worked! thanks @luraess |
Same, except for these flags: |
Wonder why it did default to artifacts, since by default they are disabled. |
But we should just remove them for now. |
It is possible that I had placed the file there as I experimented with AMDGPU on my old laptop. However this seems rather unlikely as that was at least 2 years ago (and julia-1.10 was not there yet?). |
OS: Ubuntu 22.04.3
GPU: 7900 XTX
ROCM Version: 5.7.1 (installed with amdgpu-installer).
Julia Version: Julia v1.9.4
Both the test and the import fail with
julia: /workspace/srcdir/ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char *, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion `code_buf != __null && "Code buffer allocation failed"' failed.
clinfo shows
This is what the crash looks like
The text was updated successfully, but these errors were encountered: