Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU fails test and crashes when initialized #570

Closed
jw2249a opened this issue Dec 17, 2023 · 20 comments
Closed

AMDGPU fails test and crashes when initialized #570

jw2249a opened this issue Dec 17, 2023 · 20 comments

Comments

@jw2249a
Copy link

jw2249a commented Dec 17, 2023

OS: Ubuntu 22.04.3
GPU: 7900 XTX
ROCM Version: 5.7.1 (installed with amdgpu-installer).
Julia Version: Julia v1.9.4

Both the test and the import fail with
julia: /workspace/srcdir/ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char *, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion `code_buf != __null && "Code buffer allocation failed"' failed.

clinfo shows

  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3590.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 Radeon RX 7900 XTX
  Device Topology:				 PCI[ B#12, D#0, F#0 ]
  Max compute units:				 48
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 2371Mhz
  Address bits:					 64
  Max memory allocation:			 21890072576
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 16384
  Max image 3D height:				 16384
  Max image 3D depth:				 8192
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 32768
  Global memory size:				 25753026560
  Constant buffer size:				 21890072576
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 65536
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 415236096
  Max global variable size:			 21890072576
  Max global variable preferred total size:	 25753026560
  Max read/write image args:			 64
  Max on device events:				 1024
  Queue on device max size:			 8388608
  Max on device queues:				 1
  Queue on device preferred size:		 262144
  SVM capabilities:				 
    Coarse grain buffer:			 Yes
    Fine grain buffer:				 Yes
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 32
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 0x7f510bbf0f90
  Name:						 gfx1100
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0 
  Driver version:				 3590.0 (HSA1.1,LC)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 2.0 
  Extensions:					 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 

This is what the crash looks like

julia: /workspace/srcdir/ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char *, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion code_buf != __null && "Code buffer allocation failed failed.

[30800] signal (6.-6): Aborted
in expression starting at REPL[6]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f396222871a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZNK4rocr3AMD8GpuAgent14AssembleShaderEPKcNS1_14AssembleTargetERPvRm at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent15BindTrapHandlerEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent13PostToolsInitEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime4LoadEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime7AcquireEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3HSA8hsa_initEv at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
hsa_init at /home/jrw/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
hsa_init at /home/jrw/.julia/packages/AMDGPU/bzHD4/src/hsa/LibHSARuntime.jl:71 [inlined]
__init__ at /home/jrw/.julia/packages/AMDGPU/bzHD4/src/AMDGPU.jl:245
jl_sysimg_fvars_base at /home/jrw/.julia/compiled/v1.9/AMDGPU/arpZD_ObjvJ.so (unknown line)
jl_apply at /home/jrw/julia/src/julia.h:1880 [inlined]
jl_module_run_initializer at /home/jrw/julia/src/toplevel.c:75
ijl_init_restored_modules at /home/jrw/julia/src/module.c:982
register_restored_modules at ./loading.jl:1115
_include_from_serialized at ./loading.jl:1061
_require_search_from_serialized at ./loading.jl:1506
_require at ./loading.jl:1783
_require_prelocked at ./loading.jl:1660
macro expansion at ./loading.jl:1648 [inlined]
macro expansion at ./lock.jl:267 [inlined]
require at ./loading.jl:1611
jfptr_require_48600 at /home/jrw/julia/julia-1.9.4/lib/julia/sys.so (unknown line)
jl_apply at /home/jrw/julia/src/julia.h:1880 [inlined]
call_require at /home/jrw/julia/src/toplevel.c:466 [inlined]
eval_import_path at /home/jrw/julia/src/toplevel.c:503
jl_toplevel_eval_flex at /home/jrw/julia/src/toplevel.c:731
jl_toplevel_eval_flex at /home/jrw/julia/src/toplevel.c:856
ijl_toplevel_eval_in at /home/jrw/julia/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
eval_user_input at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:153
repl_backend_loop at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:249
#start_repl_backend#46 at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:234
start_repl_backend at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:231
#run_repl#59 at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:379
run_repl at /home/jrw/julia/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:365
jfptr_run_repl_61323 at /home/jrw/julia/julia-1.9.4/lib/julia/sys.so (unknown line)
#1018 at ./client.jl:421
jfptr_YY.1018_45017 at /home/jrw/julia/julia-1.9.4/lib/julia/sys.so (unknown line)
jl_apply at /home/jrw/julia/src/julia.h:1880 [inlined]
jl_f__call_latest at /home/jrw/julia/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:819 [inlined]
invokelatest at ./essentials.jl:816 [inlined]
run_main_repl at ./client.jl:405
exec_options at ./client.jl:322
_start at ./client.jl:522
jfptr__start_52365 at /home/jrw/julia/julia-1.9.4/lib/julia/sys.so (unknown line)
jl_apply at /home/jrw/julia/src/julia.h:1880 [inlined]
true_main at /home/jrw/julia/src/jlapi.c:573
jl_repl_entrypoint at /home/jrw/julia/src/jlapi.c:717
main at julia (unknown line)
unknown function (ip: 0x7f3962229d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at julia (unknown line)
Allocations: 14667144 (Pool: 14651509; Big: 15635); GC: 25
Aborted (core dumped)
@pxl-th
Copy link
Collaborator

pxl-th commented Dec 17, 2023

Navi 3 is supported only on Julia 1.10+, but I'm not sure that will fix your error...

@jw2249a
Copy link
Author

jw2249a commented Dec 17, 2023

Navi 3 is supported only on Julia 1.10+, but I'm not sure that will fix your error...

@pxl-th I'm recompiling it now. I think it may be a permissions error or directory searching issue because when I ran julia as a superuser with sudo I get the error in the test that says Navi 3 is supported by Julia 1.10 and it doesn't immediately crash.

@pxl-th
Copy link
Collaborator

pxl-th commented Dec 17, 2023

Make sure your user is in the same group as /dev/kfd: docs

@pxl-th
Copy link
Collaborator

pxl-th commented Dec 17, 2023

Also, Navi 3 may hang during tests, I'm not sure why. That only happens on Linux and may be a Linux kernel issue

@pxl-th
Copy link
Collaborator

pxl-th commented Dec 17, 2023

However, outside of AMDGPU.jl tests it works fine

@jw2249a
Copy link
Author

jw2249a commented Dec 17, 2023

@pxl-th upgrading to 1.10.0-rc2 got it working but failed tests. will close this because it works now.

@jw2249a jw2249a closed this as completed Dec 17, 2023
@kalmarek
Copy link

I have the same problem with

julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 8 default, 0 interactive, 4 GC (on 16 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8

julia> ENV["HSA_OVERRIDE_GFX_VERSION"]
"11.0.0"

Full error:

julia> using AMDGPU
julia: /workspace/srcdir/ROCR-Runtime/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char *, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion `code_buf != __null && "Code buffer allocation failed"' failed.

[171545] signal (6.-6): Aborted
in expression starting at REPL[3]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x78a94d62871a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZNK4rocr3AMD8GpuAgent14AssembleShaderEPKcNS1_14AssembleTargetERPvRm at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent15BindTrapHandlerEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3AMD8GpuAgent13PostToolsInitEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime4LoadEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr4core7Runtime7AcquireEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
_ZN4rocr3HSA8hsa_initEv at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
hsa_init at /home/kalmar/.julia/artifacts/4df816456579cea2c03cac08a6b82fa87abe2b38/lib/libhsa-runtime64.so (unknown line)
unknown function (ip: 0x78a8bc86b4de)
unknown function (ip: 0x78a8bc7caebb)
unknown function (ip: 0x78a8bc862c95)
unknown function (ip: 0x78a8bc52fca9)
hipRuntimeGetVersion at /home/kalmar/.julia/artifacts/3e4a5c18581a48180ab1525d3d490a2e2552616f/hip/lib/libamdhip64.so (unknown line)
_hip_runtime_version at /home/kalmar/.julia/packages/AMDGPU/a1v0k/src/discovery/discovery.jl:87
__init__ at /home/kalmar/.julia/packages/AMDGPU/a1v0k/src/discovery/discovery.jl:144
jfptr___init___5505 at /home/kalmar/.julia/compiled/v1.10/AMDGPU/arpZD_llESY.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_module_run_initializer at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:76
run_module_init at ./loading.jl:1134
register_restored_modules at ./loading.jl:1122
_include_from_serialized at ./loading.jl:1067
_require_search_from_serialized at ./loading.jl:1581
_require at ./loading.jl:1938
__require_prelocked at ./loading.jl:1812
jfptr___require_prelocked_80833.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
_require_prelocked at ./loading.jl:1803
macro expansion at ./loading.jl:1790 [inlined]
macro expansion at ./lock.jl:267 [inlined]
__require at ./loading.jl:1753
jfptr___require_80798.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
require at ./loading.jl:1746
jfptr_require_80795.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
call_require at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:481 [inlined]
eval_import_path at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:518
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:752
jl_toplevel_eval_flex at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#run_repl#59 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91805.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#1013 at ./client.jl:432
jfptr_YY.1013_82772.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82798.1 at /home/kalmar/.julia/juliaup/julia-1.10.5+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x78a94d629d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 26510462 (Pool: 26475799; Big: 34663); GC: 38
[1]    171545 IOT instruction (core dumped)  julia

rocminfo:

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5289                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    28505104(0x1b2f410) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    28505104(0x1b2f410) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    28505104(0x1b2f410) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      2048(0x800) KB                     
  Chip ID:                 5567(0x15bf)                       
  ASIC Revision:           9(0x9)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2700                               
  BDFID:                   49920                              
  Internal Node ID:        1                                  
  Compute Unit:            12                                 
  SIMDs per CU:            2                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       APU
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 39                                 
  SDMA engine uCode::      18                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

@pxl-th
Copy link
Collaborator

pxl-th commented Sep 11, 2024

What OS are you on?
Also you don't need to specify ENV["HSA_OVERRIDE_GFX_VERSION"]

@kalmarek
Copy link

it's ubuntu 22.04 LTS with HWE

$ uname -a
Linux hp-845-g10 6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

official rocm-6.2 installation following the instructions from amd docs.

@pxl-th
Copy link
Collaborator

pxl-th commented Sep 13, 2024

Hm... I'm on ROCm 6.1.2 (as well as our CI machines), let me try 6.2, maybe something has changed.

@pxl-th
Copy link
Collaborator

pxl-th commented Sep 14, 2024

I just installed ROCm 6.2 on Ubuntu 22.04 and it works without issues.
I used AMDGPU install script:
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/amdgpu-install.html

@pxl-th
Copy link
Collaborator

pxl-th commented Sep 14, 2024

Do you have AMDGPU artifacts enabled? I see references to them in your stacktrace. If so, disable them (with code below or removing LocalPreferences.toml file) and try using your system-wide ROCm installation:

julia> AMDGPU.ROCmDiscovery.use_artifacts!(false)

We probably should remove them for now, to not confuse users, since they are quite old.

@kalmarek
Copy link

@pxl-th Which options did you use for amdgpu-install?
I used this:

$ amdgpu-install --usecase=graphics,opencl,hip,rocm --opencl=rocr --no-32 

I can't even disable the artifacts, as simple using AMDGPU segfaults the whole julia session.

@luraess
Copy link
Collaborator

luraess commented Sep 17, 2024

I can't even disable the artifacts, as simple using AMDGPU segfaults the whole julia session.

You could add a LocalPreferences.toml file in your working dir or project that includes the artifact info:

$ cat LocalPreferences.toml
[AMDGPU]
use_artifacts = false

and then try using AMDGPU with this.

@kalmarek
Copy link

that worked! thanks @luraess

@pxl-th
Copy link
Collaborator

pxl-th commented Sep 17, 2024

@pxl-th Which options did you use for amdgpu-install? I used this:

$ amdgpu-install --usecase=graphics,opencl,hip,rocm --opencl=rocr --no-32 

I can't even disable the artifacts, as simple using AMDGPU segfaults the whole julia session.

Same, except for these flags: --opencl=rocr --no-32

@pxl-th
Copy link
Collaborator

pxl-th commented Sep 17, 2024

Wonder why it did default to artifacts, since by default they are disabled.

@pxl-th
Copy link
Collaborator

pxl-th commented Sep 17, 2024

But we should just remove them for now.

@luraess
Copy link
Collaborator

luraess commented Sep 17, 2024

Is #673 helping here @pxl-th ?

@kalmarek
Copy link

Wonder why it did default to artifacts, since by default they are disabled.

It is possible that I had placed the file there as I experimented with AMDGPU on my old laptop. However this seems rather unlikely as that was at least 2 years ago (and julia-1.10 was not there yet?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants