Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is ROCm no longer supported by 0.9.x? #374

Open
Lookforworld opened this issue Dec 7, 2023 · 16 comments
Open

Is ROCm no longer supported by 0.9.x? #374

Lookforworld opened this issue Dec 7, 2023 · 16 comments

Comments

@Lookforworld
Copy link

Lookforworld commented Dec 7, 2023

@ghostplant
I've tried to run ROCM on the WSL platform and haven't been able to find a good way, but I finally found it here and saw a silver lining. I want to try version 0.9.X but can't find a whl that supports ROCM. Install version 0.3.x and use BACKEND=c-rocm to prompt that the gpu cannot be found, use BACKEND=c-rocm_win64 to run and report "/home/root001/miniconda3/lib/python3.11/site-packages/antares_core/backends/c-rocm_win64/../../graph_evaluator/run_graph.cpp:14:29: error: ‘memalign’ was not declared in this scope
14 | void data_ptr = (void)memalign(256, length);" error, I don't know where to start to fix the error, is there an official guidance document to tell me the correct steps? 😒

@ghostplant
Copy link
Contributor

0.9.x is a new implementation that can work on Windows without even WSL, while old features are all kept. So you still can follow "Path 1" to run ROCm custom kernels. The problem you suffered seems a C++ compatibility issue, let me fix it.

@ghostplant
Copy link
Contributor

BTW, do you consider to turn to DirectX on Windows instead of ROCm on Windows?

@Lookforworld
Copy link
Author

Lookforworld commented Dec 8, 2023

BTW, do you consider to turn to DirectX on Windows instead of ROCm on Windows?

@ghostplant
Thanks for your reply.
I have successfully installed ROCM and AMD SDK on Windows, and I have successfully compiled the llama.cpp, but I have not been able to use ROCM in WSL. Because many of the libraries I want to use do not support Windows and DirectX. The Antares I installed with "Path 1" has always been 0.3.X instead of 0.9.X, I want to see if 0.9.X can succeed and what should I do?
My device is gfx1100(7900xtx).

@ghostplant
Copy link
Contributor

ghostplant commented Dec 8, 2023

BTW, do you consider to turn to DirectX on Windows instead of ROCm on Windows?

@ghostplant Thanks for your reply. I have successfully installed ROCM and AMD SDK on Windows, and I have successfully compiled the llama.cpp, but I have not been able to use ROCM in WSL. Because many of the libraries I want to use do not support Windows and DirectX. The Antares I installed with "Path 1" has always been 0.3.X instead of 0.9.X, I want to see if 0.9.X can succeed and what should I do? My device is gfx1100(7900xtx).

I just push a commit to fix the error you reported.
Do you install AMD SDK released for Win32, or install HIP released for Linux in WSL?

@Lookforworld
Copy link
Author

@ghostplant
Thanks!
My HIP in WSL:

HIP version: 5.7.31921-d1770ee1b
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.7.0 23352 d1e13c532a947d0cbfc94759c00dcf152294aa13)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.7.0/llvm/bin

And the Antares I installed with "Path 1" has always been 0.3.X instead of 0.9.X, how to install it if I want to use 0.9.X with ROCm?

@ghostplant
Copy link
Contributor

v0.9.x is a re-implementation without WSL dependency. It may take time to merge a lot of features from v0.3.x. Suggest using DirectX v0.9.x for now since they are similar in performance.

@Lookforworld
Copy link
Author

v0.9.x is a re-implementation without WSL dependency. It may take time to merge a lot of features from v0.3.x. Suggest using DirectX v0.9.x for now since they are similar in performance.

@ghostplant
Thanks very much!
How to install the fixed version now for me?

@ghostplant
Copy link
Contributor

v0.9.x is a re-implementation without WSL dependency. It may take time to merge a lot of features from v0.3.x. Suggest using DirectX v0.9.x for now since they are similar in performance.

@ghostplant Thanks very much! How to install the fixed version now for me?

The PR was just applied in PyPI. You can install antares >= 0.3.24.0 from pip to include the fixed change: https://pypi.org/project/antares/#files

@Lookforworld
Copy link
Author

Thanks for your kind reply! I'll try it!

@Lookforworld
Copy link
Author

Lookforworld commented Dec 8, 2023

@ghostplant
There's a new Erro😢:

 >> Backend = c-rocm_win64, Python PID = 1251, Task = lang.generic;

// ---------------------------------------------------------------------------
// GLOBALS: input0:float32[524288], input1:float32[524288] -> output0:float32[524288]
// BACKEND: c-rocm_win64 (default)
// CONFIG: null
// COMPUTE_V1: - einstein_v2("output0[N] = input0[N] + input1[N]", input_dict={"input0": {"dtype": "float32", "shape": [1024 * 512]}, "input1": {"dtype": "float32", "shape": [1024 * 512]}})


// ---------------------------------------------------------------------------
// LOCAL: template_op_kernel0 -- input0:float32[524288], input1:float32[524288] -> output0:float32[524288]

#include <hip/hip_runtime.h>
#include <hip/hip_fp16.h>

#ifndef __ROCM_COMMON_MACRO__
#define __ROCM_COMMON_MACRO__

#define __ITEM_0_OF__(v) (v).x
#define __ITEM_1_OF__(v) (v).y
#define __ITEM_2_OF__(v) (v).z
#define __ITEM_3_OF__(v) (v).w

#define __STORE_ITEM_0__(t, out, ido, in, idi) *(t*)(out + ido) = *(t*)(in + idi)
#define __STORE_ITEM_1__(t, out, ido, in, idi)
#define __STORE_ITEM_2__(t, out, ido, in, idi)
#define __STORE_ITEM_3__(t, out, ido, in, idi)

#define __AMDGFX__ gfx1100

__forceinline__ __device__ __half hmax(const __half &a, const __half &b) { return a > b ? a : b; }
__forceinline__ __device__ __half hmin(const __half &a, const __half &b) { return a < b ? a : b; }

#endif


extern "C" __global__ __launch_bounds__(1) void template_op_kernel0(float* __restrict__ input0, float* __restrict__ input1, float* __restrict__ output0) {
  // [thread_extent] blockIdx.x = 524288
  // [thread_extent] threadIdx.x = 1
  output0[(((int)blockIdx.x))] = (input0[(((int)blockIdx.x))] + input1[(((int)blockIdx.x))]);
}

// ---------------------------------------------------------------------------

[EvalAgent] Evaluating Modules .. (for backend = c-rocm_win64)
+ /opt/rocm/bin/hipcc /tmp/.antares-module-tempfile.cu --amdgpu-target=gfx1100 --genco -Wno-ignored-attributes -O2 -o /tmp/.antares-module-tempfile.cu.out
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
clang++: error: no such file or directory: '/tmp/.antares-module-tempfile.cu'
clang++: error: no input files
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to execute command: sh -c 'wsl.exe sh -cx "/opt/rocm/bin/hipcc /tmp/.antares-module-tempfile.cu --amdgpu-target=gfx1100 --genco -Wno-ignored-attributes -O2 -o /tmp/.antares-module-tempfile.cu.out 1>&2"'


[EvalAgent] Results = {}

[Antares] Incorrect compute kernel from evaluator.

But the file is in the right place. Is that because of permissions? Or is there something else wrong?

@Lookforworld
Copy link
Author

Lookforworld commented Dec 10, 2023

Can you try this version: https://files.pythonhosted.org/packages/cb/fe/5fef007100d8beaa64113d1da466a057db656ef5e0731140883bfc0ca05e/antares-0.3.24.1-py3-none-manylinux1_x86_64.whl

@ghostplant
The Erro didn't fix. This time there is no cu file under the path.😢

@ghostplant
Copy link
Contributor

ghostplant commented Dec 10, 2023

Can you attach the new error logs?

@Lookforworld
Copy link
Author

Can you attach the new error logs?

@ghostplant
Okey, the logs:

 >> Backend = c-rocm_win64, Python PID = 450, Task = lang.generic;

// ---------------------------------------------------------------------------
// GLOBALS: input0:float32[524288], input1:float32[524288] -> output0:float32[524288]
// BACKEND: c-rocm_win64 (default)
// CONFIG: null
// COMPUTE_V1: - einstein_v2("output0[N] = input0[N] + input1[N]", input_dict={"input0": {"dtype": "float32", "shape": [1024 * 512]}, "input1": {"dtype": "float32", "shape": [1024 * 512]}})


// ---------------------------------------------------------------------------
// LOCAL: template_op_kernel0 -- input0:float32[524288], input1:float32[524288] -> output0:float32[524288]

#include <hip/hip_runtime.h>
#include <hip/hip_fp16.h>

#ifndef __ROCM_COMMON_MACRO__
#define __ROCM_COMMON_MACRO__

#define __ITEM_0_OF__(v) (v).x
#define __ITEM_1_OF__(v) (v).y
#define __ITEM_2_OF__(v) (v).z
#define __ITEM_3_OF__(v) (v).w

#define __STORE_ITEM_0__(t, out, ido, in, idi) *(t*)(out + ido) = *(t*)(in + idi)
#define __STORE_ITEM_1__(t, out, ido, in, idi)
#define __STORE_ITEM_2__(t, out, ido, in, idi)
#define __STORE_ITEM_3__(t, out, ido, in, idi)

#define __AMDGFX__ gfx1100

__forceinline__ __device__ __half hmax(const __half &a, const __half &b) { return a > b ? a : b; }
__forceinline__ __device__ __half hmin(const __half &a, const __half &b) { return a < b ? a : b; }

#endif


extern "C" __global__ __launch_bounds__(1) void template_op_kernel0(float* __restrict__ input0, float* __restrict__ input1, float* __restrict__ output0) {
  // [thread_extent] blockIdx.x = 524288
  // [thread_extent] threadIdx.x = 1
  output0[(((int)blockIdx.x))] = (input0[(((int)blockIdx.x))] + input1[(((int)blockIdx.x))]);
}

// ---------------------------------------------------------------------------

[EvalAgent] Evaluating Modules .. (for backend = c-rocm_win64)
+ /opt/rocm/bin/hipcc /mnt/c/Users/Modys/AppData/Local/Temp/.antares-module-tempfile.cu --amdgpu-target=gfx1100 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
clang++: error: no such file or directory: '/mnt/c/Users/Modys/AppData/Local/Temp/.antares-module-tempfile.cu'
clang++: error: no input files
terminate called after throwing an instance of 'std::runtime_error'
  what():  Failed to execute command: sh -c 'wsl.exe sh -cx "/opt/rocm/bin/hipcc $TMP/.antares-module-tempfile.cu --amdgpu-target=gfx1100 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out 1>&2"'


[EvalAgent] Results = {}

[Antares] Incorrect compute kernel from evaluator.

@ghostplant
Copy link
Contributor

It is unfortunate that I cannot reproduce this. Can you help to debug yourself why this two lines failed to create the file at C:\Users\Modys\AppData\Local\Temp\.antares-module-tempfile.cu. If it does create the file, then /mnt/c/Users/Modys/AppData/Local/Temp/.antares-module-tempfile.cu should be available inside WSL instead of not foud.

You can run vi $(antares pwd)/../backends/c-rocm_win64/include/backend.hpp to edit the file inline, saving the changes and it will be automatically recompiled at the next run of any antares compilation request.

@Lookforworld
Copy link
Author

It is unfortunate that I cannot reproduce this. Can you help to debug yourself why this two lines failed to create the file at C:\Users\Modys\AppData\Local\Temp\.antares-module-tempfile.cu. If it does create the file, then /mnt/c/Users/Modys/AppData/Local/Temp/.antares-module-tempfile.cu should be available inside WSL instead of not foud.

You can run vi $(antares pwd)/../backends/c-rocm_win64/include/backend.hpp to edit the file inline, saving the changes and it will be automatically recompiled at the next run of any antares compilation request.

@ghostplant
Ok!
Thank you very much! If I have fixed it, I will tell you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants