Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H2O-GPT on AMD GPUs (ROCm) #1812

Open
rohitnanda1443 opened this issue Aug 24, 2024 · 4 comments
Open

H2O-GPT on AMD GPUs (ROCm) #1812

rohitnanda1443 opened this issue Aug 24, 2024 · 4 comments

Comments

@rohitnanda1443
Copy link

rohitnanda1443 commented Aug 24, 2024

Hi, How can we run H20-GPT on AMD-GPUs using the AMD ROCm libraries.

One can easily run an inference server on Ollama using ROCm thereby H2O-GPT needs to use this Ollama server for inferencing.

Problem: H2o-GPT install fails as it keeps finding CUDA during install. Some guidance here on editing the install script for ROCm would be helpful,

Method:

  1. LLM runs on an inference server using ROCm
  2. H2o-GPT sends LLM requests to the inference server
pseudotensor added a commit that referenced this issue Aug 25, 2024
@pseudotensor
Copy link
Collaborator

Can you share what you mean by it finds CUDA during install and fails? Maybe logs etc.?

I adjusted one block In docs/linux_install.sh CUDA is mentioned.

@rohitnanda1443
Copy link
Author

rohitnanda1443 commented Aug 31, 2024

It should not be uninstalling ROCm-Torch

`` Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
/tmp/pip-install-jav98t1i/flash-attn_c0c8ed92b3c147bfa04d7e6ab7c98f49/setup.py:95: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-jav98t1i/flash-attn_c0c8ed92b3c147bfa04d7e6ab7c98f49/setup.py", line 179, in
CUDAExtension(
File "/home/rohit/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1074, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/home/rohit/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1201, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/home/rohit/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2407, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

  torch.__version__  = 2.2.1+cu121
  
  
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Attempting uninstall: torch
Found existing installation: torch 2.5.0.dev20240822+rocm6.1
Uninstalling torch-2.5.0.dev20240822+rocm6.1:
Successfully uninstalled torch-2.5.0.dev20240822+rocm6.1
Attempting uninstall: sse_starlette
Found existing installation: sse-starlette 0.10.3
Uninstalling sse-starlette-0.10.3:
Successfully uninstalled sse-starlette-0.10.3
Attempting uninstall: torchvision
Found existing installation: torchvision 0.20.0.dev20240823+rocm6.1
Uninstalling torchvision-0.20.0.dev20240823+rocm6.1:
Successfully uninstalled torchvision-0.20.0.dev20240823+rocm6.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tts 0.22.0 requires numpy==1.22.0; python_version <= "3.10", but you have numpy 1.26.4 which is incompatible.
tts 0.22.0 requires pandas<2.0,>=1.4, but you have pandas 2.2.2 which is incompatible.
awscli 1.34.5 requires docutils<0.17,>=0.10, but you have docutils 0.21.2 which is incompatible.
fiftyone 0.25.0 requires sse-starlette<1,>=0.10.3, but you have sse-starlette 2.1.3 which is incompatible.
torchaudio 2.4.0.dev20240823+rocm6.1 requires torch==2.5.0.dev20240822, but you have torch 2.2.1 which is incompatible.
vllm 0.5.5+rocm614 requires pydantic>=2.8, but you have pydantic 2.7.0 which is incompatible.
Successfully installed docutils-0.21.2 pandas-2.2.2 pydantic-2.7.0 pydantic-core-2.18.1 pypandoc_binary-1.13 sse_starlette-2.1.3 torch-2.2.1 torchvision-0.17.1 ``

@rohitnanda1443
Copy link
Author

Do we have an ROCm Docker image?

@pseudotensor
Copy link
Collaborator

We don't build one, but you can build one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants