Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

current workflow failed on AMD MI300X with rocm 6.2 and pytorch 2.4 #48

Closed
gfursin opened this issue Oct 16, 2024 · 2 comments
Closed

Comments

@gfursin
Copy link

gfursin commented Oct 16, 2024

Hi,
I finally found some time to try the SCC'24 tutorial to run SDXL on AMD MI300X - the workflow resolved all dependencies but failed in loadgen. I attached the CM logs and deps.
Did you see such error - any ideas what is happening?
Which PyTorch version did you try?
Thank you,
Grigori
error-rocm.txt
error-rocm-deps.txt

Extra ref: mlcommons/cm4mlops#300

@arjunsuresh
Copy link
Contributor

I'm not sure of the exact torch version used as it was done by the AMD team. But the below error looks like a HIP driver installation issue - the driver installation and detection for AMD GPUs are not done in CM like we do for Nvidia GPUs as we don't have a test system.

 File "/persistent_storage/gfursin/cm/lib/python3.10/site-packages/torch/cuda/__init__.py", line 319, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No HIP GPUs are available

@gfursin gfursin changed the title current workflow failed on AMD MI300X with rocm 6.2 and pytorch 2.6 current workflow failed on AMD MI300X with rocm 6.2 and pytorch 2.4 Oct 24, 2024
@gfursin
Copy link
Author

gfursin commented Oct 24, 2024

Sure. We will investigate it further internally. Thanks!

@gfursin gfursin closed this as completed Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants