Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton: use CUDA 12.3 tools from the base image #656

Merged
merged 4 commits into from
Mar 25, 2024

Conversation

olupton
Copy link
Collaborator

@olupton olupton commented Mar 22, 2024

Previously, Triton would download its own copies of ptxas, cuobjdump and nvdisasm:
https://github.com/openxla/triton/blob/cl617459344/python/setup.py#L373-L393

This began to cause problems when those versions were bumped to CUDA 12.4, meaning that Triton started to generate PTX with version number 8.3. When this was compiled, using the ptxas from the base container, inside XLA, then there were errors:

CustomCall failed: ptxas exited with non-zero error code 65280, output: ptxas /tmp/tempfile-aac66f5d464c-1e8add55-32-61414d9c202e5, line 5;
fatal   : Unsupported .version 8.4; current version is '8.3'
ptxas fatal   : Ptx assembly aborted due to errors

in the nightly tests, which are taken from JAX-Triton.

Setting environment variables like TRITON_PTXAS_PATH has two effects:

  • it blocks downloading other versions during setup.py
  • at runtime, it is the highest precedence search location

If Triton starts depending on new features before the base container is updated to CUDA 12.4, problems may resurface.

Thanks to @andportnoy for help debugging.

Copy link
Collaborator

@yhtang yhtang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the paths are hard-coded, can we add some test so that we get notified if the binaries change locations? e.g.

RUN if [[ ! -x ${TRITON_PTXAS_PATH} ]]; then <THROW-ERROR>; fi

@olupton olupton requested a review from yhtang March 22, 2024 13:25
andportnoy
andportnoy previously approved these changes Mar 22, 2024
Copy link
Contributor

@andportnoy andportnoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix! Left a minor suggestion adding some context.

.github/container/Dockerfile.triton Outdated Show resolved Hide resolved
@nouiz
Copy link
Collaborator

nouiz commented Mar 22, 2024 via email

andportnoy
andportnoy previously approved these changes Mar 22, 2024
Copy link
Contributor

@andportnoy andportnoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion, Andrey. Good job.

.github/container/Dockerfile.triton Outdated Show resolved Hide resolved
@olupton olupton force-pushed the olupton/triton-should-not-use-12.4-yet branch from 4289497 to fe2f64f Compare March 25, 2024 08:13
@olupton
Copy link
Collaborator Author

olupton commented Mar 25, 2024

@nouiz

Should we fix this upstream?

I'm not sure what that would look like? What did you have in mind?

@olupton olupton merged commit 52b2c10 into main Mar 25, 2024
167 of 171 checks passed
@olupton olupton deleted the olupton/triton-should-not-use-12.4-yet branch March 25, 2024 17:21
@nouiz
Copy link
Collaborator

nouiz commented Mar 25, 2024

@nouiz

Should we fix this upstream?

I'm not sure what that would look like? What did you have in mind?

I taught it was for Triton via Pallas. All is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants