-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge main into protected branch 24.10-devel
#1062
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. `xla_gpu_enable_triton_gemm` is still needed. 2. Removed some other deprecated XLA flags: `xla_gpu_enable_triton_softmax_fusion` 3. Also removed some XLA flags that are now turned on by default. `xla_enable_async_all_gather` etc.
Fixed the tensorboard dir path after a recent change in MaxText software: AI-Hypercomputer/maxtext#863
Example as of 8-28-2024 ``` $ docker run --entrypoint='' --rm -it ghcr.io/nvidia/jax:pax-2024-08-28 ls -lah /opt/jaxlibs total 20K drwxr-xr-x 1 root root 4.0K Aug 28 09:43 . drwxr-xr-x 1 root root 4.0K Aug 28 10:04 .. drwx------ 1 root root 4.0K Aug 28 09:43 jax_gpu_pjrt drwx------ 1 root root 4.0K Aug 28 09:43 jax_gpu_plugin drwx------ 1 root root 4.0K Aug 28 09:43 jaxlib ``` Signed-off-by: Terry Kong <terryk@nvidia.com>
Provide an option to run XLA cuDNN flash attention as an alternative to TE cuDNN flash attention.
Forced by this change in JAX build system: jax-ml/jax#23787
Co-authored-by: Olli Lupton <olupton@nvidia.com>
olupton
previously approved these changes
Sep 26, 2024
Moves XLA flags from model CI into their own files that can be sourced. Each file can be sourced and will print what it sets. Some files source other files, which was intentional to avoid introducing sim-links into the repo, which can sometimes have platform issues (like on windows). --------- Signed-off-by: Terry Kong <terryk@nvidia.com>
The latest MaxText uses `pathwayutils`, which is added as a dependency. Need to add it to our manifest.yaml file to resolve reference issue during final installation.
…tives (#1073) This adds logic to treat `dynamic[-update]-slice` operations that have a source/destination operand in the host memory space as being communication operations, labelling them as single-device "collectives". The goal is to improve support for analysing profiles of execution including offloading to host memory. Also fix using nsys 2024.6 by applying the same patch as 2024.5 that adds the thread ID.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To stay up to date before code freeze for NGC 24.10 release