Merge main into protected branch `24.10-devel` #1062

DwarKapex · 2024-09-25T20:12:15Z

To stay up to date before code freeze for NGC 24.10 release

1. `xla_gpu_enable_triton_gemm` is still needed. 2. Removed some other deprecated XLA flags: `xla_gpu_enable_triton_softmax_fusion` 3. Also removed some XLA flags that are now turned on by default. `xla_enable_async_all_gather` etc.

Fixed the tensorboard dir path after a recent change in MaxText software: AI-Hypercomputer/maxtext#863

Example as of 8-28-2024 ``` $ docker run --entrypoint='' --rm -it ghcr.io/nvidia/jax:pax-2024-08-28 ls -lah /opt/jaxlibs total 20K drwxr-xr-x 1 root root 4.0K Aug 28 09:43 . drwxr-xr-x 1 root root 4.0K Aug 28 10:04 .. drwx------ 1 root root 4.0K Aug 28 09:43 jax_gpu_pjrt drwx------ 1 root root 4.0K Aug 28 09:43 jax_gpu_plugin drwx------ 1 root root 4.0K Aug 28 09:43 jaxlib ``` Signed-off-by: Terry Kong <terryk@nvidia.com>

Provide an option to run XLA cuDNN flash attention as an alternative to TE cuDNN flash attention.

Forced by this change in JAX build system: jax-ml/jax#23787

Co-authored-by: Olli Lupton <olupton@nvidia.com>

Moves XLA flags from model CI into their own files that can be sourced. Each file can be sourced and will print what it sets. Some files source other files, which was intentional to avoid introducing sim-links into the repo, which can sometimes have platform issues (like on windows). --------- Signed-off-by: Terry Kong <terryk@nvidia.com>

The latest MaxText uses `pathwayutils`, which is added as a dependency. Need to add it to our manifest.yaml file to resolve reference issue during final installation.

…tives (#1073) This adds logic to treat `dynamic[-update]-slice` operations that have a source/destination operand in the host memory space as being communication operations, labelling them as single-device "collectives". The goal is to improve support for analysing profiles of execution including offloading to host memory. Also fix using nsys 2024.6 by applying the same patch as 2024.5 that adds the thread ID.

kocchop and others added 8 commits September 5, 2024 11:57

remove deprecated XLA flag (#1010)

ecacd5b

1. `xla_gpu_enable_triton_gemm` is still needed. 2. Removed some other deprecated XLA flags: `xla_gpu_enable_triton_softmax_fusion` 3. Also removed some XLA flags that are now turned on by default. `xla_enable_async_all_gather` etc.

fix tensorboard events dir path (#1032)

44b4dfe

Fixed the tensorboard dir path after a recent change in MaxText software: AI-Hypercomputer/maxtext#863

TE multithread build (#1009)

f116054

Add an option to test-pax.sh to enable XLA cuDNN flash attention (#1045)

056a3b0

Provide an option to run XLA cuDNN flash attention as an alternative to TE cuDNN flash attention.

Bump CUDA to 12.6.1 (#1050)

57919e0

Bump clang to 18 (#1060)

3a2e8c8

Forced by this change in JAX build system: jax-ml/jax#23787

Add CI argument for user-defined CUDA base image (#1013)

ccededf

Co-authored-by: Olli Lupton <olupton@nvidia.com>

DwarKapex requested review from terrykong, olupton and yhtang September 25, 2024 20:12

olupton previously approved these changes Sep 26, 2024

View reviewed changes

terrykong dismissed olupton’s stale review via 1a3febb September 26, 2024 23:05

DwarKapex and others added 2 commits September 27, 2024 10:45

Add pathwaysutils for MaxText to manifest file (#1065)

3638a66

The latest MaxText uses `pathwayutils`, which is added as a dependency. Need to add it to our manifest.yaml file to resolve reference issue during final installation.

DwarKapex closed this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge main into protected branch `24.10-devel` #1062

Merge main into protected branch `24.10-devel` #1062

DwarKapex commented Sep 25, 2024

Merge main into protected branch 24.10-devel #1062

Merge main into protected branch 24.10-devel #1062

Conversation

DwarKapex commented Sep 25, 2024

Merge main into protected branch `24.10-devel` #1062

Merge main into protected branch `24.10-devel` #1062