-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge main into protected branch 24.10-devel
#1062
Commits on Sep 5, 2024
-
remove deprecated XLA flag (#1010)
1. `xla_gpu_enable_triton_gemm` is still needed. 2. Removed some other deprecated XLA flags: `xla_gpu_enable_triton_softmax_fusion` 3. Also removed some XLA flags that are now turned on by default. `xla_enable_async_all_gather` etc.
Configuration menu - View commit details
-
Copy full SHA for ecacd5b - Browse repository at this point
Copy the full SHA ecacd5bView commit details
Commits on Sep 6, 2024
-
fix tensorboard events dir path (#1032)
Fixed the tensorboard dir path after a recent change in MaxText software: AI-Hypercomputer/maxtext#863
Configuration menu - View commit details
-
Copy full SHA for 44b4dfe - Browse repository at this point
Copy the full SHA 44b4dfeView commit details -
Makes jaxlib wheel dirs readable for non-root users (#1023)
Example as of 8-28-2024 ``` $ docker run --entrypoint='' --rm -it ghcr.io/nvidia/jax:pax-2024-08-28 ls -lah /opt/jaxlibs total 20K drwxr-xr-x 1 root root 4.0K Aug 28 09:43 . drwxr-xr-x 1 root root 4.0K Aug 28 10:04 .. drwx------ 1 root root 4.0K Aug 28 09:43 jax_gpu_pjrt drwx------ 1 root root 4.0K Aug 28 09:43 jax_gpu_plugin drwx------ 1 root root 4.0K Aug 28 09:43 jaxlib ``` Signed-off-by: Terry Kong <terryk@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for f808df5 - Browse repository at this point
Copy the full SHA f808df5View commit details
Commits on Sep 9, 2024
-
Configuration menu - View commit details
-
Copy full SHA for f116054 - Browse repository at this point
Copy the full SHA f116054View commit details
Commits on Sep 18, 2024
-
Add an option to test-pax.sh to enable XLA cuDNN flash attention (#1045)
Provide an option to run XLA cuDNN flash attention as an alternative to TE cuDNN flash attention.
Configuration menu - View commit details
-
Copy full SHA for 056a3b0 - Browse repository at this point
Copy the full SHA 056a3b0View commit details
Commits on Sep 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 57919e0 - Browse repository at this point
Copy the full SHA 57919e0View commit details -
Forced by this change in JAX build system: jax-ml/jax#23787
Configuration menu - View commit details
-
Copy full SHA for 3a2e8c8 - Browse repository at this point
Copy the full SHA 3a2e8c8View commit details -
Add CI argument for user-defined CUDA base image (#1013)
Co-authored-by: Olli Lupton <olupton@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for ccededf - Browse repository at this point
Copy the full SHA ccededfView commit details
Commits on Sep 26, 2024
-
Moves XLA flags from model CI into their own files that can be sourced. Each file can be sourced and will print what it sets. Some files source other files, which was intentional to avoid introducing sim-links into the repo, which can sometimes have platform issues (like on windows). --------- Signed-off-by: Terry Kong <terryk@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 1a3febb - Browse repository at this point
Copy the full SHA 1a3febbView commit details
Commits on Sep 27, 2024
-
Add pathwaysutils for MaxText to manifest file (#1065)
The latest MaxText uses `pathwayutils`, which is added as a dependency. Need to add it to our manifest.yaml file to resolve reference issue during final installation.
Configuration menu - View commit details
-
Copy full SHA for 3638a66 - Browse repository at this point
Copy the full SHA 3638a66View commit details
Commits on Oct 1, 2024
-
nsys-jax post-processing: treat host-device copies as 1-device collec…
…tives (#1073) This adds logic to treat `dynamic[-update]-slice` operations that have a source/destination operand in the host memory space as being communication operations, labelling them as single-device "collectives". The goal is to improve support for analysing profiles of execution including offloading to host memory. Also fix using nsys 2024.6 by applying the same patch as 2024.5 that adds the thread ID.
Configuration menu - View commit details
-
Copy full SHA for ef3fd66 - Browse repository at this point
Copy the full SHA ef3fd66View commit details