Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pip-compile to help with consistent Python dependency resolution #371

Merged
merged 151 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from 143 commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
abb6f97
wip
yhtang Oct 23, 2023
068aab9
fix typo
yhtang Oct 23, 2023
fb7cf0b
wip
yhtang Oct 24, 2023
4b0c406
wip
yhtang Oct 24, 2023
4044450
wip
yhtang Oct 24, 2023
38da8a1
wip
yhtang Oct 24, 2023
9ad94c8
wip
yhtang Oct 24, 2023
1aac03a
use full clone
yhtang Oct 25, 2023
42ec9bb
update pip-tools script
yhtang Oct 25, 2023
2f02023
update pip-tools script
yhtang Oct 25, 2023
ad12e78
update pip-tools script
yhtang Oct 25, 2023
bdc34c8
fix t5x dockerfile
yhtang Oct 25, 2023
a5c478e
fix t5x dockerfile
yhtang Oct 25, 2023
b4fd2d8
test flax hack
yhtang Oct 25, 2023
fe6e3d8
flax hack
yhtang Oct 25, 2023
ad83f21
hack for git top of tree Flax dependency
yhtang Nov 2, 2023
b7e1a6c
update URL req
yhtang Nov 3, 2023
0dd9617
update
yhtang Nov 3, 2023
dc44fe4
editability
yhtang Nov 3, 2023
acf2acf
editability
yhtang Nov 3, 2023
027cd63
editability
yhtang Nov 3, 2023
7c10ffa
editability
yhtang Nov 3, 2023
f1046d2
editability
yhtang Nov 3, 2023
bbf2c21
wip
yhtang Nov 3, 2023
cd0d5d1
wip
yhtang Nov 6, 2023
e973c9a
wip
yhtang Nov 6, 2023
1f6d4f1
fix shell
yhtang Nov 6, 2023
6d9ae00
fix arg order
yhtang Nov 6, 2023
cc66ce6
remove standalone TE build
yhtang Nov 6, 2023
fe8708a
build TE wheel in JAX
yhtang Nov 6, 2023
b1e332e
pax wip
yhtang Nov 6, 2023
62ca85b
add pax build
yhtang Nov 6, 2023
bc6c702
add pax build
yhtang Nov 6, 2023
ff6ec2a
fix CI
yhtang Nov 6, 2023
67df9b8
debug pax build
yhtang Nov 6, 2023
d67966a
debug pax build
yhtang Nov 6, 2023
1ba1bff
debug pax build
yhtang Nov 6, 2023
e8f87d2
fix EOF
yhtang Nov 6, 2023
24b4931
redesign workflow
yhtang Nov 7, 2023
e2c34b4
fix job step id
yhtang Nov 7, 2023
7ee441b
arm64 build
yhtang Nov 7, 2023
4f4d909
arm64 build
yhtang Nov 7, 2023
b777244
arm64 build
yhtang Nov 8, 2023
c4c22af
arm64 build
yhtang Nov 8, 2023
736246a
add sitrep to base build
yhtang Nov 8, 2023
bc4b6db
lingvo
yhtang Nov 8, 2023
ce1cf94
lingvo
yhtang Nov 8, 2023
9ac7367
lingvo
yhtang Nov 8, 2023
44d3026
refactor pax arm64 build
yhtang Nov 8, 2023
5c47fcb
refactor pax arm64 build wip
yhtang Nov 8, 2023
139e539
refactor pax arm64 build wip
yhtang Nov 8, 2023
5e2a5ad
refactor pax arm64 build wip
yhtang Nov 8, 2023
cb99d71
refactor pax arm64 build wip
yhtang Nov 9, 2023
0959264
pax arm64
yhtang Nov 9, 2023
314db99
redesign CI
yhtang Nov 9, 2023
8943e9f
redesign CI
yhtang Nov 9, 2023
94115ba
refactor CI
yhtang Nov 9, 2023
ee11851
refactor CI
yhtang Nov 9, 2023
6b6fc92
refactor CI
yhtang Nov 9, 2023
cf66cfd
refactor CI
yhtang Nov 9, 2023
9fd4503
refactor CI
yhtang Nov 9, 2023
f2e80a1
file permission
yhtang Nov 9, 2023
7db2f84
refactor CI
yhtang Nov 9, 2023
7090349
refactor CI
yhtang Nov 9, 2023
6900d86
refactor CI
yhtang Nov 9, 2023
06327c1
refactor CI
yhtang Nov 9, 2023
618a3f5
refactor CI
yhtang Nov 9, 2023
c659d3c
refactor CI
yhtang Nov 9, 2023
8395604
refactor CI
yhtang Nov 9, 2023
2ab0cc9
refactor CI
yhtang Nov 9, 2023
69c17fc
refactor CI
yhtang Nov 9, 2023
d9400d3
refactor CI
yhtang Nov 9, 2023
33dc9ac
refactor CI
yhtang Nov 9, 2023
4fc18dc
fix output tag order
yhtang Nov 9, 2023
b806a5a
t5x arm64 build not ready yet
yhtang Nov 9, 2023
8391f74
nightly T5X build
yhtang Nov 9, 2023
984a19a
nightly T5X build
yhtang Nov 9, 2023
715f62c
fix TE/T5X bug
yhtang Nov 9, 2023
7f06cb8
add TE examples and tests to wheel
yhtang Nov 9, 2023
92b6d0a
allow TE parallel build
yhtang Nov 9, 2023
c4f5b84
jax publish
yhtang Nov 9, 2023
bcfd0e4
rename staging to mealkit
yhtang Nov 9, 2023
12a2fd6
fix nightly
yhtang Nov 9, 2023
1925b14
fix nightly
yhtang Nov 9, 2023
6e71c64
bug fix
yhtang Nov 10, 2023
adb10da
bug fix
yhtang Nov 10, 2023
b956b30
bug fix
yhtang Nov 10, 2023
c727607
fix
yhtang Nov 10, 2023
d450ceb
fix TE test
yhtang Nov 10, 2023
fc2c6e6
fix pax test
yhtang Nov 10, 2023
f9c6cd3
fix TE test
yhtang Nov 10, 2023
7230589
merge CI yaml
yhtang Nov 10, 2023
429ae4d
fix arg
yhtang Nov 10, 2023
de32e4f
rerun TE/PAX test
yhtang Nov 10, 2023
43a57c6
Merge branch 'main' of github.com:NVIDIA/JAX-Toolbox into add-pip-com…
yhtang Nov 10, 2023
ab73f6b
fix TE multi-device test
yhtang Nov 10, 2023
9eb97e8
fix lzma build issue
yhtang Nov 10, 2023
772f606
edit TE test name
yhtang Nov 11, 2023
fcb29b4
fix TE arm64 test install error
yhtang Nov 13, 2023
22d400b
remove --install option from get-source.sh
yhtang Nov 13, 2023
e9f074f
fix TE arm64 test install error
yhtang Nov 13, 2023
602002f
disable sandbox
yhtang Nov 13, 2023
12a57eb
i'm jet-lagged
yhtang Nov 13, 2023
dbaba5b
use Pax image for TE testing
yhtang Nov 13, 2023
ccafb52
Fix job dependency
yhtang Nov 13, 2023
6974a3a
Add nightly rosetta build and test
DwarKapex Nov 16, 2023
c5f8f23
wip: fix typo
DwarKapex Nov 16, 2023
1d9d282
wip: build sandbox on dispatch
DwarKapex Nov 16, 2023
e6bf405
wip: fix build_rosetta
DwarKapex Nov 16, 2023
4c58356
wip: update yaml structure for nightly build of rosseta y5x
DwarKapex Nov 16, 2023
1fe5f69
wip: update yaml structure for nightly build of rosseta t5x
DwarKapex Nov 16, 2023
95f5729
wip: fix yaml structure for nightly build of rosseta t5x
DwarKapex Nov 16, 2023
d89f93f
wip: fix yaml structure for nightly build of rosseta pax
DwarKapex Nov 16, 2023
08fb7f9
wip: fix typo in yaml structure for nightly build of rosseta pax
DwarKapex Nov 16, 2023
a7f95f1
wip: fix typo in yaml structure for nightly build of rosseta pax 2
DwarKapex Nov 16, 2023
2d35714
wip
DwarKapex Nov 16, 2023
5ae2e79
wip:
DwarKapex Nov 16, 2023
ce8ba2d
use the _publish_container reusable workflow for base container weekl…
yhtang Nov 16, 2023
6501ae9
fix base build output arg name error
yhtang Nov 16, 2023
aae5865
wip: add base build
DwarKapex Nov 16, 2023
a98bf46
Merge branch 'add-pip-compile' of github.com:NVIDIA/JAX-Toolbox into …
DwarKapex Nov 16, 2023
94d2db7
wip: add base build
DwarKapex Nov 16, 2023
9bb80b4
wip: add base build
DwarKapex Nov 16, 2023
3442369
wip: build t5x and rosetta/pax
DwarKapex Nov 16, 2023
f567e2c
wip: build all in question
DwarKapex Nov 16, 2023
efb339c
wip: build all in question 2
DwarKapex Nov 16, 2023
dca817c
Build whole pipeline
DwarKapex Nov 16, 2023
356adbf
Merge origin/main
DwarKapex Nov 16, 2023
44d7897
Build rosetta in CI pipeline
DwarKapex Nov 16, 2023
420162f
Build rosetta in CI pipeline
DwarKapex Nov 16, 2023
2a12b05
Debug rosetta build
DwarKapex Nov 16, 2023
507c6c1
Debug rosetta build: correct container
DwarKapex Nov 16, 2023
81d03ef
Debug rosetta build: fix odd issue from docker file
DwarKapex Nov 17, 2023
f546c9b
Full pipelilne 2
DwarKapex Nov 17, 2023
85a8817
Addressed Ann's comments
DwarKapex Nov 17, 2023
630d1df
Full pipelilne: revert _sandbox.yaml
DwarKapex Nov 17, 2023
dc37804
Merge branch 'add-pip-compile' of github.com:NVIDIA/JAX-Toolbox into …
DwarKapex Nov 17, 2023
1cf80c2
Rename nightly rosseta to be able to sun nigthly rosseta pax build ma…
DwarKapex Nov 17, 2023
0885302
Add arch to rosetta-pax build
DwarKapex Nov 17, 2023
a97659e
Add arch to rosetta-pax build
DwarKapex Nov 17, 2023
2744e67
Add arch to rosetta-pax build: fix typo
DwarKapex Nov 17, 2023
858881b
Add arch to rosetta-pax build: fix typo
DwarKapex Nov 17, 2023
2d6a11c
Add arch to rosetta-pax build: fix typo
DwarKapex Nov 17, 2023
332bf98
Addressed Terry's comments
DwarKapex Nov 20, 2023
2bd1999
Pass git username and email thru params
DwarKapex Nov 20, 2023
873b5a5
Address Terry's final comments
DwarKapex Nov 20, 2023
770990d
Address Terry's LGTM comments
DwarKapex Nov 21, 2023
4ae4189
Rosetta dockerfiles to have 2 stages: mealkit and final
DwarKapex Nov 21, 2023
85182ef
Rosetta dockerfiles: remore __pychache__ deletion
DwarKapex Nov 21, 2023
6e7b418
Merge origin/main
DwarKapex Nov 21, 2023
4ea1742
Reverte test_destribution
DwarKapex Nov 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion .github/container/Dockerfile.base
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
ARG BASE_IMAGE=nvidia/cuda:12.2.0-devel-ubuntu22.04

FROM ${BASE_IMAGE}

###############################################################################
Expand All @@ -17,13 +18,28 @@ RUN apt-get update && \
git \
lld \
vim \
bat \
curl \
git \
gnupg \
rsync \
python-is-python3 \
python3-pip \
liblzma-dev \
wget \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade --no-cache-dir pip
RUN <<"EOF" bash -ex
git config --global user.name "JAX Toolbox"
git config --global user.email "jax@nvidia.com"
DwarKapex marked this conversation as resolved.
Show resolved Hide resolved
EOF
RUN pip install --upgrade --no-cache-dir pip pip-tools && rm -rf ~/.cache/*
RUN mkdir -p /opt/pip-tools.d
ADD --chmod=777 \
get-source.sh \
pip-finalize.sh \
/usr/local/bin/

###############################################################################
## Install cuDNN
Expand Down
66 changes: 42 additions & 24 deletions .github/container/Dockerfile.jax
Original file line number Diff line number Diff line change
@@ -1,18 +1,25 @@
ARG BASE_IMAGE=ghcr.io/nvidia/jax-toolbox:base
ARG REPO_JAX="https://github.com/google/jax.git"
ARG REPO_XLA="https://github.com/openxla/xla.git"
ARG REPO_FLAX="https://github.com/google/flax.git"
ARG REPO_TE="https://github.com/NVIDIA/TransformerEngine.git"
ARG REF_JAX=main
ARG REF_XLA=main
ARG SRC_PATH_JAX=/opt/jax-source
ARG REF_FLAX=main
ARG REF_TE=main
ARG SRC_PATH_JAX=/opt/jax
ARG SRC_PATH_XLA=/opt/xla-source
ARG SRC_PATH_FLAX=/opt/flax
ARG SRC_PATH_TE=/opt/transformer-engine-source

ARG BAZEL_CACHE=/tmp
ARG BUILD_DATE

###############################################################################
## Build JAX
###############################################################################

FROM ${BASE_IMAGE} as jax-builder
FROM ${BASE_IMAGE} as builder
ARG REPO_JAX
ARG REPO_XLA
ARG REF_JAX
Expand Down Expand Up @@ -47,15 +54,12 @@ RUN build-jax.sh \
--xla-arm64-patch /opt/xla-arm64-neon.patch \
--clean

RUN cp -r ${SRC_PATH_JAX} ${SRC_PATH_JAX}-no-git && rm -rf ${SRC_PATH_JAX}-no-git/.git
RUN cp -r ${SRC_PATH_XLA} ${SRC_PATH_XLA}-no-git && rm -rf ${SRC_PATH_XLA}-no-git/.git

###############################################################################
## Build 'runtime' flavor without the git metadata
## Pack jaxlib wheel and various source dirs into a pre-installation image
###############################################################################

ARG BASE_IMAGE
FROM ${BASE_IMAGE} as runtime-image
FROM ${BASE_IMAGE} as mealkit
ARG SRC_PATH_JAX
ARG SRC_PATH_XLA
ARG BUILD_DATE
Expand All @@ -66,29 +70,43 @@ ENV CUDA_DEVICE_MAX_CONNECTIONS=1
ENV NCCL_IB_SL=1
ENV CUDA_MODULE_LOADING=EAGER

COPY --from=jax-builder ${SRC_PATH_JAX}-no-git ${SRC_PATH_JAX}
COPY --from=jax-builder ${SRC_PATH_XLA}-no-git ${SRC_PATH_XLA}
COPY --from=builder ${SRC_PATH_JAX} ${SRC_PATH_JAX}
COPY --from=builder ${SRC_PATH_XLA} ${SRC_PATH_XLA}
ADD build-jax.sh local_cuda_arch test-jax.sh /usr/local/bin/

RUN pip --disable-pip-version-check install ${SRC_PATH_JAX}/dist/*.whl && \
pip --disable-pip-version-check install -e ${SRC_PATH_JAX} && \
rm -rf ~/.cache/pip/
RUN mkdir -p /opt/pip-tools.d
RUN <<"EOF" bash -ex
echo "-e file://${SRC_PATH_JAX}" >> /opt/pip-tools.d/manifest.jax
echo "jaxlib @ file://$(ls ${SRC_PATH_JAX}/dist/*.whl)" >> /opt/pip-tools.d/manifest.jax
EOF

# Install software stack in JAX ecosystem
# Made this optional since tensorstore cannot build on Ubuntu 20.04 + ARM
RUN { pip install flax || true; } && rm -rf ~/.cache/*
## Flax
ARG REPO_FLAX
ARG REF_FLAX
ARG SRC_PATH_FLAX
RUN get-source.sh -f ${REPO_FLAX} -r ${REF_FLAX} -d ${SRC_PATH_FLAX} -m /opt/pip-tools.d/manifest.flax

## Transformer engine: check out source and build wheel
ARG REPO_TE
ARG REF_TE
ARG SRC_PATH_TE
ENV NVTE_FRAMEWORK=jax
ENV SRC_PATH_TE=${SRC_PATH_TE}
RUN <<"EOF" bash -ex
set -o pipefail
pip install ninja && rm -rf ~/.cache/pip
get-source.sh -f ${REPO_TE} -r ${REF_TE} -d ${SRC_PATH_TE}
pushd ${SRC_PATH_TE}
python setup.py bdist_wheel && rm -rf build
echo "transformer-engine @ file://$(ls ${SRC_PATH_TE}/dist/*.whl)" >> /opt/pip-tools.d/manifest.te
EOF

# TODO: properly configure entrypoint
# COPY entrypoint.d/ /opt/nvidia/entrypoint.d/

###############################################################################
## Build 'devel' image with build scripts and git metadata
## Install primary packages and transitive dependencies
###############################################################################

FROM runtime-image as devel-image
ARG SRC_PATH_JAX
ARG SRC_PATH_XLA

ADD build-jax.sh local_cuda_arch test-jax.sh /usr/local/bin/
FROM mealkit as final

COPY --from=jax-builder ${SRC_PATH_JAX}/.git ${SRC_PATH_JAX}/.git
COPY --from=jax-builder ${SRC_PATH_XLA}/.git ${SRC_PATH_XLA}/.git
RUN pip-finalize.sh
65 changes: 43 additions & 22 deletions .github/container/Dockerfile.pax.amd64
Original file line number Diff line number Diff line change
@@ -1,33 +1,54 @@
# syntax=docker/dockerfile:1-labs
###############################################################################
## Pax
###############################################################################

ARG BASE_IMAGE=ghcr.io/nvidia/jax:latest
FROM ${BASE_IMAGE}

ADD install-pax.sh /usr/local/bin
ADD install-flax.sh /usr/local/bin
ADD install-te.sh /usr/local/bin

ENV NVTE_FRAMEWORK=jax
ARG REPO_PAXML=https://github.com/google/paxml.git
ARG REPO_PRAXIS=https://github.com/google/praxis.git
ARG REF_PAXML=main
ARG REF_PRAXIS=main
ARG SRC_PATH_PAXML=/opt/paxml
ARG SRC_PATH_PRAXIS=/opt/praxis

###############################################################################
## Download source and add auxiliary scripts
###############################################################################

FROM ${BASE_IMAGE} as mealkit
ARG REPO_PAXML
ARG REPO_PRAXIS
ARG REF_PAXML
ARG REF_PRAXIS
ARG SRC_PATH_PAXML
ARG SRC_PATH_PRAXIS

# update TE manifest file to install the [test] extras
RUN sed -i "s/transformer-engine @/transformer-engine[test] @/g" /opt/pip-tools.d/manifest.te

RUN <<"EOF" bash -ex
install-pax.sh --defer --from_paxml ${REPO_PAXML} --from_praxis ${REPO_PRAXIS} --ref_paxml ${REF_PAXML} --ref_praxis ${REF_PRAXIS}
install-flax.sh --defer
install-te.sh --defer

if [[ -f /opt/requirements-defer.txt ]]; then
# SKIP_HEAD_INSTALLS avoids having to install jax from Github source so that
# we do not overwrite the jax that was already installed.
SKIP_HEAD_INSTALLS=true pip install -r /opt/requirements-defer.txt
fi
if [[ -f /opt/cleanup.sh ]]; then
bash -ex /opt/cleanup.sh
fi
get-source.sh -f ${REPO_PAXML} -r ${REF_PAXML} -d ${SRC_PATH_PAXML}
get-source.sh -f ${REPO_PRAXIS} -r ${REF_PRAXIS} -d ${SRC_PATH_PRAXIS}
echo "-e file://${SRC_PATH_PAXML}[gpu]" >> /opt/pip-tools.d/manifest.pax
echo "-e file://${SRC_PATH_PRAXIS}" >> /opt/pip-tools.d/manifest.pax

for src in ${SRC_PATH_PAXML} ${SRC_PATH_PRAXIS}; do
pushd ${src}
sed -i "s| @ git+https://github.com/google/flax||g" requirements.in
sed -i "s| @ git+https://github.com/google/jax||g" requirements.in
if git diff --quiet; then
echo "URL specs no longer present in select dependencies for ${src}"
exit 1
else
git commit -a -m "remove URL specs from select dependencies for ${src}"
fi
popd
done
EOF

ADD test-pax.sh /usr/local/bin

###############################################################################
## Install accumulated packages from the base image and the previous stage
###############################################################################

FROM mealkit as final

RUN pip-finalize.sh
Loading
Loading