Skip to content

Commit

Permalink
Merge branch 'skypilot-org:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
asaiacai authored Nov 4, 2024
2 parents f622129 + 81532d0 commit 5fd4371
Show file tree
Hide file tree
Showing 35 changed files with 932 additions and 349 deletions.
24 changes: 9 additions & 15 deletions Dockerfile_k8s
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM continuumio/miniconda3:23.3.1-0
FROM --platform=linux/amd64 continuumio/miniconda3:23.3.1-0

# TODO(romilb): Investigate if this image can be consolidated with the skypilot
# client image (`Dockerfile`)
Expand Down Expand Up @@ -33,21 +33,15 @@ ENV HOME /home/sky
# Set current working directory
WORKDIR /home/sky

# Install SkyPilot pip dependencies preemptively to speed up provisioning time
RUN conda init && \
pip install wheel Click colorama cryptography jinja2 jsonschema networkx \
oauth2client pandas pendulum PrettyTable rich tabulate filelock packaging \
'protobuf<4.0.0' pulp pycryptodome==3.12.0 docker kubernetes==28.1.0 \
grpcio==1.51.3 python-dotenv==1.0.1 ray[default]==2.9.3 && \
# Install skypilot dependencies
RUN conda init && export PIP_DISABLE_PIP_VERSION_CHECK=1 && \
python3 -m venv ~/skypilot-runtime && \
PYTHON_EXEC=$(echo ~/skypilot-runtime)/bin/python && \
$PYTHON_EXEC -m pip install 'skypilot-nightly[remote,kubernetes]' 'ray[default]==2.9.3' 'pycryptodome==3.12.0' && \
$PYTHON_EXEC -m pip uninstall skypilot-nightly -y && \
curl -LO "https://dl.k8s.io/release/v1.28.11/bin/linux/amd64/kubectl" && \
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Add /home/sky/.local/bin/ to PATH
RUN echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc

# Copy SkyPilot code base. This is required for the ssh jump pod to find the
# lifecycle management scripts
COPY --chown=sky . /skypilot/sky/
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl && \
echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc

# Set PYTHONUNBUFFERED=1 to have Python print to stdout/stderr immediately
ENV PYTHONUNBUFFERED=1
19 changes: 7 additions & 12 deletions Dockerfile_k8s_gpu
Original file line number Diff line number Diff line change
Expand Up @@ -41,19 +41,14 @@ RUN curl https://repo.anaconda.com/miniconda/Miniconda3-py310_23.11.0-2-Linux-x8
eval "$(~/miniconda3/bin/conda shell.bash hook)" && conda init && conda config --set auto_activate_base true && conda activate base && \
grep "# >>> conda initialize >>>" ~/.bashrc || { conda init && source ~/.bashrc; } && \
rm Miniconda3-Linux-x86_64.sh && \
pip install wheel Click colorama cryptography jinja2 jsonschema networkx \
oauth2client pandas pendulum PrettyTable rich tabulate filelock packaging \
'protobuf<4.0.0' pulp pycryptodome==3.12.0 docker kubernetes==28.1.0 \
grpcio==1.51.3 python-dotenv==1.0.1 ray[default]==2.9.3 && \
export PIP_DISABLE_PIP_VERSION_CHECK=1 && \
python3 -m venv ~/skypilot-runtime && \
PYTHON_EXEC=$(echo ~/skypilot-runtime)/bin/python && \
$PYTHON_EXEC -m pip install 'skypilot-nightly[remote,kubernetes]' 'ray[default]==2.9.3' 'pycryptodome==3.12.0' && \
$PYTHON_EXEC -m pip uninstall skypilot-nightly -y && \
curl -LO "https://dl.k8s.io/release/v1.28.11/bin/linux/amd64/kubectl" && \
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# Add /home/sky/.local/bin/ to PATH
RUN echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc

# Copy SkyPilot code base. This is required for the ssh jump pod to find the
# lifecycle management scripts
COPY --chown=sky . /skypilot/sky/
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl && \
echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc

# Set PYTHONUNBUFFERED=1 to have Python print to stdout/stderr immediately
ENV PYTHONUNBUFFERED=1
16 changes: 16 additions & 0 deletions docs/source/reference/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,22 @@ For example, if you have access to special regions of GCP, add the data to ``~/.
Also, you can update the catalog for a specific cloud by deleting the CSV file (e.g., ``rm ~/.sky/catalogs/<schema-version>/gcp.csv``).
SkyPilot will automatically download the latest catalog in the next run.

Package Installation
---------------------

Unable to import PyTorch in a SkyPilot task.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For `PyTorch <https://pytorch.org/>`_ installation, if you are using the default SkyPilot images (not passing in `--image-id`), ``pip install torch`` should work.

But if you use your own image which has an older NVIDIA driver (535.161.08 or lower) and you install the default PyTorch, you may encounter the following error:

.. code-block:: bash
ImportError: /home/azureuser/miniconda3/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
You will need to install a PyTorch version that is compatible with your NVIDIA driver, e.g., ``pip install torch --index-url https://download.pytorch.org/whl/cu121``.


Miscellaneous
-------------

Expand Down
Loading

0 comments on commit 5fd4371

Please sign in to comment.