Skip to content

Commit

Permalink
Update packer scripts (#4203)
Browse files Browse the repository at this point in the history
* Update custom image packer script to exclude .sky and include python sys packages

* add comments
  • Loading branch information
yika-luo authored Oct 29, 2024
1 parent ce8d2df commit f267893
Show file tree
Hide file tree
Showing 10 changed files with 32 additions and 87 deletions.
14 changes: 8 additions & 6 deletions sky/clouds/service_catalog/images/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ You only need to do this once.
packer init plugins.pkr.hcl
```
3. Setup cloud credentials
4. `cd sky/clouds/service_catalog/images`

## Generate Images
FYI time to packer build images:
Expand All @@ -30,7 +31,7 @@ packer build ${IMAGE}.pkr.hcl
2. Make the image public
```bash
# Make image public
export IMAGE_NAME=skypilot-gcp-cpu-ubuntu-xxx # Update this
export IMAGE_NAME=skypilot-gcp-cpu-ubuntu-20241029144600 # Update this
export IMAGE_ID=projects/sky-dev-465/global/images/${IMAGE_NAME}
gcloud compute images add-iam-policy-binding ${IMAGE_NAME} --member='allAuthenticatedUsers' --role='roles/compute.imageUser'
```
Expand All @@ -44,7 +45,8 @@ packer build ${IMAGE}.pkr.hcl
```
2. Copy images to all regions
```bash
export IMAGE_ID=ami-0b31b24524afa8e47 # Update this
export TYPE=gpu # Update this
export IMAGE_ID=ami-05e9f5efd844f1a4f # Update this
python aws_utils/image_gen.py --image-id ${IMAGE_ID} --processor ${TYPE}
```
3. Add fallback images if any region failed \
Expand All @@ -55,11 +57,11 @@ Look for "NEED_FALLBACK" in the output `images.csv` and edit. (You can use publi
```bash
export SECRET=xxxxxx # Update this
```
2. Build and copy images for all regions and both VM generations (1 and 2).
2. Build and copy images for all regions for GPU (gen 1 & 2) and CPU (gen 2 only).
```bash
export VM_GENERATION=2 # Update this
packer build -force --var vm_generation=${VM_GENERATION} --var client_secret=${SECRET} skypilot-azure-cpu-ubuntu.pkr.hcl
packer build --var client_secret=${SECRET} skypilot-azure-gpu-ubuntu.pkr.hcl
export TYPE=gpu # Update this
export VM_GENERATION=1 # Update this
packer build --var vm_generation=${VM_GENERATION} --var client_secret=${SECRET} skypilot-azure-${TYPE}-ubuntu.pkr.hcl
```

## Test Images
Expand Down
2 changes: 1 addition & 1 deletion sky/clouds/service_catalog/images/aws_utils/image_gen.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ def process_region(copy_to_region):
except Exception as e:
print(f"Error generating image to {copy_to_region}: {str(e)}")
new_image_id = 'NEED_FALLBACK'
image_cache.append((new_image_id, copy_to_region))
image_cache.append((new_image_id, copy_to_region))

with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(process_region, ALL_REGIONS)
Expand Down
50 changes: 0 additions & 50 deletions sky/clouds/service_catalog/images/provisioners/cloud.sh

This file was deleted.

23 changes: 17 additions & 6 deletions sky/clouds/service_catalog/images/provisioners/skypilot.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,21 +36,31 @@ echo PATH=$PATH
python3 -m venv ~/skypilot-runtime
PYTHON_EXEC=$(echo ~/skypilot-runtime)/bin/python

# Pip installs
$PYTHON_EXEC -m pip install "setuptools<70"
$PYTHON_EXEC -m pip install "grpcio!=1.48.0,<=1.51.3,>=1.42.0"
$PYTHON_EXEC -m pip install "skypilot-nightly"
# Install SkyPilot
$PYTHON_EXEC -m pip install "skypilot-nightly[remote]"

# Install ray
# Install Ray
RAY_ADDRESS=127.0.0.1:6380
$PYTHON_EXEC -m pip install --exists-action w -U ray[default]==2.9.3
$PYTHON_EXEC -m pip install --exists-action w -U "ray[default]==2.9.3"
export PATH=$PATH:$HOME/.local/bin
source ~/skypilot-runtime/bin/activate
which ray > ~/.sky/ray_path || exit 1
$PYTHON_EXEC -m pip list | grep "ray " | grep 2.9.3 2>&1 > /dev/null && {
$PYTHON_EXEC -c "from sky.skylet.ray_patches import patch; patch()" || exit 1
}

# Install cloud dependencies
if [ "$CLOUD" = "azure" ]; then
$PYTHON_EXEC -m pip install "skypilot-nightly[azure]"
elif [ "$CLOUD" = "gcp" ]; then
# We don't have to install the google-cloud-sdk since it is installed by default in GCP machines.
$PYTHON_EXEC -m pip install "skypilot-nightly[gcp]"
elif [ "$CLOUD" = "aws" ]; then
$PYTHON_EXEC -m pip install "skypilot-nightly[aws]"
else
echo "Error: Unknown cloud $CLOUD so not installing any cloud dependencies."
fi

# System configurations
sudo bash -c 'rm -rf /etc/security/limits.d; echo "* soft nofile 1048576" >> /etc/security/limits.conf; echo "* hard nofile 1048576" >> /etc/security/limits.conf'
sudo grep -e '^DefaultTasksMax' /etc/systemd/system.conf || sudo bash -c 'echo "DefaultTasksMax=infinity" >> /etc/systemd/system.conf'
Expand All @@ -67,3 +77,4 @@ sudo systemctl disable jupyter > /dev/null 2>&1 || true
# Cleanup
# Remove SkyPilot in OS image because when user sky launch we will install whatever version of SkyPilot user has on their local machine.
$PYTHON_EXEC -m pip uninstall "skypilot-nightly" -y
rm -rf ~/.sky
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,10 @@ build {
provisioner "shell" {
script = "./provisioners/docker.sh"
}
provisioner "shell" {
script = "./provisioners/skypilot.sh"
}
provisioner "shell" {
environment_vars = [
"CLOUD=aws",
]
script = "./provisioners/cloud.sh"
script = "./provisioners/skypilot.sh"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,10 @@ build {
provisioner "shell" {
script = "./provisioners/nvidia-container-toolkit.sh"
}
provisioner "shell" {
script = "./provisioners/skypilot.sh"
}
provisioner "shell" {
environment_vars = [
"CLOUD=aws",
]
script = "./provisioners/cloud.sh"
script = "./provisioners/skypilot.sh"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,10 @@ build {
provisioner "shell" {
script = "./provisioners/docker.sh"
}
provisioner "shell" {
script = "./provisioners/skypilot.sh"
}
provisioner "shell" {
environment_vars = [
"CLOUD=azure",
]
script = "./provisioners/cloud.sh"
script = "./provisioners/skypilot.sh"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,10 @@ build {
provisioner "shell" {
script = "./provisioners/nvidia-container-toolkit.sh"
}
provisioner "shell" {
script = "./provisioners/skypilot.sh"
}
provisioner "shell" {
environment_vars = [
"CLOUD=azure",
]
script = "./provisioners/cloud.sh"
script = "./provisioners/skypilot.sh"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,10 @@ build {
provisioner "shell" {
script = "./provisioners/docker.sh"
}
provisioner "shell" {
script = "./provisioners/skypilot.sh"
}
provisioner "shell" {
environment_vars = [
"CLOUD=gcp",
]
script = "./provisioners/cloud.sh"
script = "./provisioners/skypilot.sh"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,10 @@ build {
provisioner "shell" {
script = "./provisioners/nvidia-container-toolkit.sh"
}
provisioner "shell" {
script = "./provisioners/skypilot.sh"
}
provisioner "shell" {
environment_vars = [
"CLOUD=gcp",
]
script = "./provisioners/cloud.sh"
script = "./provisioners/skypilot.sh"
}
}

0 comments on commit f267893

Please sign in to comment.