Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add checkpoint uds-core slim package #818

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
80a9892
feat: add frozen uds-core slim package
Racer159 Sep 25, 2024
59d8999
lint
Racer159 Sep 25, 2024
2b74bd5
tune this for speed
Racer159 Sep 26, 2024
895d53a
swap to checkpoint
Racer159 Sep 27, 2024
68a03ae
add release workflow
Racer159 Sep 27, 2024
7a09a33
Merge branch 'main' into gotta-go-fast
Racer159 Sep 27, 2024
7d86107
add testing
Racer159 Sep 27, 2024
a853265
add id token write back
Racer159 Sep 27, 2024
6af295e
install uds wo brew
Racer159 Sep 27, 2024
c1e3e54
install uds wo brew
Racer159 Sep 27, 2024
c1a457e
fix oci
Racer159 Sep 27, 2024
88e0aa6
fixup version var
Racer159 Sep 27, 2024
6e24a4a
fix version
Racer159 Sep 27, 2024
db1aeef
slim istio validate
Racer159 Sep 28, 2024
7cff415
add npm ci
Racer159 Sep 28, 2024
9296753
make a slim dev test
Racer159 Sep 28, 2024
98bd274
fix save logs
Racer159 Sep 28, 2024
964786b
lint
Racer159 Sep 28, 2024
4788325
swap back checkpoint workflow
Racer159 Sep 28, 2024
a05b23c
Merge branch 'main' into gotta-go-fast
Racer159 Sep 30, 2024
34235c9
Merge branch 'main' into gotta-go-fast
Racer159 Sep 30, 2024
d8a12b2
Merge branch 'main' into gotta-go-fast
Racer159 Oct 1, 2024
e34b0de
initial feedback
Racer159 Oct 1, 2024
ca35214
Merge branch 'main' into gotta-go-fast
Racer159 Oct 1, 2024
2211d71
add docs
Racer159 Oct 1, 2024
b3cb482
refine README
Racer159 Oct 1, 2024
d1abeee
fix lil string
Racer159 Oct 2, 2024
d84c408
fix last bits
Racer159 Oct 2, 2024
4046f6f
revert checkpoint workflow
Racer159 Oct 2, 2024
cb9db50
Update packages/checkpoint-dev/zarf.yaml
Racer159 Oct 2, 2024
3adc01d
produce a downloadable artifact
Racer159 Oct 2, 2024
2b0c083
fix permissions
Racer159 Oct 2, 2024
43a4ec2
fix docker load
Racer159 Oct 2, 2024
830b978
Merge branch 'main' into gotta-go-fast
Racer159 Oct 2, 2024
e72901a
Merge branch 'main' into gotta-go-fast
Racer159 Oct 4, 2024
401d88c
Update packages/checkpoint-dev/zarf.yaml
Racer159 Oct 4, 2024
aaea091
Merge branch 'main' into gotta-go-fast
Racer159 Oct 4, 2024
5cb166b
Merge branch 'main' into gotta-go-fast
Racer159 Oct 8, 2024
2336dc7
Merge branch 'main' into gotta-go-fast
Racer159 Oct 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions .github/workflows/checkpoint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
name: Checkpoint UDS Core

on:
pull_request: # TODO: TEMP @WSTARR
# milestoned is added here as a workaround for release-please not triggering PR workflows (PRs should be added to a milestone to trigger the workflow).
types: [milestoned, opened, reopened, synchronize]
# triggered by tag-and-release.yaml
workflow_call:

jobs:
publish-uds-core:
strategy:
matrix:
architecture: [amd64, arm64]
runs-on: ${{ matrix.architecture == 'arm64' && 'uds-ubuntu-arm64-4-core' || 'uds-ubuntu-big-boy-4-core' }}
name: Publish checkpoint

permissions:
contents: read
packages: write
id-token: write # This is needed for OIDC federation.

steps:
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7

- name: Environment setup
uses: ./.github/actions/setup
with:
registry1Username: ${{ secrets.IRON_BANK_ROBOT_USERNAME }}
registry1Password: ${{ secrets.IRON_BANK_ROBOT_PASSWORD }}
ghToken: ${{ secrets.GITHUB_TOKEN }}
chainguardIdentity: ${{ secrets.CHAINGUARD_IDENTITY }}

- name: Deploy K3d + UDS Core Slim Bundle
run: |
uds run -f tasks/deploy.yaml latest-slim-bundle-release --no-progress

- name: Create Checkpoint Package
run: |
uds run -f tasks/create.yaml checkpoint-dev-package --no-progress

- name: Test Checkpoint Package
run: |
uds run -f tasks/deploy.yaml checkpoint-package --no-progress
npm ci
uds run test:slim-dev --no-progress

- name: Debug Output
if: always()
uses: ./.github/actions/debug-output

# - name: Publish Checkpoint Package
# run: uds run -f tasks/publish.yaml checkpoint-package --no-progress

- name: Save logs
if: always()
uses: ./.github/actions/save-logs
with:
suffix: -${{ matrix.architecture }}

- uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 # v4.4.0
with:
name: checkpoint-pkg-${{ matrix.architecture }}
path: |
build/zarf-package-k3d-core-slim-dev-${{ matrix.architecture }}-0.28.0.tar.zst
2 changes: 2 additions & 0 deletions .github/workflows/slim-dev-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ jobs:
uses: ./.github/actions/setup
- name: Deploy Slim Dev Bundle
run: uds run slim-dev --no-progress
- name: Test Slim Dev Bundle
run: uds run test:slim-dev --no-progress
- name: Debug Output
if: ${{ always() }}
uses: ./.github/actions/debug-output
Expand Down
9 changes: 9 additions & 0 deletions .github/workflows/tag-and-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,12 @@ jobs:
with:
snapshot: false
secrets: inherit

checkpoint-uds-core-release:
needs: publish-uds-core-release
permissions:
contents: read
packages: write
id-token: write
uses: ./.github/workflows/checkpoint.yaml
secrets: inherit
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,7 @@ extract-terraform.sh
**/.terraform*
cluster-config.yaml
**.tfstate

packages/checkpoint-dev/uds-checkpoint.tar

**.backup
31 changes: 31 additions & 0 deletions packages/checkpoint-dev/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# K3d + UDS Core Slim Dev Checkpoint

This is a special Zarf package that takes a running K3d cluster (named `uds`) and wraps its committed container and volumes into a zarf package.

## Creating this package

In order to create this package you must follow the following:

1. Setup a K3d cluster (named `uds`) containing the contents you'd like to checkpoint

> [!NOTE]
> The intent for this package is that those contents are the `uds dev stack`, `zarf init` and `uds core slim`

2. Run `uds zarf package create <path-to-zarf-yaml> --confirm` on the Zarf Package in this directory

> [!IMPORTANT]
> This package requires `sudo` to create and deploy currently - if you see a prompt and it seems stalled it is waiting for password input (hidden by the spinner)

## Deploying this package

Once you have a package with the contents you want created you can deploy it with:

```
uds zarf package deploy <path-to-zarf-tarball> --confirm
```

> [!IMPORTANT]
> This package requires `sudo` to deploy and create currently - if you see a prompt and it seems stalled it is waiting for password input (hidden by the spinner)

> [!NOTE]
> The pre-reqs for this package are the same as `uds-k3d` and you do not need to have a cluster running prior to deploying it.
74 changes: 74 additions & 0 deletions packages/checkpoint-dev/checkpoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/bin/bash

# Name of the running k3d container
K3S_CONTAINER="k3d-uds-server-0"

if [ -z "$TMPDIR" ]; then
# macOS sets TMPDIR to a user temp directory - this also provides more options to linux
TMPDIR="/tmp"
fi
DATA_DIR="${TMPDIR}/uds-checkpoint-data"

# Step 0: Ensure we can get sudo
echo "This package requires elevated permissions to create - requesting sudo (if paused enter password)"
sudo echo "got sudo! success!"

# Step 1: Get the container ID of the running k3d container
CONTAINER_ID=$(docker ps -qf "name=$K3S_CONTAINER")

if [ -z "$CONTAINER_ID" ]; then
echo "No running container found for $K3S_CONTAINER"
exit 1
fi

# Step 2: Get the mounted volumes of the running container
echo "Inspecting container volumes for $CONTAINER_ID..."
VOLUMES=$(docker inspect -f '{{ json .Mounts }}' "$CONTAINER_ID" | jq)

# Step 3: Prepare directories to save the volume data
sudo rm -rf "$DATA_DIR"
mkdir -p "${DATA_DIR}/kubelet_data" "${DATA_DIR}/k3s_data"

# Step 4: Loop through volumes and copy data to corresponding directories
echo "Copying volumes to local directories..."

for row in $(echo "$VOLUMES" | jq -r '.[] | @base64'); do
_jq() {
echo "${row}" | base64 --decode | jq -r "${1}"
}

SOURCE=$(_jq '.Source')
DESTINATION=$(_jq '.Destination')

case "$DESTINATION" in
"/var/lib/kubelet")
echo "Copying $SOURCE to ${DATA_DIR}/kubelet_data/"
sudo cp -a "$SOURCE"/. "${DATA_DIR}/kubelet_data/"
;;
"/var/lib/rancher/k3s")
echo "Copying $SOURCE to ${DATA_DIR}/k3s_data/"
sudo cp -a "$SOURCE"/. "${DATA_DIR}/k3s_data/"
;;
Comment on lines +44 to +51
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During creation I see these errors (which cause the deploy to fail later):

     cp: /var/lib/docker/volumes/c0d8ea4ead46f3c6649218be409e19d1cd63bfcc68f32d548a116c7924d7a793/_data/.: No such file or directory
     cp: /var/lib/docker/volumes/822e843b8cf644f9c4c9118671f6014d32ad84a062d690e69b07d5c6fdfcfbe2/_data/.: No such file or directory

I think pretty much universally on macOS docker is run inside of a VM, in my case the VM can be accessed with colima ssh but docker desktop, rancher desktop, etc would likely have similar issues and ways to access the VM.

I was able to rewrite a portion of this script to use docker cp instead and got closer (at least didn't get errors with the volumes). I think this is probably a better, more agnostic option here and simplifies a lot of this logic (no looping through volumes, just copy the two paths we need explicitly). I was hoping it might also remove the need for sudo but in my case one of the paths gave some permission errors still until I added sudo. I'm sure there's some efficiency loss here, but since it's create time I think it's worth it to make this work across distros? In my run locally it took less than a minute still to run which still seems decently performant (granted I couldn't get it to run successfully previously so unsure of the real comparison).

Would be curious your thoughts on this - I dropped the script changes into a gist since there were a handful of changes across the entirety of the file: https://gist.github.com/mjnagel/6d681678df83067169c4e652466f704f

I also had to add --no-xattrs to the final tar command, I got warnings/errors without this (suspect that's some macOS <> Linux stuff). This got me much closer but I hit some issues with the token:

time="2024-10-02T15:19:18Z" level=fatal msg="starting kubernetes: preparing server: bootstrap data already found and encrypted with different token"

I tried to tweak the commands around startup (using the k3d --token option rather than k3s arg) and validated the token exists after extraction but couldn't figure this one out. Would be curious if you hit the same issue with my modified script and can figure out what's wrong?

*)
echo "$DESTINATION is not needed. Skipping..."
;;
esac
done

# Step 5: Commit and save the current container as a new image
IMAGE_NAME="ghcr.io/defenseunicorns/uds-core/checkpoint:latest"
echo "Committing container $CONTAINER_ID to image $IMAGE_NAME:latest..."
docker commit -p "$CONTAINER_ID" "$IMAGE_NAME"

echo "Saving image to ${DATA_DIR}/uds-k3d-checkpoint-latest.tar..."
sudo docker save -o "${DATA_DIR}/uds-k3d-checkpoint-latest.tar" "$IMAGE_NAME"

echo "Container image saved to ${DATA_DIR}/uds-k3d-checkpoint-latest.tar"

# Step 6: Create a tarball from the data contents
echo "Creating a final tarball to include in the package"
sudo tar --blocking-factor=64 -cpf uds-checkpoint.tar -C "$DATA_DIR" .

echo "Successfully checkpointed the cluster!"

exit 0
96 changes: 96 additions & 0 deletions packages/checkpoint-dev/zarf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# yaml-language-server: $schema=https://raw.githubusercontent.com/defenseunicorns/zarf/main/zarf.schema.json

kind: ZarfPackageConfig
metadata:
name: k3d-core-slim-dev
description: "Rehydratable UDS K3d + UDS Core Slim (Istio, UDS Operator and Keycloak) Checkpoint"
authors: "Defense Unicorns - Product"
# x-release-please-start-version
version: "0.28.0"
# x-release-please-end

variables:
- name: CLUSTER_NAME
description: "Name of the cluster"
default: "uds"

- name: K3D_EXTRA_ARGS
description: "Optionally pass k3d arguments to the default"
default: ""

- name: NGINX_EXTRA_PORTS
description: "Optionally allow more ports through Nginx (combine with K3D_EXTRA_ARGS '-p <port>:<port>@server:*')"
default: "[]"

components:
- name: destroy-cluster
required: true
description: "Optionally destroy the cluster before creating it"
actions:
onDeploy:
before:
- cmd: |
echo "This package requires elevated permissions to deploy - requesting sudo (if paused enter password)"
sudo echo "got sudo! success!"
- cmd: k3d cluster delete ${ZARF_VAR_CLUSTER_NAME}
description: "Destroy the cluster"
- cmd: |
sudo rm -rf data
Racer159 marked this conversation as resolved.
Show resolved Hide resolved

- name: create-cluster
required: true
description: "Create the K3d cluster w/UDS Core pre-installed"
files:
- source: uds-checkpoint.tar
target: uds-checkpoint.tar
actions:
onCreate:
before:
- cmd: ./checkpoint.sh
onSuccess:
- cmd: |
if [ -z "$TMPDIR" ]; then
# macOS sets TMPDIR to a user temp directory - this also provides more options to linux
TMPDIR="/tmp"
fi
DATA_DIR="${TMPDIR}/uds-checkpoint-data"
sudo rm -rf "$DATA_DIR" uds-checkpoint.tar
onDeploy:
after:
- cmd: |
if [ -z "$TMPDIR" ]; then
# macOS sets TMPDIR to a user temp directory - this also provides more options to linux
TMPDIR="/tmp"
fi
DATA_DIR="${TMPDIR}/uds-checkpoint-data"
mkdir -p "$DATA_DIR"

sudo tar --blocking-factor=64 -xpf uds-checkpoint.tar -C "$DATA_DIR"
K8S_TOKEN="$(sudo cat ${DATA_DIR}/k3s_data/server/token)"
echo $K8S_TOKEN
sudo docker load -i "${DATA_DIR}/uds-k3d-checkpoint-latest.tar"

k3d cluster create \
-p "80:80@server:*" \
-p "443:443@server:*" \
--api-port 6550 \
--k3s-arg "--disable=traefik@server:*" \
--k3s-arg "--disable=metrics-server@server:*" \
--k3s-arg "--disable=servicelb@server:*" \
--k3s-arg "--disable=local-storage@server:*" \
--k3s-arg "--token=${K8S_TOKEN}@server:*" \
-v "${DATA_DIR}/kubelet_data:/var/lib/kubelet@server:*" \
-v "${DATA_DIR}/k3s_data:/var/lib/rancher/k3s@server:*" \
--image ghcr.io/defenseunicorns/uds-core/checkpoint:latest ${ZARF_VAR_K3D_EXTRA_ARGS} \
${ZARF_VAR_CLUSTER_NAME}
description: "Create the cluster"
# This action waits on Keycloak since it is the slowest pod to start after cluster creation. By waiting on it, we guarantee the cluster is healthy and usable after deployment.
- description: Keycloak to be Healthy
wait:
cluster:
kind: Pod
name: app.kubernetes.io/name=keycloak
namespace: keycloak
condition: Ready
Racer159 marked this conversation as resolved.
Show resolved Hide resolved
Racer159 marked this conversation as resolved.
Show resolved Hide resolved
onSuccess:
- cmd: rm -f uds-checkpoint.tar
16 changes: 16 additions & 0 deletions src/istio/tasks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,19 @@ tasks:
kind: Gateway
name: tenant-gateway
namespace: istio-tenant-gateway

- name: validate-slim
actions:
- description: Validate the Istio Admin Gateway
wait:
cluster:
kind: Gateway
name: admin-gateway
namespace: istio-admin-gateway

- description: Validate the Istio Tenant Gateway
wait:
cluster:
kind: Gateway
name: tenant-gateway
namespace: istio-tenant-gateway
6 changes: 6 additions & 0 deletions tasks/create.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,12 @@ tasks:
- description: "Create the slim dev bundle (Base and Identity)"
cmd: "uds create bundles/k3d-slim-dev --confirm --no-progress --architecture=${ZARF_ARCHITECTURE}"

- name: checkpoint-dev-package
description: "Create the K3d + UDS Core Checkpoint Zarf Package"
actions:
- description: "Create the UDS Core Checkpoint Zarf Package"
cmd: "uds zarf package create packages/checkpoint-dev --confirm --no-progress --skip-sbom"

# This task is a wrapper to support --set LAYER=identity-authorization
- name: single-layer-callable
actions:
Expand Down
10 changes: 10 additions & 0 deletions tasks/deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,17 @@ tasks:
- description: "Deploy the latest UDS Core package release"
cmd: uds zarf package deploy oci://${TARGET_REPO}/core:${LATEST_VERSION} --confirm --no-progress --components '*'

- name: latest-slim-bundle-release
actions:
- description: "Deploy the latest UDS Core package release"
cmd: uds deploy oci://ghcr.io/defenseunicorns/packages/uds/bundles/k3d-core-slim-dev:latest --set INSECURE_ADMIN_PASSWORD_GENERATION=true --confirm --no-progress

- name: standard-package
actions:
- description: "Deploy the standard UDS Core zarf package"
cmd: uds zarf package deploy build/zarf-package-core-${UDS_ARCH}-${VERSION}.tar.zst --confirm --no-progress --components '*'

- name: checkpoint-package
actions:
- description: "Deploy the checkpoint K3d + UDS Core Slim zarf package"
cmd: uds zarf package deploy build/zarf-package-k3d-core-slim-dev-${UDS_ARCH}-${VERSION}.tar.zst --confirm --no-progress
7 changes: 7 additions & 0 deletions tasks/publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,13 @@ tasks:
uds zarf tools registry copy ${pkgPath}:${VERSION}-${FLAVOR} ${pkgPath}:latest-${FLAVOR}
fi

- name: checkpoint-package
description: "Publish the UDS checkpoint package"
actions:
- description: "Publish the checkpoint package for the current UDS_ARCH"
cmd: |
uds zarf package publish build/zarf-package-k3d-core-slim-dev-${UDS_ARCH}-${VERSION}.tar.zst oci://ghcr.io/defenseunicorns/dev/uds/checkpoints/k3d-core-slim-dev

- name: bundles
description: "Publish UDS Bundles"
actions:
Expand Down
10 changes: 10 additions & 0 deletions tasks/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ includes:
- setup: ./setup.yaml
- deploy: ./deploy.yaml
- compliance: https://raw.githubusercontent.com/defenseunicorns/uds-common/v1.1.0/tasks/compliance.yaml
- istio: ../src/istio/tasks.yaml
- keycloak: ../src/keycloak/tasks.yaml
- pepr: ../src/pepr/tasks.yaml
- base-layer: ../packages/base/tasks.yaml

tasks:
Expand Down Expand Up @@ -93,3 +96,10 @@ tasks:
with:
assessment_results: ./compliance/oscal-assessment-results.yaml
options: -t il4

- name: slim-dev
description: "Run validate for the components contained in the slim dev bundle"
actions:
- task: istio:validate-slim
- task: keycloak:validate
- task: pepr:validate
Loading