-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add checkpoint uds-core slim package #818
base: main
Are you sure you want to change the base?
Conversation
Checkpoint task passed in this PR (except for the actual publish task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an approver but the code does look good to me. I would like to see more information on how to use this package though so it's more clear on how/why/when someone would want to use it.
"/var/lib/kubelet") | ||
echo "Copying $SOURCE to ${DATA_DIR}/kubelet_data/" | ||
sudo cp -a "$SOURCE"/. "${DATA_DIR}/kubelet_data/" | ||
;; | ||
"/var/lib/rancher/k3s") | ||
echo "Copying $SOURCE to ${DATA_DIR}/k3s_data/" | ||
sudo cp -a "$SOURCE"/. "${DATA_DIR}/k3s_data/" | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During creation I see these errors (which cause the deploy to fail later):
cp: /var/lib/docker/volumes/c0d8ea4ead46f3c6649218be409e19d1cd63bfcc68f32d548a116c7924d7a793/_data/.: No such file or directory
cp: /var/lib/docker/volumes/822e843b8cf644f9c4c9118671f6014d32ad84a062d690e69b07d5c6fdfcfbe2/_data/.: No such file or directory
I think pretty much universally on macOS docker is run inside of a VM, in my case the VM can be accessed with colima ssh
but docker desktop, rancher desktop, etc would likely have similar issues and ways to access the VM.
I was able to rewrite a portion of this script to use docker cp
instead and got closer (at least didn't get errors with the volumes). I think this is probably a better, more agnostic option here and simplifies a lot of this logic (no looping through volumes, just copy the two paths we need explicitly). I was hoping it might also remove the need for sudo
but in my case one of the paths gave some permission errors still until I added sudo. I'm sure there's some efficiency loss here, but since it's create time I think it's worth it to make this work across distros? In my run locally it took less than a minute still to run which still seems decently performant (granted I couldn't get it to run successfully previously so unsure of the real comparison).
Would be curious your thoughts on this - I dropped the script changes into a gist since there were a handful of changes across the entirety of the file: https://gist.github.com/mjnagel/6d681678df83067169c4e652466f704f
I also had to add --no-xattrs
to the final tar command, I got warnings/errors without this (suspect that's some macOS <> Linux stuff). This got me much closer but I hit some issues with the token:
time="2024-10-02T15:19:18Z" level=fatal msg="starting kubernetes: preparing server: bootstrap data already found and encrypted with different token"
I tried to tweak the commands around startup (using the k3d --token
option rather than k3s arg) and validated the token exists after extraction but couldn't figure this one out. Would be curious if you hit the same issue with my modified script and can figure out what's wrong?
Co-authored-by: Micah Nagel <micah.nagel@defenseunicorns.com>
Co-authored-by: Micah Nagel <micah.nagel@defenseunicorns.com>
Probably ignore all of the following, I tried testing CRIU ( Did you try If you use docker rm -f count
sudo rm -rf /tmp/checkpoint
docker run -d --name=count busybox /bin/sh -c 'for i in $(seq 9999999); do echo "$i" && sleep 1; done'
docker checkpoint create --checkpoint-dir=/tmp/checkpoint count checkpoint1
docker rm count
docker create --name count busybox
# Apparently `docker start --checkpoint-dir` is broken, use workaround: https://github.com/moby/moby/issues/37344#issuecomment-450782189
# docker start --checkpoint-dir /tmp/checkpoint --checkpoint checkpoint1 count
sudo mv /tmp/checkpoint/checkpoint1 "/var/lib/docker/containers/$(docker ps -aq --no-trunc --filter name=count)/checkpoints/"
docker start --checkpoint=checkpoint1 count
docker ps
docker logs -f count The biggest downside would be this is near impossible to use with Docker Desktop. A big advantage is the cluster never actually "stops", it's magically paused and resumed elsewhere. Podman seems to support this too, and seems to be a bit more fully supported. k3d (somewhat) supports Podman too. Unlike docker, Podman's CRIU support includes volumes, and capturing multiple containers at once. It can apparently pack the checkpoint into an OCI image too (useful for publishing to GHCR?) Except... this whole idea may be useless because don't think CRIU supports checkpointing nested namespaces (which is how k3d works to embed sub containers inside it's parent docker container for the k8s node) limactl start template://podman-rootful
export DOCKER_HOST=unix://$HOME/.lima/podman-rootful/sock/podman.sock
k3d cluster create
limactl shell podman-rootful sudo podman container checkpoint --export=/tmp/lima/checkpoint.tgz k3d-k3s-default-server-0 k3d-k3s-default-serverlb
# Error:
# Can't dump nested pid namespace for 4663 |
Description
This adds a ~75% faster way to deploy or reset a full uds-core cluster (theoretically would work for other preloaded things like testing GitLab Runner w/GitLab too).
Normal:
Checkpoint:
Tradeoffs:
sudo
- not sure of a great way around this without mangling volume permissions for containerdRelated Issue
Fixes #N/A
Type of change
Checklist before merging