Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not possible to run enroot start when operating system is running on rootfs (stateless server boot) #200

Open
rafalop opened this issue Jul 18, 2024 · 2 comments

Comments

@rafalop
Copy link

rafalop commented Jul 18, 2024

If you run a stateless cluster (such as one deployed by warewulf) with root filesystem in RAM, for example:

root@node1:/tmp# df -h /
Filesystem      Size  Used Avail Use% Mounted on
rootfs         1001G   16G  985G   2% /
root@node1:/tmp# mount | grep rootfs
rootfs on / type rootfs (rw,size=1048948424k,nr_inodes=262237106,inode64)
root@node1:/tmp# 

You cannot start enroot containers. This happens:

root@node1:/tmp# enroot start raf-ssd
enroot-switchroot: failed to switch root: /raid/enroot/raf-ssd: Invalid argument
root@node1:/tmp# 

strace snippet:

pivot_root(".", ".")                    = -1 EINVAL (Invalid argument)

There seems to be a hard requirement for enroot to do a pivot_root syscall:

if ((int)syscall(SYS_pivot_root, ".", ".") < 0)

Unfortunately pivot_root is not supported by stateless/memory based root disk.

The nvidia-container-cli binary provides a flag to --no-pivot, presumably this works for docker... but there is no equivalent for enroot.

root@node1:/tmp# nvidia-container-cli --help | grep pivot
  -n, --no-pivot             Do not use pivot_root
root@node1:/tmp#
@3XX0
Copy link
Member

3XX0 commented Jul 19, 2024

Yeah we don't support doing this for now. It should be fairly straightforward to change though.

@agimenog
Copy link

agimenog commented Jul 29, 2024

Hi, same error here. In my case I've modified the file enroot-switchroot.c changing the pivot_root value for a chroot, that makes enroot works good, but it also does pyxis fail with the following:

$ srun --container-image=ubuntu grep PRETTY /etc/os-release
srun: job 14791 queued and waiting for resources
srun: job 14791 has been allocated resources
pyxis: importing docker image: ubuntu
pyxis: imported docker image: ubuntu
PRETTY_NAME="Rocky Linux 8.8 (Green Obsidian)"

It imports the container image but does not chroot inside it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants