Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport] CVE-2024-21626 to 1.0.0-rc95 #104

Open
wants to merge 14 commits into
base: release-1.0.0-rc95
Choose a base branch
from

Commits on Feb 6, 2024

  1. Fix File to Close

    [@kolyshkin: backport to rc95]
    
    Signed-off-by: hang.jiang <hang.jiang@daocloud.io>
    Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    (cherry picked from commit 7362cd5afe9d40131fb62cb075092025c7c71064)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    hang.jiang authored and kolyshkin committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    9d23fd8 View commit details
    Browse the repository at this point in the history
  2. init: verify after chdir that cwd is inside the container

    If a file descriptor of a directory in the host's mount namespace is
    leaked to runc init, a malicious config.json could use /proc/self/fd/...
    as a working directory to allow for host filesystem access after the
    container runs. This can also be exploited by a container process if it
    knows that an administrator will use "runc exec --cwd" and the target
    --cwd (the attacker can change that cwd to be a symlink pointing to
    /proc/self/fd/... and wait for the process to exec and then snoop on
    /proc/$pid/cwd to get access to the host). The former issue can lead to
    a critical vulnerability in Docker and Kubernetes, while the latter is a
    container breakout.
    
    We can (ab)use the fact that getcwd(2) on Linux detects this exact case,
    and getcwd(3) and Go's Getwd() return an error as a result. Thus, if we
    just do os.Getwd() after chdir we can easily detect this case and error
    out.
    
    In runc 1.1, a /sys/fs/cgroup handle happens to be leaked to "runc
    init", making this exploitable. On runc main it just so happens that the
    leaked /sys/fs/cgroup gets clobbered and thus this is only consistently
    exploitable for runc 1.1.
    
    Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
    Co-developed-by: lifubang <lifubang@acmcoder.com>
    Signed-off-by: lifubang <lifubang@acmcoder.com>
    [refactored the implementation and added more comments]
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    (cherry picked from commit 6b6b6e9a656b407de5d372d57ee3f1709b18a2af)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    cyphar authored and kolyshkin committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    bd0afb0 View commit details
    Browse the repository at this point in the history
  3. [rc95] setns init: do explicit lookup of execve argument early

    (This is a partial backport of a minor change included in commit
    dac4171.)
    
    This mirrors the logic in standard_init_linux.go, and also ensures that
    we do not call exec.LookPath in the final execve step.
    
    While this is okay for regular binaries, it seems exec.LookPath calls
    os.Getenv which tries to emit a log entry to the test harness when
    running in "go test" mode. In a future patch (in order to fix
    CVE-2024-21626), we will close all of the file descriptors immediately
    before execve, which would mean the file descriptor for test harness
    logging would be closed at execve time. So, moving exec.LookPath earlier
    is necessary.
    
    [kolyshkin: removed eaccess part, fix imports conflict]
    
    Ref: dac4171 ("runc-dmz: reduce memfd binary cloning cost with small C binary")
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    (cherry picked from commit 9bbeb73b7371262792a811f5a8e59d393fb4d1dd)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    cyphar authored and kolyshkin committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    a5053a8 View commit details
    Browse the repository at this point in the history
  4. [rc95] retry unix.EINTR for container init process

    When running a script from an azure file share interrupted syscall
    occurs quite frequently, to remedy this add retries around execve
    syscall, when EINTR is returned.
    
    Signed-off-by: Maksim An <maksiman@microsoft.com>
    (cherry picked from commit e39ad65)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    anmaxvl authored and kolyshkin committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    056568b View commit details
    Browse the repository at this point in the history
  5. [rc95] init: close internal fds before execve

    If we leak a file descriptor referencing the host filesystem, an
    attacker could use a /proc/self/fd magic-link as the source for execve
    to execute a host binary in the container. This would allow the binary
    itself (or a process inside the container in the 'runc exec' case) to
    write to a host binary, leading to a container escape.
    
    The simple solution is to make sure we close all file descriptors
    immediately before the execve(2) step. Doing this earlier can lead to very
    serious issues in Go (as file descriptors can be reused, any (*os.File)
    reference could start silently operating on a different file) so we have
    to do it as late as possible.
    
    Unfortunately, there are some Go runtime file descriptors that we must
    not close (otherwise the Go scheduler panics randomly). The only way of
    being sure which file descriptors cannot be closed is to sneakily
    go:linkname the runtime internal "internal/poll.IsPollDescriptor"
    function. This is almost certainly not recommended but there isn't any
    other way to be absolutely sure, while also closing any other possible
    files.
    
    In addition, we can keep the logrus forwarding logfd open because you
    cannot execve a pipe and the contents of the pipe are so restricted
    (JSON-encoded in a format we pick) that it seems unlikely you could even
    construct shellcode. Closing the logfd causes issues if there is an
    error returned from execve.
    
    In mainline runc, runc-dmz protects us against this attack because the
    intermediate execve(2) closes all of the O_CLOEXEC internal runc file
    descriptors and thus runc-dmz cannot access them to attack the host.
    
    [kolyshkin: backport, merge in commit 25862d58dabbad8f928d5137]
    
    Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    (cherry picked from commit 3b3e65306167133c6d97b9801d05b8fea50f30f7)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    cyphar authored and kolyshkin committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    5886a8f View commit details
    Browse the repository at this point in the history
  6. [rc95] cgroup: plug leaks of /sys/fs/cgroup handle

    We auto-close this file descriptor in the final exec step, but it's
    probably a good idea to not possibly leak the file descriptor to "runc
    init" (we've had issues like this in the past) especially since it is a
    directory handle from the host mount namespace.
    
    In practice, on runc 1.1 this does leak to "runc init" but on main the
    handle has a low enough file descriptor that it gets clobbered by the
    ForkExec of "runc init".
    
    OPEN_TREE_CLONE would let us protect this handle even further, but the
    performance impact of creating an anonymous mount namespace is probably
    not worth it.
    
    Also, switch to using an *os.File for the handle so if it goes out of
    scope during setup (i.e. an error occurs during setup) it will get
    cleaned up by the GC.
    
    Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    (cherry picked from commit 362aebef2bbd48099958c29337900b88db151abf)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    cyphar authored and kolyshkin committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    70c47ad View commit details
    Browse the repository at this point in the history
  7. libcontainer: mark all non-stdio fds O_CLOEXEC before spawning init

    Given the core issue in GHSA-xr7r-f8xq-vfvv was that we were unknowingly
    leaking file descriptors to "runc init", it seems prudent to make sure
    we proactively prevent this in the future. The solution is to simply
    mark all non-stdio file descriptors as O_CLOEXEC before we spawn "runc
    init".
    
    For libcontainer library users, this could result in unrelated files
    being marked as O_CLOEXEC -- however (for the same reason we are doing
    this for runc), for security reasons those files should've been marked
    as O_CLOEXEC anyway.
    
    Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    (cherry picked from commit 9a72fd4be1fdc22148642ee8cd2adfee0bf575e5)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    cyphar authored and kolyshkin committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    b655b62 View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2024

  1. [CI ONLY][rc95] use Cirrus CI for CentOS 7 tests

    1. Add .cirrus.yml to run tests on CentOS 7.
    
    2. Drop fedora/centos7 from .github config (not interested in Fedora,
       and we already test CentOS 7 on Cirrus CI).
    
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    kolyshkin committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    f400684 View commit details
    Browse the repository at this point in the history
  2. [CI ONLY][rc95] tests: replace local hello world bundle with busybox …

    …bundle
    
    Currently only amd64 and arm64v8 tarball have been checked in testdata,
    while busybox bundle is downloaded on fly, and supports multiple architectures.
    
    To enable integration tests for more architectures, the hello world
    bundle is replaced by busybox one.
    
    Signed-off-by: Shengjing Zhu <zhsj@debian.org>
    (cherry picked from commit 66bf371)
    (cherry picked from commit b8ebeec)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    (cherry picked from commit 790886cd4b1b64a21404731111d3999f3fd2390b)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    zhsj authored and kolyshkin committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    351985b View commit details
    Browse the repository at this point in the history
  3. [CI ONLY][rc95] tests/integration/get-images.sh: fix busybox.tar.xz URL

    Fix issue 3699
    
    Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
    (cherry picked from commit 3b6625c)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    (cherry picked from commit 60440c3e7beee6dbd3252ac8aa8458124270bd04)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    AkihiroSuda authored and kolyshkin committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    37212d8 View commit details
    Browse the repository at this point in the history
  4. [CI ONLY][rc95] Explicitly pin busybox and debian downloads

    Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
    (cherry picked from commit 6d28928)
    (cherry picked from commit 53ceeea)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    (cherry picked from commit f1e461e03cd65659395e1ba4a34e53985bd25a9e)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    tianon authored and kolyshkin committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    dab2672 View commit details
    Browse the repository at this point in the history
  5. [CI ONLY][rc95] disable criu in tests

    Most probably this old version of runc no longer works correctly with
    the latest criu. Not installing criu means criu tests will be skipped.
    
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    kolyshkin committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    7b94a93 View commit details
    Browse the repository at this point in the history
  6. [CI ONLY] tests/int/dev: add CAP_SYSLOG to /dev/kmsg tests

    Add CAP_SYSLOG to ensure that /dev/kmsg can be accesses on systems where
    the sysctl kernel.dmesg_restrict = 1.
    
    Signed-off-by: Odin Ugedal <odin@uged.al>
    (cherry picked from commit 6be088d)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    odinuge authored and kolyshkin committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    44d1070 View commit details
    Browse the repository at this point in the history
  7. [CI ONLY] tests/int/no_pivot: fix for new kernels

    The test is failing like this:
    
    	not ok 70 runc run --no-pivot must not expose bare /proc
    	# (in test file tests/integration/no_pivot.bats, line 20)
    	#   `[[ "$output" == *"mount: permission denied"* ]]' failed
    	# runc spec (status=0):
    	#
    	# runc run --no-pivot test_no_pivot (status=1):
    	# unshare: write error: Operation not permitted
    
    Apparently, a recent kernel commit db2e718a47984b9d prevents
    root from doing unshare -r unless it has CAP_SETFPCAP.
    
    Add the capability for this specific test.
    
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    (cherry picked from commit 1bbeada)
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    kolyshkin committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    29a6896 View commit details
    Browse the repository at this point in the history