Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SELinux] RKE2 provisioning #1362

Open
Tracked by #251
fgiudici opened this issue Apr 16, 2024 · 6 comments
Open
Tracked by #251

[SELinux] RKE2 provisioning #1362

fgiudici opened this issue Apr 16, 2024 · 6 comments
Assignees
Labels
Milestone

Comments

@fgiudici
Copy link
Member

fgiudici commented Apr 16, 2024

Check requirements for successful RKE2 provisioning and operations. Check rke2-selinux module for targeted policy is enough.

@fgiudici fgiudici mentioned this issue Apr 16, 2024
8 tasks
@kkaempf kkaempf added this to the Micro6.1 milestone Apr 23, 2024
@kkaempf kkaempf added the kind/enhancement New feature or request label Apr 23, 2024
@anmazzotti
Copy link
Contributor

anmazzotti commented May 1, 2024

I gave this a try building a dev image (TW based) with all latest goodies.

I additionally added the selinux flag to the Cluster config.
This should tell containerd to run in SELinux mode:

kind: Cluster
apiVersion: provisioning.cattle.io/v1
metadata:
  name: volcano
  namespace: fleet-default
spec:
  rkeConfig:
    machineGlobalConfig:
      selinux: true
      debug: true
  kubernetesVersion: v1.27.12+rke2r1

I installed the rke2-selinux package as well.

Also note that rke2 ships kubectl, ctr, and crictl in /var/lib/rancher/rke2/bin. See the docs for usage.

Elemental installation and provisioning of the rke2 node works fine, however rke2 fails to start as the etcd container won't start.
When inspecting it, I found the following:

test-loopdev-c00616c7-e1e9-422c-a786-9bcd826cd8fd:~ # /var/lib/rancher/rke2/bin/crictl logs ecaad625d4c86
{"level":"info","ts":"2024-05-01T09:48:40.917504Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--config-file=/var/lib/rancher/rke2/server/db/etcd/config"]}
{"level":"warn","ts":"2024-05-01T09:48:40.917934Z","caller":"etcdmain/etcd.go:75","msg":"failed to verify flags","error":"open /var/lib/rancher/rke2/server/db/etcd/config: permission denied"}
test-loopdev-c00616c7-e1e9-422c-a786-9bcd826cd8fd:~ # ls -alZ /var/lib/rancher/rke2/server/db/etcd/config
-rw-------. 1 root root system_u:object_r:var_lib_t:s0 1313 May  1 09:48 /var/lib/rancher/rke2/server/db/etcd/config

This seems to be the same issue reported here: rancher/rke2#1494
Which was also reopened as rancher/rke2-selinux#20

@anmazzotti
Copy link
Contributor

So maybe I did hit the same error but due to another reason.
What I noticed is that just after provisioning the labels do not seem to be applied correctly:

elemental:~ # ls -alZ /var/lib/rancher/rke2/
total 24
drwxr-xr-x. 6 root root system_u:object_r:var_lib_t:s0 4096 May  2 14:01 .
drwxr-xr-x. 5 root root system_u:object_r:var_lib_t:s0 4096 May  2 14:01 ..
drwxr-xr-x. 7 root root system_u:object_r:var_lib_t:s0 4096 May  2 14:01 agent
lrwxrwxrwx. 1 root root system_u:object_r:var_lib_t:s0   59 May  2 14:01 bin -> /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin
drwxr-xr-x. 3 root root system_u:object_r:var_lib_t:s0 4096 May  2 14:01 data
drwxr-xr-x. 7 root root system_u:object_r:var_lib_t:s0 4096 May  2 14:01 server
drwxr-xr-x. 2 root root system_u:object_r:var_lib_t:s0 4096 May  2 14:01 system-agent-installer

But after a reboot (due to our little selinux support config) or after a manual restorecon -Frv invocation, they look fine:

elemental:~ # ls -alZ /var/lib/rancher/rke2
total 24
drwxr-xr-x. 6 root root system_u:object_r:container_var_lib_t:s0      4096 May  2 14:04 .
drwxr-xr-x. 5 root root system_u:object_r:var_lib_t:s0                4096 May  2 14:01 ..
drwxr-xr-x. 7 root root system_u:object_r:container_var_lib_t:s0      4096 May  2 14:05 agent
lrwxrwxrwx. 1 root root system_u:object_r:container_var_lib_t:s0        59 May  2 14:04 bin -> /var/lib/rancher/rke2/data/v1.27.12-rke2r1-3e47c5e13be8/bin
drwxr-xr-x. 3 root root system_u:object_r:container_runtime_exec_t:s0 4096 May  2 14:01 data
drwxr-xr-x. 7 root root system_u:object_r:container_var_lib_t:s0      4096 May  2 14:01 server
drwxr-xr-x. 2 root root system_u:object_r:container_var_lib_t:s0      4096 May  2 14:04 system-agent-installer

Node is not ready yet, so there are other issues most likely (with CNI installation?), but etcd is running at least:

elemental:~ # /var/lib/rancher/rke2/bin/crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                       ATTEMPT             POD ID              POD
fb24ed16d5b09       ef5391197931d       2 minutes ago       Running             kube-proxy                 0                   d11f870b9e345       kube-proxy-test-loopdev-c00616c7-e1e9-422c-a786-9bcd826cd8fd
4476863c0bca3       00bad7e3c2263       3 minutes ago       Running             cloud-controller-manager   1                   c7ab87198ecec       cloud-controller-manager-test-loopdev-c00616c7-e1e9-422c-a786-9bcd826cd8fd
42fc5ef4c52b3       c6b7a4f2f79b2       3 minutes ago       Running             etcd                       4                   1c17ec258c2d5       etcd-test-loopdev-c00616c7-e1e9-422c-a786-9bcd826cd8fd
e5c3342c35b15       ef5391197931d       3 minutes ago       Running             kube-apiserver             0                   963dcad7207f7       kube-apiserver-test-loopdev-c00616c7-e1e9-422c-a786-9bcd826cd8fd

The rke2 provisioning plan uses system-agent-installer-rke2, which unpacks a tar. I guess this is where it fails.

@davidcassany
Copy link
Contributor

I believe all these are known issues related to rancher/rke2#3888 (comment) , in fact we are installing RKE2 from a tarball. We do not support RPM installation at runtime, as this would imply having a mutable /usr. In addition the tarball is not installed in /usr/local because this is a mountpoint (I can't understand this check 😕 and setup of the install script) and then it falls back to /opt, I am not sure if installing /opt has any impact to SELinux. I guess we could check if the policy covers the paths under /opt and/or /usr/local.

While provisioning we get the error message:

SELinux is enabled for rke2 but process is not running in 'context_runtime_t', rke2-selinux policy may need to be applied

However the policy is already installed and active and rebooting does actually fix the problem. At boot we relabel every RW path in the system and this includes the installation in /opt. So it looks rke2 is started before it gets the appropriate labels defined in the policy.

@davidcassany davidcassany added the status/blocked Issue depend on another one label May 14, 2024
@davidcassany
Copy link
Contributor

davidcassany commented May 14, 2024

Marking as blocked as currently there is no support for RKE2 when installed directly from the tarball, which is the procedure used within Elemental. In order to support RKE2 we could either build RKE2 images (with preinstalled RKE2) or either add SELinux support for RKE2 installation form a tarball in Rancher provisioning.

From the tests I did what's missing are the correct labels under /var/lib/rancher, more specific I saw that, at least, the executables in /var/lib/rancher/rke2/bin are not properly labeled. Couldn't it be a bug in the rke2-selinux policy? Feels like the system should be capable to simply untar and run with the appropriate labels without the need of special hacks. On my tests simply running restorecon -F -R /var/lib/rancher and systemctl restart rke2-server fixes the issue. Looks like whatever populates /var/lib/rancher does it in a way SELinux labels are not set appropriately.

@mbologna
Copy link
Member

=> Create issues upstream for fixing the tarball

@anmazzotti
Copy link
Contributor

Issue already exists: rancher/rke2-selinux#64
This should fix the system-agent-installer-rke2 tarball driven install.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Blocked
Development

No branches or pull requests

5 participants