Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve rthooks docs #2932

Merged
merged 2 commits into from
Sep 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
195 changes: 109 additions & 86 deletions docs/content/en/docs/installation/runtime-hooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,99 +8,57 @@ description: "Configure Runtime Hooks"
See [Tetragon Runtime Hooks]({{< ref "/docs/concepts/runtime-hooks" >}}), for an introduction to
the topic.

We use `minikube` as the example platform because it supports both `cri-o` and `containerd`. Also,
at the time of this writing, no images that support this have been released, so we build images
locally within a checked out repo.

```shell
make image image-operator image-rthooks
```

## CRI-O
## Install Tetragon with Runtime Hooks

### Setup Minikube
We use `minikube` as the example platform because it supports both `cri-o` and `containerd`, but the
same steps can be applied to other platforms.

```shell
minikube start --driver=kvm2 --container-runtime=cri-o
minikube image load cilium/tetragon:latest
minikube image load cilium/tetragon-operator:latest
minikube image load cilium/tetragon-rthooks:latest
minikube image ls | grep tetragon
```

The output should be similar to:

```
localhost/cilium/tetragon:latest
localhost/cilium/tetragon-rthooks:latest
localhost/cilium/tetragon-operator:latest
```
### Install Tetragon
### Setup Helm

```shell
helm install \
--namespace kube-system \
--set tetragon.image.override=localhost/cilium/tetragon:latest \
--set tetragonOperator.image.override=localhost/cilium/tetragon-operator:latest \
--set rthooks.enabled=true \
--set rthooks.interface=oci-hooks \
--set rthooks.image.override=localhost/cilium/tetragon-rthooks:latest \
tetragon ./install/kubernetes/tetragon
```


```shel
kubecl -n kube-system get pods | grep tetragon
helm repo add cilium https://helm.cilium.io
helm repo update
```

With output similar to:
```
tetragon-hpjwq 2/2 Running 0 2m42s
tetragon-operator-664ddc8957-9lmd2 1/1 Running 0 2m42s
tetragon-rthooks-m24xr 1/1 Running 0 2m42s
```
### Setup cluster

### Test
{{< tabpane text=true >}}

Start a pod:
```shell
kubectl run test --image=debian --rm -it -- /bin/bash
```
{{% tab "minikube with CRI-O" %}}

Check logs:
```shell
minikube ssh 'tail -1 /opt/tetragon/tetragon-oci-hook.log'
```

Output:
```json
{"time":"2024-07-01T10:57:21.435689144Z","level":"INFO","msg":"hook request to agent succeeded","hook":"create-container","start-time":"2024-07-01T10:57:21.433755984Z","req-cgroups":"/kubepods/besteffort/podd4e74de2-0db8-4143-ae55-695b2489c727/crio-828977b42e3149b502b31708778d0c057efbce038af80d0882ed3e0cb0ff8796","req-rootdir":"/run/containers/storage/overlay-containers/828977b42e3149b502b31708778d0c057efbce038af80d0882ed3e0cb0ff8796/userdata","req-containerName":"test"}
minikube start --driver=kvm2 --container-runtime=cri-o
```
{{% /tab %}}

## Containerd

### Setup Minikube
{{% tab "minikube with Containerd" %}}

```shell
minikube start --driver=kvm2 --container-runtime=containerd
minikube image load cilium/tetragon:latest
minikube image load cilium/tetragon-operator:latest
minikube image load cilium/tetragon-rthooks:latest
minikube image ls | grep tetragon
```

Output should be similar to:
```
docker.io/cilium/tetragon:latest
docker.io/cilium/tetragon-rthooks:latest
docker.io/cilium/tetragon-operator:latest
minikube start --driver=kvm2 --container-runtime=cri-o
```

Tetragon Runtime Hooks use [NRI](https://github.com/containerd/nri). NRI is [enabled by
default](https://github.com/containerd/containerd/blob/main/docs/NRI.md#disabling-nri-support-in-containerd)
starting from containerd version 2.0. For version 1.7, however, it needs to be enabled in the
configuration.

This requires a section such as:
```toml
[plugins."io.containerd.nri.v1.nri"]
disable = false
disable_connections = false
plugin_config_path = "/etc/nri/conf.d"
plugin_path = "/opt/nri/plugins"
plugin_registration_timeout = "5s"
plugin_request_timeout = "2s"
socket_path = "/var/run/nri/nri.sock"
```

To be present in containerd's configuration (e.g., `/etc/containerd/config.toml`).


You can use the `tetragon-oci-hook-setup` to patch the configuration file:
```shell
minikube ssh cat /etc/containerd/config.toml > /tmp/old-config.toml
./contrib/tetragon-rthooks/tetragon-oci-hook-setup patch-containerd-conf enable-nri --config-file=/tmp/old-config.toml --output=/tmp/new-config.toml
Expand Down Expand Up @@ -132,44 +90,109 @@ minikube cp /tmp/new-config.toml /etc/containerd/config.toml
minikube ssh sudo systemctl restart containerd
```

{{% /tab %}}

{{< /tabpane >}}

### Install Tetragon

```shell
{{< tabpane lang=shell >}}
{{< tab "CRI-O (oci-hooks)" >}}
helm install \
--namespace kube-system \
--set rthooks.enabled=true \
--set rthooks.interface=oci-hooks \
tetragon ./install/kubernetes/tetragon
{{< /tab >}}
{{< tab "Containerd (nri-hook)" >}}
helm install \
--namespace kube-system \
--set tetragon.image.override=docker.io/cilium/tetragon:latest \
--set tetragonOperator.image.override=docker.io/cilium/tetragon-operator:latest \
--set rthooks.enabled=true \
--set rthooks.interface=nri-hook \
--set rthooks.image.override=docker.io/cilium/tetragon-rthooks:latest \
tetragon ./install/kubernetes/tetragon
```
{{< /tab >}}
{{< /tabpane >}}

```shell
kubectl -n kube-system get pods | grep tetragon
```shel
kubecl -n kube-system get pods | grep tetragon
```

Output should be similar to:
With output similar to:
```
tetragon-operator-754b85cfd4-2mdd7 1/1 Running 0 24m
tetragon-pjrsf 2/2 Running 0 24m
tetragon-rthooks-6g8cq 1/1 Running 0 24m
tetragon-hpjwq 2/2 Running 0 2m42s
tetragon-operator-664ddc8957-9lmd2 1/1 Running 0 2m42s
tetragon-rthooks-m24xr 1/1 Running 0 2m42s
```

### Test
### Test Runtime Hooks

Start a pod:

```shell
kubectl run test --image=debian --rm -it -- /bin/bash
```

Examine the log file:
Check logs:
```shell
minikube ssh 'tail -1 /opt/tetragon/tetragon-oci-hook.log'
```

Output:
```json
{"time":"2024-07-02T12:02:02.823291054Z","level":"INFO","msg":"hook request to agent succeeded","hook":"createRuntime","start-time":"2024-07-02T12:02:02.816185835Z","req-cgroups":"/kubepods/besteffort/pod9305570c-ac68-4f95-96d8-afbb138bd0b0/42469ae2c52d0ee340b550b8a07a142c9b8cc709aa8ca75b777bb00812149621","req-rootdir":"/run/containerd/io.containerd.runtime.v2.task/k8s.io/42469ae2c52d0ee340b550b8a07a142c9b8cc709aa8ca75b777bb00812149621","req-containerName":"test"}
{"time":"2024-07-01T10:57:21.435689144Z","level":"INFO","msg":"hook request to agent succeeded","hook":"create-container","start-time":"2024-07-01T10:57:21.433755984Z","req-cgroups":"/kubepods/besteffort/podd4e74de2-0db8-4143-ae55-695b2489c727/crio-828977b42e3149b502b31708778d0c057efbce038af80d0882ed3e0cb0ff8796","req-rootdir":"/run/containers/storage/overlay-containers/828977b42e3149b502b31708778d0c057efbce038af80d0882ed3e0cb0ff8796/userdata","req-containerName":"test"}
```

## Configuring Runtime Hooks installation

### Installation directory (`installDir`)

For tetragon runtime hooks to work, a binary (`tetragon-oci-hook`) needs to be installed on the
host. Installation happens by the `tetragon-rthooks` daemonset and the binary is installed in
`/opt/tetragon` by default.

In some systems, however, the `/opt` directory is mounted read-only. This will result in
errors such as:

```
Warning FailedMount 8s (x5 over 15s) kubelet MountVolume.SetUp failed for volume "oci-hook-install-path" : mkdir /opt/tetragon: read-only file system (6 results) [48/6775]
```

You can use the `rthooks.installDir` helm variable to select a different location. For example:

```
--set rthooks.installDir=/run/tetragon
```


### Failure check (`failAllowNamespaces`)

By default, `tetragon-oci-hook` logs information to `/opt/tetragon/tetragon-oci-hook.log`.
Inspecting this file we get the following messages.

```json
{"time":"2024-03-05T15:18:52.669044463Z","level":"WARN","msg":"hook request to the agent failed","hook":"create-container","start-time":"2024-03-05T15:18:42.667916779Z","req-cgroups":"/kubepods/besteffort/pod43ec7f32-3c9f-429f-a01c-fbaafff9f8e1/crio-1d18fd58f0879f6152a1c421f8f1e0987845394ee17001a16bee2df441c112f3","req-rootdir":"/run/containers/storage/overlay-containers/1d18fd58f0879f6152a1c421f8f1e0987845394ee17001a16bee2df441c112f3/userdata","err":"connecting to agent (context deadline exceeded) failed: unix:///var/run/cilium/tetragon/tetragon.sock"}
{"time":"2024-03-05T15:18:52.66912411Z","level":"INFO","msg":"failCheck determined that we should not fail this container, even if there was an error","hook":"create-container","start-time":"2024-03-05T15:18:42.667916779Z"}
{"time":"2024-03-05T15:18:53.01093915Z","level":"WARN","msg":"hook request to the agent failed","hook":"create-container","start-time":"2024-03-05T15:18:43.01005032Z","req-cgroups":"/kubepods/burstable/pod60f971e6-ac38-4aa0-b2d3-549333b2c803/crio-c0bf4e38bfa4ed5c58dd314d505f8b6a0f513d2f2de4dc4aa86a55c7c3e963ab","req-rootdir":"/run/containers/storage/overlay-containers/c0bf4e38bfa4ed5c58dd314d505f8b6a0f513d2f2de4dc4aa86a55c7c3e963ab/userdata","err":"connecting to agent (context deadline exceeded) failed: unix:///var/run/cilium/tetragon/tetragon.sock"}
{"time":"2024-03-05T15:18:53.010999098Z","level":"INFO","msg":"failCheck determined that we should not fail this container, even if there was an error","hook":"create-container","start-time":"2024-03-05T15:18:43.01005032Z"}
{"time":"2024-03-05T15:19:04.034580703Z","level":"WARN","msg":"hook request to the agent failed","hook":"create-container","start-time":"2024-03-05T15:18:54.033449685Z","req-cgroups":"/kubepods/besteffort/pod43ec7f32-3c9f-429f-a01c-fbaafff9f8e1/crio-d95e61f118557afdf3713362b9034231fee9bd7033fc8e7cc17d1efccac6f54f","req-rootdir":"/run/containers/storage/overlay-containers/d95e61f118557afdf3713362b9034231fee9bd7033fc8e7cc17d1efccac6f54f/userdata","err":"connecting to agent (context deadline exceeded) failed: unix:///var/run/cilium/tetragon/tetragon.sock"}
{"time":"2024-03-05T15:19:04.03463995Z","level":"INFO","msg":"failCheck determined that we should not fail this container, even if there was an error","hook":"create-container","start-time":"2024-03-05T15:18:54.033449685Z"}
```

To understand these messages, consider what `tetragon-oci-hook` should do if it
cannot contact the Tetragon agent.

You may want to stop certain workloads from running. For other workloads (for example, the
tetragon pod itself) you probably want to do the opposite and let the them start. To this end,
`tetragon-oci-hook` checks the container annotations, and by default does not fail a container if it
belongs in the same namespace as Tetragon. The previous messages concern the tetragon containers
(`tetragon-operator` and `tetragon`) and they indicate that the choice was made not to fail this
container from starting.

Furthermore, users may specify additional namespaces where the container will not fail if the
tetragon agent cannot be contacted via the `rthooks.failAllowNamespaces` option.

For example:
```yaml
rthooks:
enabled: true
failAllowNamespaces: namespace1,namespace2
```