Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMA hashes in LSM events #2818

Merged
merged 7 commits into from
Oct 4, 2024
Merged

IMA hashes in LSM events #2818

merged 7 commits into from
Oct 4, 2024

Conversation

anfedotoff
Copy link
Contributor

@anfedotoff anfedotoff commented Aug 16, 2024

Adding support for collecting file hashes calculated by Linux integrity subsystem (IMA). IMA hashes will be contained in LSM events. To make it work you need to:

Limitations of bpf_ima helpers (from #2409):

  1. bpf_ima_inode_hash/bpf_ima_file_hash can be called only from BPF_F_SLEEPABLE lsm programs: lsm.s.
  2. lsm.s programs are strictly limited for maps usage. Only arrays, hashes and ringbuffer are allowed. We can't use BPF_MAP_TYPE_PROG_ARRAY and BPF_MAP_TYPE_PERF_EVENT_ARRAY. It means for us that we can't use bpf-to-bpf calls from lsm.s programs and send events from lsm.s programs with perfbuffer.
  3. only trusted pointers are allowed to call bpf_ima helpers. Some verifier error example:
    38: (85) call bpf_ima_file_hash#193 R1 type=scalar expected=ptr_, trusted_ptr.

Approach from #2409 is meant to overcome limitations above has a race condition. User mode handler can receive LSM event before IMA hash is stored to the separate map.

This approach was redesigned to get rid of race condition. The idea is follows. We split LSM generic sensor into three parts, which are loaded using bpf trampolines feature.

  1. bpf_generic_lsm_core does everything but sending LSM event. It also initializes an empty ima_hash, which is used in later trampoline calls.
  2. bpf_generic_lsm_ima_* (optional) is triggered after bpf_generic_lsm_core, calculates IMA hash and stores it in separate hash map.
  3. bpf_generic_lsm_output just sends event using perf buffer. Also at this stage IMA hash is collected from the map (stored in p. 2).

Using this scheme we can avoid race condition.
Note: if we don't need to collect IMA hash, step 2 is skipped.

@anfedotoff anfedotoff requested a review from a team as a code owner August 16, 2024 13:52
@anfedotoff anfedotoff requested a review from tixxdz August 16, 2024 13:52
@anfedotoff anfedotoff marked this pull request as draft August 16, 2024 13:52
@anfedotoff anfedotoff force-pushed the ima-hash-action branch 2 times, most recently from 15d918b to 758ec7e Compare August 16, 2024 15:08
Copy link

netlify bot commented Aug 16, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit d5c5ec2
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/66feb22fa7347900086e433f
😎 Deploy Preview https://deploy-preview-2818--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@anfedotoff anfedotoff force-pushed the ima-hash-action branch 2 times, most recently from 3ec1c3d to 7525121 Compare August 20, 2024 11:17
@anfedotoff anfedotoff force-pushed the ima-hash-action branch 6 times, most recently from f9e5ac0 to 20a3c09 Compare August 29, 2024 10:23
@anfedotoff anfedotoff force-pushed the ima-hash-action branch 2 times, most recently from b84e2bd to ef33031 Compare September 3, 2024 12:29
@anfedotoff anfedotoff changed the title WIP: IMA hashes in LSM events IMA hashes in LSM events Sep 3, 2024
@anfedotoff anfedotoff marked this pull request as ready for review September 3, 2024 12:30
@anfedotoff
Copy link
Contributor Author

anfedotoff commented Sep 4, 2024

@kkourt , hi! May I ask you for a code review? Next Monday, on Tetragon community meeting I'll show a demo. I want to talk about some design decisions I made. Also I'm going to speak about problems that I overcame and those are still exists.

@kkourt kkourt self-requested a review September 4, 2024 12:51
@kkourt
Copy link
Contributor

kkourt commented Sep 4, 2024

@kkourt , hi! May I ask you for a code review? Next Monday, on Tetragon community meeting I'll show a demo. I want to talk about some design decisions I made. Also I'm going to speak about problems that I overcame and those are still exists.

Hello :)

Thanks for the heads up! Will try and have a look before the community meeting. Looking forward to the demo!

@anfedotoff anfedotoff force-pushed the ima-hash-action branch 2 times, most recently from 2ef75a3 to 58c2771 Compare September 6, 2024 13:56
@kkourt kkourt added the release-note/major This PR introduces major new functionality label Sep 9, 2024
Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anfedotoff! I had a first look. Please find some comments below.

bpf/process/bpf_generic_lsm.c Outdated Show resolved Hide resolved
pkg/sensors/program/loader.go Outdated Show resolved Hide resolved
@olsajiri
Copy link
Contributor

I refreshed on LSM programs and I think we can do this differently

so LSM programs are attached through trampolines on specific bpf_lsm_XXX
function which is invoked as part of the lsm modules call chain on particular
lsm call back

and trampoline allows to attach and invoke multiple bpf programs to
specific function (or lsm hook)

so I think we could have the sleepable lsm hooks invoked first to get the
hash and update the map and then we will have generic lsm invoked that
would run the hash action, which would get the hash from the map and
send the event

  bpf_lsm_XXX
    -> trampoline
         run lsm.s/file_open
               map_update_elem(&ima_hash_map, &e->current, &hash, BPF_ANY);
         run lsm/generic_lsm
               map_lookup_elem(&ima_hash_map, &e->current, ...)

to ensure the generic lsm is invoked AFTER lsm.s/XXX hook we can rely
on kernel using standard lists when queuing programs for trampoline,
so you should just attach lsm.s hook first and then generic lsm

I guess there might be some hiccups when you'll implement that
(there always are) but so far it seems straightforward

WDYT?

@anfedotoff
Copy link
Contributor Author

I refreshed on LSM programs and I think we can do this differently

so LSM programs are attached through trampolines on specific bpf_lsm_XXX function which is invoked as part of the lsm modules call chain on particular lsm call back

and trampoline allows to attach and invoke multiple bpf programs to specific function (or lsm hook)

so I think we could have the sleepable lsm hooks invoked first to get the hash and update the map and then we will have generic lsm invoked that would run the hash action, which would get the hash from the map and send the event

  bpf_lsm_XXX
    -> trampoline
         run lsm.s/file_open
               map_update_elem(&ima_hash_map, &e->current, &hash, BPF_ANY);
         run lsm/generic_lsm
               map_lookup_elem(&ima_hash_map, &e->current, ...)

to ensure the generic lsm is invoked AFTER lsm.s/XXX hook we can rely on kernel using standard lists when queuing programs for trampoline, so you should just attach lsm.s hook first and then generic lsm

I guess there might be some hiccups when you'll implement that (there always are) but so far it seems straightforward

WDYT?

Thanks for making it clear!
IIUC, we will calculate hash on every triggered hook. It might cause some performance issues. Now our calls of bpf_ima_file_hash are filtered by tracing policy. When we move lsm.s/ima_* programe to the top, we will call helper no matter if tracing policy is satisfied or not. I think, it will be OK, if ima_policy is satisfied. Let's consider the following example:

We have a default ima_policy=tcb:

measure func=MMAP_CHECK mask=MAY_EXEC
measure func=BPRM_CHECK mask=MAY_EXEC # binary executed
measure func=FILE_CHECK mask=^MAY_READ euid=0
measure func=FILE_CHECK mask=^MAY_READ uid=0 # root opened r/o, r/w
measure func=MODULE_CHECK
measure func=FIRMWARE_CHECK
measure func=POLICY_CHECK

In particular, this ima_policy will measure hashes when files owned by root are opened for reading. So, in solution that you proposed will call bpf_ima_file_hash on every hook. The question is what helper will return, if ima_policy is not satisfied? If error, than it is OK, I think. But if it calculates hash despite the ima_policy it is not OK. I think, I can check this.

To sum up, I think, you solution is a nice compromise if ima_policy is taking to account during bpf_ima_file_hash call. What do you think about this?

@anfedotoff
Copy link
Contributor Author

It works! No race condition! ✨

Copy link
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! left some comment, thanks

bpf/process/bpf_generic_lsm_core.c Outdated Show resolved Hide resolved
bpf/process/types/basic.h Outdated Show resolved Hide resolved
pkg/sensors/tracing/genericlsm.go Show resolved Hide resolved
bpf/process/types/basic.h Show resolved Hide resolved
pkg/sensors/tracing/lsm_test.go Show resolved Hide resolved
@anfedotoff anfedotoff force-pushed the ima-hash-action branch 2 times, most recently from 5f9d458 to 3e75f2f Compare September 24, 2024 13:22
bpf/process/types/basic.h Outdated Show resolved Hide resolved
bpf/process/bpf_generic_lsm_ima_bprm.c Outdated Show resolved Hide resolved
bpf/process/bpf_generic_lsm_core.c Outdated Show resolved Hide resolved
bpf/process/bpf_generic_lsm_output.c Show resolved Hide resolved
cmd/tetra/getevents/getevents.go Show resolved Hide resolved
Copy link
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leeks great, thanks

Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me overall!

I have some requests for changes. Please let me know if they do not make sense to you. I have not followed the discussion with Jiri. Can you please add a summary of the discussion in the PR header. Having a discussion of what the previous approach was, why it did not work, and what this PR does that works would be very helpful!

Thanks!

bpf/process/bpf_generic_kprobe.c Show resolved Hide resolved
bpf/lib/generic.h Outdated Show resolved Hide resolved
bpf/process/bpf_generic_lsm_ima_bprm.c Outdated Show resolved Hide resolved
api/v1/tetragon/tetragon.proto Show resolved Hide resolved
@anfedotoff
Copy link
Contributor Author

Thanks, this looks good to me overall!

I have some requests for changes. Please let me know if they do not make sense to you. I have not followed the discussion with Jiri. Can you please add a summary of the discussion in the PR header. Having a discussion of what the previous approach was, why it did not work, and what this PR does that works would be very helpful!

Thanks!

Sure! I updated the PR description. I hope, now it became more clear about what we've done to avoid the race condition.

@kkourt
Copy link
Contributor

kkourt commented Oct 3, 2024

Thanks, this looks good to me overall!
I have some requests for changes. Please let me know if they do not make sense to you. I have not followed the discussion with Jiri. Can you please add a summary of the discussion in the PR header. Having a discussion of what the previous approach was, why it did not work, and what this PR does that works would be very helpful!
Thanks!

Sure! I updated the PR description. I hope, now it became more clear about what we've done to avoid the race condition.

Great Thanks! Can you please split the change that changes the return value into a separate commit with a short explanation of why it OK to do as discussed in another comment? Other than that, LGTM!

Before:
if event is to be posted tail call from generic_action occurs.
Otherwise function returns 0.
Now:
Previous logic is saved. But now the value (true/false) is returned
from generic_action in case tail call is failed.

Signed-off-by: Andrei Fedotov <anfedotoff@yandex-team.ru>
Due to restrictions of bpf sleepable programs (no tailcalls,
no perf buffer and per_cpu maps, etc.), we need to split
generic LSM sensor into three parts (collections)
and load them in this order:

- bpf_generic_output sends event using perf buffer
- bpf_generic_lsm_ima_*  calculates hash using IMA helpers
- bpf_generic_lsm_core does everything else

Signed-off-by: Andrei Fedotov <anfedotoff@yandex-team.ru>
Adding support for IMA hash collection in Post Action.
Adding IMA hashes in LSM events. Hash is represented by
a string algorithm:value. Support loading lsm.s/generic_lsm_ima_* programs.

Signed-off-by: Andrei Fedotov <anfedotoff@yandex-team.ru>
The output looks similar to this:

🔒 LSM     user-nix /usr/bin/zsh bprm_check_security
   /usr/bin/git sha256:29aa689f38158d2e8941fa54e436f0260890af31cecad1e8799e5c2df7bc1ecc
🔒 LSM     user-nix /usr/bin/zsh bprm_check_security
   /usr/bin/ls sha256:8696974df4fc39af88ee23e307139afc533064f976da82172de823c3ad66f444
🔒 LSM     user-nix /usr/bin/zsh bprm_check_security
   /usr/bin/wc sha256:5a91d203948e44a538e8e5179e712f37fa3264593748e5ce0f888b600447d004

Signed-off-by: Andrei Fedotov <anfedotoff@yandex-team.ru>
Adding test for ImaHash Post action.

Signed-off-by: Andrei Fedotov <anfedotoff@yandex-team.ru>
Add imaHash post action to lsm_bprm_check.yaml

Signed-off-by: Andrei Fedotov <anfedotoff@yandex-team.ru>
Signed-off-by: Andrei Fedotov <anfedotoff@yandex-team.ru>
@anfedotoff
Copy link
Contributor Author

Thanks, this looks good to me overall!
I have some requests for changes. Please let me know if they do not make sense to you. I have not followed the discussion with Jiri. Can you please add a summary of the discussion in the PR header. Having a discussion of what the previous approach was, why it did not work, and what this PR does that works would be very helpful!
Thanks!

Sure! I updated the PR description. I hope, now it became more clear about what we've done to avoid the race condition.

Great Thanks! Can you please split the change that changes the return value into a separate commit with a short explanation of why it OK to do as discussed in another comment? Other than that, LGTM!

@kkourt, can you look one more time plz)?

Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks!

@kkourt
Copy link
Contributor

kkourt commented Oct 4, 2024

All tests ✅, merging! Thanks!

@kkourt kkourt merged commit e472088 into cilium:main Oct 4, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/major This PR introduces major new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants