PVF: add landlock sandboxing #7303

mrcnski · 2023-05-28T17:37:47Z

Pull Request

Overview

Landlock is a new sandboxing mechanism in Linux. Unfortunately it only sandboxes filesystem access right now. Also, it's 5.13+ so not all kernel versions support it. However, if support is not enabled calling landlock is simply a noop. It was easy to add and nice to have in the interim until full sandboxing.

TODO

Implement check_enabled
- (waiting on Get prospective restriction status, without applying restrictions landlock-lsm/rust-landlock#36.)
- In lieu of that ticket we can just spawn a dummy thread and try to restrict it.
- Update: Went ahead and implemented with a dummy thread for now.
~~Enable telemetry (to see how many validators have landlock enabled).~~
- The telemetry data would ideally be private so that attackers can't exploit this data. I don't think we currently support private telemetry and it shouldn't really be a blocker for this PR.
~~Add section to validator's guide about upgrading the kernel with landlock support.~~
- Decided not to do this because the validator guide recommends using a VPS, so it wouldn't be very easy to upgrade: "The most common way for a beginner to run a validator is on a cloud server running Linux. You may choose whatever VPS provider that your prefer."
- [Jun 6 2023] On further research, most distros should have landlock enabled.
- [Jun 6 2023] Added note to the guide here.

I can emit a warning (once on host startup I think), but the way the validator guide is written right now, I'm not sure how much control a validator can have on their kernel version/configuration:

"The most common way for a beginner to run a validator is on a cloud server running Linux. You may choose whatever VPS provider that your prefer."

I guess we could recommend "try to find a VPS with landlock enabled", but unless we give specific recommendations, it might be hard to find this info.

For people not running on a VPS, I found a guide for enabling landlock, but I haven't tested it, and I'm not a Linux expert. I'm not sure we should be officially recommending random guides from the internet...

Advice would be really appreciated. Let me know what you think!

I can emit a warning (once on host startup I think)

Hmm, so given the above, I'm not sure how actionable a warning would be. It could just be noise for most operators.

I found out that landlock is enabled by default on most distros including Ubuntu 22.04 LTS (what we recommend). And I don't see any reason why a VPS would explicitly opt-out of a security feature. So, we can simply emit a warning like this:

WARNING: Could not enable landlock, a Linux kernel security feature. Running validation of PVF code may compromise this machine. Consider upgrading the kernel version for maximum security.

So, if we consider this the recommended way of running the validator then we should refuse to start if the feature is not present and build in an explicit cmdline argument to bypass this check.

That's a good idea, I've added it to #7073 (see description). I don't think we can mandate it right now because like half of validators are on too-old kernel versions (according to the visible telemetry data). We already need to announce that "secure mode" change in advance, maybe we can also mention that validators should upgrade to kernel 5.13+ to avoid running in "insecure mode". I will bug Will again to see if we can get that announcement going.

Alright, makes sense then. So, keep the warning for now and make it mandatory at a later date.

Indeed this makes sense, I would nevertheless suggest a soft launch:

Enable the feature and print warning on startup if not available, hinting that this feature will be mandatory on some future release XY and that the operator should upgrade the machine. In addition this should obviously be part of the release notes as they are less likely to be ignored.

Release version XY which makes it mandatory, exiting with a warning that the check can be bypassed by that command line flag.

Updating the wiki would be good as well, but is least likely to be read by already operating operators.

@eskimor I missed your comment before merging. Here's what we have now, let me know what needs a follow-up:

1. Landlock sandboxing enabled, warning printed if not available

1. Hint that feature will be mandatory in a future release (maybe can do this at the same time as Secure Mode Announcement w3f/polkadot-wiki#4881? We should have version XY at that point.)

1. Include in release notes (this PR has the proper label)

2. Make it mandatory -- will do in CLI: Restrict os/arch for secure validators, add flag for insecure mode #7073

node/core/pvf/src/host.rs

node/core/pvf/common/src/worker/mod.rs

mrcnski · 2023-06-01T19:51:12Z

I've been pondering whether we should enable ABI V2 instead of V1, even though
only the latter is supported by our reference kernel version. If we were on V2,
then landlock would use that if it was supported on the current machine, giving it
stronger security, and just fall back to V1 if not supported.¹

The issue I see with that is indeterminacy. If half of validators were on V2 and
half were on V1, they may have different semantics on some PVFs. So a malicious
PVF now has a new attack vector: they can exploit this indeterminism between
landlock ABIs! But this is exactly the kind of thing we want to prevent!

So, we have to stick only to the latest ABI supported by our reference kernel
version (right now ABI V1). If a validator's machine does not fully support it,
we can't let them run as a secure validator.

The only caveat with that is that we want to make running securely actually
realistic for users, so we don't have a significant proportion of validators
running as insecure-i-know-what-i-do. If there were enough of those, it would
again provide some indeterminism that could be exploited - e.g. if 33% weren't on
a kernel that supported landlock, and found it easier to just pass the insecure
flag than upgrade.

So, takeaways:

Always use a reasonable ABI that most validators can fully support, and require full
landlock enablement to run securely.
Too many validators on insecure-mode can be a source of indeterminism.
We should monitor with telemetry how many validators are secure vs.
insecure. This shouldn't use the public telemetry server.

I've documented the determinism concerns in a new commit.

Also, I think this PR should be burned-in on Versi before merging.

The current versions are: reference kernel: 5.16+ | V1: 5.13 | V2: 5.19 ↩

mrcnski · 2023-06-01T22:25:36Z

Could use a re-review to sanity check the last two commits. I'll do a burn-in meanwhile.

alexggh · 2023-06-02T06:10:01Z

node/core/pvf/common/src/worker/security.rs

+	/// we were on V2, then landlock would use V2 if it was supported on the current machine, and
+	/// just fall back to V1 if not.
+	///
+	/// The issue with this is indeterminacy. If half of validators were on V2 and half were on V1,


This logic doesn't parse well for me, if V1 is less restrictive and makes validators vulnerable then I would guess having just a part of them vulnerable is better than all of them isn't it ?

Good question and I wondered the same. The issue is that it opens up a possibility of different behavior on different validators, and this itself can be exploited to attack consensus. Say that in the future we enable ABI V3 and executing some PVF tries to truncate a file (which will be banned on ABI V3)¹ - some validators will error and some won't. If the split in voting among validators is less than 33%, there will be a dispute and all the losing validators will get slashed. If the split is more than 33%, it violates our assumptions about consensus and finality will stall.

So indeed there is a very interesting trade-off between security for the validator and security for the chain, and I think we have to prioritize the latter while providing as much validator security as possible. If a small amount of validators are behind in security and vote wrongly then some slashing is okay, and it can be reverted with governance, but I think we really don't want finality stalls.

Although, I guess it would also be really bad if a bunch of validator keys got stolen and an attacker started impersonating them... And anyway there are other sources of indeterminacy to attack the chain with... Fortunately ABI V1 already fully protects against reading unauthorized data, so in this case it is enough to protect validators' keys and it is still the correct decision. (The only other thing I would want to feel safe is to block all network access. Maybe it's possible to set up a firewall per-thread?)

There are similar considerations that made the seccomp sandboxing harder than anticipated. Maybe @eskimor can double-check my analysis.

Footnotes

Using V3 in this example because V2 doesn't actually provide additional restrictions on top of V1. ↩

Thank you for the explanation, makes sense now! I guess V1 is the best we can do.

It should be the default for validators to have these security measures in place, ideally we would have none without them. Anyhow, the risk of disputes should be very low as this is already a second level defense mechanism. I would rather have a dispute than some PVF being able to read the validator's disk. We should make damn sure that there are no legitimate disk accesses of course, but checking that should be independent of PVF or candidate, so also rather easy. At least at the moment, I can't think of a legitimate disk access that would only happen on some PVFs ..

Thanks @eskimor. Determinism is still a goal, and given that ABI v2 and v3 don't add to the security I would stick to v1 here.¹ I will update the docs as the determinacy is still relevant but not the only factor. And if in the future a version is released with meaningful new features like network blocking (which is an eventual goal of landlock) we can enable it immediately. We should keep in mind that attackers can exploit any difference in operation to attack the chain, but the risk is low and there are other indeterminacy vectors anyway.

Footnotes

v2 adds a new config option which we don't use, and v3 additionally blocks file truncation - which might be annoying, but not really critical to security, and validators should have backups, right? ↩

node/core/pvf/execute-worker/src/lib.rs

sandreim

Nice work @mrcnski ! Before merging we should make sure we run on CI machines which support landlock and also without such support, just to make sure things work fine in both cases.

node/core/pvf/src/host.rs

node/core/pvf/common/src/worker/security.rs

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

mrcnski · 2023-06-02T16:25:10Z

I raised w3f/polkadot-wiki#4872, who do I annoy to get it reviewed? 🙃

mrcnski · 2023-06-05T15:48:34Z

Required before merging:

New CI pipeline for pre-5.13 kernel: https://github.com/paritytech/ci_cd/issues/811
Versi burn-in to go well.

Job introduced in #7371.

mrcnski added 4 commits May 26, 2023 18:11

Begin adding landlock + test

146e6e2

Move PVF implementer's guide section to own page, document security

950add0

Implement test

e48605a

Add some docs

d1af7ee

mrcnski requested review from s0me0ne-unkn0wn and sandreim May 28, 2023 17:37

mrcnski self-assigned this May 28, 2023

mrcnski marked this pull request as draft May 28, 2023 17:44

Do some cleanup

3e5b6cd

mrcnski marked this pull request as ready for review May 31, 2023 14:26

Fix typo

39f2495

mrcnski requested a review from eskimor May 31, 2023 14:35

eskimor approved these changes May 31, 2023

View reviewed changes

eskimor reviewed May 31, 2023

View reviewed changes

mrcnski added 3 commits May 31, 2023 13:21

Warn on host startup if landlock is not supported

e555165

Clarify docs a bit

30178c9

Minor improvements

c096284

alexggh reviewed Jun 1, 2023

View reviewed changes

node/core/pvf/src/host.rs Outdated Show resolved Hide resolved

node/core/pvf/common/src/worker/mod.rs Outdated Show resolved Hide resolved

mrcnski mentioned this pull request Jun 1, 2023

CLI: Restrict os/arch for secure validators, add flag for insecure mode #7073

Draft

4 tasks

alexggh approved these changes Jun 1, 2023

View reviewed changes

Add some docs about determinism

41d8d1a

Address review comments (mainly add warning on landlock error)

6524c81

mrcnski requested a review from eskimor June 1, 2023 22:25

alexggh reviewed Jun 2, 2023

View reviewed changes

sandreim approved these changes Jun 2, 2023

View reviewed changes

node/core/pvf/src/host.rs Outdated Show resolved Hide resolved

node/core/pvf/src/host.rs Outdated Show resolved Hide resolved

node/core/pvf/common/src/worker/security.rs Show resolved Hide resolved

mrcnski and others added 4 commits June 2, 2023 08:06

Update node/core/pvf/src/host.rs

a9b2dfd

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

Update node/core/pvf/src/host.rs

b9d8fc1

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

Merge branch 'master' into mrcnski/pvf-landlock

609a82a

Fix unused fn

e9b5c17

mrcnski mentioned this pull request Jun 2, 2023

Validator Guide: add note about landlock w3f/polkadot-wiki#4872

Merged

Update ABI docs to reflect latest discussions

08d98e9

mrcnski mentioned this pull request Jun 2, 2023

PVF worker: restrict network access paritytech/polkadot-sdk#619

Closed

Remove outdated notes

2d72e31

mrcnski mentioned this pull request Jun 4, 2023

PVF worker: Prevent access to env vars #7330

Merged

1 task

mrcnski mentioned this pull request Jun 5, 2023

PVF worker: restrict networking #7334

Draft

mrcnski added 2 commits July 5, 2023 17:41

Try to trigger new test-linux-oldkernel-stable job

e079ce5

Job introduced in #7371.

Merge branch 'master' into mrcnski/pvf-landlock

3765203

mrcnski merged commit b2bf9cd into master Jul 5, 2023

mrcnski deleted the mrcnski/pvf-landlock branch July 5, 2023 16:57

mrcnski mentioned this pull request Jul 5, 2023

Secure Mode Announcement w3f/polkadot-wiki#4881

Closed

mrcnski mentioned this pull request Aug 8, 2023

PVF host: Specialize on Linux, support macOS paritytech/polkadot-sdk#881

Closed

crystalin mentioned this pull request Oct 20, 2023

Update substrate/polkadot/cumulus from v0.9.43 to v1.1.0 moonbeam-foundation/moonbeam#2535

Closed

JesseAbram mentioned this pull request Oct 26, 2023

Update substrate and subxt entropyxyz/entropy-core#435

Merged

+              			cond_notify_on_done(
+              				|| {
+              					#[cfg(target_os = "linux")]
+              					let _ = crate::worker::security::landlock::try_restrict_thread();

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVF: add landlock sandboxing #7303

PVF: add landlock sandboxing #7303

mrcnski commented May 28, 2023 •

edited

Loading

mrcnski commented May 31, 2023

eskimor left a comment

eskimor May 31, 2023

eskimor May 31, 2023

mrcnski May 31, 2023

mrcnski May 31, 2023

mrcnski May 31, 2023

This comment was marked as duplicate.

mrcnski Jun 1, 2023

alexggh Jun 1, 2023

eskimor Jun 6, 2023 •

edited

Loading

mrcnski Jul 5, 2023

mrcnski commented Jun 1, 2023

mrcnski commented Jun 1, 2023

alexggh Jun 2, 2023

mrcnski Jun 2, 2023

alexggh Jun 2, 2023

eskimor Jun 2, 2023

mrcnski Jun 2, 2023

sandreim left a comment

mrcnski commented Jun 2, 2023

mrcnski commented Jun 5, 2023 •

edited

Loading

PVF: add landlock sandboxing #7303

PVF: add landlock sandboxing #7303

Conversation

mrcnski commented May 28, 2023 • edited Loading

Pull Request

Overview

TODO

Related

mrcnski commented May 31, 2023

eskimor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as duplicate.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eskimor Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrcnski commented Jun 1, 2023

Footnotes

mrcnski commented Jun 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

sandreim left a comment

Choose a reason for hiding this comment

mrcnski commented Jun 2, 2023

mrcnski commented Jun 5, 2023 • edited Loading

mrcnski commented May 28, 2023 •

edited

Loading

eskimor Jun 6, 2023 •

edited

Loading

mrcnski commented Jun 5, 2023 •

edited

Loading