-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Conversation
I've either implemented or crossed-out all TODO items. Should be ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks @mrcnski !
cond_notify_on_done( | ||
|| { | ||
#[cfg(target_os = "linux")] | ||
let _ = crate::worker::security::landlock::try_restrict_thread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, instead of simply really doing nothing, I think we should issue a warning that landlock is not available and for maximum security encourage the operator to upgrade their Kernel/make sure landlock is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for non Linux - a warning that the landlock sandboxing is not available would be a good idea, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can emit a warning (once on host startup I think), but the way the validator guide is written right now, I'm not sure how much control a validator can have on their kernel version/configuration:
"The most common way for a beginner to run a validator is on a cloud server running Linux. You may choose whatever VPS provider that your prefer."
I guess we could recommend "try to find a VPS with landlock enabled", but unless we give specific recommendations, it might be hard to find this info.
For people not running on a VPS, I found a guide for enabling landlock, but I haven't tested it, and I'm not a Linux expert. I'm not sure we should be officially recommending random guides from the internet...
Advice would be really appreciated. Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can emit a warning (once on host startup I think)
Hmm, so given the above, I'm not sure how actionable a warning would be. It could just be noise for most operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found out that landlock is enabled by default on most distros including Ubuntu 22.04 LTS (what we recommend). And I don't see any reason why a VPS would explicitly opt-out of a security feature. So, we can simply emit a warning like this:
WARNING: Could not enable landlock, a Linux kernel security feature. Running validation of PVF code may compromise this machine. Consider upgrading the kernel version for maximum security.
This comment was marked as duplicate.
This comment was marked as duplicate.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if we consider this the recommended way of running the validator then we should refuse to start if the feature is not present and build in an explicit cmdline argument to bypass this check.
That's a good idea, I've added it to #7073 (see description). I don't think we can mandate it right now because like half of validators are on too-old kernel versions (according to the visible telemetry data). We already need to announce that "secure mode" change in advance, maybe we can also mention that validators should upgrade to kernel 5.13+ to avoid running in "insecure mode". I will bug Will again to see if we can get that announcement going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, makes sense then. So, keep the warning for now and make it mandatory at a later date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this makes sense, I would nevertheless suggest a soft launch:
- Enable the feature and print warning on startup if not available, hinting that this feature will be mandatory on some future release XY and that the operator should upgrade the machine. In addition this should obviously be part of the release notes as they are less likely to be ignored.
- Release version XY which makes it mandatory, exiting with a warning that the check can be bypassed by that command line flag.
Updating the wiki would be good as well, but is least likely to be read by already operating operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eskimor I missed your comment before merging. Here's what we have now, let me know what needs a follow-up:
- 1. Landlock sandboxing enabled, warning printed if not available
- 1. Hint that feature will be mandatory in a future release (maybe can do this at the same time as Secure Mode Announcement w3f/polkadot-wiki#4881? We should have version XY at that point.)
- 1. Include in release notes (this PR has the proper label)
- 2. Make it mandatory -- will do in CLI: Restrict os/arch for secure validators, add flag for insecure mode #7073
I've been pondering whether we should enable ABI V2 instead of V1, even though The issue I see with that is indeterminacy. If half of validators were on V2 and So, we have to stick only to the latest ABI supported by our reference kernel The only caveat with that is that we want to make running securely actually So, takeaways:
I've documented the determinism concerns in a new commit. Also, I think this PR should be burned-in on Versi before merging. Footnotes
|
Could use a re-review to sanity check the last two commits. I'll do a burn-in meanwhile. |
/// we were on V2, then landlock would use V2 if it was supported on the current machine, and | ||
/// just fall back to V1 if not. | ||
/// | ||
/// The issue with this is indeterminacy. If half of validators were on V2 and half were on V1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic doesn't parse well for me, if V1 is less restrictive and makes validators vulnerable then I would guess having just a part of them vulnerable is better than all of them isn't it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question and I wondered the same. The issue is that it opens up a possibility of different behavior on different validators, and this itself can be exploited to attack consensus. Say that in the future we enable ABI V3 and executing some PVF tries to truncate a file (which will be banned on ABI V3)1 - some validators will error and some won't. If the split in voting among validators is less than 33%, there will be a dispute and all the losing validators will get slashed. If the split is more than 33%, it violates our assumptions about consensus and finality will stall.
So indeed there is a very interesting trade-off between security for the validator and security for the chain, and I think we have to prioritize the latter while providing as much validator security as possible. If a small amount of validators are behind in security and vote wrongly then some slashing is okay, and it can be reverted with governance, but I think we really don't want finality stalls.
Although, I guess it would also be really bad if a bunch of validator keys got stolen and an attacker started impersonating them... And anyway there are other sources of indeterminacy to attack the chain with... Fortunately ABI V1 already fully protects against reading unauthorized data, so in this case it is enough to protect validators' keys and it is still the correct decision. (The only other thing I would want to feel safe is to block all network access. Maybe it's possible to set up a firewall per-thread?)
There are similar considerations that made the seccomp sandboxing harder than anticipated. Maybe @eskimor can double-check my analysis.
Footnotes
-
Using V3 in this example because V2 doesn't actually provide additional restrictions on top of V1. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation, makes sense now! I guess V1 is the best we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be the default for validators to have these security measures in place, ideally we would have none without them. Anyhow, the risk of disputes should be very low as this is already a second level defense mechanism. I would rather have a dispute than some PVF being able to read the validator's disk. We should make damn sure that there are no legitimate disk accesses of course, but checking that should be independent of PVF or candidate, so also rather easy. At least at the moment, I can't think of a legitimate disk access that would only happen on some PVFs ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eskimor. Determinism is still a goal, and given that ABI v2 and v3 don't add to the security I would stick to v1 here.1 I will update the docs as the determinacy is still relevant but not the only factor. And if in the future a version is released with meaningful new features like network blocking (which is an eventual goal of landlock) we can enable it immediately. We should keep in mind that attackers can exploit any difference in operation to attack the chain, but the risk is low and there are other indeterminacy vectors anyway.
Footnotes
-
v2 adds a new config option which we don't use, and v3 additionally blocks file truncation - which might be annoying, but not really critical to security, and validators should have backups, right? ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @mrcnski ! Before merging we should make sure we run on CI machines which support landlock
and also without such support, just to make sure things work fine in both cases.
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
I raised w3f/polkadot-wiki#4872, who do I annoy to get it reviewed? 🙃 |
Required before merging:
|
Pull Request
Overview
Landlock is a new sandboxing mechanism in Linux. Unfortunately it only sandboxes filesystem access right now. Also, it's 5.13+ so not all kernel versions support it. However, if support is not enabled calling landlock is simply a noop. It was easy to add and nice to have in the interim until full sandboxing.
TODO
check_enabled
Enable telemetry (to see how many validators have landlock enabled).Add section to validator's guide about upgrading the kernel with landlock support.Related
Closes #7243