Security Issues of the Mithril Software Architecture #1586

onyxstakepool · 2024-02-05T18:18:11Z

onyxstakepool
Feb 5, 2024

Why

The current Mithril network architecture is susceptible to various types of denial-of-sevice attacks of the block producer nodes that run the Mithril signer. The goal of the Mithril network project to interact with the stake weighted majority of block producers makes this an even more pressing issue. The close interaction of the Mithril signer with the cardano-node database and the access to secret keys and certificates as outlined in the Architecture overview creates a myriad of security issues.

The design decision that a single server, the Mithril aggregator, is interacting with the most important Cardano nodes on the network, the stake-weighted majority of the block producers, is a huge security risk. Even an inadvert bug in the Mithril software could disrupt the majority of the block producers and lead to a catastrophic network failiure of the Cardano network.

Having such a single point of failure risk on the Cardano network is not acceptable.

It also puts an enormous trust burdon on a piece of software that is maintained outside of the main cardano-node repository.

What

A reasonable approach to mitigate the main security issues above is to integrate the Mithril signer code into the cardano-node software and use the establisched networking infrastructure to relay the required Mithril information over the Cardano P2P network. The Mithril aggregator can then listen for specific Mithril messages on the Cardano P2P network without the need to penetrate the firewalls and infrastructure of the stake pool operators.

ghost · 2024-02-05T21:38:46Z

ghost
Feb 5, 2024

@onyxstakepool Thanks for raising this issue.

We think the actual picture is less bleak than the one you are painting here, but it's clear we are lacking good resources on the Mithril network threat model and an updated description of the architecture. We are planning to address your concerns promptly, eg. in the next few days (Work on ssue #1350 has been stalled but need to be moved forward).

In order to help us provide the best possible answer, could you please provide some details on the "myriad of security issues" you are seeing here (taking into consideration our short discussion on the #moria channel)?

0 replies

onyxstakepool · 2024-02-06T00:49:58Z

onyxstakepool
Feb 6, 2024
Author

@abailly-iohk Thanks for your respone.

After learning in the Discord Moria channel that it is not technically necessary to run the Mithril signer on a block producer, I am much less concerned now about the security implications. Since a simple relay server is also sufficient for the Mithril signer only the KES key and the node certificate are at risk. Even if these two pieces are leaked from the relay, an attacker would not be able to replicate the block producer since the vrf key is still missing. The KES key and the node certificate can be quickly rotated without any harm to the real block producer.

With this understanding, I suggest that the Mithril documentation is changed to recommend the naive deployment also for mainnet. See: Mithril signer deployment model

My original security concern was that an attacker would spend considerable efforts to compromise the Mithril repository and thus gain access to the block producers and wreck havoc on the whole Cardano network.

0 replies

ghost · 2024-02-06T07:01:12Z

ghost
Feb 6, 2024

IIRC we updated the suggested deployment model from relay to BP because some SPOs participating in Mithril were concerned about the potential of KES keys leaking when deployed on a relay which is inherently less protected 🤔

Your concern about supply-chain-based attacks is definitely valid and something we are also concerned with and should mitigate, but to be fair this is also something the cardano-node itself is subject to. We will take that into account in the threat model.

0 replies

jpraynaud · 2024-02-06T09:53:58Z

jpraynaud
Feb 6, 2024
Maintainer

@onyxstakepool Thanks for this issue.

Indeed, the new deployment model has been jointly designed with pioneer SPOs input and feedback, and they raised concerns about having the KES keys exposed on a relay. Prior to the mainnet launch, in late June 2023, we have started rolling out this new deployment (as announced in this dev blog post: https://mithril.network/doc/dev-blog/2023/06/28/signer-deployment-models).

As @abailly-iohk already mentioned, we are currently working on a threat model for the Mithril network. This will help us get a better picture of the security related issues, and it will also be an opportunity for the community to contribute and give some feedback.

In the mean time, feel free to share any specific attack scenario that you would already have in mind 👍

Issue #1488 has also been created to make the architecture diagrams easier to understand.

0 replies

reqlez · 2024-02-10T08:39:19Z

reqlez
Feb 10, 2024

Indeed, it was decided that keeping keys on world-facing cardano-node relays was not very acceptable. However, I don't see how an outgoing connection from a mithril-signer to an aggregator is a huge risk, personally.

In my mind, it's no different than you trusting the NTP client app to connect to an NTP server and not "get hacked" in the process.

"creates a myriad of security issues" I would like to see some example attack scenarios if i'm to be convinced. The way I see it, an attack would have to involve the mithril signer software having a serious bug PLUS the aggregator getting hacked at same time, crafting a malicious response to the signer, that could potentially execute some code in some "buffer overflow, etc". I guess theoretically possible? I think the engineers would have a better idea how possible it is to execute something like this.

RE supply chain attacks, no different than people using any of the add-on software that SPOs use. Like... Koios Tools, Scripts, cncli, etc etc. Not to mention, the operating system where you are running your software from, and the huge number of libraries, packages, etc, not having a supply chain attack.

Don't get me wrong, I do support the idea of Mithril just being integrated into the core protocol, but maybe it's a bit too early for this?

0 replies

onyxstakepool · 2024-03-21T01:57:17Z

onyxstakepool
Mar 21, 2024
Author

@disassembler just pointed out security issues with squid in the Mithril channel.
https://www.cvedetails.com/vulnerability-list/vendor_id-9950/product_id-17766/Squid-cache-Squid.html

0 replies

ghost · 2024-03-21T06:42:20Z

ghost
Mar 21, 2024

@onyxstakepool For the sake of completeness, you should also have posted @disassembler's answer

Looks like squid had a release recently patching a number of vulnerabilities: https://www.squid-cache.org/Versions/v6/squid-6.8-RELEASENOTES.html
This wasn't the case when we made the decision to use traffic. Also traffic has that vulnerability mentioned above patched in 9.2.3 released back in October.
Good news is both products when using the latest release have zero CVEs at the moment but I encourage anyone running infrastructure to keep up on tracking CVEs for anything they're running. The ossec mailing list is a good way to get early notifications of vulnerabilities.

and also @jpraynaud's answer later on:

Actually, the Squid vulnerabilities (majority of which are DoS attacks) are not directly applicable to the Mithril usage for the following reasons:
Squid is used only by the Mithril signer to forward proxy its HTTPS calls to the Mithril aggregator (containing only public data)
Squid is not used to relay any traffic coming from outside of the SPO infrastructure, and it is not caching any data
The firewall and Squid configurations that we recommend enforce that only the Mithril signer can have its calls relayed

Squid is a piece of software and like all pieces of software can have vulnerabilities. Deploying and using any software for business critical missions require constant monitoring and attention to security and performance issues which, fortunately for the case of squid, are public and promptly fixed.

0 replies

ghost · 2024-03-21T07:53:26Z

ghost
Mar 21, 2024

BTW, seems like Traffic is also subject to security exploits: https://www.cvedetails.com/vulnerability-list/vendor_id-45/product_id-19990/Apache-Traffic-Server.html

0 replies

onyxstakepool · 2024-03-21T17:25:03Z

onyxstakepool
Mar 21, 2024
Author

@abailly-iohk Thanks for addressing all the concerns above.

Let me explain here why I keep voicing concerns about how Mithril is set up.

First, as soon as you open a door (port) to your server, you are in the business of defending this door. So, for critical infrastructure you just do not open these doors unless absolutely necessary.

For the cardano-node I am quite confident that every bit that flows in and out of the server is meticulously processed and checked.

For the Mithril signer, relay, and aggregator the whole security concept looks much more ad hoc.

I must run the signer with the same user as the cardano-node.
The signer must be granted unrestricted access to the node database.
The signer needs access to secret keys.
The relay (squid) adds more complexity to keep everything secure.
The signer and aggregator communicate over their own channel (port).
The aggregator is connected and controls the signers on the stake majority of the block producers.

So, I must put a lot of trust into all these additional software pieces. What happens if the aggregator gets compromised or gets seized by the authorities? Then the aggregator software can be manipulated and potentially corrupt the signers and block producers of the stake majority or corrupt the databases of relay servers or even leak keys and so forth.

You might still think this is all moot. Here is a story:
In the early days of Bitcoin there was an IRC client build-in for bootstrapping the network. It took only 4 lines of obfuscated source code to subvert the IRC client to execute arbitrary system commands on the node server. Security analysis showed that this was the downfall of one of the largest crypto exchanges at the time. All wallet keys were leaked. Thereafter all the IRC code was removed from the Bitcoin client.

Now we have the same pattern with Mithril. An unchecked http channel deep into the core SPO infrastructure that sends data back and forth waiting to be exploited.

This is why I am concerned. Also, in the future there might be more than one aggregator with more delegated trust. The risks are just increasing.

2 replies

TrevorBenson Apr 5, 2024

While the described security issues imply certain requirements as necessities rather than configurable options, they actually reflect choices available to the SPO during installation and configuration. However, this would necessitate the SPO customize and thoroughly test the setup process, deviating from the steps provided in quick start guides.

I will address each bullet point individually to clarify.

I must run the signer with the same user as the cardano-node.

The same user is actually not a hard requirement. This simply provides "ease of use/setup" by using the same user. What this really provides is the required file permissions for all Mithril components without step by step instructions on the alternative configurations.

Without using the same user for Cardano Node and Mithril Signer processes requires the user of the Mithril Signer process to have:

Read access to the Cardano DB for the Mithril Signer
Read access to the KES secret key for the Mithril Signer
Read/Write access to the Cardano Node socket for the Mithril Signer

These requirements can be achieved without using the same user account. One way to handle this is by using setfacl to apply directory and file ACLs. The user running the Mithril Signer process is granted the required permissions to read the parent folders and files for the DB and KES secret key.

The signer must be granted unrestricted access to the node database.

The signer does not require unrestricted access, it simply needs to be able to read the entire database to create signatures. The only write level access required is actually for Mithril Client, which could be run under its own use if the ACLs are setup to ensure any data written to the DB automatically inherits full read/write permissions for the user of Cardano Node process.

The signer needs access to secret keys.

Yes, but only the KES secret key. As mentioned earlier an ACL could be applied that limits this access by user (UID) to only the KES secret key, and prevents reading any other keys, both public or private.

The relay (squid) adds more complexity to keep everything secure.

When setup correctly the Squid (or any) relay blocks access to anything except the block producer and potentially a backup block producer (hereafter referred to as BPs). I dare say "setup correctly" means that not only the proxy rules are limiting access to the BPs, that a firewall should also limit any socket connection to the BPs. With this setup, regardless of what relay software is actually used, there is reduced concerns about security due to vulnerabilities in the proxy as only the SPO's own infrastructure can reach the relay process. This also mitigates the potential for DOS/DDOS attacks taking down the SPO's Mithril relay.

This really goes for any relay software whether its Squid, Nginx or Apache Traffic, they should all be secured, and if more than one Cardano Node relay exists I would suggest having a Mithril Relay (of whatever relay software) running on each. However, this does require setting up a "sidecar" load balancer (i.e. load balancer running on same host with Mithril Signer) to handle the redirection from the signer to your pool of Mithril Relays. Yes this makes the architecture a bit more complex, but it provides the redundancy/failover the current architecture would require for multiple relays.

The signer and aggregator communicate over their own channel (port).

Yes. However, from what I recall this is an HTTP protocol which uses a client / server model.

The Aggregator does not require inbound access to the Signer and does not initiate communications.
The Signer sends POST requests to the Aggregator to submit signed certificates.

I don't believe its PUT or PATCH requests. However, I didn't tcpdump the traffic before writing this response to confirm but as long as it is still a HTTP client/server communications the difference between each is mostly moot for this discussion.

The standards of an HTTP Response are:

Status Line
Headers
Response body

The aggregator is connected and controls the signers on the stake majority of the block producers.

My understanding of the aggregator is admittedly somewhat limited, only having an aggregator when doing the initial devnet testing. However given the Signer:

Uses HTTP Protocol
Uses HTTP proxy/relays between it an the Aggregator
Has no inbound open ports for communications.

There is limited chance the Aggregator is controlling the Signer. Here is the description of the Aggregator:

INFO
Mithril aggregator is responsible for collecting individual signatures from the Mithril signers and aggregating them into a multi-signature. With this capability, the Mithril aggregator can provide certified snapshots of the Cardano blockchain.

While the description mentions collecting there is no actual inbound port opened for the Mithril Signer, as mentioned above. So the Aggregator has no intrinsic method to send traffic to the Signer. The client / server model of the HTTP protocol generally limits any type of "instructions" from the server (Aggregator) being sent to a client (Signer). Unless the client has some unique, I'd even call it "very odd", logic where it parses the HTTP Response Body for commands. Thus, I would not consider the Aggregator to be "controlling the signer".

In reality the Aggregator should be "accepting" individual signatures from Mithril Signers to aggregate into a multi-signature. Therefore the aggregator could either accept or reject a signature from a Signer, but should have no other role in instructing the Signer to take actions on its behalf.

@jpraynaud @abailly-iohk Please feel free to correct anything I've included in this post that you think I may have gotten incorrect about the Mithril Signer, Aggregator or the Client.

ghost Apr 5, 2024

Thanks a lot @TrevorBenson for the thorough analysis! I did not take the time to address the various points in details and I think you're really doing it in a better way than I could do

ghost · 2024-03-21T18:01:15Z

ghost
Mar 21, 2024

@onyxstakepool Thanks a lot for voicing clearly your concerns. It's a bit late for most of us right now, so I won't be able to respond in details but let's keep the conversation going and make sure we address those in the clearest possible way.

0 replies

ghost · 2024-03-22T15:39:00Z

ghost
Mar 22, 2024

That makes sense @ch1bo but it seems to me we are here discussing general principles and architecture rather than specific vulnerabilities, so perhaps doing it in the open would be better. Perhaps would it be even better to turn this into a Discussion?

1 reply

ch1bo Mar 22, 2024
Maintainer

Very good idea. Discussions have much better threading capabilities :)

brouwerQ · 2024-03-31T11:39:20Z

brouwerQ
Mar 31, 2024

What about running the mithril signer on your backup BP if you have one? It already has the KES key and node cert and in the event of a problem with the signer disrupting the node, your normal BP stays unaffected.

4 replies

ch1bo Apr 2, 2024
Maintainer

Is it common to have a backup block producer?

brouwerQ Apr 2, 2024

For bigger pools that mint lots of blocks per epoch it's essential imho. For smaller pools the extra cost most likely isn't justifiable...

jpraynaud Apr 2, 2024
Maintainer

It sounds like a possible option if you already run a backup BP and don't want to run on your main BP 👍

ghost Apr 6, 2024

IIUC there are already SPOs doing that.

onyxstakepool · 2024-04-19T18:34:37Z

onyxstakepool
Apr 19, 2024
Author

@TrevorBenson @brouwerQ @abailly-iohk @jpraynaud @ch1bo
Thank you all for the extensive discussion!

I can report now that the multi-pool signer setup outlined in detail in issue #1605 works as intended after 5 epochs of operation on mainnet. This is a completely isolated setup form the stake pool operation. So many of the security concerns discussed above do not apply any more.

Please review!

0 replies

jpraynaud · 2024-06-20T08:13:25Z

jpraynaud
Jun 20, 2024
Maintainer

We have released a Mithril Threat Model page on the documentation website which supersedes this discussion.

Thank you to all the participants!

0 replies

This comment has been hidden.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security Issues of the Mithril Software Architecture #1586

{{title}}

Replies: 15 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

This comment has been hidden.

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Security Issues of the Mithril Software Architecture #1586

Why

What

Replies: 15 comments · 7 replies

onyxstakepool Feb 6, 2024 Author

jpraynaud Feb 6, 2024 Maintainer

onyxstakepool Mar 21, 2024 Author

onyxstakepool Mar 21, 2024 Author

This comment has been hidden.

ch1bo Mar 22, 2024 Maintainer

ch1bo Apr 2, 2024 Maintainer

jpraynaud Apr 2, 2024 Maintainer

onyxstakepool Apr 19, 2024 Author

jpraynaud Jun 20, 2024 Maintainer

Replies: 15 comments 7 replies

onyxstakepool
Feb 6, 2024
Author

jpraynaud
Feb 6, 2024
Maintainer

onyxstakepool
Mar 21, 2024
Author

onyxstakepool
Mar 21, 2024
Author

ch1bo Mar 22, 2024
Maintainer

ch1bo Apr 2, 2024
Maintainer

jpraynaud Apr 2, 2024
Maintainer

onyxstakepool
Apr 19, 2024
Author

jpraynaud
Jun 20, 2024
Maintainer