Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use integrate smoldot further with Chopsticks to improve the performance of query remote state? #1473

Closed
xlc opened this issue Dec 15, 2023 · 8 comments

Comments

@xlc
Copy link
Contributor

xlc commented Dec 15, 2023

I spent a bit time navigating the repo trying to find an answer but I figured it maybe easier to just ask.

Right now, one of the biggest pain point to maintain a Chopstick based e2e test suite is to deal with all the broken RPC providers.

Chopsticks currently uses smoldot for wasm execution but that's it. The storages are read using pjs from a remote RPC endpoint (code)

This means if the upstream endpoint is slow, unresponsive, not working, etc, the tests will fail. Also currently Chopsticks is making a lot of sequential requests and that's not helping. It can easily hit a rate limit, or just slow due to latency.

Obviously smoldot as a light client is also capable of query storages. How can I use smoldot to query replace all the existing use cases (abstracted in this file) or RPC remote node?

Specifically, I am looking for some suggestions on the best way to approach this.

  • Should I use smoldot directly or use substrate connect or something else (e.g. polkadot-api)?
  • Should I use the old substrate API or the new JSON RPC API?
  • What's the best way to use smoldot to access the p2p network and perform queries using the p2p protocol?

I know smoldot as a light client, query historical state is kinda out of scope and may not be supported. However, Chopsticks needs the ability to query historical state so it can fork off an old block. With that in mind, is there anyway to use smoldot to achieve my goal? For example, make p2p requests directly to remote peers to query the information. Chopsticks as a dev & testing tool, security isn't the top priority so it doesn't need to verify data integrity and that may make something easier?

I want to avoid running a light client if possible because I don't really need it.

Here is a list of the API Chopsticks is currently using:

  • system_name
  • system_properties
  • system_chain
  • chain_getBlockHash
  • chain_getHeader
  • chain_getBlock
  • childstate_getStorage
  • state_getStorage
  • childstate_getKeysPaged
  • state_getKeysPaged
  • chain_subscribeNewHeads (not used by core code)
  • chain_subscribeFinalizedHeads (not used by core code)
@tomaka
Copy link
Contributor

tomaka commented Dec 15, 2023

I've read your post but overall I'm not really sure of what to answer.

The storages are read using pjs from a remote RPC endpoint (code)
This means if the upstream endpoint is slow, unresponsive, not working, etc, the tests will fail.

If you make requests on the peer-to-peer level, then you're facing the same problems. The endpoint can be slow, unresponsive, etc.
The smoldot light client "solves" this problem by distributing the requests it needs to send amongst its multiple peers, and by trying again automatically with a different peer if one fails to answer.
The peer-to-peer level is in fact even more restrictive than the JSON-RPC API, because you are restricted to only one request in-flight per peer at a time (this avoids spam attacks).

The advantage I see in doing requests at the peer-to-peer level is that some JSON-RPC providers might have a reverse proxy with a sketchy configuration in front of their endpoint, and you'd bypassing that. But that's hypothetical, and I'm not sure if it would be worth it.

Also currently Chopsticks is making a lot of sequential requests and that's not helping.

If you're sending one storage request every time the runtime execution reads something, one thing I'd suggest is to ask for a call proof at the beginning. Since you're hijacking the storage the call proof will probably not cover everything, but it will probably cover a lot of accesses.

Of course you'd need to decode the call proof, which is a bit complicated.

Should I use the old substrate API or the new JSON RPC API?

The new JSON-RPC API isn't ready for querying historical data. Right now it's been focused on following the head of the chain.

@xlc
Copy link
Contributor Author

xlc commented Dec 15, 2023

I see the benefits of using p2p protocol is that it have peer discovery so that I can easily connect to more nodes, rather then a fixed set of rpc. Yeah I can see some additional work is needed to deal with all the peers but I am hoping smoldot already have the logic implemented and I can simply use it.

@xlc
Copy link
Contributor Author

xlc commented Dec 15, 2023

Thanks for the suggestion of call proof. That wasn’t in my head and could make a big change in some scenarios.

@tomaka
Copy link
Contributor

tomaka commented Dec 15, 2023

I see the benefits of using p2p protocol is that it have peer discovery so that I can easily connect to more nodes, rather then a fixed set of rpc.

The problem is you can't know which nodes are archive nodes or not. You can't know which peers have a certain block available.

@tomaka
Copy link
Contributor

tomaka commented Dec 15, 2023

If you want to give it a try anyway, what I'd suggest is to create a NetworkService.

It's not public, but I think it should be ok making the light client services public.

The way you do it is:

  • Call NetworkService::new. For the platform, pass a DefaultPlatform.
  • Call add_chain to add your chain. Each network service supports multiple chains.
  • Call NetworkServiceChain::discover and pass the bootnodes. There's no need to call NetworkServiceChain::discover afterwards, the discovery is done automatically. discover is for manual operations.
  • Call NetworkServiceChain::subscribe in order to know which peers you're connected to through Event::Connected. Everything is automatic, you just have to listen for events. Note that you don't need to worry about race conditions: if you subscribe when the service is already connected to peers, it will emit dummy Connected events.
  • Call NetworkServiceChain::storage_proof_request. The target has to be a peer to which you're connected (i.e. an Event::Connected has happened).

NetworkServiceChain::storage_proof_request returns Merkle proofs, so you'd have to decode them in order to know the storage value.

NetworkServiceChain::storage_proof_request is very straight forward and sends one request to the given target. It doesn't do anything magic such as trying again if it fails or falling back to a different peer, as that's done by the sync_service. The sync_service of the light client tracks the best block of each peer in order to avoid sending requests to peers that don't have the block. If you're asking for old blocks, I guess you don't need to do that.

There's unfortunately no way yet to influence which peers the network service connects to. For example it's not possible to ban a peer in case it doesn't know a block. It's something I want to add (#1442).

@xlc
Copy link
Contributor Author

xlc commented Dec 15, 2023

Thanks for the pointers. I will give it a play.

@tomaka
Copy link
Contributor

tomaka commented Jan 10, 2024

Closing this. Feel free to open discussions or issues for questions or whatnot.

@tomaka tomaka closed this as not planned Won't fix, can't repro, duplicate, stale Jan 10, 2024
@xlc
Copy link
Contributor Author

xlc commented Jan 10, 2024

We already have a working PoC AcalaNetwork/chopsticks#609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants