Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add use of zstd compression on compute services for pushing to API service, storing at-rest, and fast decompression on retreival via clients #220

Open
dotsdl opened this issue Jan 9, 2024 · 2 comments

Comments

@dotsdl
Copy link
Member

dotsdl commented Jan 9, 2024

Currently, we perform no compression at-rest for ProtocolDAGResults produced by compute services. This yields larger-than-necessary JSON files on the object store, and either larger-than-necessary transmission payloads or unnecessary compression/decompression in-transit for files that do not ever change on the object store.

Instead, we should compress once ProtocolDAGResults on creation on the compute service itself, then push this compressed artifact to the compute API for delivery to the object store. Requests by clients for this artifact would then pull the compressed artifact as-is with no in-transit compression/decompression required. If using zstd by way of e.g. python-zstandard, decompression client-side can be very fast, mitigating wait times for users in retrieving results.

Because the compression is done on the compute services themselves, this scheme puts minimal load on the API services receiving or retrieving these artifacts. Relatively slow compression happens only once on creation, and very fast decompression can happen many times over the lifetime of the artifact.

Importantly, for all cases of compressed ProtocolDAGResult retrieval, compression in-transit should be disabled to avoid wasteful double-compression/decompression.

@dotsdl
Copy link
Member Author

dotsdl commented Apr 11, 2024

In addition, we should ensure we are using KeyedChain-based serialization and deserialization for ProtocolDAGResults, with compression on top of this once serialized into JSON.

@ianmkenney
Copy link
Collaborator

If we go the KeyedChain route, I'd want to wait until OpenFreeEnergy/gufe#286 is resolved. The KeyedChain as it exists in alchemiscale might be removed and I don't want to bank on it's structure not fundamentally changing in order to get merged into gufe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment