Add use of zstd compression on compute services for pushing to API service, storing at-rest, and fast decompression on retreival via clients #220

dotsdl · 2024-01-09T18:42:31Z

Currently, we perform no compression at-rest for ProtocolDAGResults produced by compute services. This yields larger-than-necessary JSON files on the object store, and either larger-than-necessary transmission payloads or unnecessary compression/decompression in-transit for files that do not ever change on the object store.

Instead, we should compress once ProtocolDAGResults on creation on the compute service itself, then push this compressed artifact to the compute API for delivery to the object store. Requests by clients for this artifact would then pull the compressed artifact as-is with no in-transit compression/decompression required. If using zstd by way of e.g. python-zstandard, decompression client-side can be very fast, mitigating wait times for users in retrieving results.

Because the compression is done on the compute services themselves, this scheme puts minimal load on the API services receiving or retrieving these artifacts. Relatively slow compression happens only once on creation, and very fast decompression can happen many times over the lifetime of the artifact.

Importantly, for all cases of compressed ProtocolDAGResult retrieval, compression in-transit should be disabled to avoid wasteful double-compression/decompression.

The text was updated successfully, but these errors were encountered:

dotsdl · 2024-04-11T03:15:34Z

In addition, we should ensure we are using KeyedChain-based serialization and deserialization for ProtocolDAGResults, with compression on top of this once serialized into JSON.

ianmkenney · 2024-04-24T16:57:32Z

If we go the KeyedChain route, I'd want to wait until OpenFreeEnergy/gufe#286 is resolved. The KeyedChain as it exists in alchemiscale might be removed and I don't want to bank on it's structure not fundamentally changing in order to get merged into gufe.

dotsdl added component-user-api component-compute-api component-compute-service component-objectstore component-compute-client component-user-client performance labels Jan 9, 2024

dotsdl added this to the Release 0.4.0 - "living networks" and automated strategies enablement milestone Jan 9, 2024

dotsdl added the priority-high label Apr 11, 2024

dotsdl assigned ianmkenney Apr 23, 2024

dotsdl modified the milestones: Release 0.7.0 - "living networks" and automated strategies enablement, Release 0.6.0 - result retrieval optimizations, server-side task restart policies Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add use of zstd compression on compute services for pushing to API service, storing at-rest, and fast decompression on retreival via clients #220

Add use of zstd compression on compute services for pushing to API service, storing at-rest, and fast decompression on retreival via clients #220

dotsdl commented Jan 9, 2024

dotsdl commented Apr 11, 2024

ianmkenney commented Apr 24, 2024

Add use of zstd compression on compute services for pushing to API service, storing at-rest, and fast decompression on retreival via clients #220

Add use of zstd compression on compute services for pushing to API service, storing at-rest, and fast decompression on retreival via clients #220

Comments

dotsdl commented Jan 9, 2024

dotsdl commented Apr 11, 2024

ianmkenney commented Apr 24, 2024