You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we perform no compression at-rest for ProtocolDAGResults produced by compute services. This yields larger-than-necessary JSON files on the object store, and either larger-than-necessary transmission payloads or unnecessary compression/decompression in-transit for files that do not ever change on the object store.
Instead, we should compress once ProtocolDAGResults on creation on the compute service itself, then push this compressed artifact to the compute API for delivery to the object store. Requests by clients for this artifact would then pull the compressed artifact as-is with no in-transit compression/decompression required. If using zstd by way of e.g. python-zstandard, decompression client-side can be very fast, mitigating wait times for users in retrieving results.
Because the compression is done on the compute services themselves, this scheme puts minimal load on the API services receiving or retrieving these artifacts. Relatively slow compression happens only once on creation, and very fast decompression can happen many times over the lifetime of the artifact.
Importantly, for all cases of compressed ProtocolDAGResult retrieval, compression in-transit should be disabled to avoid wasteful double-compression/decompression.
The text was updated successfully, but these errors were encountered:
In addition, we should ensure we are using KeyedChain-based serialization and deserialization for ProtocolDAGResults, with compression on top of this once serialized into JSON.
If we go the KeyedChain route, I'd want to wait until OpenFreeEnergy/gufe#286 is resolved. The KeyedChain as it exists in alchemiscale might be removed and I don't want to bank on it's structure not fundamentally changing in order to get merged into gufe.
Currently, we perform no compression at-rest for
ProtocolDAGResult
s produced by compute services. This yields larger-than-necessary JSON files on the object store, and either larger-than-necessary transmission payloads or unnecessary compression/decompression in-transit for files that do not ever change on the object store.Instead, we should compress once
ProtocolDAGResult
s on creation on the compute service itself, then push this compressed artifact to the compute API for delivery to the object store. Requests by clients for this artifact would then pull the compressed artifact as-is with no in-transit compression/decompression required. If usingzstd
by way of e.g.python-zstandard
, decompression client-side can be very fast, mitigating wait times for users in retrieving results.Because the compression is done on the compute services themselves, this scheme puts minimal load on the API services receiving or retrieving these artifacts. Relatively slow compression happens only once on creation, and very fast decompression can happen many times over the lifetime of the artifact.
Importantly, for all cases of compressed
ProtocolDAGResult
retrieval, compression in-transit should be disabled to avoid wasteful double-compression/decompression.The text was updated successfully, but these errors were encountered: