Virtualize zip gzip / compressed files to reduce not just space but churn #605

brendanheywood · 2024-03-26T00:10:28Z

This is a very 'out there' idea :)

The idea is that if you upload a large compressed file, eg an mbz backup file which has lots of binary content (ie a moodle backup which does contain files on purpose) then objectfs under the hood will disassemble the file, and for each binary file see if that file is already in object storage with an exact hash match. If it is then it replaces it with a reference and then stores the reduced size compressed file. It reverses this operation on the way out transparently. These files would not be compatible with cloudfront serving as the file is just there ready to go in its final form. This would need to be bullet proof and the generated file must be an exact binary match of the original or it could create of problems elsewhere.

This way if automated backups are constantly churning backup files, the backup files themselves are quite minimal. But it is still wasting a lot of time grinding these files when file-less backups are better.

Most of the time a binary file inside a compressed file is not attempted to be compressed at all its just inserted, so there shouldn't really be a massive change in the overall space used for a single file pulled apart. But there will be massive space reduction when files are never deleted (for PITR recovery for example)

There would need to be specific code written for each type of compression format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtualize zip gzip / compressed files to reduce not just space but churn #605

Virtualize zip gzip / compressed files to reduce not just space but churn #605

brendanheywood commented Mar 26, 2024

Virtualize zip gzip / compressed files to reduce not just space but churn #605

Virtualize zip gzip / compressed files to reduce not just space but churn #605

Comments

brendanheywood commented Mar 26, 2024