Skip to content

Commit

Permalink
docs: Why GeoJSONs are not compressed TDE-1263 (#306)
Browse files Browse the repository at this point in the history
### Motivation

Explain to end users why GeoJSON files are not compressed in our public
buckets.

### Modifications

Add file with documentation

### Verification

N/A
  • Loading branch information
l0b0 authored Oct 17, 2024
1 parent 5ca03d9 commit b3d85a8
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions docs/GeoJSON-compression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# GeoJSON compression

## Summary

Toitū Te Whenua has decided to store all metadata _uncompressed,_ including GeoJSON.

## Arguments

Pro compression:

- Saves money on storage
- Saves time and money on transfer of data
- Metadata files are highly compressible, since they have a lot of text strings and repetition

Contra compression:

- Some tools do not seamlessly decompress files
- [AWS CLI issue](https://github.com/aws/aws-cli/issues/6765)
- [boto3 issue](https://github.com/boto/botocore/issues/1255)
- Any files on S3 "[smaller than 128 KB](https://aws.amazon.com/s3/pricing/)" (presumably actually 128 KiB) are treated as being 128 KB for pricing purposes, so there would be no price gain from compressing any files which are smaller than this
- The extra development time to deal with compressing and decompressing JSON files larger than 128 KB would not offset the savings:
- We can get the sizes of JSON files by running `aws s3api list-objects-v2 --bucket=nz-elevation --no-sign-request --query="Contents[?ends_with(Key, 'json')].Size"` and `aws s3api list-objects-v2 --bucket=nz-imagery --no-sign-request --query="Contents[?ends_with(Key, 'json')].Size"`
- Summing up the sizes of files larger than 128 KB we get a total of only _33 MB_ at time of writing

0 comments on commit b3d85a8

Please sign in to comment.