Rework decompression algorithm to operate over 4-byte chunks #45

NickCondron · 2024-10-10T06:00:18Z

I tried implementing the chunk decompression approach (used in the original library) in Rust. This is very WIP, but I was curious what you thought about this idea. It's my first time writing Rust with this much unsafe, so I'm open to ideas to improve how it's organized.

I used the benchmark code in the my other PR. The performance results are mixed but promising. Maybe my code could be further optimized. Maybe it's worth testing other data sets.

dbtext/wikipedia/decompress
                        time:   [1.3307 ms 1.3312 ms 1.3319 ms]
                        thrpt:  [1.1472 GiB/s 1.1478 GiB/s 1.1482 GiB/s]
                 change:
                        time:   [+11.683% +11.794% +11.883%] (p = 0.00 < 0.05)
                        thrpt:  [-10.621% -10.550% -10.461%]
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

dbtext/l_comment/decompress
                        time:   [606.27 ┬╡s 606.73 ┬╡s 607.24 ┬╡s]
                        thrpt:  [1.5616 GiB/s 1.5629 GiB/s 1.5641 GiB/s]
                 change:
                        time:   [-12.526% -12.458% -12.392%] (p = 0.00 < 0.05)
                        thrpt:  [+14.145% +14.230% +14.320%]
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

dbtext/urls/decompress  time:   [1.7961 ms 1.7986 ms 1.8027 ms]
                        thrpt:  [1.4759 GiB/s 1.4792 GiB/s 1.4812 GiB/s]
                 change:
                        time:   [-8.4988% -8.3049% -8.0170%] (p = 0.00 < 0.05)
                        thrpt:  [+8.7158% +9.0571% +9.2882%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  7 (7.00%) high severe

a10y · 2024-10-11T21:33:32Z

Hey! Thanks for opening, this is definitely something we want to add. Apologies for being slow here, will find some time in next few days to get you some feedback

a10y · 2024-10-29T02:38:54Z

@NickCondron thanks for your patience here.

The code looks fine, and it passes all tests, have not run it through the fuzzer yet but should do that as well.

Here are my results on an AMD EPYC processor from a random Hetzner box:

dbtext/wikipedia/decompress
                        time:   [3.1569 ms 3.1890 ms 3.2261 ms]
                        thrpt:  [846.30 MiB/s 856.14 MiB/s 864.85 MiB/s]
                 change:
                        time:   [+4.1172% +5.7481% +7.4295%] (p = 0.00 < 0.05)
                        thrpt:  [-6.9157% -5.4357% -3.9544%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

compressed dbtext/wikipedia 2862830 => 1640581B (compression factor 1.75:1)
dbtext/l_comment/decompress
                        time:   [872.50 µs 888.90 µs 910.30 µs]
                        thrpt:  [2.8094 GiB/s 2.8770 GiB/s 2.9311 GiB/s]
                 change:
                        time:   [-12.195% -10.144% -7.6115%] (p = 0.00 < 0.05)
                        thrpt:  [+8.2385% +11.289% +13.888%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

compressed dbtext/l_comment 2745949 => 1018169B (compression factor 2.70:1)
dbtext/urls/decompress  time:   [2.5986 ms 2.6386 ms 2.6835 ms]
                        thrpt:  [2.1962 GiB/s 2.2335 GiB/s 2.2678 GiB/s]
                 change:
                        time:   [-5.9530% -3.7961% -1.5762%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6014% +3.9459% +6.3298%]
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

And on my M2 Max Apple Silicon

aduffy@DuffyProBook ~/c/fsst (nc/pr-45)> critcmp develop decompress4x
group                          decompress4x                           develop
-----                          ------------                           -------
dbtext/l_comment/decompress    1.00   503.3±19.01µs     5.1 GB/sec    1.06   535.5±20.86µs     4.8 GB/sec
dbtext/urls/decompress         1.00  1456.0±51.74µs     4.0 GB/sec    1.04  1514.0±41.35µs     3.9 GB/sec
dbtext/wikipedia/decompress    1.06  1120.6±35.05µs     2.4 GB/sec    1.00  1054.3±47.10µs     2.5 GB/sec

I very quickly added some counters to when the escape_mask was 0 VS non-zero, and it seems like that is the biggest predictor of how the benchmarks were affected by this change

wikipedia
HIT COUNT: 367454  MISS_COUNT: 51297 (7.16 HIT:MISS ratio)

l_comment
HIT COUNT: 252153  MISS_COUNT: 2705 (93:1 HIT:MISS ratio)

urls
HIT COUNT: 687203  MISS_COUNT: 40639 (16.9 HIT:MISS ratio)

Hence, given that l_comment seems to have on average 93 non-escape branches for every 1 escape branch, it's no wonder why it gains the most, while wikipedia, (which only takes ~7 non-escape branches for every escaping branch) suffers a bit.

a10y · 2024-10-29T02:40:23Z

In the morning I'll try running Vortex's TPC-H benchmarks with this version of FSST and see if it has any effect

a10y · 2024-10-29T12:44:43Z

I triggered a benchmark run in spiraldb/vortex#1158 👀

Rework decompression algorithm to operate over 4-byte chunks

603e142

a10y mentioned this pull request Oct 29, 2024

use NickCondron fsst changes spiraldb/vortex#1158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework decompression algorithm to operate over 4-byte chunks #45

Rework decompression algorithm to operate over 4-byte chunks #45

NickCondron commented Oct 10, 2024

a10y commented Oct 11, 2024

a10y commented Oct 29, 2024

a10y commented Oct 29, 2024

a10y commented Oct 29, 2024

Rework decompression algorithm to operate over 4-byte chunks #45

Are you sure you want to change the base?

Rework decompression algorithm to operate over 4-byte chunks #45

Conversation

NickCondron commented Oct 10, 2024

a10y commented Oct 11, 2024

a10y commented Oct 29, 2024

a10y commented Oct 29, 2024

a10y commented Oct 29, 2024