Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework decompression algorithm to operate over 4-byte chunks #45

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

NickCondron
Copy link
Contributor

I tried implementing the chunk decompression approach (used in the original library) in Rust. This is very WIP, but I was curious what you thought about this idea. It's my first time writing Rust with this much unsafe, so I'm open to ideas to improve how it's organized.

I used the benchmark code in the my other PR. The performance results are mixed but promising. Maybe my code could be further optimized. Maybe it's worth testing other data sets.

dbtext/wikipedia/decompress
                        time:   [1.3307 ms 1.3312 ms 1.3319 ms]
                        thrpt:  [1.1472 GiB/s 1.1478 GiB/s 1.1482 GiB/s]
                 change:
                        time:   [+11.683% +11.794% +11.883%] (p = 0.00 < 0.05)
                        thrpt:  [-10.621% -10.550% -10.461%]
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

dbtext/l_comment/decompress
                        time:   [606.27 ┬╡s 606.73 ┬╡s 607.24 ┬╡s]
                        thrpt:  [1.5616 GiB/s 1.5629 GiB/s 1.5641 GiB/s]
                 change:
                        time:   [-12.526% -12.458% -12.392%] (p = 0.00 < 0.05)
                        thrpt:  [+14.145% +14.230% +14.320%]
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

dbtext/urls/decompress  time:   [1.7961 ms 1.7986 ms 1.8027 ms]
                        thrpt:  [1.4759 GiB/s 1.4792 GiB/s 1.4812 GiB/s]
                 change:
                        time:   [-8.4988% -8.3049% -8.0170%] (p = 0.00 < 0.05)
                        thrpt:  [+8.7158% +9.0571% +9.2882%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  7 (7.00%) high severe

@a10y
Copy link
Contributor

a10y commented Oct 11, 2024

Hey! Thanks for opening, this is definitely something we want to add. Apologies for being slow here, will find some time in next few days to get you some feedback

@a10y
Copy link
Contributor

a10y commented Oct 29, 2024

@NickCondron thanks for your patience here.

The code looks fine, and it passes all tests, have not run it through the fuzzer yet but should do that as well.

Here are my results on an AMD EPYC processor from a random Hetzner box:

dbtext/wikipedia/decompress
                        time:   [3.1569 ms 3.1890 ms 3.2261 ms]
                        thrpt:  [846.30 MiB/s 856.14 MiB/s 864.85 MiB/s]
                 change:
                        time:   [+4.1172% +5.7481% +7.4295%] (p = 0.00 < 0.05)
                        thrpt:  [-6.9157% -5.4357% -3.9544%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

compressed dbtext/wikipedia 2862830 => 1640581B (compression factor 1.75:1)
dbtext/l_comment/decompress
                        time:   [872.50 µs 888.90 µs 910.30 µs]
                        thrpt:  [2.8094 GiB/s 2.8770 GiB/s 2.9311 GiB/s]
                 change:
                        time:   [-12.195% -10.144% -7.6115%] (p = 0.00 < 0.05)
                        thrpt:  [+8.2385% +11.289% +13.888%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

compressed dbtext/l_comment 2745949 => 1018169B (compression factor 2.70:1)
dbtext/urls/decompress  time:   [2.5986 ms 2.6386 ms 2.6835 ms]
                        thrpt:  [2.1962 GiB/s 2.2335 GiB/s 2.2678 GiB/s]
                 change:
                        time:   [-5.9530% -3.7961% -1.5762%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6014% +3.9459% +6.3298%]
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

And on my M2 Max Apple Silicon

aduffy@DuffyProBook ~/c/fsst (nc/pr-45)> critcmp develop decompress4x
group                          decompress4x                           develop
-----                          ------------                           -------
dbtext/l_comment/decompress    1.00   503.3±19.01µs     5.1 GB/sec    1.06   535.5±20.86µs     4.8 GB/sec
dbtext/urls/decompress         1.00  1456.0±51.74µs     4.0 GB/sec    1.04  1514.0±41.35µs     3.9 GB/sec
dbtext/wikipedia/decompress    1.06  1120.6±35.05µs     2.4 GB/sec    1.00  1054.3±47.10µs     2.5 GB/sec

I very quickly added some counters to when the escape_mask was 0 VS non-zero, and it seems like that is the biggest predictor of how the benchmarks were affected by this change

image
wikipedia
HIT COUNT: 367454  MISS_COUNT: 51297 (7.16 HIT:MISS ratio)

l_comment
HIT COUNT: 252153  MISS_COUNT: 2705 (93:1 HIT:MISS ratio)

urls
HIT COUNT: 687203  MISS_COUNT: 40639 (16.9 HIT:MISS ratio)

Hence, given that l_comment seems to have on average 93 non-escape branches for every 1 escape branch, it's no wonder why it gains the most, while wikipedia, (which only takes ~7 non-escape branches for every escaping branch) suffers a bit.

@a10y
Copy link
Contributor

a10y commented Oct 29, 2024

In the morning I'll try running Vortex's TPC-H benchmarks with this version of FSST and see if it has any effect

@a10y
Copy link
Contributor

a10y commented Oct 29, 2024

I triggered a benchmark run in spiraldb/vortex#1158 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants