Support SPECIES_128 #41

Squiry · 2024-02-29T14:26:45Z

Should help with #9, the performance is still kind of low though (half of what jsoniter shows)

piotrrzysko · 2024-03-03T15:31:51Z

Thanks for the contribution!

I'm a bit busy right now, working on a feature for the parser that I'll hopefully finish this month, so I can't promise when I'll be able to look at your PR, but I'll definitely do so. I believe that the most important thing is to make sure that this change doesn't affect the most common cases (256-bit and 512-bit registers).

piotrrzysko · 2024-04-30T05:43:42Z

I've run the benchmarks on a machine with Neoverse-N1 CPU:

Architecture:             aarch64
  CPU op-mode(s):         32-bit, 64-bit
  Byte Order:             Little Endian
CPU(s):                   2
  On-line CPU(s) list:    0,1
Vendor ID:                ARM
  Model name:             Neoverse-N1
    Model:                1
    Thread(s) per core:   1
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             r3p1
    BogoMIPS:             243.75
    Flags:                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid

and the results are indeed unsatisfactory:

Benchmark                                                                              Mode  Cnt    Score   Error  Units
ParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_fastjson                   thrpt    5  436.897 ± 1.512  ops/s
ParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_jackson                    thrpt    5  380.908 ± 0.816  ops/s
ParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_simdjson                   thrpt    5  197.846 ± 0.894  ops/s
ParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_simdjsonPadded             thrpt    5  199.902 ± 0.545  ops/s
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_fastjson        thrpt    5  626.115 ± 1.175  ops/s
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_jackson         thrpt    5  463.471 ± 0.881  ops/s
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_jsoniter_scala  thrpt    5  871.302 ± 4.688  ops/s
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_simdjson        thrpt    5  213.725 ± 0.452  ops/s
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_simdjsonPadded  thrpt    5  216.995 ± 0.329  ops/s

I'd like to understand where the disparity between 256/512-bit and 128-bit vectors comes from (see results in README for Intel CPUs). Currently, I don't have space to investigate this. Would you like to do it, or would you like me to come back to it when I have time?

Squiry · 2024-05-02T15:37:24Z

I'd like to understand where the disparity between 256/512-bit and 128-bit vectors comes from

The way I've implement that feature for 128bit is not the same as the arm64 implementation in original repo. They take a little bit different approach there, but I don't think we need that kind of details here anyway.

piotrrzysko · 2024-05-05T16:49:20Z

I think your code looks good. By the disparity between 256/512-bit and 128-bit vectors I meant the difference in performance. As you can see in README for the (SchemaBased)ParseAndSelectBenchmark simdjson-java is typically 3-4 times faster than other libraries. However, based on the results I shared in my previous comment, it appears that for 128-bit vectors, the performance doesn't even match that of other libraries. I'm curious about the root cause of this difference. Could it simply be due to narrower registers? Or perhaps there's something else we're missing?

Squiry · 2024-05-05T22:12:01Z

That's interesting. My MacBook with m1max gives me different result:

Benchmark                                                                              Mode  Cnt     Score    Error  Units
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_fastjson        thrpt    5  1874.904 ±  8.548  ops/s
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_jackson         thrpt    5  1044.073 ± 39.591  ops/s
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_jsoniter_scala  thrpt    5  2153.209 ± 22.102  ops/s
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_simdjson        thrpt    5  1120.909 ± 16.372  ops/s
SchemaBasedParseAndSelectBenchmark.countUniqueUsersWithDefaultProfile_simdjsonPadded  thrpt    5  1131.995 ± 42.193  ops/s

It's still bad, but not even close that bad.

Support SPECIES_128

7a3cdfd

piotrrzysko added 2 commits April 29, 2024 06:40

Merge branch 'main' into species-128-support

1ee499c

Merge branch 'main' into species-128-support

6d59046

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SPECIES_128 #41

Support SPECIES_128 #41

Squiry commented Feb 29, 2024

piotrrzysko commented Mar 3, 2024

piotrrzysko commented Apr 30, 2024

Squiry commented May 2, 2024 •

edited

Loading

piotrrzysko commented May 5, 2024

Squiry commented May 5, 2024

Support SPECIES_128 #41

Are you sure you want to change the base?

Support SPECIES_128 #41

Conversation

Squiry commented Feb 29, 2024

piotrrzysko commented Mar 3, 2024

piotrrzysko commented Apr 30, 2024

Squiry commented May 2, 2024 • edited Loading

piotrrzysko commented May 5, 2024

Squiry commented May 5, 2024

Squiry commented May 2, 2024 •

edited

Loading