Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xxHash algorithm family #110

Closed
wants to merge 2 commits into from
Closed

Conversation

hzhuang1
Copy link

@hzhuang1 hzhuang1 commented Nov 3, 2022

Add XXH32 and XXH64 of algorithm family. Implement them with aarch64 SVE instructions.

@gbtucker
Copy link
Contributor

gbtucker commented Nov 5, 2022

This is interesting but xxhash is very parallelable in the single buffer version also. Are you comparing with the NEON single buffer version? You may have to add XXH_VECTOR == XXH_NEON when building xxhash.

@hzhuang1
Copy link
Author

hzhuang1 commented Nov 7, 2022

This is interesting but xxhash is very parallelable in the single buffer version also. Are you comparing with the NEON single buffer version? You may have to add XXH_VECTOR == XXH_NEON when building xxhash.

Screen Shot 2022-11-07 at 10 43 59

Screen Shot 2022-11-07 at 10 44 15

In cellphone SoC, maybe NEON could gain better performance than Scalar on single buffer version. In server SoC, I couldn't get the same result. I attached two pictures on the same test platform. I even got the similar result on another vendor's platform.

I think that you're talking about XXH3 algorithm. There's even performance downgrade on XXH3 with NEON. I'm in the process of upstreaming it in xxHash repository (Cyan4973/xxHash#751). With the optimization on SVE, I could gain 4x improvement for the best case on the same test platform.

XXH32 and XXH64 algorithms are less parallelable. Since there's no multiple buffer framework in xxHash, I submit the related patches to ISA-L_crypto instead. And I only need to compare SVE and Scalar since NEON can't bring any benefit in server SoC.

@hzhuang1
Copy link
Author

Any comment on this pull request?

@hzhuang1
Copy link
Author

Could you help to share any comment to me? Thanks

@gbtucker
Copy link
Contributor

So the SVE multi-buffer version is 3x-4x the single buffer version?

@hzhuang1
Copy link
Author

hzhuang1 commented Dec 7, 2022

So the SVE multi-buffer version is 3x-4x the single buffer version?

Excuse me to response late.

Not exactly. On different machine, the improvement is also different. I think that it's caused by different micro-architecture. The first one is ARM-v8.2, and the latest one is ARM-v8.4. It evolves quickly.

On the SVE-512 test machine, it improved the performance in the range of 1.x-3.x. It's caused by different block size.

@pablodelara
Copy link
Contributor

HI @hzhuang1. Apologies for this long wait time, we have been transitioning in our company and we are maintaining actively this library again. Are you still interested in pushing this? There are currently some code formatting issues (you can run tools/check_format.sh). Could you take a look at them? Thanks!

@hzhuang1
Copy link
Author

Thanks. I've updated the pull request.

@pablodelara
Copy link
Contributor

Hi @hzhuang1, apologies for asking about format again, but given that we have had issues with the indent tool, we have transitioned to clang-format, so this PR will require a change. Can you run ./tools/format.sh (you will need clang-format-18)?

Add the multi-buffer interface of xxhash. And create the test cases.

Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Implement XXH32 and XXH64 algorithm with aarch64 SVE instructions.

Test data on SVE-512bit platform is in below.

The standard XXH32 could reach to 1.2GB/s. The MB version could reach
to 1.4-4.8GB/s at capacity if data buffer length is equal or larger
than 1KB.

The standard XXH64 could reach to 2.3GB/s. The MB version could reach
to 2.3-7.1GB/s at capacity if data buffer length is equal or larger
than 1KB.

Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
@pablodelara
Copy link
Contributor

Hi @hzhuang1. Apologies for this late response. After thinking about this, I don't think this is the right place for this algorithm, given this is a non-cryptographic hash. I also saw that this is already in xxHash repo, so no need to include it here. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants