-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
boost/math defaulting to slow 128-bit float on aarch64 #1211
Comments
I think the issue and poor default comes from here, IMO promotion to long double is not appropriate on any architecture that is relevant today (anyone running on one where it is can still override in several ways).
|
You are correct that we should not be promoting to an emulated type, that's just silly. Changing to not promoting double on x86 systems has been on my list of things I should probably look at for a while - however it's a major breaking change that would also break every one of our tests - so not to be taken lightly unfortunately. |
Hm, macOS for example does not even provide a long double type (at least on M1 and onwards - not sure for Intel), how do the tests work there? |
We have an expected error rate for the "largest real type", and all the other real's are then assumed to be zero error. The largest real depends on the compiler/platform, so as you say for MacOS and MSVC that's double, long double for most GCC configurations. Unfortunately, these are set per-test not centrally, and we would need to go through and double check all the error rates we get to make sure there's nothing buggy there if we change over. It's all doable, there have just always been more important things to do... |
Most tests use an epsilon relative to the type of the test, not the most precise type. |
Not quite... it's a really bad design fault in the now rather old Boost.Test: the tolerance is expressed as a percentage, but the printout is the actual relative difference (so they are different by a factor of 100), so it's 8.39188e-30 found, vs 8e-28 / 100, ie 8e-30 expected. So a "trivial" fail, because the tolerance is slightly tight for that platform. |
Ah thanks! Now that I have a pass on the baseline, I will look into some of the failures with the promotion policy changed and see if I can help out with some fixes. No promises, and unlikely get all the way, but maybe it helps enough to speed things up a bit. |
I sent #1214 |
Seem it is kind of a duplicate of #241 which is over 4 years old, I think it's really time to address it. |
Whilst running a workload which uses boost::math::digamma, I discovered unexpectedly slow performance on aarch64 platforms.
The default seen in the distros (Ubuntu 22.04, Rocky9) on this platform (Linux aarch64) is to build with 128b floats. This then causes software emulation steps - and its large slowdown.
This seems a poor default - and whilst you can resolve at application compile time, by passing the compiler flag
-DBOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS
- it's unlikely people will know to do so.On aarch64 a 100x speed up is had by adding the define compared to the current default. On x86, it brings around a 6x speed up.
g++ workload-datasets/boost/btest.cpp -Ofast -mcpu=native
g++ workload-datasets/boost/btest.cpp -Ofast -mcpu=native -DBOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS
The text was updated successfully, but these errors were encountered: