Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More SPU patterns #15211

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

More SPU patterns #15211

wants to merge 8 commits into from

Conversation

RipleyTom
Copy link
Contributor

  • Simplified re_accurate pattern and fixed an edge case in the intrinsic
  • Fixed and simplified sqrt pattern(still need to test accuracy)
  • Added an alternative re_accurate pattern used by GOW3 that should give the same result

@Megamouse Megamouse added CPU LLVM Related to LLVM instruction decoders labels Feb 15, 2024
@Ordinary205
Copy link
Contributor

This PR breaks NFS Most Wanted audio on Approximate XFloat. This doesnt happen on the latest build.
RPCS3.log.gz

const auto bitcast_float = bitcast<u32[4]>(cursed_float);
set_vr(op.rt4, select(bitcast_float == splat<u32[4]>(0x3F800000) | bitcast_float == splat<u32[4]>(0x3F800001), fsqrt(fabs(x)), fma(fnms(spu_rsqrte(x), c, cursed_float), fm(fsplat<f32[4]>(0.5f), fm(x, spu_rsqrte(x))), fm(x, spu_rsqrte(x)))));
const auto bitcast_float = bitcast<u32[4]>(fnms_float);
set_vr(op.rt4, select(fcmp_uno(fm_float == fsplat<f32[4]>(0.5f)) & (bitcast_float == splat<u32[4]>(0x3F800000) | bitcast_float == splat<u32[4]>(0x3F800001)), fsqrt(fabs(x)), fma(fnms(spu_rsqrte(x), c, fnms_float), fm(fm_float, fm(x, spu_rsqrte(x))), fm(x, spu_rsqrte(x)))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select evaluates all its arguments, its why it exists (so it would be branch-less)
So this would perform worse.
You can use conditional jumps

@digant73
Copy link
Contributor

@RipleyTom is there a reason why this PR is still in draft and not yet merged? Is there something that we can test to allow the PR is merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CPU LLVM Related to LLVM instruction decoders
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants