Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix and update SYCL targets #2390

Merged
merged 1 commit into from
Sep 20, 2024

Conversation

AuroraPerego
Copy link
Contributor

Now all device targets are defined as 0, while the one(s) we are compiling for are defined as 1. Therefore # if defined cannot be used anymore.

I've added some new targets as well.

Note that there is probably a bug with NVIDIA targets, but we are not using them for now.

Copy link
Contributor

@fwyzard fwyzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check some of the updates.

@fwyzard
Copy link
Contributor

fwyzard commented Sep 19, 2024

@psychocoderHPC the error in mathTest (link, attached log) should be unrelated to this PR:

Randomness seeded to: 2485158815
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mathTest is a Catch2 v3.5.2 host application.
Run with -? for options
-------------------------------------------------------------------------------
mathOpsComplexFloat - TestAccFunctorTuplesComplex - 49
-------------------------------------------------------------------------------
/builds/hzdr/crp/alpaka/test/unit/math/src/mathComplexFloat.cpp:26
...............................................................................
/builds/hzdr/crp/alpaka/test/unit/math/src/TestTemplate.hpp:145: FAILED:
  REQUIRE( isApproxEqual(results(i), std_result) )
with expansion:
  false
with messages:
  testing acc:alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::
  integral_constant<unsigned long, 1ul>, unsigned long> data type:alpaka::
  internal::Complex<float> functor:mathtest::OpPowComplex seed:4140714306
  Operator: OpPowComplex
  Type: alpaka::internal::Complex<float>
  The args buffer: 
  capacity: 1000
  0: [ (1e+01,1e+01), (0,0) ]
  ...
  999: [ (2,2), (-0.8,-0.3) ]
  
  Idx i: 1 computed : (506.109,-1.62104e+06) vs expected: (509.202,-1.62104e+
  06)
===============================================================================
test cases:    278 |    277 passed | 1 failed
assertions: 267086 | 267085 passed | 1 failed

Any ideas what may have caused it or how to reproduce it ?

@psychocoderHPC
Copy link
Member

I retriggered the failing test

Copy link
Contributor

@fwyzard fwyzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the behaviour of HIP with @AuroraPerego, we think that all AMD GPUs starting from gfx 10.0 should support only a subgroup size of 32.

@psychocoderHPC
Copy link
Member

psychocoderHPC commented Sep 19, 2024

After reviewing the behaviour of HIP with @AuroraPerego, we think that all AMD GPUs starting from gfx 10.0 should support only a subgroup size of 32.

Yes I checked it against https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html#accelerator-and-gpu-hardware-specifications (the table has different tabs)

and AMD is writing https://rocm.docs.amd.com/projects/HIP/en/latest/understand/hardware_implementation.html#rdna-architecture

RDNA makes a fundamental change to CU design, by changing the size of a warp to 32 threads. 
This is done by effectively combining two GCN5 SIMDs, creating a VALU of width 32, 
so that a whole warp can be issued in one cycle. 
The CU is also replaced by the work group processor (WGP), 
which encompasses two CUs. For backwards compatibility the WGP can also run in wave64 mode, 
in which it issues a warp of size 64 in two cycles.

@fwyzard
Copy link
Contributor

fwyzard commented Sep 19, 2024

Yes... the RDNA1, RDNA2 and RDNA3 architecture whitepapers suggest that both 32 and 64 can be used.

And clang++ does have an -mwavefrontsize64 option to enable wavefronts (warps) with 64 elements.

But HIP does not like it:

/opt/rocm/include/hip/hip_runtime.h
#if __HIP_DEVICE_COMPILE__ && !__GFX7__ && !__GFX8__ && !__GFX9__ && __AMDGCN_WAVEFRONT_SIZE == 64
#error HIP is not supported on the specified GPU ARCH with wavefront size 64
#endif

I assume that the ROCm backend for SYCL/oneAPI is built with HIP, so on gfx 10 and later we should use only a wavefront size of 32.

now all targets are defined as 0 (the one we are compiling for as 1),
therefore `if defined` cannot be used.

Co-authored-by: Andrea Bocci <fwyzard@gmail.com>
@psychocoderHPC psychocoderHPC merged commit 5c5a690 into alpaka-group:develop Sep 20, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants