-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the SYCL backend for the SYCL 2020 standard and USM allocations #1845
Commits on Jul 26, 2023
-
Rewrite the SYCL backend for the SYCL 2020 standard and USM allocatio…
…ns (part 1) Initial work to support the SYCL 2020 standard, using USM allocations instead of SYCL buffers and accessors: - bring the SYCL interface in line with the other backends, and remove the last uses of the alpaka::experimental namespace; - reimplement the alpaka memory buffers, memset and memcpy tasks for the USM SYCL backend; - make the SYCL native handles more consistent with the other backends; - use the oneAPI printf extension, and implement a workaround for the OpenCL limitation on variadic functions and the conflict with AMD HIP/ROCm device code; - add more debug print messages; - various fixes for kernel names, memory_scope Grid and atomics; - update copyright information. Initial work on the SYCL random number generators (not fully working yet).
Configuration menu - View commit details
-
Copy full SHA for be6421e - Browse repository at this point
Copy the full SHA be6421eView commit details -
Rewrite the SYCL backend for the SYCL 2020 standard and USM allocatio…
…ns (part 2) More changes to the SYCL backend: - move printf to alpaka/core and use it in ALPAKA_CHECK; - remove IsView -> false in mem/buf/sycl/Accessor; - remove wrong attribute in mem/buf/sycl/Copy; - remove the SYCL experimental BuildAccessor<BufGenericSycl>, use the default implementation from alpaka/mem/view. Fix the examples to work with the SYCL backend: - fix the accelerator in the vectorAdd example; - move AccCpuSerial at the end in the ExampleDefaultAcc, as it was preventing the SYCL accelerators from being selected. Complete the work on the SYCL random number generators.
Configuration menu - View commit details
-
Copy full SHA for 5b1b2b7 - Browse repository at this point
Copy the full SHA 5b1b2b7View commit details -
Rewrite the SYCL backend for the SYCL 2020 standard and USM allocatio…
…ns (part 3) Update the documentation. Implement various fixes to the SYCL math functions: - add missing "if constexpr" to rsqrt(); - do not call math function with mixed arguments; this fixes errors due to the implicit conversion between floating point types of different sizes in sycl::atan2() and sycl::pow(); - add explicit type casts to silence warnings; - cast the result of isfinite/isinf/isnan to bool. Implement various fixes to the SYCL atomic functions: - fix the cas/compare_exchange loops; - clarify which atomic types are supported. Implement various fixes to the SYCL warp-level functions: - fix compilation warnings; - extract bits from sub_group_mask. Mark the use of global device variables and constants as undupported: the SYCL backend does not support global device variables and constants, yet. Add explicit checks on the dimensionality of the SYCL accelerator and work division. Silence warnings about the use of GNU extensions, and those coming from the Intel oneMKL and oneDPL headers. Update more tests for the SYCL backend: - add a special case for 0-dimensional tests; - disable the use of STL rand; - disable the test of global device variables and constants.
Configuration menu - View commit details
-
Copy full SHA for 97ebacd - Browse repository at this point
Copy the full SHA 97ebacdView commit details -
Rewrite the SYCL backend for the SYCL 2020 standard and USM allocatio…
…ns (part 4) Update the documentation related to FPGAs. Various fixes and updates to the SYCL backend and tests, the copyright information and code formatting.
Configuration menu - View commit details
-
Copy full SHA for 085365a - Browse repository at this point
Copy the full SHA 085365aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0fadd2a - Browse repository at this point
Copy the full SHA 0fadd2aView commit details -
Rewrite the SYCL memcpy and memset operations
Rewrite the N-dimensional Copy and Set memory operations to support pitched memory buffers, based on the Cpu implementation. This may require more than one memset or memcpy call per operation, which is not supported by command group handlers. Rewrite the Copy and Set memory operations to use queues instead.
Configuration menu - View commit details
-
Copy full SHA for 93c607d - Browse repository at this point
Copy the full SHA 93c607dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 01afbee - Browse repository at this point
Copy the full SHA 01afbeeView commit details -
Support compile-time warp size in SYCL kernels
Introduce a new optional trait to describe at compile time the warp size that a kernel should use. The default behaviour is to let the back-end compiler pick the preferred size. Before launching a kernel with a compile-time sub-group size the user should query the sizes supported by the device, and choose accordingly. If the device does not support the requested size, the SYCL runtime will throw a synchronous exception. During just-in-time (JIT) compilation this guarantees that a kernel is compiled only for the sizes supported by the device. During ahead-of-time (AOT) compilation this is not enough, because the device is not known at compile time. The SYCL specification mandates that the back-end compilers should not fail if a kernel uses unsupported features, like unsupported sub-group sizes. Unfortunately the Intel OpenCL CPU and GPU compilers currently fail with a hard error. To work around this limitation, use the preprocessor macros defined when compiling AOT for the new SYCL targets to enable the compilation only for the sub-group sizes supported by each device. Note: while the CPU OpenCL back-end does support a sub-group size of 64, the SYCL code currently does not. To avoid issues with the sub-group primitives always consider the sub-group size of 64 as not supported by the device. Other changes: - remove the use of SYCL streams in favour of the printf() extension; - remove the ALPAKA_FN_HOST attribute; - fix the GetSize test for the different sub-group sizes; - fix the use of sycl::exceptions; - use different member names for nd_item in different classes, to avoid ambiguous name lookup error when accessing the nd_item in the accelerator object.
Configuration menu - View commit details
-
Copy full SHA for b26b05a - Browse repository at this point
Copy the full SHA b26b05aView commit details -
Various fixes related to the SYCL back-end
- add the missing specialization of CreateViewPlainPtr for SYCL devices - improve the comments on the ALPAKA_FN_INLINE macro - remove unnecessary ALPAKA_FN_HOST attributes - rename QueueGenericSyclBase::m_impl to m_spQueueImpl, to align with the other back-ends
Configuration menu - View commit details
-
Copy full SHA for 218ab6e - Browse repository at this point
Copy the full SHA 218ab6eView commit details -
Configuration menu - View commit details
-
Copy full SHA for c66b7a5 - Browse repository at this point
Copy the full SHA c66b7a5View commit details