update SYCL README

AuroraPerego · Jul 27, 2023 · 172b27b · 172b27b
1 parent b0fbddf
commit 172b27b
Showing 1 changed file with 20 additions and 14 deletions.
diff --git a/README_SYCL.md b/README_SYCL.md
@@ -13,14 +13,12 @@ At the moment alpaka's SYCL back-end can only be used together with Intel oneAPI
 ### General CMake options
 
 * `alpaka_ACC_SYCL_ENABLE`: set to `ON` to enable the SYCL back-end. Requires the activation of at least one oneAPI hardware target (see below).
-* `alpaka_SYCL_ENABLE_IOSTREAM`: set to `ON` to enable device-side printing. Force-enabled if `BUILD_TESTING` is enabled.
-* `alpaka_SYCL_IOSTREAM_KIB`: Kibibytes per block reserved as output buffer for device-side printing. This cannot exceed the amount of shared memory per block. Only takes effect if `alpaka_SYCL_ENABLE_IOSTREAM` is enabled. Default: `64`.
 * `MKL_ROOT`: always required. Set to the root of your oneMKL installation. Example: `/opt/intel/oneapi/mkl/2023.1.0`.
 * `oneDPL_DIR`: always required. Set to the CMake path of your oneDPL installation. Example: `/opt/intel/oneapi/dpl/2022.1.0/lib/cmake/oneDPL`.
 
-### Building for Intel CPUs
+### Building for CPUs
 
-The following CMake flags can be set for Intel CPUs:
+The following CMake flags can be set for CPUs:
 
 * `alpaka_SYCL_ONEAPI_CPU`: set to `ON` to enable compilation for Intel CPUs.
 * `alpaka_SYCL_ONEAPI_CPU_ISA`: the Intel ISA to compile for. Look at the possible `--march` options listed in the output of `opencl-aot --help`. Default: `avx2`.
@@ -37,7 +35,8 @@ Note: Intel FPGAs cannot be targeted together with other Intel hardware. This is
 ### Building for Intel GPUs
 
 * `alpaka_SYCL_ONEAPI_GPU`: set to `ON` to enable compilation for Intel GPUs.
-* `alpaka_SYCL_ONEAPI_GPU_DEVICES`: semicolon-separated list of one or more Intel GPUs to compile for. Check the output of `ocloc compile --help` and look at the possible values for the `-device` argument for valid values to supply here. Default: `bdw`.
+* `alpaka_SYCL_ONEAPI_GPU_DEVICES`: semicolon-separated list of one or more Intel GPUs to compile for. The possible values for the devices are listed in the [UsersManual](https://intel.github.io/llvm-docs/UsersManual.html#generic-options) under the flag `-fsycl-targets`. Default: `intel_gpu_pvc`.
+  NOTE: currently only one target at a time can be specified (limitation of the Intel Compiler)
 
 ## Standalone mode
 
@@ -48,10 +47,7 @@ Using the SYCL back-end always requires the following flags:
 * `-fsycl` (compiler and linker)
 * `-fsycl-standard=2020` (compiler)
 
-To enable device-side printing add the following compiler flags:
-
-* `-DALPAKA_SYCL_IOSTREAM_ENABLED`: to enable device-side printing.
-* `-DALPAKA_SYCL_IOSTREAM_KIB=<value>`: `<value>` (without the brackets) defines the kibibytes per block to be reserved for device-side printing. `<value>` cannot exceed the amount of shared memory per block.
+Device-side printing is possible with `printf`, it calls `sycl::ext::oneapi::experimental::printf` that emulates the standard one. This is an extension of the SYCL standard, still in an experimental state, therefore may not always work correctly.
 
 ### Building for x86 64-bit CPUs
 
@@ -75,18 +71,17 @@ To enable device-side printing add the following compiler flags:
 
 1. `#include <alpaka/standalone/GpuSyclIntel.hpp>` in your C++ code.
 2. Add the following flags:
-  * `-fsycl-targets=spir64_gen` (compiler and linker): to enable GPU compilation. Note: If you are using multiple SYCL hardware targets (like CPU and GPU) separate them by comma here.
-  * `-Xsycl-target-backend=spir64_gen "-device <list>"` (linker): to choose the Intel GPU(s) to compile for. Multiple devices can either be separated by comma or by supplying a range of devices. Refer to the output of `ocloc compile --help` and look for the `-device` flag for the possible values.
+  * `-fsycl-targets=intel_gpu_pvc` (compiler and linker): to enable GPU compilation. Note: If you are using multiple SYCL hardware targets (like CPU and GPU) separate them by comma here.
 
 ## Using the SYCL back-end
 
 ### Choosing the Accelerator
 
 In contrast to the other back-ends the SYCL back-end comes with multiple different accelerators which should be chosen according to your requirements:
 
-* `alpaka::experimental::AccCpuSycl` for targeting Intel and AMD CPUs. In contrast to the other CPU back-ends this will use Intel's OpenCL implementation for CPUs under the hood.
-* `alpaka::experimental::AccFpgaSyclIntel` for targeting Intel FPGAs.
-* `alpaka::experimental::AccGpuSyclIntel` for targeting Intel GPUs. 
+* `alpaka::AccCpuSycl` for targeting Intel and AMD CPUs. In contrast to the other CPU back-ends this will use Intel's OpenCL implementation for CPUs under the hood.
+* `alpaka::AccFpgaSyclIntel` for targeting Intel FPGAs.
+* `alpaka::AccGpuSyclIntel` for targeting Intel GPUs.
 
 These can be used interchangeably (some restrictions apply - see below) with the non-experimental alpaka accelerators to compile an existing alpaka code for SYCL-capable hardware.
 
@@ -102,3 +97,14 @@ These can be used interchangeably (some restrictions apply - see below) with the
   ```
   See [Intel's FAQ](https://github.com/intel/compute-runtime/blob/master/opencl/doc/FAQ.md#feature-double-precision-emulation-fp64) for more information.
 * The FPGA back-end does not support atomics. alpaka will not check this.
+* device global variables are not supported in SYCL yet.
+
+### Choosing the sub-group size (warp size)
+
+There is a trait to specify at compile time the sub-group size to use for a kernel. The default behaviour, when no sub-group size is specified, is to let the back-end compiler pick the preferred size.
+
+Before launching a kernel with a compile-time sub-group size the user should query the sizes supported by the device, and choose accordingly. If the device does not support the requested size, the SYCL runtime will throw a synchronous exception.
+
+During just-in-time (JIT) compilation this guarantees that a kernel is compiled only for the sizes supported by the device. During ahead-of-time (AOT) compilation this is not enough, because the device is not known at compile time. The SYCL specification mandates that the back-end compilers should not fail if a kernel uses unsupported features, like unsupported sub-group sizes. Unfortunately the Intel OpenCL CPU and GPU compilers currently fail with a hard error. To work around this limitation, use the preprocessor macros defined when compiling AOT for the new SYCL targets to enable the compilation only for the sub-group sizes supported by each device.
+
+NOTE: while the CPU OpenCL back-end does support a sub-group size of 64, the SYCL code currently does not. To avoid issues with the sub-group primitives always consider the sub-group size of 64 as not supported by the device.