Skip to content

Commit

Permalink
update SYCL README
Browse files Browse the repository at this point in the history
  • Loading branch information
AuroraPerego committed Jul 27, 2023
1 parent b0fbddf commit 172b27b
Showing 1 changed file with 20 additions and 14 deletions.
34 changes: 20 additions & 14 deletions README_SYCL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,12 @@ At the moment alpaka's SYCL back-end can only be used together with Intel oneAPI
### General CMake options

* `alpaka_ACC_SYCL_ENABLE`: set to `ON` to enable the SYCL back-end. Requires the activation of at least one oneAPI hardware target (see below).
* `alpaka_SYCL_ENABLE_IOSTREAM`: set to `ON` to enable device-side printing. Force-enabled if `BUILD_TESTING` is enabled.
* `alpaka_SYCL_IOSTREAM_KIB`: Kibibytes per block reserved as output buffer for device-side printing. This cannot exceed the amount of shared memory per block. Only takes effect if `alpaka_SYCL_ENABLE_IOSTREAM` is enabled. Default: `64`.
* `MKL_ROOT`: always required. Set to the root of your oneMKL installation. Example: `/opt/intel/oneapi/mkl/2023.1.0`.
* `oneDPL_DIR`: always required. Set to the CMake path of your oneDPL installation. Example: `/opt/intel/oneapi/dpl/2022.1.0/lib/cmake/oneDPL`.

### Building for Intel CPUs
### Building for CPUs

The following CMake flags can be set for Intel CPUs:
The following CMake flags can be set for CPUs:

* `alpaka_SYCL_ONEAPI_CPU`: set to `ON` to enable compilation for Intel CPUs.
* `alpaka_SYCL_ONEAPI_CPU_ISA`: the Intel ISA to compile for. Look at the possible `--march` options listed in the output of `opencl-aot --help`. Default: `avx2`.
Expand All @@ -37,7 +35,8 @@ Note: Intel FPGAs cannot be targeted together with other Intel hardware. This is
### Building for Intel GPUs

* `alpaka_SYCL_ONEAPI_GPU`: set to `ON` to enable compilation for Intel GPUs.
* `alpaka_SYCL_ONEAPI_GPU_DEVICES`: semicolon-separated list of one or more Intel GPUs to compile for. Check the output of `ocloc compile --help` and look at the possible values for the `-device` argument for valid values to supply here. Default: `bdw`.
* `alpaka_SYCL_ONEAPI_GPU_DEVICES`: semicolon-separated list of one or more Intel GPUs to compile for. The possible values for the devices are listed in the [UsersManual](https://intel.github.io/llvm-docs/UsersManual.html#generic-options) under the flag `-fsycl-targets`. Default: `intel_gpu_pvc`.
NOTE: currently only one target at a time can be specified (limitation of the Intel Compiler)

## Standalone mode

Expand All @@ -48,10 +47,7 @@ Using the SYCL back-end always requires the following flags:
* `-fsycl` (compiler and linker)
* `-fsycl-standard=2020` (compiler)

To enable device-side printing add the following compiler flags:

* `-DALPAKA_SYCL_IOSTREAM_ENABLED`: to enable device-side printing.
* `-DALPAKA_SYCL_IOSTREAM_KIB=<value>`: `<value>` (without the brackets) defines the kibibytes per block to be reserved for device-side printing. `<value>` cannot exceed the amount of shared memory per block.
Device-side printing is possible with `printf`, it calls `sycl::ext::oneapi::experimental::printf` that emulates the standard one. This is an extension of the SYCL standard, still in an experimental state, therefore may not always work correctly.

### Building for x86 64-bit CPUs

Expand All @@ -75,18 +71,17 @@ To enable device-side printing add the following compiler flags:

1. `#include <alpaka/standalone/GpuSyclIntel.hpp>` in your C++ code.
2. Add the following flags:
* `-fsycl-targets=spir64_gen` (compiler and linker): to enable GPU compilation. Note: If you are using multiple SYCL hardware targets (like CPU and GPU) separate them by comma here.
* `-Xsycl-target-backend=spir64_gen "-device <list>"` (linker): to choose the Intel GPU(s) to compile for. Multiple devices can either be separated by comma or by supplying a range of devices. Refer to the output of `ocloc compile --help` and look for the `-device` flag for the possible values.
* `-fsycl-targets=intel_gpu_pvc` (compiler and linker): to enable GPU compilation. Note: If you are using multiple SYCL hardware targets (like CPU and GPU) separate them by comma here.

## Using the SYCL back-end

### Choosing the Accelerator

In contrast to the other back-ends the SYCL back-end comes with multiple different accelerators which should be chosen according to your requirements:

* `alpaka::experimental::AccCpuSycl` for targeting Intel and AMD CPUs. In contrast to the other CPU back-ends this will use Intel's OpenCL implementation for CPUs under the hood.
* `alpaka::experimental::AccFpgaSyclIntel` for targeting Intel FPGAs.
* `alpaka::experimental::AccGpuSyclIntel` for targeting Intel GPUs.
* `alpaka::AccCpuSycl` for targeting Intel and AMD CPUs. In contrast to the other CPU back-ends this will use Intel's OpenCL implementation for CPUs under the hood.
* `alpaka::AccFpgaSyclIntel` for targeting Intel FPGAs.
* `alpaka::AccGpuSyclIntel` for targeting Intel GPUs.

These can be used interchangeably (some restrictions apply - see below) with the non-experimental alpaka accelerators to compile an existing alpaka code for SYCL-capable hardware.

Expand All @@ -102,3 +97,14 @@ These can be used interchangeably (some restrictions apply - see below) with the
```
See [Intel's FAQ](https://github.com/intel/compute-runtime/blob/master/opencl/doc/FAQ.md#feature-double-precision-emulation-fp64) for more information.
* The FPGA back-end does not support atomics. alpaka will not check this.
* device global variables are not supported in SYCL yet.
### Choosing the sub-group size (warp size)
There is a trait to specify at compile time the sub-group size to use for a kernel. The default behaviour, when no sub-group size is specified, is to let the back-end compiler pick the preferred size.
Before launching a kernel with a compile-time sub-group size the user should query the sizes supported by the device, and choose accordingly. If the device does not support the requested size, the SYCL runtime will throw a synchronous exception.
During just-in-time (JIT) compilation this guarantees that a kernel is compiled only for the sizes supported by the device. During ahead-of-time (AOT) compilation this is not enough, because the device is not known at compile time. The SYCL specification mandates that the back-end compilers should not fail if a kernel uses unsupported features, like unsupported sub-group sizes. Unfortunately the Intel OpenCL CPU and GPU compilers currently fail with a hard error. To work around this limitation, use the preprocessor macros defined when compiling AOT for the new SYCL targets to enable the compilation only for the sub-group sizes supported by each device.
NOTE: while the CPU OpenCL back-end does support a sub-group size of 64, the SYCL code currently does not. To avoid issues with the sub-group primitives always consider the sub-group size of 64 as not supported by the device.

0 comments on commit 172b27b

Please sign in to comment.