Rewrite the SYCL backend for the SYCL 2020 standard and USM allocations #1845

Parsifal-2045 · 2022-11-29T10:53:49Z

Rewrite the SYCL backend to support the SYCL 2020 standard, using USM allocations instead of SYCL buffers and accessors.

Few highlights:

full support for the SYCL 2020 standard and the Unified Shared Memory (USM) model;
rewrite the N-dimensional Copy and Set memory operations to support pitched memory buffers, based on the Cpu implementation;
generalise the SYCL CpuSelector to non-Intel CPUs;
implement support for random number generators based on the Intel oneAPI libraries;
support compile-time sub-group sizes in SYCL kernels (see below);
update the documentation, tests and examples.

Kernel trait for compile-time sub-group size

Introduce a new optional trait to describe at compile time the warp size that a kernel should use. The default behaviour is to let the back-end compiler pick the preferred size.

Before launching a kernel with a compile-time sub-group size the user should query the sizes supported by the device, and choose accordingly. If the device does not support the requested size, the SYCL runtime will throw a synchronous exception.

During just-in-time (JIT) compilation this guarantees that a kernel is compiled only for the sizes supported by the device. During ahead-of-time (AOT) compilation this is not enough, because the device is not known at compile time. The SYCL specification mandates that the back-end compilers should not fail if a kernel uses unsupported features, like unsupported sub-group sizes. Unfortunately the Intel OpenCL CPU and GPU compilers currently fail with a hard error. To work around this limitation, use the preprocessor macros defined when compiling AOT for the new SYCL targets to enable the compilation only for the sub-group sizes supported by each device.

Note: while the CPU OpenCL back-end does support a sub-group size of 64, the SYCL code currently does not. To avoid issues with the sub-group primitives always consider the sub-group size of 64 as not supported by the device.

bernhardmgruber

I started reviewing but the changeset but it is cluttered with too much namespace removals to stay focused. Can we keep the removal of the experimental namespace out of this PR? We can either remove the namespace after your work is integrated or even schedule it before your PR. @j-stephan and @psychocoderHPC, what would you prefer?

include/alpaka/atomic/AtomicGenericSycl.hpp

include/alpaka/dev/DevGenericSycl.hpp

bernhardmgruber · 2022-11-29T22:17:46Z

include/alpaka/dev/DevGenericSycl.hpp

+        auto get_device() const -> sycl::device
+        {


Should we call those maybe get_native_device? We tend to us the word native when we expose platform specifics. See also getNativeHandle, which btw already offers this functionality. Should we have your function at all then?

The two functions get_device and get_context were implemented as commodities for other pieces of code more than anything. In particular, sometimes we only need one of the two (for example, to allocate pinned memory on host we just need the device's context) and I find it clearer to ask for either the device or the context with a specific function instead of relying on the pair. I might reimplement them using getNativeHandle and maybe rename them, following the same convention, as getNativeDevice and getNativeContext. Otherwise, removing them entirely shouldn't be much of an issue

Yes, these should be in camelCase. However, I'm not sure I'm a fan. Are there use cases where you would need a sycl::device without its sycl::context? Even in this PR they are used together. This is why we decided to return a std::pair<sycl::device, sycl::context> in getNativeHandle().

There are a few cases where we need the context but not the device: the allocation of pinned / mapped host memory.
I'll look into all the debug prints (also, in the sycl::free method we only need the context)

bernhardmgruber · 2022-11-29T22:17:58Z

include/alpaka/dev/DevGenericSycl.hpp

+            return m_impl->get_device();
+        }
+
+        auto get_context() const -> sycl::context


Same as previous comment.

include/alpaka/event/EventGenericSycl.hpp

include/alpaka/queue/QueueGenericSyclBlocking.hpp

fwyzard · 2022-11-29T22:47:55Z

In particular, to allocate pinned memory on the host, SYCL needs the context of the specific device. For this reason the BufAllocMapped is templated on the device (which contains the context) and not on the platform like the other implementations. Two solutions can be proposed:

Keep the SYCL implementation as is and template all the other specialised traits on the device (while internally still only using its platform)

Move the sycl::context from the Alpaka device to the Alpaka platform, thus making the platform an object and, at the same time, allowing SYCL to template onto it like all the other implementations do.

The difference is a bit more nuanced.

Currently, allocMappedBuf takes the platform as a template argument:

auto buf = alpaka::allocMappedBuf<PltfAcc, Val, Idx>(host, extent);

The approach used in this PR is to pass accelerator device along, and get the sycl::context from there:

auto buf = alpaka::allocMappedBuf<Val, Idx>(host, device, extent);

This works, but IMHO is not the preferred solution because the mapped memory buffer is associated to all the devices in the same SYCL context, not only on the given one.

One of the options discussed this morning is to have the SYCL context as a data member of the PltfGenericSycl type, and change the Platform types to actual concrete types that one has to instantiate and pass around (explicitly, or implicitly through their devices).
This does require a change in interface:

auto buf = alpaka::allocMappedBuf<Val, Idx>(host, platform, extent);

The other option mentioned this morning (possibly what @j-stephan is looking into) is to have the SYCL context as a static data member of the PltfGenericSycl type.
This would indeed let us keep the current interface:

auto buf = alpaka::allocMappedBuf<PltfAcc, Val, Idx>(host, extent);

Personally, I'm neither in favour or against the approach based on static variables, if used correctly.

The first requirement that comes to mind is some form of lazy initialisation: the SYCL context should be initialised only if and when a SYCL device on the given platform is used, not as global object construction time. This is highly desirable because some SYCL platforms take a long time to initialise (e.g. SYCL's CUDA backend with more than one NVIDIA GPU present). I think it is also needed to support debugging (I've seen cuda-gdb fail if CUDA was initialised before the call to main()).

The other obvious requirement is thread safety: sharing the same SYCL context across threads should be safe both at construction time and during the rest of the program execution. This should be easy assuming the underlying sycl::context object is itself thread safe.

The last point is whether it should be possible to explicitly destroy the Alpaka platform and the associated SYCL context, or if it should only be destroyed automatically (e.g. by ref counting), or never (only implicitly at the end of the process, which might have consequences on debugging and profiling).

By the way, if the platforms gain a state (static or not), we could (re)consider adding also lazy refs to the devices, so that alpaka::getDevByIdx<Pltf>(i) may return the same device object instead of a new instance every time...

Parsifal-2045 · 2022-11-30T08:28:52Z

Can we keep the removal of the experimental namespace out of this PR? We can either remove the namespace after your work is integrated or even schedule it before your PR.

That removal was mostly motivated by the testing ease that we gained without it. Adding it back in, although certainly possible, would require a bit more work to reimplement all of the latest changes and would also make testing way more difficult on our side, since the generic alpaka interface wouldn't work anymore.

j-stephan

Thanks! This looks like a promising effort. A few comments:

include/alpaka/acc/AccGenericSycl.hpp

include/alpaka/kernel/TaskKernelGenericSycl.hpp

j-stephan · 2022-12-01T15:15:26Z

include/alpaka/mem/buf/sycl/Accessor.hpp

+    struct Accessor<
+        detail::SyclAccessor<TElem, DimInt<TDim>::value, TAccessModes>,
+        TElem,
+        TIdx,
+        TDim,
+        TAccessModes>


Did clang-format do this?

Yes, but it might be due to some other changes, it will probably be reverted

j-stephan · 2022-12-01T15:17:39Z

include/alpaka/dev/DevGenericSycl.hpp

+        auto get_device() const -> sycl::device
+        {


Yes, these should be in camelCase. However, I'm not sure I'm a fan. Are there use cases where you would need a sycl::device without its sycl::context? Even in this PR they are used together. This is why we decided to return a std::pair<sycl::device, sycl::context> in getNativeHandle().

j-stephan · 2022-12-02T08:57:38Z

include/alpaka/mem/buf/BufGenericSycl.hpp

-#    if ALPAKA_DEBUG >= ALPAKA_DEBUG_FULL
-                auto const widthBytes = width * static_cast<TIdx>(sizeof(TElem));
-                std::cout << __func__ << " ew: " << width << " ewb: " << widthBytes << '\n';
-#    endif


These shouldn't be entirely removed as we require that info for debugging purposes.

In the new version of this method we don't have different cases to deal with the different dimensions, but a way to keep these debug prints could be this, lines 171 to 199. Does it work for you?

j-stephan · 2022-12-02T08:58:02Z

include/alpaka/mem/buf/BufGenericSycl.hpp

-#    if ALPAKA_DEBUG >= ALPAKA_DEBUG_FULL
-                auto const widthBytes = width * static_cast<TIdx>(sizeof(TElem));
-                std::cout << __func__ << " ew: " << width << " eh: " << height << " ed: " << depth
-                          << " ewb: " << widthBytes << " pitch: " << widthBytes << '\n';
-#    endif


include/alpaka/mem/buf/BufGenericSycl.hpp

j-stephan · 2022-12-02T09:24:19Z

include/alpaka/mem/buf/sycl/Copy.hpp

+            if(static_cast<std::size_t>(this->m_extent.prod()) != 0u)
+            {
+                meta::ndLoopIncIdx(
+                    extentWithoutInnermost,
+                    [&](Vec<DimMin1, ExtentSize> const& idx)
+                    {
+                        queue.getNativeHandle().memcpy(
+                            reinterpret_cast<void*>(
+                                this->m_dstMemNative
+                                + (castVec<DstSize>(idx) * dstPitchBytesWithoutOutmost)
+                                      .foldrAll(std::plus<DstSize>())),
+                            reinterpret_cast<void const*>(
+                                this->m_srcMemNative
+                                + (castVec<SrcSize>(idx) * srcPitchBytesWithoutOutmost)
+                                      .foldrAll(std::plus<SrcSize>())),
+                            static_cast<std::size_t>(this->m_extentWidthBytes));
+                    });
+            }


This would launch many memcpy operations if we are dealing with 2D or 3D buffers. I assume you are doing this in order to deal with offsets / slices in one of the source / destination views. Wouldn't it be more effective to write a specialized copy kernel for the 2D / 3D case?

does Alpaka support the cases where

the source and destination buffers have different pitch

only a subset of the source and destination buffers are copied

?

At least the CUDA implementation seems to support both points, so I'd argue for SYCL to do the same. CC @psychocoderHPC, please correct me if I'm wrong.

j-stephan · 2022-12-02T09:26:53Z

include/alpaka/mem/buf/sycl/Copy.hpp

+    //! The SYCL non-blocking device queue scalar copy enqueue trait specialization.
+    template<typename TPltf, typename TExtent, typename TViewSrc, typename TViewDst>
+    struct Enqueue<
+        alpaka::QueueGenericSyclNonBlocking<TPltf>,
+        alpaka::detail::TaskCopySycl<DimInt<0u>, TViewDst, TViewSrc, TExtent>>
+    {


Why are you adding specializations for the Enqueue trait? Shouldn't the existing design (using objects that can be used as SYCL command groups) also fit for this use case?

I don't think the existing design without the specialization would work because we are using the memcpy method of the sycl::queue, not of the sycl::handler as it was before.

And is there a reason for not calling cgh.memcpy and using the queue version instead?

Not really, we were using the queue version in our application. We've just tried with cgh.memcpy and it seems to work, we'll do some more tests tomorrow and then change it

In both cases, moving from the sycl::queue's methods to the sycl::handler's methods we lose the ability to Copy/Set in N dimensions as we are doing right now. We would need a specialised method for that since one handler task can call a single memory operation / kernel

j-stephan · 2022-12-02T09:27:54Z

include/alpaka/mem/buf/sycl/Set.hpp

+                    meta::ndLoopIncIdx(
+                        extentWithoutInnermost,
+                        [&](Vec<DimMin1, ExtentSize> const& idx)
+                        {
+                            queue.getNativeHandle().memset(
+                                reinterpret_cast<void*>(
+                                    this->m_dstMemNative
+                                    + (castVec<DstSize>(idx) * dstPitchBytesWithoutOutmost)
+                                          .foldrAll(std::plus<DstSize>())),
+                                this->m_byte,
+                                static_cast<std::size_t>(this->m_extentWidthBytes));
+                        });


The same questions as for the copy operations also apply to this file.

j-stephan · 2022-12-02T09:46:32Z

Some other points:

The namespace changes should be part of a separate PR. They add a lot of noise for an otherwise (relatively) small change (in terms of LOC).
The AMD/Xilinx types need special handling because the AMD/Xilinx implementation doesn't support USM pointers yet.

The other option mentioned this morning (possibly what @j-stephan is looking into) is to have the SYCL context as a static data member of the PltfGenericSycl type.

This is indeed what I'm currently investigating. The challenge here is that we would need to have the context not as part of PltfGenericSycl but as part of the various hardware-specific platforms. Otherwise it wouldn't be possible to mix platforms from multiple vendors in the same alpaka program.

However, this requires a refactoring of the existing (alpaka) SYCL platform design. In the current design, having separate contexts in PltfCpuSyclIntel and PltfGpuSyclIntel would mean that we cannot share memory objects between them which wouldn't be very useful.

fwyzard · 2022-12-02T09:48:56Z

However, this requires a refactoring of the existing (alpaka) SYCL platform design. In the current design, having separate contexts in PltfCpuSyclIntel and PltfGpuSyclIntel would mean that we cannot share memory objects between them which wouldn't be very useful.

You cannot share memory objects between the Intel OpenCL CPU and GPU platforms/devices.

fwyzard · 2022-12-02T09:52:51Z

The namespace changes should be part of a separate PR. They add a lot of noise for an otherwise (relatively) small change (in terms of LOC).

What I understood from @Parsifal-2045 is that the experimental namespace approach makes it impossible to use the SYCL backend in the usual tests.

The removal of the experimental namespace can be delayed to a separate PR only if there is a way to still use the tests with it.

The AMD/Xilinx types need special handling because the AMD/Xilinx implementation doesn't support USM pointers yet.

From our point of view, it means that those backends are simply unusable.
Well, from the point of view of the Alpaka tests, as well.

j-stephan · 2022-12-02T10:26:34Z

You cannot share memory objects between the Intel OpenCL CPU and GPU platforms/devices.

Wow, really? I actually never tested that because I assumed it would just work, everything being oneAPI and all. Is that documented somewhere? A quick Google search didn't bring up anything useful.

From our point of view, it means that those backends are simply unusable.

I tend to agree and I'm not a big fan of having separate code paths for them. However, our FPGA setup at HZDR is Xilinx-based so we need to keep it.

fwyzard · 2022-12-02T10:29:31Z

You cannot share memory objects between the Intel OpenCL CPU and GPU platforms/devices.

Wow, really? I actually never tested that because I assumed it would just work, everything being oneAPI and all. Is that documented somewhere? A quick Google search didn't bring up anything useful.

SYCL memory objects are associated to the SYCL context.
The OpenCL CPU and GPU devices are in different contexts, so they cannot share memory objects.

fwyzard · 2022-12-02T10:33:45Z

From our point of view, it means that those backends are simply unusable.

I tend to agree and I'm not a big fan of having separate code paths for them. However, our FPGA setup at HZDR is Xilinx-based so we need to keep it.

Maybe we can introduce an intermediate layer:

generic SYCL device/platform/etc.
- USM-based SYCL device/platform/etc.
  - Intel CPUs
  - Intel GPUs
  - Intel/Altera FPGAs
  - CUDA GPUs
  - ROCm GPUs
- accessor-based SYCL device/platform/etc.
  - Xilinx/AMD FPGAs
  - possibly the others as well

and keep as much as possible of the code common ?

j-stephan · 2022-12-02T12:20:39Z

Maybe we can introduce an intermediate layer [...] and keep as much as possible of the code common ?

Yes, that is what I'm trying to attempt in my version of this PR. So far I basically have buffer / memcpy / memset specializations for Xilinx and everything else is shared.

Parsifal-2045 · 2022-12-19T17:51:44Z

The latest push implements static members inside the Alpaka platform for SYCL as suggested in #1865. This allows to use allocMappedBuf in the same way as it is used by other backends (two arguments: host and extent, while being templated on Platform, Element type, TDim, TIdx). A previous push has also implemented N-dimensional copy and set tasks for SYCL

fwyzard · 2023-01-17T09:19:19Z

@Parsifal-2045 now that #1865 has been merged, would you have time to rebase and update this PR ?

Parsifal-2045 · 2023-01-17T14:16:14Z

I can take a look in the next few days

fwyzard · 2023-01-19T12:19:30Z

@Parsifal-2045 @j-stephan I see that with these changes the tests fail pretty soon.
What could be a viable strategy to eventually merge it ?

For example, would it help if I prepare a separate PR with only the removal of the alpaka::experimental namespace for the SYCL classes ?

j-stephan · 2023-01-24T18:37:04Z

Hi @fwyzard, I'm on vacation since last Wednesday. I'll be back on 06 February and look into this.

fwyzard · 2023-01-24T19:12:49Z

Hi @fwyzard, I'm on vacation since last Wednesday.

I heard this morning - congratulations :-)

Parsifal-2045 · 2023-01-31T09:56:52Z

I have opened a supporting PR (#1910) with just the removal of the experimental namespace, if that makes it easier to review

bernhardmgruber · 2023-03-27T15:27:21Z

@Parsifal-2045 We merged #1910, so this PR can be rebased. I would strongly recommend to squash all changes into one commit before rebasing. If you need help, I can do that for you.

bernhardmgruber · 2023-07-26T13:43:34Z

@bernhardmgruber I think the platform changes broke the SYCL backend :-(

I am sorry to hear that. For the sake of getting this PR done, it's fine for me if you merge this PR with a workaround and add a FIXME comment in the code + a github issue to finish the transformation back to platform objects for the SYCL backend.

SimeonEhrig · 2023-07-26T14:51:16Z

@SimeonEhrig I have finally been able to build most of the tests with this PR on my laptop with Ubuntu 22.04 and oneAPI 2023.2.0:
cmake \
  -DCMAKE_CXX_COMPILER=/opt/intel/oneapi/compiler/latest/linux/bin/icpx \
  -DoneDPL_ROOT=/opt/intel/oneapi/dpl/latest \
  -DoneDPL_DIR=/opt/intel/oneapi/dpl/latest/lib/cmake/oneDPL \
  -DMKL_ROOT=/opt/intel/oneapi/mkl/latest \
  -DMKL_DIR=/opt/intel/oneapi/mkl/latest/lib/cmake/mkl \
  -DTBB_ROOT=/opt/intel/oneapi/tbb/latest \
  -DTBB_DIR=/opt/intel/oneapi/tbb/latest/lib/cmake/tbb \
  -DBUILD_TESTING=ON \
  -Dalpaka_ACC_SYCL_ENABLE=ON \
  -Dalpaka_SYCL_PLATFORM_ONEAPI=ON \
  -Dalpaka_SYCL_ONEAPI_CPU=ON \
  ../alpaka/

make -j4

make test
The explicit _ROOT and _DIR are needed to convince CMake to use the version of the libraries from oneAPI instead of the system ones.

Thank you. I will test it, when I starting developing the sycl CI.

j-stephan · 2023-07-26T14:52:53Z

@fwyzard: Do you still need to set those paths when you execute source /opt/intel/oneapi/setvars.sh before?

fwyzard · 2023-07-26T16:01:01Z

@fwyzard: Do you still need to set those paths when you execute source /opt/intel/oneapi/setvars.sh before?

At least some of them yes, otherwise CMake may pick some of the libraries from /usr/lib, and fail at link time.

…ns (part 1) Initial work to support the SYCL 2020 standard, using USM allocations instead of SYCL buffers and accessors: - bring the SYCL interface in line with the other backends, and remove the last uses of the alpaka::experimental namespace; - reimplement the alpaka memory buffers, memset and memcpy tasks for the USM SYCL backend; - make the SYCL native handles more consistent with the other backends; - use the oneAPI printf extension, and implement a workaround for the OpenCL limitation on variadic functions and the conflict with AMD HIP/ROCm device code; - add more debug print messages; - various fixes for kernel names, memory_scope Grid and atomics; - update copyright information. Initial work on the SYCL random number generators (not fully working yet).

…ns (part 2) More changes to the SYCL backend: - move printf to alpaka/core and use it in ALPAKA_CHECK; - remove IsView -> false in mem/buf/sycl/Accessor; - remove wrong attribute in mem/buf/sycl/Copy; - remove the SYCL experimental BuildAccessor<BufGenericSycl>, use the default implementation from alpaka/mem/view. Fix the examples to work with the SYCL backend: - fix the accelerator in the vectorAdd example; - move AccCpuSerial at the end in the ExampleDefaultAcc, as it was preventing the SYCL accelerators from being selected. Complete the work on the SYCL random number generators.

…ns (part 3) Update the documentation. Implement various fixes to the SYCL math functions: - add missing "if constexpr" to rsqrt(); - do not call math function with mixed arguments; this fixes errors due to the implicit conversion between floating point types of different sizes in sycl::atan2() and sycl::pow(); - add explicit type casts to silence warnings; - cast the result of isfinite/isinf/isnan to bool. Implement various fixes to the SYCL atomic functions: - fix the cas/compare_exchange loops; - clarify which atomic types are supported. Implement various fixes to the SYCL warp-level functions: - fix compilation warnings; - extract bits from sub_group_mask. Mark the use of global device variables and constants as undupported: the SYCL backend does not support global device variables and constants, yet. Add explicit checks on the dimensionality of the SYCL accelerator and work division. Silence warnings about the use of GNU extensions, and those coming from the Intel oneMKL and oneDPL headers. Update more tests for the SYCL backend: - add a special case for 0-dimensional tests; - disable the use of STL rand; - disable the test of global device variables and constants.

…ns (part 4) Update the documentation related to FPGAs. Various fixes and updates to the SYCL backend and tests, the copyright information and code formatting.

Rewrite the N-dimensional Copy and Set memory operations to support pitched memory buffers, based on the Cpu implementation. This may require more than one memset or memcpy call per operation, which is not supported by command group handlers. Rewrite the Copy and Set memory operations to use queues instead.

Introduce a new optional trait to describe at compile time the warp size that a kernel should use. The default behaviour is to let the back-end compiler pick the preferred size. Before launching a kernel with a compile-time sub-group size the user should query the sizes supported by the device, and choose accordingly. If the device does not support the requested size, the SYCL runtime will throw a synchronous exception. During just-in-time (JIT) compilation this guarantees that a kernel is compiled only for the sizes supported by the device. During ahead-of-time (AOT) compilation this is not enough, because the device is not known at compile time. The SYCL specification mandates that the back-end compilers should not fail if a kernel uses unsupported features, like unsupported sub-group sizes. Unfortunately the Intel OpenCL CPU and GPU compilers currently fail with a hard error. To work around this limitation, use the preprocessor macros defined when compiling AOT for the new SYCL targets to enable the compilation only for the sub-group sizes supported by each device. Note: while the CPU OpenCL back-end does support a sub-group size of 64, the SYCL code currently does not. To avoid issues with the sub-group primitives always consider the sub-group size of 64 as not supported by the device. Other changes: - remove the use of SYCL streams in favour of the printf() extension; - remove the ALPAKA_FN_HOST attribute; - fix the GetSize test for the different sub-group sizes; - fix the use of sycl::exceptions; - use different member names for nd_item in different classes, to avoid ambiguous name lookup error when accessing the nd_item in the accelerator object.

- add the missing specialization of CreateViewPlainPtr for SYCL devices - improve the comments on the ALPAKA_FN_INLINE macro - remove unnecessary ALPAKA_FN_HOST attributes - rename QueueGenericSyclBase::m_impl to m_spQueueImpl, to align with the other back-ends

fwyzard · 2023-07-26T17:22:23Z

@bernhardmgruber it turns out that the changes to the platforms were good.
The problem comes from the delegating constructor of KernelExecutionFixture, that uses the data member m_platformAcc before it has been initialised.

A fix is in #2021.

bernhardmgruber · 2023-07-26T18:53:18Z

@bernhardmgruber it turns out that the changes to the platforms were good. The problem comes from the delegating constructor of KernelExecutionFixture, that uses the data member m_platformAcc before it has been initialised.

A fix is in #2021.

You are amazing! Thank you so much :) The proposed PR also LGTM. Great work!

Agreed to keep the full set of changes in thi PR

fwyzard · 2023-07-27T08:29:16Z

🎉

j-stephan · 2023-07-27T08:50:36Z

Well, thanks for the good work! This was our largest PR so far (in terms of comments and reviews). Glad to see it accepted!

bernhardmgruber requested a review from j-stephan November 29, 2022 12:13

bernhardmgruber reviewed Nov 29, 2022

View reviewed changes

include/alpaka/queue/QueueGenericSyclBlocking.hpp Outdated Show resolved Hide resolved

j-stephan requested changes Dec 2, 2022

View reviewed changes

j-stephan added Type:Enhancement Backend:SYCL labels Dec 6, 2022

j-stephan linked an issue Dec 6, 2022 that may be closed by this pull request

Add USM pointers to Intel SYCL back-ends #1646

Closed

j-stephan mentioned this pull request Dec 12, 2022

Make SYCL runtime objects static. #1865

Merged

Parsifal-2045 mentioned this pull request Dec 20, 2022

Add support for the SYCL backend for Alpaka cms-patatrack/heterogeneous-clue#20

Open

bernhardmgruber mentioned this pull request Jan 9, 2023

Remove experimental::Accessor #1897

Closed

1 task

Parsifal-2045 force-pushed the SYCL_USM branch from 15eb7aa to 4894703 Compare January 18, 2023 19:29

Parsifal-2045 mentioned this pull request Jan 31, 2023

Remove experimental namespace from SYCL #1910

Merged

bernhardmgruber mentioned this pull request Jul 26, 2023

Fix compilation errors in PltfGenericSycl #2020

Merged

fwyzard force-pushed the SYCL_USM branch from 7eff280 to a326eea Compare July 26, 2023 13:54

fwyzard force-pushed the SYCL_USM branch from a326eea to 66ec8d5 Compare July 26, 2023 17:09

Parsifal-2045 and others added 10 commits July 26, 2023 19:17

Rewrite the SYCL backend for the SYCL 2020 standard and USM allocatio…

085365a

…ns (part 4) Update the documentation related to FPGAs. Various fixes and updates to the SYCL backend and tests, the copyright information and code formatting.

Generalise the SYCL CpuSelector to non-Intel CPUs

0fadd2a

First draft adding the warp size as a kernel trait

01afbee

Update the main SYCL include file name

c66b7a5

fwyzard force-pushed the SYCL_USM branch from 66ec8d5 to c66b7a5 Compare July 26, 2023 17:17

fwyzard requested a review from j-stephan July 27, 2023 06:22

bernhardmgruber approved these changes Jul 27, 2023

View reviewed changes

fwyzard merged commit b5d541b into alpaka-group:develop Jul 27, 2023
20 checks passed

j-stephan mentioned this pull request Aug 1, 2023

Add random number support to SYCL back-end #1605

Closed

AuroraPerego mentioned this pull request Sep 4, 2023

[alpaka] Add support for the SYCL back-end cms-patatrack/pixeltrack-standalone#407

Open

Rewrite the SYCL backend for the SYCL 2020 standard and USM allocations #1845

Rewrite the SYCL backend for the SYCL 2020 standard and USM allocations #1845

Conversation

Parsifal-2045 commented Nov 29, 2022 • edited by fwyzard Loading

Kernel trait for compile-time sub-group size

bernhardmgruber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Parsifal-2045 Nov 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Nov 29, 2022

Parsifal-2045 commented Nov 30, 2022

j-stephan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-stephan commented Dec 2, 2022

fwyzard commented Dec 2, 2022

fwyzard commented Dec 2, 2022

j-stephan commented Dec 2, 2022

fwyzard commented Dec 2, 2022

fwyzard commented Dec 2, 2022

j-stephan commented Dec 2, 2022 • edited Loading

Parsifal-2045 commented Dec 19, 2022 • edited Loading

fwyzard commented Jan 17, 2023

Parsifal-2045 commented Jan 17, 2023

fwyzard commented Jan 19, 2023

j-stephan commented Jan 24, 2023

fwyzard commented Jan 24, 2023

Parsifal-2045 commented Jan 31, 2023

bernhardmgruber commented Mar 27, 2023

bernhardmgruber commented Jul 26, 2023

SimeonEhrig commented Jul 26, 2023

j-stephan commented Jul 26, 2023

fwyzard commented Jul 26, 2023

fwyzard commented Jul 26, 2023

bernhardmgruber commented Jul 26, 2023

fwyzard commented Jul 27, 2023

j-stephan commented Jul 27, 2023

Parsifal-2045 commented Nov 29, 2022 •

edited by fwyzard

Loading

Parsifal-2045 Nov 30, 2022 •

edited

Loading

j-stephan commented Dec 2, 2022 •

edited

Loading

Parsifal-2045 commented Dec 19, 2022 •

edited

Loading