CMakeLists.txt Improvements for CUDA #1337

kylosus · 2023-07-28T15:04:07Z

This PR bumps cmake version to 3.17 and replaces the deprecated find_package(CUDA) with FindCUDAToolkit, with a number of improvements to the compilation process:

CUDA Include and Library directories are now handled automatically by cmake
CUDA architecture handling is reworked: No more regex in CMakeLists.txt or manual -gencode string generation in python code.
CUDA source files are now included directly in the targets: cmake handles proper compilation and linking of device code automatically.
Similar modifications to OpenMP and Threads targets

- Bumped cmake minimim version to 3.17 - Changed `DACE_LIBS` element to `CUDA::cudart` - Removed unneeded `include_directories` and link_directories calls - Removed `compile_cuda`. CUDA files are now passed directly to targets - Removed `-fPIC` and `-std` args from nvcc as they are handled automatically now - Renamed `CUDA_NVCC_FLAGS` to `CMAKE_CUDA_FLAGS`

- Moved `-gencode` handling to `cmake`: cmake variable `DACE_CUDA_ARCHITECTURES_DEFAULT` is set in python code to be used by `CMAKE_CUDA_ARCHITECTURES` instead of manually creating the compiler arg string. - Default cuda arch is no longer included forcefully in presence of a native architecture - `get_cuda_arch.cpp` now returns a properly formatted architecture string compatible with cmake. - `get_cuda_arch.cpp` now fails if no architectures are found

dace/codegen/CMakeLists.txt

This reverts commit abdfc88.

kylosus · 2023-08-03T22:00:22Z

tests/cuda_highdim_kernel_test.py is erroring on master branch too, but the test passes because the dace program exits with 0. This code in this branch errors and causes a segfault in pytest, which is why the test is failing. I don't know how this is possible or how it was unearthed by the changes in this PR.

$ # git checkout master
$ pytest --tb=short -m "gpu" tests/cuda_highdim_kernel_test.py
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.10.12, pytest-7.4.0, pluggy-1.2.0
rootdir: /home/user/nvidia/dace-test
configfile: pytest.ini
collected 9 items / 6 deselected / 3 selected                                                         

tests/cuda_highdim_kernel_test.py ...                                                                                                                                                                     [100%]

=============================================================================================== warnings summary ================================================================================================
tests/cuda_highdim_kernel_test.py::test_gpu
  /home/user/nvidia/dace-test/venv/lib/python3.10/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function CompiledSDFG.__del__ at 0x7f53aff3f1c0>
  
  Traceback (most recent call last):
    File "/home/user/nvidia/dace-test/dace/codegen/compiled_sdfg.py", line 402, in __del__
      self.finalize()
    File "/home/user/nvidia/dace-test/dace/codegen/compiled_sdfg.py", line 354, in finalize
      raise RuntimeError(
  RuntimeError: An error was detected after running "tests_cuda_highdim_kernel_test_highdim": invalid configuration argument. Consider enabling synchronous debugging mode (environment variable: DACE_compiler_cuda_syncdebug=1) to see where the issue originates from.
  
    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================== 3 passed, 6 deselected, 1 warning in 19.23s ==================================================================================
ERROR launching kernel kernel_0_0_0: invalid configuration argument (9). Grid dimensions: (15, 6, 6); Block dimensions: (5, 0, 4).

$ python tests/cuda_highdim_kernel_test.py
High-dimensional GPU kernel test (12, 3, 14, 15, 1, 2, 3, 4, 5)
Difference: 0.0
High-dimensional GPU kernel test (12, 3, 14, 15, 1, 2, 3, 4, 5)
ERROR launching kernel kernel_0_0_0: invalid configuration argument (9). Grid dimensions: (15, 6, 6); Block dimensions: (5, 0, 4).
Exception ignored in: <function CompiledSDFG.__del__ at 0x7f259e54fd00>
Traceback (most recent call last):
  File "/home/user/nvidia/dace-test/dace/codegen/compiled_sdfg.py", line 402, in __del__
    self.finalize()
  File "/home/user/nvidia/dace-test/dace/codegen/compiled_sdfg.py", line 354, in finalize
    raise RuntimeError(
RuntimeError: An error was detected after running "highdim": invalid configuration argument. Consider enabling synchronous debugging mode (environment variable: DACE_compiler_cuda_syncdebug=1) to see where the issue originates from.
Difference: 0.0
WARNING: New access a[i:i + 2, j] already covered by a[i:i + 2, j:j + 2]

$ echo $?
0

tbennun · 2023-08-07T14:00:06Z

@kylosus thank you, I'll take a look

kylosus · 2023-08-07T16:15:37Z

I think it's because the old cmake links to static libcudart_static.a so the shared library loaded here never sees the error. This way handling cuda errors seems a little hacky, maybe __dace_runkernel_* and __program_{name}_internal functions should return the errors instead.

tbennun · 2023-08-07T16:58:03Z

maybe __dace_runkernel_* and __program_{name}_internal functions should return the errors instead.

This is something we considered, but cannot reliably implement in the code generator without significant effort.

Handled automatically since spcl#1337

BenWeber42 · 2023-10-27T13:00:26Z

Hi, what's the status of this PR? Looks like this would be a nice change for DaCe?

kylosus · 2023-10-30T16:22:42Z

Hi, what's the status of this PR? Looks like this would be a nice change for DaCe?

@BenWeber42 It shoud be ready to merge, but the cmake can probably be improved further. I didn't touch parts unrelated to CUDA.

BenWeber42 · 2023-11-01T13:21:11Z

Ok, thanks for the clarification. I think it's easiest to have this PR only be about CUDA related improvements to the CMakeLists.txt (I edited the title accordingly).

Could you maybe merge latest master into this branch? Since it hasn't been synced in quite a while.

kylosus · 2023-11-08T16:33:23Z

Seems fine

BenWeber42 · 2023-11-13T17:31:31Z

There was a segfault in the gpu test. I restarted it...

kylosus · 2023-11-20T13:15:53Z

There was a segfault in the gpu test. I restarted it...

The failing test had been silently ignored until this PR. See the discussion above

tbennun · 2023-11-22T16:16:24Z

@BenWeber42 @kylosus I fixed this test in #1441. Turns out there were some invalid (empty) ranges in the maps of one of the tests

BenWeber42 · 2023-11-27T17:20:41Z

Just approved the fix. Thanks again. That's great, then it looks like we should continue here as follows:

Wait for Fix CUDA high-dimensional test #1441 to be merged
Re-run tests (they should pass with Fix CUDA high-dimensional test #1441 merged)
Find reviewer & merge

Fixes invalid ranges used in a test. Opened following #1337

tbennun · 2023-11-30T15:27:55Z

@kylosus could you please update this PR to the latest master? I fixed the issue you were observing.

kylosus added 5 commits July 28, 2023 17:17

ChangedOpenMP and Threads to use the actual cmake targets

e057856

Added --no-undefined linker option as a failsafe

abdfc88

Fixed MPI target

02f0c76

tbennun reviewed Jul 31, 2023

View reviewed changes

dace/codegen/CMakeLists.txt Outdated Show resolved Hide resolved

Reverting for portability issues

9b3946a

This reverts commit abdfc88.

kylosus added a commit to ParCoreLab/CPU-Free-Model-Compiler that referenced this pull request Aug 28, 2023

Removing unnecessary cmake flags

6cb43ec

Handled automatically since spcl#1337

BenWeber42 changed the title ~~CMakeLists.txt Improvements~~ CMakeLists.txt Improvements for CUDA Nov 1, 2023

kylosus mentioned this pull request Nov 6, 2023

NVHPC support #1424

Closed

Merge branch 'spcl:master' into cmake-cuda-update

1347655

tbennun mentioned this pull request Nov 22, 2023

Fix CUDA high-dimensional test #1441

Merged

tbennun added a commit that referenced this pull request Nov 28, 2023

Fix CUDA high-dimensional test (#1441)

cfa0871

Fixes invalid ranges used in a test. Opened following #1337

Merge branch 'spcl:master' into cmake-cuda-update

de4ef6f

tbennun approved these changes Jan 1, 2024

View reviewed changes

tbennun added this pull request to the merge queue Jan 1, 2024

Merged via the queue into spcl:master with commit 1393cb0 Jan 1, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMakeLists.txt Improvements for CUDA #1337

CMakeLists.txt Improvements for CUDA #1337

kylosus commented Jul 28, 2023

kylosus commented Aug 3, 2023 •

edited

Loading

tbennun commented Aug 7, 2023

kylosus commented Aug 7, 2023

tbennun commented Aug 7, 2023

BenWeber42 commented Oct 27, 2023

kylosus commented Oct 30, 2023

BenWeber42 commented Nov 1, 2023

kylosus commented Nov 8, 2023

BenWeber42 commented Nov 13, 2023

kylosus commented Nov 20, 2023

tbennun commented Nov 22, 2023

BenWeber42 commented Nov 27, 2023 •

edited

Loading

tbennun commented Nov 30, 2023

CMakeLists.txt Improvements for CUDA #1337

CMakeLists.txt Improvements for CUDA #1337

Conversation

kylosus commented Jul 28, 2023

kylosus commented Aug 3, 2023 • edited Loading

tbennun commented Aug 7, 2023

kylosus commented Aug 7, 2023

tbennun commented Aug 7, 2023

BenWeber42 commented Oct 27, 2023

kylosus commented Oct 30, 2023

BenWeber42 commented Nov 1, 2023

kylosus commented Nov 8, 2023

BenWeber42 commented Nov 13, 2023

kylosus commented Nov 20, 2023

tbennun commented Nov 22, 2023

BenWeber42 commented Nov 27, 2023 • edited Loading

tbennun commented Nov 30, 2023

kylosus commented Aug 3, 2023 •

edited

Loading

BenWeber42 commented Nov 27, 2023 •

edited

Loading