Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMakeLists.txt Improvements for CUDA #1337

Merged
merged 8 commits into from
Jan 1, 2024

Conversation

kylosus
Copy link
Contributor

@kylosus kylosus commented Jul 28, 2023

This PR bumps cmake version to 3.17 and replaces the deprecated find_package(CUDA) with FindCUDAToolkit, with a number of improvements to the compilation process:

  • CUDA Include and Library directories are now handled automatically by cmake
  • CUDA architecture handling is reworked: No more regex in CMakeLists.txt or manual -gencode string generation in python code.
  • CUDA source files are now included directly in the targets: cmake handles proper compilation and linking of device code automatically.
  • Similar modifications to OpenMP and Threads targets

- Bumped cmake minimim version to 3.17
- Changed `DACE_LIBS` element to `CUDA::cudart`
- Removed unneeded `include_directories` and link_directories calls
- Removed `compile_cuda`. CUDA files are now passed directly to targets
- Removed `-fPIC` and `-std` args from nvcc as they are handled
  automatically now
- Renamed `CUDA_NVCC_FLAGS` to `CMAKE_CUDA_FLAGS`
- Moved `-gencode` handling to `cmake`: cmake variable
  `DACE_CUDA_ARCHITECTURES_DEFAULT` is set in python code to be used by
  `CMAKE_CUDA_ARCHITECTURES` instead of manually creating the compiler arg string.
- Default cuda arch is no longer included forcefully in presence of a native architecture
- `get_cuda_arch.cpp` now returns a properly formatted architecture
  string compatible with cmake.
- `get_cuda_arch.cpp` now fails if no architectures are found
@kylosus
Copy link
Contributor Author

kylosus commented Aug 3, 2023

tests/cuda_highdim_kernel_test.py is erroring on master branch too, but the test passes because the dace program exits with 0. This code in this branch errors and causes a segfault in pytest, which is why the test is failing. I don't know how this is possible or how it was unearthed by the changes in this PR.

$ # git checkout master
$ pytest --tb=short -m "gpu" tests/cuda_highdim_kernel_test.py
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.10.12, pytest-7.4.0, pluggy-1.2.0
rootdir: /home/user/nvidia/dace-test
configfile: pytest.ini
collected 9 items / 6 deselected / 3 selected                                                         

tests/cuda_highdim_kernel_test.py ...                                                                                                                                                                     [100%]

=============================================================================================== warnings summary ================================================================================================
tests/cuda_highdim_kernel_test.py::test_gpu
  /home/user/nvidia/dace-test/venv/lib/python3.10/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function CompiledSDFG.__del__ at 0x7f53aff3f1c0>
  
  Traceback (most recent call last):
    File "/home/user/nvidia/dace-test/dace/codegen/compiled_sdfg.py", line 402, in __del__
      self.finalize()
    File "/home/user/nvidia/dace-test/dace/codegen/compiled_sdfg.py", line 354, in finalize
      raise RuntimeError(
  RuntimeError: An error was detected after running "tests_cuda_highdim_kernel_test_highdim": invalid configuration argument. Consider enabling synchronous debugging mode (environment variable: DACE_compiler_cuda_syncdebug=1) to see where the issue originates from.
  
    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================== 3 passed, 6 deselected, 1 warning in 19.23s ==================================================================================
ERROR launching kernel kernel_0_0_0: invalid configuration argument (9). Grid dimensions: (15, 6, 6); Block dimensions: (5, 0, 4).
$ python tests/cuda_highdim_kernel_test.py
High-dimensional GPU kernel test (12, 3, 14, 15, 1, 2, 3, 4, 5)
Difference: 0.0
High-dimensional GPU kernel test (12, 3, 14, 15, 1, 2, 3, 4, 5)
ERROR launching kernel kernel_0_0_0: invalid configuration argument (9). Grid dimensions: (15, 6, 6); Block dimensions: (5, 0, 4).
Exception ignored in: <function CompiledSDFG.__del__ at 0x7f259e54fd00>
Traceback (most recent call last):
  File "/home/user/nvidia/dace-test/dace/codegen/compiled_sdfg.py", line 402, in __del__
    self.finalize()
  File "/home/user/nvidia/dace-test/dace/codegen/compiled_sdfg.py", line 354, in finalize
    raise RuntimeError(
RuntimeError: An error was detected after running "highdim": invalid configuration argument. Consider enabling synchronous debugging mode (environment variable: DACE_compiler_cuda_syncdebug=1) to see where the issue originates from.
Difference: 0.0
WARNING: New access a[i:i + 2, j] already covered by a[i:i + 2, j:j + 2]

$ echo $?
0

@tbennun
Copy link
Collaborator

tbennun commented Aug 7, 2023

@kylosus thank you, I'll take a look

@kylosus
Copy link
Contributor Author

kylosus commented Aug 7, 2023

I think it's because the old cmake links to static libcudart_static.a so the shared library loaded here never sees the error. This way handling cuda errors seems a little hacky, maybe __dace_runkernel_* and __program_{name}_internal functions should return the errors instead.

@tbennun
Copy link
Collaborator

tbennun commented Aug 7, 2023

maybe __dace_runkernel_* and __program_{name}_internal functions should return the errors instead.

This is something we considered, but cannot reliably implement in the code generator without significant effort.

kylosus added a commit to ParCoreLab/CPU-Free-Model-Compiler that referenced this pull request Aug 28, 2023
Handled automatically since spcl#1337
@BenWeber42
Copy link
Contributor

Hi, what's the status of this PR? Looks like this would be a nice change for DaCe?

@kylosus
Copy link
Contributor Author

kylosus commented Oct 30, 2023

Hi, what's the status of this PR? Looks like this would be a nice change for DaCe?

@BenWeber42 It shoud be ready to merge, but the cmake can probably be improved further. I didn't touch parts unrelated to CUDA.

@BenWeber42 BenWeber42 changed the title CMakeLists.txt Improvements CMakeLists.txt Improvements for CUDA Nov 1, 2023
@BenWeber42
Copy link
Contributor

Ok, thanks for the clarification. I think it's easiest to have this PR only be about CUDA related improvements to the CMakeLists.txt (I edited the title accordingly).

Could you maybe merge latest master into this branch? Since it hasn't been synced in quite a while.

@kylosus kylosus mentioned this pull request Nov 6, 2023
@kylosus
Copy link
Contributor Author

kylosus commented Nov 8, 2023

Seems fine

@BenWeber42
Copy link
Contributor

There was a segfault in the gpu test. I restarted it...

@kylosus
Copy link
Contributor Author

kylosus commented Nov 20, 2023

There was a segfault in the gpu test. I restarted it...

The failing test had been silently ignored until this PR. See the discussion above

@tbennun
Copy link
Collaborator

tbennun commented Nov 22, 2023

@BenWeber42 @kylosus I fixed this test in #1441. Turns out there were some invalid (empty) ranges in the maps of one of the tests

@BenWeber42
Copy link
Contributor

BenWeber42 commented Nov 27, 2023

Just approved the fix. Thanks again. That's great, then it looks like we should continue here as follows:

tbennun added a commit that referenced this pull request Nov 28, 2023
Fixes invalid ranges used in a test.

Opened following #1337
@tbennun
Copy link
Collaborator

tbennun commented Nov 30, 2023

@kylosus could you please update this PR to the latest master? I fixed the issue you were observing.

@tbennun tbennun added this pull request to the merge queue Jan 1, 2024
Merged via the queue into spcl:master with commit 1393cb0 Jan 1, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants