Use CUPTI_API_VERSION instead of CUDA_VERSION (#792)

Summary: It seems like CUPTI version and CUDA version can have a mismatch; in which case we should care about CUPTI_API_VERSION in this case, because CUPTI is where this enum is defined. Verified on a H100 machine with this script ```python import torch def fn(x, y, z): return torch.addmm(z, x, y) x, y, z = [torch.rand((16, 16), device='cuda') for _ in range(3)] with torch.profiler.profile() as prof: for i in range(4): fn(x, y, z) prof.export_chrome_trace("profile_addmm.json") ``` I verified (on H100): * Going to before the cudaLaunchKernelExC changes in kineto, I can find "INVALID" in the profile_addmm.json * Going to current main branch, I cannot find "INVALID" in the profile_addmm.json * On this branch, I cannot find "INVALID" in the profile_addmm.json ^ i.e., this confirms that (a) my test works to reproduce the behavior, and (b) this PR doesn't break support for cudaLaunchKernelExC. Pull Request resolved: #792 Reviewed By: aaronenyeshi Differential Revision: D47823117 Pulled By: davidberard98 fbshipit-source-id: 102d23a23345327c229a7d4664a1781d7c259855
pytorch · Jul 27, 2023 · 465ff4c · 465ff4c
1 parent a94f97b
commit 465ff4c
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 2 deletions.
diff --git a/libkineto/src/CuptiActivity.cpp b/libkineto/src/CuptiActivity.cpp
@@ -243,7 +243,7 @@ inline bool RuntimeActivity::flowStart() const {
       activity_.cbid == CUPTI_RUNTIME_TRACE_CBID_cudaDeviceSynchronize_v3020 ||
       activity_.cbid == CUPTI_RUNTIME_TRACE_CBID_cudaStreamWaitEvent_v3020;
 
-#if defined(CUDA_VERSION) && CUDA_VERSION >= 11060
+#if defined(CUPTI_API_VERSION) && CUPTI_API_VERSION >= 17
   should_correlate |=
       activity_.cbid == CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060;
 #endif

diff --git a/libkineto/test/CuptiStringsTest.cpp b/libkineto/test/CuptiStringsTest.cpp
@@ -24,7 +24,7 @@ TEST(CuptiStringsTest, Valid) {
   ASSERT_STREQ(
       runtimeCbidName(CUPTI_RUNTIME_TRACE_CBID_cudaStreamSetAttribute_ptsz_v11000),
       "cudaStreamSetAttribute_ptsz");
-#if defined(CUDA_VERSION) && CUDA_VERSION >= 11060
+#if defined(CUPTI_API_VERSION) && CUPTI_API_VERSION >= 17
   ASSERT_STREQ(
       runtimeCbidName(CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060),
       "cudaLaunchKernelExC");