From b2b99d424a6911c85d57ff9f52ca1d7b1548cfd4 Mon Sep 17 00:00:00 2001 From: Vladimir Indic <139573562+vlaindic@users.noreply.github.com> Date: Fri, 12 Jul 2024 20:20:33 +0200 Subject: [PATCH] Update release/rocm-rel-6.2 for RC3 (#968) (#972) * Small doc update to remove restrictions no longer present (#917) * Small doc update to remove restrictions no longer present * Add calls to stop and wait for a debugger (#916) * Small change to sample for clarity (#913) * Added error log for query counter info (#903) * Added error log for query counter info * Add dimension query to counter collection sample (#918) * Disable PC sampling service if counter collection service is configured (#899) * The NULL value of an internal correlation ID defined (#901) * Remove duplicate table code from tests (#922) * Remove duplicate table code from tests Remove duplicate HSA table code from tests. Cleanup includes (and remove unnecessary ones). * SWDEV-465322: Adding support for Perfcounter SIMD Mask in ATT (#910) * SWDEV-465322: Adding support for r Perfcounter SIMD Mask in ATT * Apply suggestions from code review * Adding unit tests * Adding counters check for gfx9 and SQ block only * Addressing review comments * changing the struct size * fixing header includes --------- * Fix for SLES/RHEL compilers (#925) * Fix for SLES/RHEL compilers --------- * Fix agent profiling for SQ counters (#919) * Fix agent profiling for SQ counters --------- * Disable counter collection if PC sampling is enabled (#924) * docs and tests format (#927) * ATT API changes - add user_data field and separation of dispatch vs agent profiling (#893) * DRM Issue Fix for SLES 15 (#897) * DRM Issue Fix * Formatting Fix * PC sampling: CID manager unit test (#898) * Adding per-dispatch userdata field to ATT * Clang tidy * Formatting * Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp * Adding dispatch_id, fixing user_data and update aql_profile_v2 * Formatting * Tidy fixes * Second fix for userdata * removing assert for union * Adding serialization. Created agent profiling-like thread trace * Implemented agent thread trace * Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp * Restructured thread trace packets * Added agent API tests * Fixing multigpu for agent test * Formatting * Formatting * Improving header locations * Fixing merge conflicts * Tidy * Tidy * Tidy --------- * Allow multiple agents in a single context for agent profiling (#908) Allow multiple profiles for agent profiling * Remove unnecessary AgentCache argument from profile construction (#931) This argument is not necessary. Removed. * Update controller.cpp (#932) * Update controller.cpp * Update controller.cpp * Formatting * Pumping down the ioctl version for CI only (#928) * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Replicate global counters across all derived counters (#936) Fix derived counters to have globals replicated across all architectures (that support them). --------- * Incremental Counter Profile Creation (#933) * Incremental Counter Profile Creation Adds support for incremental counter creation. How this functions is the behavior of rocprofiler_create_profile_config has been changed. rocprofiler_create_profile_config(rocprofiler_agent_id_t agent_id, rocprofiler_counter_id_t* counters_list, size_t counters_count, rocprofiler_profile_config_id_t* config_id) The behavior of this function now allows an existing config_id to be supplied via config_id. The counters contained in this config will be copied over and used as a base for a new config along with any counters supplied in counters_list. The new config id is returned via config_id and can be used in future dispatch/agent counting sessions. A new config is created over modifying an existing config since there is no gaurentee that the existing config isn't already in use. While we could add locks (or other mutual exclusion properties) to check if its in use and reject an update, the benefit from doing so is minor in comparison to just creating a new config. This also side steps a common pattern a tool may use to add additional counters at some point later on during execution. Now they can do that without destroying the existing config. --------- * PC Sampling IOCTL version check introduced (#944) * doc update for 6.2 release (#938) * doc update for 6.2 release * Adding warning for gerrit->github nightly sync * PC sampling IOCTL versioning refactored (#945) The following changes are introduced: - Use functions instead of macros. - Verify the error code when querying KFD IOCTL version. - Skip tests and samples if KFD IOCTL < 1.16 or PC Sampling IOCTL < 0.1. * Add HSA tracing support for `hsa_amd_vmem_address_reserve_align` (#946) * Add support for hsa_amd_vmem_address_reserve_align * Update lib/rocprofiler-sdk/hsa/types.hpp - support HSA_AMD_EXT_API_TABLE_STEP_VERSION == 0x2 for HSA v1.14.0 --------- * readthedocs updates (#877) * readthedocs updates * Adding License * correcting table of contents path * Move doc requirements to sphinx dir * Compile requirements.txt * Update path to reqs * Adding missing python module * changing sphinx version * changing docutils version * enabling sphinx extensions * trying sphinx-rtd-theme * Remove unused doc configs * Remove unused html theme options * Add files to toc * temp commit to test * updating environment.yml for CI build * Update doc requirements To include rocprofiler-sdk in projects.yaml * Set external_projects_current_project as rocprofiler-sdk * Exclude external projects * Fix warning for missing static path * updating conf.py * Removing reST syntax * Use rocm-docs-core doxygen integration * Remove RST syntax from Markdown files * Generate doxyfile post checkout on RTD * Use custom RTD env * Specify mambaforge * Put conda before post checkout cmd * Add doxyfile for RTD * Run cmake from conf.py * Update environment.yml * Use mambaforge * Fix path to environment.yml * Call build doxyfile * Add Developer API title to Doxyfile * Config version header * Fix typo in conf.py * Format fix for conf.py * Increasing timeout for build-docs-from-source * Remove README as mainpage for doxyfile * Fix formatting in conf.py --------- * Fixing OpenSuse build (#947) * Fix documentation (#949) * Sync queue and async copy on client finalizer (#950) * Add `logical_node_type_id` field to `rocprofiler_agent_t` (#948) * Add logical_node_type_id field to rocprofiler_agent_t * Patch queue_controller * Remove fatal error when callback and buffer tracing API in one context (#952) - one context for callback and buffer tracing of same API produces erroneous fatal error -- this is a valid use case * Adding wrappers on HSA for executable load/unload and allowing multiple agents per context on ATT (#951) * Codeobj wrappers around HSA calls for ATT * Formatting * Bookeeping * Tidy * Tidy * Update source/lib/rocprofiler-sdk/thread_trace/code_object.hpp * Update source/lib/rocprofiler-sdk/thread_trace/att_core.hpp * Variable naming --------- * Removing cache of decoded lines and returning shared_ptr (#953) * Update continuous_integration.yml (#926) * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml --------- * Accumulation metrics support and update counter collection API to aqlprofile_v2 (#915) * Updating to v3 API * General fixes * Extending dimension bits to 54 * Disabling agent profiling tests * Fixed unit test * Adding accumulate metric support for parsing counters (#609) * Adding accumulate metric support for parsing counters * Adding metric flag * Updating tests * source formatting (clang-format v11) (#610) * source formatting (clang-format v11) (#614) * Adding evaluate ast test * source formatting (clang-format v11) (#633) * Update scanner generated file * Adding flags to events for aqlprofile * Fix Mi200 failing test --------- * Revert "Extending dimension bits to 54" This reverts commit 3cd6628452484044a93e129f27974f996a0e4c08. * Removing CU dimension * Fixing merge conflicts * Revert "Disabling agent profiling tests" This reverts commit 7e01518ed8c51fbb0c3b2575e1e0b8f9ddfa8237. * Fixing merge conflicts * Fix parser tests * Adding accumulate metric documentation * Update counter_collection_services.md * Update index.md * fix nested expression use * Update source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp * Doc update --------- * Fix kernel trace gaps (#961) - source/lib/rocprofiler-sdk/hsa/queue.cpp - Optimize WriteInterceptor to eliminate extra barrier packets causing gaps between kernels in kernel tracing - increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue - misc logging improvements - source/lib/rocprofiler-sdk/counters/agent_profiling.cpp - increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue - tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt - add TIMEOUT for rocprofv3-test-hsa-multiqueue-execute * PC sampling: integration test with instruction decoding (#929) * PC sampling: integration test with instruction decoding * PC sampling: verifying internal and external CIDs The PC sampling integration test has been extended to verify internal and external correlation IDs. * tmp solution of using Instructions as keys * wrapper for HIP call * PCS integration test: ld_addr as instruction id For the sake of the integration test, use as the instruction identifier. To support code object unloading and relocations, use as the identifier (the change in the decoder is required). * PCS integration test: removing shared_ptr Completely removing usage of shared pointers. * PCS integration test: removing decoder When a code object has been unloaded, ensure all PC samples corresponding to that object are decoded, prior to removing the decoder. * PCS integration test: fixing build flags and imports * PCS integration test: fixing labels * PCS integration test: cmake flags fix * PC sampling cmake labels renamed * PCS integration test refactoring * PCS integration test: minimize usage of raw pointers * PCS integration test: at least one sample should be delivered. * PC sampling lables: pc-sampling * General fixes to ATT, packets and event ID retrieval (#960) * General fixes to ATT, packets and event ID retrieval * Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp --------- * Returning code object id information in code_printing.cpp:Instruction (#965) * Returning code object id information in code_printing.cpp:Instruction * Adding assertions * Simplifying decoder library * Miscellaneous updates (#959) - missing-new-line CI job: ensures all source files end with new line - logging updates - add new line to the end of many files - fix header include ordering is misc places - transition to use hsa::get_core_table() and hsa::get_amd_ext_table() in various places instead of making copies * Update HIP API tracing (#958) - support HipDispatchTable additions for HIP_RUNTIME_API_TABLE_STEP_VERSION 1 thru 4 * Fix agent shutdown destructor errors (#969) * Update lib/rocprofiler-sdk/agent.cpp - use static_object wrapper for vector of agent_pair (rocp agent <-> hsa agent) * Fix get_aql_handles() shutdown error - use `static_object` wrapper for vector of `aqlprofile_agent_handle_t` --------- Co-authored-by: Jonathan R. Madsen Co-authored-by: Benjamin Welton Co-authored-by: Benjamin Welton Co-authored-by: Manjunath P Jakaraddi Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Gopesh Bhardwaj Co-authored-by: Giovanni Lenzi Baraldi Co-authored-by: Ammar ELWazir Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com> Co-authored-by: Manjunath-Jakaraddi <21177428+Manjunath-Jakaraddi@users.noreply.github.com> Co-authored-by: jrmadsen <6001865+jrmadsen@users.noreply.github.com> Co-authored-by: Manjunath-Jakaraddi --- .github/workflows/continuous_integration.yml | 36 +- .github/workflows/docs.yml | 2 +- .github/workflows/formatting.yml | 18 + .readthedocs.yaml | 21 + CHANGELOG.md | 19 +- README.md | 13 +- samples/advanced_thread_trace/client.cpp | 167 +++--- samples/code_object_isa_decode/client.cpp | 10 +- samples/common/defines.hpp | 34 +- samples/counter_collection/client.cpp | 161 ++++-- .../print_functional_counters.cpp | 14 +- samples/pc_sampling/pcs.cpp | 5 +- source/docs/.gitignore | 2 + source/docs/CMakeLists.txt | 2 +- source/docs/_toc.yml.in | 24 + source/docs/about.md | 34 -- source/docs/buffered_services.md | 6 - source/docs/callback_services.md | 6 - source/docs/conf.py | 146 ++--- source/docs/counter_collection_services.md | 14 + source/docs/developer_api.md | 12 - source/docs/environment.yml | 14 +- source/docs/features.md | 6 - source/docs/index.md | 42 +- source/docs/installation.md | 6 - source/docs/intercept_table.md | 6 - source/docs/license.rst | 5 + source/docs/requirements.txt | 2 - source/docs/rocprofiler-sdk.dox.in | 20 +- source/docs/rocprofv3.md | 2 +- source/docs/samples.md | 6 - source/docs/sphinx/requirements.in | 1 + source/docs/sphinx/requirements.txt | 169 ++++++ source/docs/tool_library_overview.md | 6 - source/include/rocprofiler-sdk/agent.h | 16 + .../include/rocprofiler-sdk/agent_profile.h | 1 - .../rocprofiler-sdk/amd_detail/CMakeLists.txt | 3 +- .../rocprofiler-sdk-codeobj/code_printing.hpp | 251 ++++----- .../rocprofiler-sdk-codeobj/disassembly.hpp | 6 +- .../rocprofiler-sdk-codeobj/segment.hpp | 146 ++--- .../rocprofiler-sdk/amd_detail/thread_trace.h | 175 +----- .../amd_detail/thread_trace_agent.h | 64 +++ .../amd_detail/thread_trace_core.h | 172 ++++++ .../amd_detail/thread_trace_dispatch.h | 82 +++ source/include/rocprofiler-sdk/counters.h | 6 - source/include/rocprofiler-sdk/defines.h | 8 + source/include/rocprofiler-sdk/fwd.h | 5 +- source/include/rocprofiler-sdk/hip/api_args.h | 226 +++++++- .../rocprofiler-sdk/hip/compiler_api_id.h | 4 + .../rocprofiler-sdk/hip/runtime_api_id.h | 44 +- .../rocprofiler-sdk/hsa/amd_ext_api_id.h | 3 + source/include/rocprofiler-sdk/hsa/api_args.h | 10 + source/include/rocprofiler-sdk/pc_sampling.h | 28 +- .../include/rocprofiler-sdk/profile_config.h | 10 +- source/lib/common/CMakeLists.txt | 1 + source/lib/common/abi.hpp | 74 +++ source/lib/common/utility.cpp | 26 + source/lib/common/utility.hpp | 7 + .../tests/CMakeLists.txt | 2 +- .../tests/codeobj_library_test.cpp | 63 ++- .../tests/{smallkernel.b => smallkernel.bin} | Bin source/lib/rocprofiler-sdk-tool/tool.cpp | 13 +- source/lib/rocprofiler-sdk/agent.cpp | 68 ++- source/lib/rocprofiler-sdk/agent_profile.cpp | 2 +- .../lib/rocprofiler-sdk/aql/aql_profile_v2.h | 21 +- source/lib/rocprofiler-sdk/aql/helpers.cpp | 38 +- source/lib/rocprofiler-sdk/aql/helpers.hpp | 25 +- .../rocprofiler-sdk/aql/packet_construct.cpp | 184 +++--- .../rocprofiler-sdk/aql/packet_construct.hpp | 58 +- .../rocprofiler-sdk/aql/tests/CMakeLists.txt | 11 +- .../rocprofiler-sdk/aql/tests/aql_test.cpp | 43 +- .../lib/rocprofiler-sdk/aql/tests/helpers.cpp | 50 +- .../lib/rocprofiler-sdk/context/context.hpp | 17 +- source/lib/rocprofiler-sdk/counters.cpp | 2 + .../counters/agent_profiling.cpp | 486 +++++++++------- .../counters/agent_profiling.hpp | 43 +- .../rocprofiler-sdk/counters/controller.cpp | 41 +- .../rocprofiler-sdk/counters/controller.hpp | 4 +- source/lib/rocprofiler-sdk/counters/core.cpp | 12 +- source/lib/rocprofiler-sdk/counters/core.hpp | 4 +- .../rocprofiler-sdk/counters/dimensions.hpp | 2 +- .../counters/dispatch_handlers.cpp | 13 +- .../counters/dispatch_handlers.hpp | 3 +- .../rocprofiler-sdk/counters/evaluate_ast.cpp | 46 +- .../rocprofiler-sdk/counters/evaluate_ast.hpp | 3 +- .../rocprofiler-sdk/counters/id_decode.cpp | 4 +- .../rocprofiler-sdk/counters/id_decode.hpp | 2 +- .../lib/rocprofiler-sdk/counters/metrics.cpp | 25 +- .../lib/rocprofiler-sdk/counters/metrics.hpp | 11 + .../counters/parser/parser.cpp | 186 ++++--- .../rocprofiler-sdk/counters/parser/parser.h | 13 +- .../rocprofiler-sdk/counters/parser/parser.y | 6 + .../counters/parser/raw_ast.hpp | 45 +- .../counters/parser/scanner.cpp | 119 ++-- .../rocprofiler-sdk/counters/parser/scanner.l | 1 + .../counters/parser/tests/parser_test.cpp | 258 ++++++--- .../counters/tests/CMakeLists.txt | 23 +- .../counters/tests/agent_profiling.cpp | 115 ++-- .../counters/tests/agent_profiling.hpp | 2 +- .../counters/tests/code_object_loader.cpp | 2 +- .../rocprofiler-sdk/counters/tests/core.cpp | 139 +++-- .../counters/tests/dimension.cpp | 40 +- .../counters/tests/evaluate_ast_test.cpp | 68 +++ .../counters/tests/hsa_tables.cpp | 266 +++++++++ .../counters/tests/hsa_tables.hpp | 41 ++ .../counters/tests/metrics_test.cpp | 2 +- .../counters/tests/metrics_test.h | 50 +- .../counters/xml/derived_counters.xml | 127 +++-- .../lib/rocprofiler-sdk/details/kfd_ioctl.h | 7 +- .../rocprofiler-sdk/external_correlation.cpp | 6 +- source/lib/rocprofiler-sdk/hip/CMakeLists.txt | 4 +- source/lib/rocprofiler-sdk/hip/abi.cpp | 526 ++++++++++++++++++ .../hip/details/CMakeLists.txt | 2 +- .../rocprofiler-sdk/hip/details/format.hpp | 324 +++++++++++ .../rocprofiler-sdk/hip/details/ostream.hpp | 9 +- source/lib/rocprofiler-sdk/hip/hip.cpp | 5 - source/lib/rocprofiler-sdk/hip/hip.def.cpp | 44 +- source/lib/rocprofiler-sdk/hip/hip.hpp | 7 +- source/lib/rocprofiler-sdk/hip/types.hpp | 182 ------ source/lib/rocprofiler-sdk/hip/utils.hpp | 26 +- source/lib/rocprofiler-sdk/hsa/aql_packet.cpp | 143 +++-- source/lib/rocprofiler-sdk/hsa/aql_packet.hpp | 127 +++-- source/lib/rocprofiler-sdk/hsa/async_copy.cpp | 14 +- source/lib/rocprofiler-sdk/hsa/async_copy.hpp | 3 + source/lib/rocprofiler-sdk/hsa/hsa.cpp | 50 +- source/lib/rocprofiler-sdk/hsa/hsa.def.cpp | 11 + source/lib/rocprofiler-sdk/hsa/hsa.hpp | 3 + source/lib/rocprofiler-sdk/hsa/queue.cpp | 114 ++-- .../rocprofiler-sdk/hsa/queue_controller.cpp | 54 +- .../rocprofiler-sdk/hsa/queue_controller.hpp | 7 +- source/lib/rocprofiler-sdk/hsa/types.hpp | 21 +- source/lib/rocprofiler-sdk/marker/marker.cpp | 3 - .../pc_sampling/ioctl/ioctl_adapter.cpp | 175 ++++-- .../pc_sampling/parser/correlation.hpp | 4 +- .../parser/pc_record_interface.cpp | 2 +- .../rocprofiler-sdk/pc_sampling/service.cpp | 14 + .../pc_sampling/tests/CMakeLists.txt | 2 +- .../tests/pc_sampling_internals.hpp | 2 +- .../pc_sampling_vs_counter_collection.cpp | 514 +++++++++++++++++ source/lib/rocprofiler-sdk/profile_config.cpp | 39 +- source/lib/rocprofiler-sdk/registration.cpp | 4 + source/lib/rocprofiler-sdk/rocprofiler.cpp | 7 +- .../lib/rocprofiler-sdk/tests/CMakeLists.txt | 5 +- source/lib/rocprofiler-sdk/tests/agent.cpp | 4 +- .../lib/rocprofiler-sdk/tests/hsa_barrier.cpp | 77 ++- .../rocprofiler-sdk/tests/intercept_table.cpp | 37 +- .../thread_trace/CMakeLists.txt | 5 +- .../rocprofiler-sdk/thread_trace/att_core.cpp | 376 +++++++------ .../rocprofiler-sdk/thread_trace/att_core.hpp | 168 +++--- .../thread_trace/att_service.cpp | 120 ++-- .../thread_trace/code_object.cpp | 147 +++++ .../thread_trace/code_object.hpp | 63 +++ .../thread_trace/tests/CMakeLists.txt | 11 +- .../thread_trace/tests/att_packet_test.cpp | 174 ++++-- .../lib/rocprofiler-sdk/tracing/tracing.hpp | 88 --- tests/CMakeLists.txt | 1 + tests/pc_sampling/CMakeLists.txt | 142 +++++ tests/pc_sampling/address_translation.cpp | 197 +++++++ tests/pc_sampling/address_translation.hpp | 273 +++++++++ tests/pc_sampling/cid_retirement.cpp | 129 +++++ tests/pc_sampling/cid_retirement.hpp | 38 ++ tests/pc_sampling/client.cpp | 225 ++++++++ tests/pc_sampling/client.hpp | 44 ++ tests/pc_sampling/codeobj.cpp | 261 +++++++++ tests/pc_sampling/codeobj.hpp | 38 ++ tests/pc_sampling/external_cid.cpp | 110 ++++ tests/pc_sampling/external_cid.hpp | 42 ++ tests/pc_sampling/kernel_tracing.cpp | 78 +++ tests/pc_sampling/kernel_tracing.hpp | 41 ++ tests/pc_sampling/main.cpp | 224 ++++++++ tests/pc_sampling/pcs.cpp | 504 +++++++++++++++++ tests/pc_sampling/pcs.hpp | 55 ++ tests/pc_sampling/utils.cpp | 37 ++ tests/pc_sampling/utils.hpp | 65 +++ tests/rocprofv3/CMakeLists.txt | 2 +- .../counter-collection/input1/CMakeLists.txt | 12 +- .../counter-collection/input2/CMakeLists.txt | 8 +- .../counter-collection/input3/CMakeLists.txt | 19 +- .../counter-collection/input3/validate.py | 2 +- .../hsa-queue-dependency/CMakeLists.txt | 2 +- .../CMakeLists.txt | 8 +- .../conftest.py | 0 .../input.txt | 0 .../pytest.ini | 0 .../validate.py | 0 tests/thread-trace/CMakeLists.txt | 23 + tests/thread-trace/agent_test.cpp | 168 ++++++ tests/thread-trace/common.hpp | 44 +- tests/thread-trace/main.cpp | 2 +- tests/thread-trace/multi_dispatch.cpp | 25 +- tests/thread-trace/single_dispatch.cpp | 12 +- tests/thread-trace/trace_callbacks.cpp | 19 +- 192 files changed, 8867 insertions(+), 2911 deletions(-) create mode 100644 .readthedocs.yaml create mode 100644 source/docs/_toc.yml.in delete mode 100644 source/docs/about.md create mode 100644 source/docs/counter_collection_services.md delete mode 100644 source/docs/developer_api.md create mode 100644 source/docs/license.rst delete mode 100644 source/docs/requirements.txt create mode 100644 source/docs/sphinx/requirements.in create mode 100644 source/docs/sphinx/requirements.txt create mode 100644 source/include/rocprofiler-sdk/amd_detail/thread_trace_agent.h create mode 100644 source/include/rocprofiler-sdk/amd_detail/thread_trace_core.h create mode 100644 source/include/rocprofiler-sdk/amd_detail/thread_trace_dispatch.h create mode 100644 source/lib/common/abi.hpp rename source/lib/rocprofiler-sdk-codeobj/tests/{smallkernel.b => smallkernel.bin} (100%) create mode 100644 source/lib/rocprofiler-sdk/counters/tests/hsa_tables.cpp create mode 100644 source/lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp create mode 100644 source/lib/rocprofiler-sdk/hip/abi.cpp create mode 100644 source/lib/rocprofiler-sdk/hip/details/format.hpp delete mode 100644 source/lib/rocprofiler-sdk/hip/types.hpp create mode 100644 source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_vs_counter_collection.cpp create mode 100644 source/lib/rocprofiler-sdk/thread_trace/code_object.cpp create mode 100644 source/lib/rocprofiler-sdk/thread_trace/code_object.hpp create mode 100644 tests/pc_sampling/CMakeLists.txt create mode 100644 tests/pc_sampling/address_translation.cpp create mode 100644 tests/pc_sampling/address_translation.hpp create mode 100644 tests/pc_sampling/cid_retirement.cpp create mode 100644 tests/pc_sampling/cid_retirement.hpp create mode 100644 tests/pc_sampling/client.cpp create mode 100644 tests/pc_sampling/client.hpp create mode 100644 tests/pc_sampling/codeobj.cpp create mode 100644 tests/pc_sampling/codeobj.hpp create mode 100644 tests/pc_sampling/external_cid.cpp create mode 100644 tests/pc_sampling/external_cid.hpp create mode 100644 tests/pc_sampling/kernel_tracing.cpp create mode 100644 tests/pc_sampling/kernel_tracing.hpp create mode 100644 tests/pc_sampling/main.cpp create mode 100644 tests/pc_sampling/pcs.cpp create mode 100644 tests/pc_sampling/pcs.hpp create mode 100644 tests/pc_sampling/utils.cpp create mode 100644 tests/pc_sampling/utils.hpp rename tests/rocprofv3/{tracing-plus-cc => tracing-plus-counter-collection}/CMakeLists.txt (87%) rename tests/rocprofv3/{tracing-plus-cc => tracing-plus-counter-collection}/conftest.py (100%) rename tests/rocprofv3/{tracing-plus-cc => tracing-plus-counter-collection}/input.txt (100%) rename tests/rocprofv3/{tracing-plus-cc => tracing-plus-counter-collection}/pytest.ini (100%) rename tests/rocprofv3/{tracing-plus-cc => tracing-plus-counter-collection}/validate.py (100%) create mode 100644 tests/thread-trace/agent_test.cpp diff --git a/.github/workflows/continuous_integration.yml b/.github/workflows/continuous_integration.yml index 418ea7a8..0ec06a9d 100644 --- a/.github/workflows/continuous_integration.yml +++ b/.github/workflows/continuous_integration.yml @@ -21,7 +21,7 @@ env: ROCM_PATH: "/opt/rocm" GPU_TARGETS: "gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942 gfx1030 gfx1100 gfx1101 gfx1102" PATH: "/usr/bin:$PATH" - PC_SAMPLING_TESTS_REGEX: ".*pc_sampling.*" + PC_SAMPLING_TESTS_REGEX: ".*pc-sampling.*" jobs: core: @@ -29,7 +29,7 @@ jobs: strategy: fail-fast: false matrix: - runner: ['navi3', 'vega20', 'mi200', 'mi300'] + runner: ['navi3', 'vega20', 'mi200', 'mi300', 'rhel', 'sles'] os: ['ubuntu-22.04'] build-type: ['RelWithDebInfo'] ci-flags: ['--linter clang-tidy'] @@ -45,6 +45,7 @@ jobs: - uses: actions/checkout@v4 - name: Install requirements + if: ${{ !contains(matrix.runner, 'rhel') && !contains(matrix.runner, 'sles') }} timeout-minutes: 10 shell: bash run: | @@ -55,6 +56,13 @@ jobs: update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 20 --slave /usr/bin/g++ g++ /usr/bin/g++-12 --slave /usr/bin/gcov gcov /usr/bin/gcov-12 python3 -m pip install -r requirements.txt + - name: Install requirements For RHEL & SLES + if: ${{ contains(matrix.runner, 'rhel') || contains(matrix.runner, 'sles') }} + timeout-minutes: 10 + shell: bash + run: | + python3 -m pip install -r requirements.txt + - name: List Files shell: bash run: | @@ -77,6 +85,7 @@ jobs: echo 'ROCPROFILER_PC_SAMPLING_BETA_ENABLED=1' >> $GITHUB_ENV - name: Configure, Build, and Test + if: ${{ !contains(matrix.runner, 'rhel') && !contains(matrix.runner, 'sles') }} timeout-minutes: 30 shell: bash run: @@ -98,6 +107,29 @@ jobs: -- -LE "${EXCLUDED_TESTS}" + - name: Configure, Build, and Test + if: ${{ contains(matrix.runner, 'rhel') || contains(matrix.runner, 'sles') }} + timeout-minutes: 30 + shell: bash + run: + sudo LD_LIBRARY_PATH=./build/lib:$LD_LIBRARY_PATH python3 ./source/scripts/run-ci.py -B build + --name ${{ github.repository }}-${{ github.ref_name }}-${{ matrix.runner }}-mi300-core + --build-jobs 16 + --site $(echo $RUNNER_HOSTNAME)-$(/opt/rocm/bin/rocm_agent_enumerator | sed -n '2 p') + --gpu-targets ${{ env.GPU_TARGETS }} + --run-attempt ${{ github.run_attempt }} + -- + -DROCPROFILER_DEP_ROCMCORE=ON + -DROCPROFILER_BUILD_DOCS=OFF + -DROCPROFILER_BUILD_CI=OFF + -DCMAKE_BUILD_TYPE=${{ matrix.build-type }} + -DCMAKE_INSTALL_PREFIX=/opt/rocprofiler-sdk + -DCPACK_GENERATOR='DEB;RPM;TGZ' + -DCPACK_PACKAGING_INSTALL_PREFIX="$(realpath /opt/rocm)" + -DPython3_EXECUTABLE=$(which python3) + -- + -LE "${EXCLUDED_TESTS}" + - name: Install if: ${{ contains(matrix.runner, env.CORE_EXT_RUNNER) }} timeout-minutes: 10 diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index fff516b9..2884d5ea 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -101,7 +101,7 @@ jobs: python3 -m pip install -r requirements.txt - name: Configure, Build, Install, and Package - timeout-minutes: 30 + timeout-minutes: 60 shell: bash run: export CMAKE_PREFIX_PATH=/opt/rocm:${CMAKE_PREFIX_PATH}; diff --git a/.github/workflows/formatting.yml b/.github/workflows/formatting.yml index e1891a20..caf06211 100644 --- a/.github/workflows/formatting.yml +++ b/.github/workflows/formatting.yml @@ -143,3 +143,21 @@ jobs: command: review pull_number: ${{ github.event.pull_request.number }} git_dir: '.' + + missing-new-line: + runs-on: ubuntu-22.04 + + steps: + - uses: actions/checkout@v4 + + - name: Find missing new line + shell: bash + run: | + OUTFILE=missing_newline.txt + for i in $(find source/lib source/include tests samples cmake -type f | egrep -v '\.bin$'); do VAL=$(tail -c 1 ${i}); if [ -n "${VAL}" ]; then echo "- ${i}" >> ${OUTFILE}; fi; done + if [[ -f ${OUTFILE} && $(cat ${OUTFILE} | wc -l) -gt 0 ]]; then + echo -e "\nError! Source code missing new line at end of file...\n" + echo -e "\nFiles:\n" + cat ${OUTFILE} + exit 1 + fi diff --git a/.readthedocs.yaml b/.readthedocs.yaml new file mode 100644 index 00000000..2f78e0c0 --- /dev/null +++ b/.readthedocs.yaml @@ -0,0 +1,21 @@ +# Read the Docs configuration file +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +version: 2 + +sphinx: + configuration: source/docs/conf.py + +formats: [htmlzip, pdf, epub] + +python: + install: + - requirements: source/docs/sphinx/requirements.txt + +build: + os: ubuntu-22.04 + tools: + python: "mambaforge-22.9" + +conda: + environment: source/docs/environment.yml diff --git a/CHANGELOG.md b/CHANGELOG.md index a67f985a..ebf5ebab 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,7 +4,7 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/ ## ROCprofiler-SDK for AFAR I -## Added +## Additions - HSA API Tracing - Kernel Dispatch Tracing @@ -14,7 +14,7 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/ ## ROCprofiler-SDK for AFAR II -## Added +## Additions - HIP API Tracing - ROCTx Tracing @@ -23,10 +23,9 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/ - ROCTx start/stop - Memory Copy Tracing - ## ROCprofiler-SDK for AFAR III -## Added +## Additions - Kernel Dispatch Counter Collection – (includes serialization and multidimensional instances) - Kernel serialization @@ -44,7 +43,7 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/ ## ROCprofiler-SDK for AFAR IV -## Added +## Additions - Page Migration Reporting (API) - Scratch Memory Reporting (API) @@ -56,22 +55,22 @@ Full documentation for ROCprofiler-SDK is available at [Click Here](source/docs/ ## ROCprofiler-SDK for AFAR V -## Added +## Additions - Agent/Device Counter Collection (API) -- JSON output format support (tool) +- Single JSON output format support (tool) - Perfetto output format support(.pftrace) (tool) - Input YAML support for counter collection (tool) - Input JSON support for counter collection (tool) +- Application Replay (Counter collection) - PC Sampling (Beta)(API) - ROCProf V3 Multi-GPU Support: - - Merged files - Multi-process (multiple files) -## Fixed +## Fixes - SQ_ACCUM_PREV and SQ_ACCUM_PREV_HIRE overwriting issue -## Changed +## Changes - rocprofv3 tool now needs `--` in front of application. For detailed uses, please [Click Here](source/docs/rocprofv3.md) diff --git a/README.md b/README.md index 9e5090ca..6cab558c 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,11 @@ # ROCprofiler-SDK: Application Profiling, Tracing, and Performance Analysis -*** -Note: rocprofiler-sdk is currently `not` supported as part of the public ROCm software stack and is only distributed as a beta -release to customers. -*** +> [!NOTE] +Note: rocprofiler-sdk is currently considered a beta version and is subject to change in future releases ## Overview -ROCProfiler-SDK is AMD’s new and improved tooling infrastructure, providing a hardware-specific low-level performance analysis interface for profiling and tracing GPU compute applications. To see what's changed [Click Here](source/docs/about.md) +ROCProfiler-SDK is AMD’s new and improved tooling infrastructure, providing a hardware-specific low-level performance analysis interface for profiling and tracing GPU compute applications. To see what's changed [Click Here](source/docs/index.md) ## GPU Metrics @@ -57,7 +55,7 @@ To install ROCprofiler, run: cmake --build rocprofiler-sdk-build --target install ``` -Please see the detailed section on build and installation here: [Click Here](/source/docs/installation.md) +Please see the detailed section on build and installation here: [Click Here](source/docs/installation.md) ## Support @@ -80,3 +78,6 @@ Please report in the Github Issues. - Timestamps in PC sampling records might not be 100% accurate. - Using PC sampling on multi-threaded applications might fail with `HSA_STATUS_ERROR_EXCEPTION`.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the `HSA_STATUS_ERROR_EXCEPTION` might appear. + +> [!WARNING] +> The latest mainline version of AQLprofile can be found at [https://repo.radeon.com/rocm/misc/aqlprofile/](https://repo.radeon.com/rocm/misc/aqlprofile/). However, it's important to note that updates to the public AQLProfile may not occur as frequently as updates to the rocprofiler-sdk. This discrepancy could lead to a potential mismatch between the AQLprofile binary and the rocprofiler-sdk source. diff --git a/samples/advanced_thread_trace/client.cpp b/samples/advanced_thread_trace/client.cpp index de3ed947..ee4b787b 100644 --- a/samples/advanced_thread_trace/client.cpp +++ b/samples/advanced_thread_trace/client.cpp @@ -117,7 +117,7 @@ struct isa_map_elem_t { std::atomic hitcount{0}; std::atomic latency{0}; - std::shared_ptr code_line{nullptr}; + std::unique_ptr code_line{nullptr}; }; struct ToolData @@ -183,58 +183,53 @@ struct source_location struct trace_data_t { - int64_t id; - uint8_t* data; - uint64_t size; - ToolData* tool; + int64_t id; + uint8_t* data; + uint64_t size; }; +auto* tool = new ToolData{}; + void tool_codeobj_tracing_callback(rocprofiler_callback_tracing_record_t record, - rocprofiler_user_data_t* user_data, - void* callback_data) + rocprofiler_user_data_t* /* user_data */, + void* /* callback_data */) { C_API_BEGIN if(record.kind != ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT) return; if(record.phase != ROCPROFILER_CALLBACK_PHASE_LOAD) return; - assert(callback_data && "Shader callback passed null!"); - ToolData& tool = *reinterpret_cast(callback_data); - if(record.operation == ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER) { - std::unique_lock lg(tool.isa_map_mut); + std::unique_lock lg(tool->isa_map_mut); auto* data = static_cast(record.payload); - tool.kernel_id_to_kernel_name.emplace(data->kernel_id, data->kernel_name); + tool->kernel_id_to_kernel_name.emplace(data->kernel_id, data->kernel_name); } if(record.operation != ROCPROFILER_CODE_OBJECT_LOAD) return; - std::unique_lock lg(tool.isa_map_mut); + std::unique_lock lg(tool->isa_map_mut); auto* data = static_cast(record.payload); if(std::string_view(data->uri).find("file:///") == 0) { - tool.codeobjTranslate.addDecoder( + tool->codeobjTranslate.addDecoder( data->uri, data->code_object_id, data->load_delta, data->load_size); - auto symbolmap = tool.codeobjTranslate.getSymbolMap(data->code_object_id); + auto symbolmap = tool->codeobjTranslate.getSymbolMap(data->code_object_id); for(auto& [vaddr, symbol] : symbolmap) - tool.kernels_in_codeobj[vaddr] = symbol; + tool->kernels_in_codeobj[vaddr] = symbol; } else if(COPY_MEMORY_CODEOBJ) { - tool.codeobjTranslate.addDecoder(reinterpret_cast(data->memory_base), - data->memory_size, - data->code_object_id, - data->load_delta, - data->load_size); - auto symbolmap = tool.codeobjTranslate.getSymbolMap(data->code_object_id); + tool->codeobjTranslate.addDecoder(reinterpret_cast(data->memory_base), + data->memory_size, + data->code_object_id, + data->load_delta, + data->load_size); + auto symbolmap = tool->codeobjTranslate.getSymbolMap(data->code_object_id); for(auto& [vaddr, symbol] : symbolmap) - tool.kernels_in_codeobj[vaddr] = symbol; + tool->kernels_in_codeobj[vaddr] = symbol; } - - (void) user_data; - (void) callback_data; C_API_END } @@ -243,20 +238,20 @@ dispatch_callback(rocprofiler_queue_id_t /* queue_id */, const rocprofiler_agent_t* /* agent */, rocprofiler_correlation_id_t /* correlation_id */, rocprofiler_kernel_id_t kernel_id, - void* userdata) + rocprofiler_dispatch_id_t /* dispatch_id */, + rocprofiler_user_data_t* /* userdata */, + void* /* userdata */) { C_API_BEGIN - assert(userdata && "Dispatch callback passed null!"); - ToolData& tool = *reinterpret_cast(userdata); - std::shared_lock lg(tool.isa_map_mut); + std::shared_lock lg(tool->isa_map_mut); static std::atomic call_id{0}; static std::string_view desired_func_name = "transposeLds"; try { - auto& kernel_name = tool.kernel_id_to_kernel_name.at(kernel_id); + auto& kernel_name = tool->kernel_id_to_kernel_name.at(kernel_id); if(kernel_name.find(desired_func_name) == std::string::npos) return ROCPROFILER_ATT_CONTROL_NONE; @@ -276,29 +271,26 @@ get_trace_data(rocprofiler_att_parser_data_type_t type, void* att_data, void* us { C_API_BEGIN assert(userdata && "ISA callback passed null!"); - trace_data_t& trace_data = *reinterpret_cast(userdata); - assert(trace_data.tool && "ISA callback passed null!"); - ToolData& tool = *reinterpret_cast(trace_data.tool); - std::shared_lock shared_lock(tool.isa_map_mut); + std::shared_lock shared_lock(tool->isa_map_mut); - if(type == ROCPROFILER_ATT_PARSER_DATA_TYPE_OCCUPANCY) tool.num_waves++; + if(type == ROCPROFILER_ATT_PARSER_DATA_TYPE_OCCUPANCY) tool->num_waves++; if(type != ROCPROFILER_ATT_PARSER_DATA_TYPE_ISA) return; auto& event = *reinterpret_cast(att_data); pcinfo_t pc{event.marker_id, event.offset}; - auto it = tool.isa_map.find(pc); - if(it == tool.isa_map.end()) + auto it = tool->isa_map.find(pc); + if(it == tool->isa_map.end()) { shared_lock.unlock(); { - std::unique_lock unique_lock(tool.isa_map_mut); + std::unique_lock unique_lock(tool->isa_map_mut); auto ptr = std::make_unique(); try { - ptr->code_line = tool.codeobjTranslate.get(pc.marker_id, pc.addr); + ptr->code_line = tool->codeobjTranslate.get(pc.marker_id, pc.addr); } catch(std::exception& e) { std::cerr << pc.marker_id << ":" << pc.addr << ' ' << e.what() << std::endl; @@ -308,7 +300,7 @@ get_trace_data(rocprofiler_att_parser_data_type_t type, void* att_data, void* us std::cerr << "Could not fetch: " << pc.marker_id << ':' << pc.addr << std::endl; return; } - it = tool.isa_map.emplace(pc, std::move(ptr)).first; + it = tool->isa_map.emplace(pc, std::move(ptr)).first; } shared_lock.lock(); } @@ -339,15 +331,12 @@ isa_callback(char* isa_instruction, { C_API_BEGIN assert(userdata && "ISA callback passed null!"); - trace_data_t& trace_data = *reinterpret_cast(userdata); - assert(trace_data.tool && "ISA callback passed null!"); - ToolData& tool = *reinterpret_cast(trace_data.tool); - std::shared_ptr instruction; + std::unique_ptr instruction; { - std::unique_lock unique_lock(tool.isa_map_mut); - instruction = tool.codeobjTranslate.get(marker_id, offset); + std::unique_lock unique_lock(tool->isa_map_mut); + instruction = tool->codeobjTranslate.get(marker_id, offset); } if(!instruction.get()) return ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT; @@ -364,23 +353,25 @@ isa_callback(char* isa_instruction, auto ptr = std::make_unique(); ptr->code_line = std::move(instruction); - tool.isa_map.emplace(pcinfo_t{marker_id, offset}, std::move(ptr)); - C_API_END + tool->isa_map.emplace(pcinfo_t{marker_id, offset}, std::move(ptr)); return ROCPROFILER_STATUS_SUCCESS; + C_API_END + return ROCPROFILER_STATUS_ERROR; } void -shader_data_callback(int64_t se_id, void* se_data, size_t data_size, void* userdata) +shader_data_callback(int64_t se_id, + void* se_data, + size_t data_size, + rocprofiler_user_data_t /* userdata */) { C_API_BEGIN - assert(userdata && "Shader callback passed null!"); - ToolData& tool = *reinterpret_cast(userdata); { - std::unique_lock lk(tool.output_mut); - tool.output() << "SE ID: " << se_id << " with size " << data_size << std::hex << '\n'; + std::unique_lock lk(tool->output_mut); + tool->output() << "SE ID: " << se_id << " with size " << data_size << std::hex << '\n'; } - trace_data_t data{.id = se_id, .data = (uint8_t*) se_data, .size = data_size, .tool = &tool}; + trace_data_t data{.id = se_id, .data = (uint8_t*) se_data, .size = data_size}; auto status = rocprofiler_att_parse_data(copy_trace_data, get_trace_data, isa_callback, &data); if(status != ROCPROFILER_STATUS_SUCCESS) std::cerr << "shader_data_callback failed with status " << status << std::endl; @@ -402,18 +393,18 @@ tool_init(rocprofiler_client_finalize_t fini_func, void* tool_data) tool_data), "code object tracing service configure"); - std::vector parameters; - parameters.push_back({ROCPROFILER_ATT_PARAMETER_TARGET_CU, TARGET_CU}); - parameters.push_back({ROCPROFILER_ATT_PARAMETER_SIMD_SELECT, SIMD_SELECT}); - parameters.push_back({ROCPROFILER_ATT_PARAMETER_BUFFER_SIZE, BUFFER_SIZE}); - parameters.push_back({ROCPROFILER_ATT_PARAMETER_SHADER_ENGINE_MASK, SE_MASK}); - - ROCPROFILER_CALL(rocprofiler_configure_thread_trace_service(client_ctx, - parameters.data(), - parameters.size(), - dispatch_callback, - shader_data_callback, - tool_data), + std::vector parameters = { + {ROCPROFILER_ATT_PARAMETER_TARGET_CU, {TARGET_CU}}, + {ROCPROFILER_ATT_PARAMETER_SIMD_SELECT, {SIMD_SELECT}}, + {ROCPROFILER_ATT_PARAMETER_BUFFER_SIZE, {BUFFER_SIZE}}, + {ROCPROFILER_ATT_PARAMETER_SHADER_ENGINE_MASK, {SE_MASK}}}; + + ROCPROFILER_CALL(rocprofiler_configure_dispatch_thread_trace_service(client_ctx, + parameters.data(), + parameters.size(), + dispatch_callback, + shader_data_callback, + tool_data), "thread trace service configure"); int valid_ctx = 0; @@ -434,17 +425,14 @@ tool_init(rocprofiler_client_finalize_t fini_func, void* tool_data) } void -tool_fini(void* tool_data) +tool_fini(void* /* data */) { - assert(tool_data && "tool_fini callback passed null!"); - ToolData& tool = *reinterpret_cast(tool_data); - - std::unique_lock isa_lk(tool.isa_map_mut); - std::unique_lock out_lk(tool.output_mut); + std::unique_lock isa_lk(client::tool->isa_map_mut); + std::unique_lock out_lk(client::tool->output_mut); // Find largest instruction size_t max_inst_size = 0; - for(auto& [addr, lines] : tool.isa_map) + for(auto& [addr, lines] : client::tool->isa_map) if(lines.get()) max_inst_size = std::max(max_inst_size, lines->code_line->inst.size()); std::string empty_space; @@ -460,16 +448,16 @@ tool_fini(void* tool_data) size_t vector_exec = 0; size_t other_exec = 0; - for(auto& [addr, line] : tool.isa_map) + for(auto& [addr, line] : client::tool->isa_map) if(line.get()) { size_t hitcount = line->hitcount.load(std::memory_order_relaxed); size_t latency = line->latency.load(std::memory_order_relaxed); auto& code_line = line->code_line->inst; - tool.output() << std::hex << "0x" << addr.addr << std::dec << ' ' << code_line - << empty_space.substr(0, max_inst_size - code_line.size()) - << " Hit: " << hitcount << " - Latency: " << latency << '\n'; + client::tool->output() << std::hex << "0x" << addr.addr << std::dec << ' ' << code_line + << empty_space.substr(0, max_inst_size - code_line.size()) + << " Hit: " << hitcount << " - Latency: " << latency << '\n'; if(code_line.find("s_waitcnt") == 0) { @@ -502,14 +490,17 @@ tool_fini(void* tool_data) float vmc_fraction = 100 * vmc_latency / float(total_latency); float lgk_fraction = 100 * lgk_latency / float(total_latency); - tool.output() << "Total executed instructions: " << total_exec << '\n' - << "Total executed vector instructions: " << vector_exec << " with average " - << vector_latency / float(vector_exec) << " cycles.\n" - << "Total executed scalar instructions: " << scalar_exec << " with average " - << scalar_latency / float(scalar_exec) << " cycles.\n" - << "Vector memory ops occupied: " << vmc_fraction << "% of cycles.\n" - << "Scalar and LDS memory ops occupied: " << lgk_fraction << "% of cycles.\n" - << "Num waves created: " << (tool.num_waves / 2) << std::endl; + client::tool->output() << "Total executed instructions: " << total_exec << '\n' + << "Total executed vector instructions: " << vector_exec + << " with average " << vector_latency / float(vector_exec) + << " cycles.\n" + << "Total executed scalar instructions: " << scalar_exec + << " with average " << scalar_latency / float(scalar_exec) + << " cycles.\n" + << "Vector memory ops occupied: " << vmc_fraction << "% of cycles.\n" + << "Scalar and LDS memory ops occupied: " << lgk_fraction + << "% of cycles.\n" + << "Num waves created: " << (client::tool->num_waves / 2) << std::endl; } } // namespace client @@ -538,14 +529,12 @@ rocprofiler_configure(uint32_t version, std::clog << info.str() << std::endl; - auto* data = new client::ToolData{}; - // create configure data static auto cfg = rocprofiler_tool_configure_result_t{sizeof(rocprofiler_tool_configure_result_t), &client::tool_init, &client::tool_fini, - reinterpret_cast(data)}; + nullptr}; // return pointer to configure data return &cfg; diff --git a/samples/code_object_isa_decode/client.cpp b/samples/code_object_isa_decode/client.cpp index 663f9eee..255bbd85 100644 --- a/samples/code_object_isa_decode/client.cpp +++ b/samples/code_object_isa_decode/client.cpp @@ -152,15 +152,17 @@ tool_codeobj_tracing_callback(rocprofiler_callback_tracing_record_t record, << std::dec << ". Printing first 64 bytes:" << std::endl; std::unordered_set references{}; - int num_waitcnts = 0; - int num_scalar = 0; - int num_vector = 0; - int num_other = 0; + + int num_waitcnts = 0; + int num_scalar = 0; + int num_vector = 0; + int num_other = 0; size_t vaddr = begin_end.first; while(vaddr < begin_end.second) { auto inst = codeobjTranslate.get(vaddr); + assert(inst != nullptr); if(inst->comment.size()) { std::string_view source = inst->comment; diff --git a/samples/common/defines.hpp b/samples/common/defines.hpp index 279a6374..0137e597 100644 --- a/samples/common/defines.hpp +++ b/samples/common/defines.hpp @@ -21,41 +21,47 @@ // SOFTWARE. #pragma once +#define ROCPROFILER_VAR_NAME_COMBINE(X, Y) X##Y +#define ROCPROFILER_VARIABLE(X, Y) ROCPROFILER_VAR_NAME_COMBINE(X, Y) #define ROCPROFILER_WARN(result) \ { \ - rocprofiler_status_t CHECKSTATUS = result; \ - if(CHECKSTATUS != ROCPROFILER_STATUS_SUCCESS) \ + rocprofiler_status_t ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) = result; \ + if(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) != ROCPROFILER_STATUS_SUCCESS) \ { \ - std::string status_msg = rocprofiler_get_status_string(CHECKSTATUS); \ + std::string status_msg = \ + rocprofiler_get_status_string(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__)); \ std::cerr << "[" << __FILE__ << ":" << __LINE__ << "] " << #result \ - << " returned error code " << CHECKSTATUS << ": " << status_msg \ - << ". This is just a warning!" << std::endl; \ + << " returned error code " << ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) \ + << ": " << status_msg << ". This is just a warning!" << std::endl; \ } \ } #define ROCPROFILER_CHECK(result) \ { \ - rocprofiler_status_t CHECKSTATUS = result; \ - if(CHECKSTATUS != ROCPROFILER_STATUS_SUCCESS) \ + rocprofiler_status_t ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) = result; \ + if(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) != ROCPROFILER_STATUS_SUCCESS) \ { \ - std::string status_msg = rocprofiler_get_status_string(CHECKSTATUS); \ + std::string status_msg = \ + rocprofiler_get_status_string(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__)); \ std::stringstream errmsg{}; \ errmsg << "[" << __FILE__ << ":" << __LINE__ << "] " << #result \ - << " failed with error code " << CHECKSTATUS << " :: " << status_msg; \ + << " failed with error code " << ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) \ + << " :: " << status_msg; \ throw std::runtime_error(errmsg.str()); \ } \ } #define ROCPROFILER_CALL(result, msg) \ { \ - rocprofiler_status_t CHECKSTATUS = result; \ - if(CHECKSTATUS != ROCPROFILER_STATUS_SUCCESS) \ + rocprofiler_status_t ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) = result; \ + if(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) != ROCPROFILER_STATUS_SUCCESS) \ { \ - std::string status_msg = rocprofiler_get_status_string(CHECKSTATUS); \ + std::string status_msg = \ + rocprofiler_get_status_string(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__)); \ std::cerr << "[" #result "][" << __FILE__ << ":" << __LINE__ << "] " << msg \ - << " failed with error code " << CHECKSTATUS << ": " << status_msg \ - << std::endl; \ + << " failed with error code " << ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) \ + << ": " << status_msg << std::endl; \ std::stringstream errmsg{}; \ errmsg << "[" #result "][" << __FILE__ << ":" << __LINE__ << "] " << msg " failure (" \ << status_msg << ")"; \ diff --git a/samples/counter_collection/client.cpp b/samples/counter_collection/client.cpp index 9b115135..aa4b5b38 100644 --- a/samples/counter_collection/client.cpp +++ b/samples/counter_collection/client.cpp @@ -75,9 +75,38 @@ get_buffer() } /** - * Buffer callback called when the buffer is full. rocprofiler_record_header_t + * For a given counter, query the dimensions that it has. Typically you will + * want to call this function once to get the dimensions and cache them. + */ +std::vector +counter_dimensions(rocprofiler_counter_id_t counter) +{ + std::vector dims; + rocprofiler_available_dimensions_cb_t cb = + [](rocprofiler_counter_id_t, + const rocprofiler_record_dimension_info_t* dim_info, + size_t num_dims, + void* user_data) { + std::vector* vec = + static_cast*>(user_data); + for(size_t i = 0; i < num_dims; i++) + { + vec->push_back(dim_info[i]); + } + return ROCPROFILER_STATUS_SUCCESS; + }; + ROCPROFILER_CALL(rocprofiler_iterate_counter_dimensions(counter, cb, &dims), + "Could not iterate counter dimensions"); + return dims; +} + +/** + * buffered_callback (set in rocprofiler_create_buffer in tool_init) is called when the + * buffer is full (or when the buffer is flushed). The callback is responsible for processing + * the records in the buffer. The records are returned in the headers array. The headers * can contain counter records as well as other records (such as tracing). These - * records need to be filtered based on the category type. + * records need to be filtered based on the category type. For counter collection, + * they should be filtered by category == ROCPROFILER_BUFFER_CATEGORY_COUNTERS. */ void buffered_callback(rocprofiler_context_id_t, @@ -87,9 +116,6 @@ buffered_callback(rocprofiler_context_id_t, void* user_data, uint64_t) { - static int enter_count = 0; - enter_count++; - if(enter_count % 100 != 0) return; std::stringstream ss; // Iterate through the returned records for(size_t i = 0; i < num_headers; ++i) @@ -110,8 +136,20 @@ buffered_callback(rocprofiler_context_id_t, { // Print the returned counter data. auto* record = static_cast(header->payload); - ss << " (Dispatch_Id: " << record->dispatch_id << " Id: " << record->id - << " Value [D]: " << record->counter_value << "),"; + rocprofiler_counter_id_t counter_id = {.handle = 0}; + + rocprofiler_query_record_counter_id(record->id, &counter_id); + + ss << " (Dispatch_Id: " << record->dispatch_id << " Counter_Id: " << counter_id.handle + << " Record_Id: " << record->id << " Dimensions: ["; + + for(auto& dim : counter_dimensions(counter_id)) + { + size_t pos = 0; + rocprofiler_query_record_dimension_position(record->id, dim.id, &pos); + ss << "{" << dim.name << ": " << pos << "},"; + } + ss << "] Value [D]: " << record->counter_value << "),"; } } @@ -121,12 +159,19 @@ buffered_callback(rocprofiler_context_id_t, *output_stream << "[" << __FUNCTION__ << "] " << ss.str() << "\n"; } +/** + * Cache to store the profile configs for each agent. This is used to prevent + * constructing the same profile config multiple times. Used by dispatch_callback + * to select the profile config (and in turn counters) to use when a kernel dispatch + * is received. + */ std::unordered_map& get_profile_cache() { static std::unordered_map profile_cache; return profile_cache; } + /** * Callback from rocprofiler when an kernel dispatch is enqueued into the HSA queue. * rocprofiler_profile_config_id_t* is a return to specify what counters to collect @@ -142,9 +187,7 @@ dispatch_callback(rocprofiler_profile_counting_dispatch_data_t dispatch_data, /** * This simple example uses the same profile counter set for all agents. * We store this in a cache to prevent constructing many identical profile counter - * sets. We first check the cache to see if we have already constructed a counter" - * set for the agent. If we have, return it. Otherwise, construct a new profile counter - * set. + * sets. */ auto search_cache = [&]() { if(auto pos = get_profile_cache().find(dispatch_data.dispatch_info.agent_id.handle); @@ -163,12 +206,21 @@ dispatch_callback(rocprofiler_profile_counting_dispatch_data_t dispatch_data, } } +/** + * Construct a profile config for an agent. This function takes an agent (obtained from + * get_gpu_device_agents()) and a set of counter names to collect. It returns a profile + * that can be used when a dispatch is received for the agent to collect the specified + * counters. Note: while you can dynamically create these profiles, it is more efficient + * to consturct them once in advance (i.e. in tool_init()) since there are non-trivial + * costs associated with constructing the profile. + */ rocprofiler_profile_config_id_t -build_profile_for_agent(rocprofiler_agent_id_t agent) +build_profile_for_agent(rocprofiler_agent_id_t agent, + const std::set& counters_to_collect) { - std::set counters_to_collect = {"SQ_WAVES"}; std::vector gpu_counters; + // Iterate all the counters on the agent and store them in gpu_counters. ROCPROFILER_CALL(rocprofiler_iterate_agent_supported_counters( agent, [](rocprofiler_agent_id_t, @@ -186,6 +238,7 @@ build_profile_for_agent(rocprofiler_agent_id_t agent) static_cast(&gpu_counters)), "Could not fetch supported counters"); + // Find the counters we actually want to collect (i.e. those in counters_to_collect) std::vector collect_counters; for(auto& counter : gpu_counters) { @@ -201,6 +254,7 @@ build_profile_for_agent(rocprofiler_agent_id_t agent) } } + // Create and return the profile rocprofiler_profile_config_id_t profile; ROCPROFILER_CALL(rocprofiler_create_profile_config( agent, collect_counters.data(), collect_counters.size(), &profile), @@ -209,21 +263,17 @@ build_profile_for_agent(rocprofiler_agent_id_t agent) return profile; } -int -tool_init(rocprofiler_client_finalize_t, void* user_data) +/** + * Returns all GPU agents visible to rocprofiler on the system + */ +std::vector +get_gpu_device_agents() { - ROCPROFILER_CALL(rocprofiler_create_context(&get_client_ctx()), "context creation failed"); + std::vector agents; - ROCPROFILER_CALL(rocprofiler_create_buffer(get_client_ctx(), - 4096, - 2048, - ROCPROFILER_BUFFER_POLICY_LOSSLESS, - buffered_callback, - user_data, - &get_buffer()), - "buffer creation failed"); - - std::vector agents; + // Callback used by rocprofiler_query_available_agents to return + // agents on the device. This can include CPU agents as well. We + // select GPU agents only (i.e. type == ROCPROFILER_AGENT_TYPE_GPU) rocprofiler_query_available_agents_cb_t iterate_cb = [](rocprofiler_agent_version_t agents_ver, const void** agents_arr, size_t num_agents, @@ -232,25 +282,45 @@ tool_init(rocprofiler_client_finalize_t, void* user_data) throw std::runtime_error{"unexpected rocprofiler agent version"}; auto* agents_v = static_cast*>(udata); for(size_t i = 0; i < num_agents; ++i) - agents_v->emplace_back(*static_cast(agents_arr[i])); + { + const auto* agent = static_cast(agents_arr[i]); + if(agent->type == ROCPROFILER_AGENT_TYPE_GPU) agents_v->emplace_back(*agent); + } return ROCPROFILER_STATUS_SUCCESS; }; + // Query the agents, only a single callback is made that contains a vector + // of all agents. ROCPROFILER_CALL( rocprofiler_query_available_agents(ROCPROFILER_AGENT_INFO_VERSION_0, iterate_cb, sizeof(rocprofiler_agent_t), const_cast(static_cast(&agents))), "query available agents"); + return agents; +} - // Construct the profiles in advance for each agent that is a GPU - for(const auto& agent : agents) - { - if(agent.type == ROCPROFILER_AGENT_TYPE_GPU) - { - get_profile_cache().emplace(agent.id.handle, build_profile_for_agent(agent.id)); - } - } +/** + * Initialize the tool. This function is called once when the tool is loaded. + * The function is responsible for creating the context, buffer, profile configs + * (details counters to collect on each agent), configuring the dispatch profile + * counting service, and starting the context. + */ +int +tool_init(rocprofiler_client_finalize_t, void* user_data) +{ + ROCPROFILER_CALL(rocprofiler_create_context(&get_client_ctx()), "context creation failed"); + ROCPROFILER_CALL(rocprofiler_create_buffer(get_client_ctx(), + 4096, + 2048, + ROCPROFILER_BUFFER_POLICY_LOSSLESS, + buffered_callback, + user_data, + &get_buffer()), + "buffer creation failed"); + + // Get a vector of all GPU devices on the system. + auto agents = get_gpu_device_agents(); if(agents.empty()) { @@ -258,14 +328,35 @@ tool_init(rocprofiler_client_finalize_t, void* user_data) return 1; } + // Construct the profiles in advance for each agent that is a GPU + for(const auto& agent : agents) + { + // get_profile_cache() is a map that can be accessed by dispatch_callback + // below to select the profile config to use when a kernel dispatch is + // recieved. + get_profile_cache().emplace( + agent.id.handle, build_profile_for_agent(agent.id, std::set{"SQ_WAVES"})); + } + auto client_thread = rocprofiler_callback_thread_t{}; + // Create the callback thread ROCPROFILER_CALL(rocprofiler_create_callback_thread(&client_thread), "failure creating callback thread"); + // Create the buffer and assign the callback thread to the buffer, when the buffer is full + // a callback will be issued (to client_thread) ROCPROFILER_CALL(rocprofiler_assign_callback_thread(get_buffer(), client_thread), "failed to assign thread for buffer"); + + // Setup the dispatch profile counting service. This service will trigger the dispatch_callback + // when a kernel dispatch is enqueued into the HSA queue. The callback will specify what + // counters to collect by returning a profile config id. In this example, we create the profile + // configs above and store them in the map get_profile_cache() so we can look them up at + // dispatch. ROCPROFILER_CALL(rocprofiler_configure_buffered_dispatch_profile_counting_service( get_client_ctx(), get_buffer(), dispatch_callback, nullptr), "Could not setup buffered service"); + + // Start the context (start intercepting kernel dispatches). ROCPROFILER_CALL(rocprofiler_start_context(get_client_ctx()), "start context"); // no errors @@ -276,6 +367,8 @@ void tool_fini(void* user_data) { std::clog << "In tool fini\n"; + + // Flush the buffer and stop the context ROCPROFILER_CALL(rocprofiler_flush_buffer(get_buffer()), "buffer flush"); rocprofiler_stop_context(get_client_ctx()); diff --git a/samples/counter_collection/print_functional_counters.cpp b/samples/counter_collection/print_functional_counters.cpp index 4c3da4ef..af2b5e6f 100644 --- a/samples/counter_collection/print_functional_counters.cpp +++ b/samples/counter_collection/print_functional_counters.cpp @@ -4,12 +4,11 @@ #include #include #include -#include #include #include -#include #include +#include #include #include @@ -308,13 +307,14 @@ dispatch_callback(rocprofiler_profile_counting_dispatch_data_t dispatch_data, rocprofiler_profile_config_id_t profile; // Select the next counter to collect. - ROCPROFILER_CALL( - rocprofiler_create_profile_config( - dispatch_data.dispatch_info.agent_id, &(cap.remaining.back()), 1, &profile), - "Could not construct profile cfg"); + if(rocprofiler_create_profile_config( + dispatch_data.dispatch_info.agent_id, &(cap.remaining.back()), 1, &profile) == + ROCPROFILER_STATUS_SUCCESS) + { + *config = profile; + } cap.remaining.pop_back(); - *config = profile; } int diff --git a/samples/pc_sampling/pcs.cpp b/samples/pc_sampling/pcs.cpp index 307f292d..24167715 100644 --- a/samples/pc_sampling/pcs.cpp +++ b/samples/pc_sampling/pcs.cpp @@ -160,11 +160,12 @@ query_avail_configs_for_agent(tool_agent_info* agent_info) { // The query operation failed, so consider the PC sampling is unsupported at the agent. // This can happen if the PC sampling service is invoked within the ROCgdb. - ss << "Querying PC sampling capabilities failed with status: " << status << std::endl; + ss << "Querying PC sampling capabilities failed with status=" << status + << " :: " << rocprofiler_get_status_string(status) << std::endl; *utils::get_output_stream() << ss.str() << std::endl; return false; } - else if(agent_info->avail_configs->size() == 0) + else if(agent_info->avail_configs->empty()) { // No available configuration at the moment, so mark the PC sampling as unsupported. return false; diff --git a/source/docs/.gitignore b/source/docs/.gitignore index 9db59a23..54b0d35c 100644 --- a/source/docs/.gitignore +++ b/source/docs/.gitignore @@ -3,3 +3,5 @@ /_doxygen /.gitinfo /*.dox +/.sass-cache +/_toc.yml diff --git a/source/docs/CMakeLists.txt b/source/docs/CMakeLists.txt index b4cd4dec..1e58f6eb 100644 --- a/source/docs/CMakeLists.txt +++ b/source/docs/CMakeLists.txt @@ -94,7 +94,7 @@ conda activate rocprofiler-docs which python -python -m pip install -r ${CMAKE_CURRENT_LIST_DIR}/requirements.txt +python -m pip install -r ${CMAKE_CURRENT_LIST_DIR}/sphinx/requirements.txt WORK_DIR=${PROJECT_SOURCE_DIR}/source/docs SOURCE_DIR=${PROJECT_SOURCE_DIR} diff --git a/source/docs/_toc.yml.in b/source/docs/_toc.yml.in new file mode 100644 index 00000000..6db2aeb1 --- /dev/null +++ b/source/docs/_toc.yml.in @@ -0,0 +1,24 @@ +# Anywhere {branch} is used, the branch name will be substituted. +# These comments will also be removed. +defaults: + numbered: True + maxdepth: 4 + +root: index +subtrees: + - caption: Table of Contents + entries: + - file: features + - file: installation + - file: tool_library_overview + - file: callback_services + - file: buffered_services + - file: pc_sampling + - file: intercept_table + - file: counter_collection_services + - file: _doxygen/html/index + - file: samples + - file: rocprofv3 + - caption: License + entries: + - file: license diff --git a/source/docs/about.md b/source/docs/about.md deleted file mode 100644 index 2de952a8..00000000 --- a/source/docs/about.md +++ /dev/null @@ -1,34 +0,0 @@ -# About - -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 -``` - -## Important Changes - -[Roctracer](https://github.com/ROCm/roctracer) and [rocprofiler (v1)](https://github.com/ROCm/rocprofiler) -have been combined into a single rocprofiler SDK and re-designed from scratch. The new rocprofiler API has been designed with some -new restrictions to avoid problems that plagued the former implementations. These restrictions enable more efficient implementations -and much better thread-safety. The most important restriction is the window for tools to inform rocprofiler about which services -the tool wishes to use (where "services" refers to the capabilities for API tracing, kernel tracing, etc.). - -In the former implementations, when one of the ROCm runtimes were initially loaded, a tool only had -to inform roctracer/rocprofiler that it wished to use its services at some point (e.g. calling `roctracer_init()`) -and were not required to specify which services it would eventually or potentially use. Thus, these libraries had to effectively prepare for -any service to be enable at any point in time -- which introduced unnecessary overhead when tools had no desire to use certain features and -made thread-safe data management difficult. For example, roctracer was required to _always_ install wrappers around _every_ runtime API function -and _always_ added extra overhead of indirection through the roctracer library and checks for the current service configuration (in a thread-safe manner). - -In the re-designed implementation, rocprofiler introduces the concept of a "context". Contexts are effectively -bundles of service configurations. Rocprofiler gives each tool _one_ opportunity to create as many contexts as necessary -- -for example, a tool can group all of the services into one context, create individual contexts for each service, or somewhere in between. -Due to this design choice change, rocprofiler now knows _exactly_ which services might be requested by the tool clients at any point in time. -This has several important implications: - -- rocprofiler does not have to unnecessarily prepare for services that are never used -- if no registered contexts requested tracing the HSA API, no wrappers need to be generated -- rocprofiler can perform more extensive checks during service specification and inform tools about potential issues very early on -- rocprofiler can allow multiple tools to use certain services simulatenously -- rocprofiler was able to improve thread-safety without introducing parallel bottlenecks -- rocprofiler can manage internal data and allocations more efficiently diff --git a/source/docs/buffered_services.md b/source/docs/buffered_services.md index 5bf893b6..dffea541 100644 --- a/source/docs/buffered_services.md +++ b/source/docs/buffered_services.md @@ -4,12 +4,6 @@ For the buffered approach, supported buffer record categories are enumerated in ## Buffered Tracing Services -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 -``` - ## Overview In buffered approach, callbacks are receieved for batches of records from an internal (background) thread. Supported buffered tracing services are enumerated in `rocprofiler_buffer_tracing_kind_t`. diff --git a/source/docs/callback_services.md b/source/docs/callback_services.md index dbdfe144..4cb2c43a 100644 --- a/source/docs/callback_services.md +++ b/source/docs/callback_services.md @@ -1,11 +1,5 @@ # Callback Tracing Services -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 -``` - ## Overview ## Code Object Tracing diff --git a/source/docs/conf.py b/source/docs/conf.py index 8f55aa99..f1c6a01c 100644 --- a/source/docs/conf.py +++ b/source/docs/conf.py @@ -9,9 +9,6 @@ # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. -# -# import os -# sys.path.insert(0, os.path.abspath('.')) import os import sys @@ -30,10 +27,36 @@ def install(package): # Check if we're running on Read the Docs' servers read_the_docs_build = os.environ.get("READTHEDOCS", None) == "True" +_srcdir = os.path.realpath(os.path.join(os.getcwd(), "../..")) + + +def build_doxyfile(): + sp.run( + [ + "cmake", + f"-DSOURCE_DIR={_srcdir}", + "-DPROJECT_NAME='Rocprofiler SDK'", + f"-P {_srcdir}/source/docs/generate-doxyfile.cmake", + ] + ) + + +def configure_version_header(): + sp.run( + [ + "cmake", + f"-S {_srcdir}/source/include/rocprofiler-sdk", + f"-B {_srcdir}/source/include/rocprofiler-sdk", + ] + ) + + +configure_version_header() +build_doxyfile() # -- Project information ----------------------------------------------------- project = "Rocprofiler SDK" -copyright = "2023, Advanced Micro Devices, Inc." +copyright = "2023-2024, Advanced Micro Devices, Inc." author = "Advanced Micro Devices, Inc." project_root = os.path.normpath(os.path.join(os.getcwd(), "..", "..")) @@ -41,123 +64,44 @@ def install(package): # The full version, including alpha/beta/rc tags release = version -_docdir = os.path.realpath(os.getcwd()) -_srcdir = os.path.realpath(os.path.join(os.getcwd(), "..")) -_sitedir = os.path.realpath(os.path.join(os.getcwd(), "..", "site")) -_staticdir = os.path.realpath(os.path.join(_docdir, "_static")) -_templatedir = os.path.realpath(os.path.join(_docdir, "_templates")) - -if not os.path.exists(_staticdir): - os.makedirs(_staticdir) - -if not os.path.exists(_templatedir): - os.makedirs(_templatedir) - - # -- General configuration --------------------------------------------------- -install("sphinx_rtd_theme") - # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ - "sphinx.ext.autodoc", - "sphinx.ext.doctest", - "sphinx.ext.todo", - "sphinx.ext.viewcode", - "sphinx.ext.githubpages", - "sphinx.ext.mathjax", - "sphinx.ext.autosummary", - "sphinx.ext.napoleon", - "sphinx_markdown_tables", - "recommonmark", - "sphinxcontrib.doxylink", + "rocm_docs", + "rocm_docs.doxygen", ] -source_suffix = { - ".rst": "restructuredtext", - ".md": "markdown", +doxygen_root = "." +doxysphinx_enabled = True +doxygen_project = { + "name": "rocprofiler-sdk", + "path": "_doxygen/xml", } +doxyfile = "rocprofiler-sdk.dox" -from recommonmark.parser import CommonMarkParser +external_projects_current_project = "rocprofiler-sdk" +external_projects = [] -source_parsers = {".md": CommonMarkParser} - -# Add any paths that contain templates here, relative to this directory. -templates_path = ["_templates"] - -# The master toctree document. master_doc = "index" - -# List of patterns, relative to source directory, that match files and -# directories to ignore when looking for source files. -# This pattern also affects html_static_path and html_extra_path. exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "README.md"] +external_toc_path = "./_toc.yml" -default_role = None +# Add any paths that contain templates here, relative to this directory. +templates_path = ["_templates"] +suppress_warnings = ["etoc.toctree"] # -- Options for HTML output ------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. -# -html_theme = "sphinx_rtd_theme" + +html_theme = "rocm_docs_theme" +html_theme_options = {"flavor": "rocm"} # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". -html_static_path = ["_static"] - -html_theme_options = { - # "analytics_id": "G-...", # Provided by Google in your dashboard - "logo_only": False, - "display_version": True, - "prev_next_buttons_location": "bottom", - "style_external_links": False, - # 'style_nav_header_background': 'white', - # Toc options - "collapse_navigation": True, - "sticky_navigation": True, - "navigation_depth": 4, - "includehidden": True, - "titles_only": False, -} - -doxygen_root = "_doxygen" # this is just a convenience variable -doxylink = { - "demo": ( # "demo" is the role name that you can later use in sphinx to reference this doxygen documentation (see below) - f"{doxygen_root}/tagfile.xml", # the first parameter of this tuple is the tagfile - f"{doxygen_root}/html", # the second parameter of this tuple is a relative path pointing from - # sphinx output directory to the doxygen output folder inside the output - # directory tree. - # Doxylink will use the tagfile to get the html file name of the symbol you want - # to link and then prefix it with this path to generate html links (-tags). - ), -} - -from pygments.styles import get_all_styles - -# The name of the Pygments (syntax highlighting) style to use. -styles = list(get_all_styles()) -preferences = ("emacs", "pastie", "colorful") -for pref in preferences: - if pref in styles: - pygments_style = pref - break - -from recommonmark.transform import AutoStructify - - -# app setup hook -def setup(app): - app.add_config_value( - "recommonmark_config", - { - "auto_toc_tree_section": "Contents", - "enable_eval_rst": True, - "enable_auto_doc_ref": False, - }, - True, - ) - app.add_transform(AutoStructify) +html_title = f"ROCprofiler-SDK {version} Documentation" diff --git a/source/docs/counter_collection_services.md b/source/docs/counter_collection_services.md new file mode 100644 index 00000000..a9605cbf --- /dev/null +++ b/source/docs/counter_collection_services.md @@ -0,0 +1,14 @@ +# Derived Metrics + +## Accumulate metric +### Expression + expr=accumulate(, ) +### Description +- The accumulate metric is used to sum the values of a basic level counter over a specified number of cycles. By setting the resolution parameter, you can control the frequency of the summing operation: + - HIGH_RES: Sums up the basic counter every clock cycle. Captures the value every single cycle for higher accuracy, suitable for fine-grained analysis. + - LOW_RES: Sums up the basic counter every four clock cycles. Reduces the data points and provides less detailed summing, useful for reducing data volume. + - NONE: Does nothing and is equivalent to collecting basic_level_counter. Outputs the value of the basic counter without any summing operation. + +### Usage (derived_counters.xml) + +- MeanOccupancyPerCU: This metric calculates the mean occupancy per compute unit. It uses the accumulate function with HIGH_RES to sum the SQ_LEVEL_WAVES counter at every clock cycle. This sum is then divided by GRBM_GUI_ACTIVE and the number of compute units (CU_NUM) to derive the mean occupancy. diff --git a/source/docs/developer_api.md b/source/docs/developer_api.md deleted file mode 100644 index 22d81369..00000000 --- a/source/docs/developer_api.md +++ /dev/null @@ -1,12 +0,0 @@ -# Developer API - -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 - - _doxygen/html/topics - _doxygen/html/files - _doxygen/html/annotated - _doxygen/html/classes -``` diff --git a/source/docs/environment.yml b/source/docs/environment.yml index 7668b49c..179ca354 100644 --- a/source/docs/environment.yml +++ b/source/docs/environment.yml @@ -127,16 +127,4 @@ dependencies: - zlib=1.2.13=hd590300_5 - zstd=1.5.5=hfc55251_0 - pip: - - click==8.1.7 - - click-log==0.4.0 - - doxysphinx==3.3.4 - - libsass==0.22.0 - - lxml==4.9.3 - - mpire==2.8.0 - - pyjson5==1.6.4 - - pyparsing==3.1.1 - - python-dateutil==2.8.2 - - six==1.16.0 - - sphinx-markdown==1.0.2 - - sphinxcontrib-doxylink==1.12.2 - - tqdm==4.66.1 + - -r ./sphinx/requirements.txt diff --git a/source/docs/features.md b/source/docs/features.md index dd4fbc43..ea85d4f5 100644 --- a/source/docs/features.md +++ b/source/docs/features.md @@ -1,11 +1,5 @@ # Features -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 -``` - ## Overview - Improved tool initialization diff --git a/source/docs/index.md b/source/docs/index.md index c973d757..4232318a 100644 --- a/source/docs/index.md +++ b/source/docs/index.md @@ -1,20 +1,28 @@ # Welcome to the [ROCprofiler](https://github.com/ROCm/rocprofiler-sdk) Documentation! -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 - :caption: Table of Contents +## Important Changes - about - features - installation - tool_library_overview - callback_services - buffered_services - pc_sampling - intercept_table - developer_api - samples - rocprofv3 -``` +[Roctracer](https://github.com/ROCm/roctracer) and [rocprofiler (v1)](https://github.com/ROCm/rocprofiler) +have been combined into a single rocprofiler SDK and re-designed from scratch. The new rocprofiler API has been designed with some +new restrictions to avoid problems that plagued the former implementations. These restrictions enable more efficient implementations +and much better thread-safety. The most important restriction is the window for tools to inform rocprofiler about which services +the tool wishes to use (where "services" refers to the capabilities for API tracing, kernel tracing, etc.). + +In the former implementations, when one of the ROCm runtimes were initially loaded, a tool only had +to inform roctracer/rocprofiler that it wished to use its services at some point (e.g. calling `roctracer_init()`) +and were not required to specify which services it would eventually or potentially use. Thus, these libraries had to effectively prepare for +any service to be enable at any point in time -- which introduced unnecessary overhead when tools had no desire to use certain features and +made thread-safe data management difficult. For example, roctracer was required to _always_ install wrappers around _every_ runtime API function +and _always_ added extra overhead of indirection through the roctracer library and checks for the current service configuration (in a thread-safe manner). + +In the re-designed implementation, rocprofiler introduces the concept of a "context". Contexts are effectively +bundles of service configurations. Rocprofiler gives each tool _one_ opportunity to create as many contexts as necessary -- +for example, a tool can group all of the services into one context, create individual contexts for each service, or somewhere in between. +Due to this design choice change, rocprofiler now knows _exactly_ which services might be requested by the tool clients at any point in time. +This has several important implications: + +- rocprofiler does not have to unnecessarily prepare for services that are never used -- if no registered contexts requested tracing the HSA API, no wrappers need to be generated +- rocprofiler can perform more extensive checks during service specification and inform tools about potential issues very early on +- rocprofiler can allow multiple tools to use certain services simulatenously +- rocprofiler was able to improve thread-safety without introducing parallel bottlenecks +- rocprofiler can manage internal data and allocations more efficiently diff --git a/source/docs/installation.md b/source/docs/installation.md index d094d33d..817724d9 100644 --- a/source/docs/installation.md +++ b/source/docs/installation.md @@ -1,11 +1,5 @@ # Installation -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 -``` - ## Operating System ROCprofiler is only supported on Linux. The following distributions are tested: diff --git a/source/docs/intercept_table.md b/source/docs/intercept_table.md index 5cd904ea..7a7c4049 100644 --- a/source/docs/intercept_table.md +++ b/source/docs/intercept_table.md @@ -1,9 +1,3 @@ # Runtime Intercept Tables -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 -``` - Discussion on how access the raw runtime intercept tables of HSA and HIP (i.e. ExaTracer requirements by LTTng). diff --git a/source/docs/license.rst b/source/docs/license.rst new file mode 100644 index 00000000..98f78498 --- /dev/null +++ b/source/docs/license.rst @@ -0,0 +1,5 @@ +======= +License +======= + +.. include:: ../../LICENSE diff --git a/source/docs/requirements.txt b/source/docs/requirements.txt deleted file mode 100644 index a932956c..00000000 --- a/source/docs/requirements.txt +++ /dev/null @@ -1,2 +0,0 @@ -doxysphinx -jsonschema2md diff --git a/source/docs/rocprofiler-sdk.dox.in b/source/docs/rocprofiler-sdk.dox.in index f505e83a..7188a97c 100644 --- a/source/docs/rocprofiler-sdk.dox.in +++ b/source/docs/rocprofiler-sdk.dox.in @@ -4,7 +4,7 @@ # Project related configuration options #--------------------------------------------------------------------------- DOXYFILE_ENCODING = UTF-8 -PROJECT_NAME = @PROJECT_NAME@ +PROJECT_NAME = @PROJECT_NAME@ Developer API PROJECT_NUMBER = @ROCPROFILER_VERSION@ PROJECT_BRIEF = "ROCm Profiling API and tools" PROJECT_LOGO = @@ -172,7 +172,7 @@ INPUT_FILTER = FILTER_PATTERNS = FILTER_SOURCE_FILES = NO FILTER_SOURCE_PATTERNS = -USE_MDFILE_AS_MAINPAGE = @SOURCE_DIR@/README.md +USE_MDFILE_AS_MAINPAGE = FORTRAN_COMMENT_AFTER = 72 #--------------------------------------------------------------------------- # Configuration options related to source browsing @@ -197,10 +197,10 @@ IGNORE_PREFIX = GENERATE_HTML = YES HTML_OUTPUT = html HTML_FILE_EXTENSION = .html -HTML_HEADER = -HTML_FOOTER = -HTML_STYLESHEET = -HTML_EXTRA_STYLESHEET = ../../external/doxygen-awesome-css/doxygen-awesome.css +HTML_HEADER = +HTML_FOOTER = +HTML_STYLESHEET = +HTML_EXTRA_STYLESHEET = HTML_EXTRA_FILES = HTML_COLORSTYLE = LIGHT HTML_COLORSTYLE_HUE = 220 @@ -235,7 +235,7 @@ QHG_LOCATION = GENERATE_ECLIPSEHELP = NO ECLIPSE_DOC_ID = org.doxygen.rocprofiler DISABLE_INDEX = NO -GENERATE_TREEVIEW = YES +GENERATE_TREEVIEW = NO FULL_SIDEBAR = NO ENUM_VALUES_PER_LINE = 1 TREEVIEW_WIDTH = 300 @@ -298,7 +298,7 @@ MAN_LINKS = YES #--------------------------------------------------------------------------- # Configuration options related to the XML output #--------------------------------------------------------------------------- -GENERATE_XML = NO +GENERATE_XML = YES XML_OUTPUT = xml XML_PROGRAMLISTING = YES XML_NS_MEMB_FILE_SCOPE = YES @@ -355,7 +355,7 @@ SKIP_FUNCTION_MACROS = NO # Configuration options related to external references #--------------------------------------------------------------------------- TAGFILES = -GENERATE_TAGFILE = _doxygen/tagfile.xml +GENERATE_TAGFILE = _doxygen/html/tagfile.xml ALLEXTERNALS = NO EXTERNAL_GROUPS = YES EXTERNAL_PAGES = YES @@ -385,7 +385,7 @@ GRAPHICAL_HIERARCHY = YES DIRECTORY_GRAPH = YES DIR_GRAPH_MAX_DEPTH = 1 DOT_IMAGE_FORMAT = svg -INTERACTIVE_SVG = NO +INTERACTIVE_SVG = YES DOT_PATH = @DOT_EXECUTABLE@ DOTFILE_DIRS = DIA_PATH = diff --git a/source/docs/rocprofv3.md b/source/docs/rocprofv3.md index 8dd9b1a1..e96b0150 100644 --- a/source/docs/rocprofv3.md +++ b/source/docs/rocprofv3.md @@ -178,7 +178,7 @@ To trace HIP runtime APIs, use: rocprofv3 --hip-trace -- < app_relative_path > ``` -**Note: The tracing and counter colleciton options generates an additional agent info file. See** [Agent Info](#agent-info) +**Note: The tracing and counter colleciton options generates an additional agent info file. The above command generates a `hip_api_trace.csv` file prefixed with the process ID. diff --git a/source/docs/samples.md b/source/docs/samples.md index 798bdd29..07c9167e 100644 --- a/source/docs/samples.md +++ b/source/docs/samples.md @@ -2,12 +2,6 @@ ## Running Samples -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 -``` - Samples and tool can be run in order to see the profiler in action. This section covers on how to build these samples and run the tool. Once the rocm build is installed, samples are installed under: diff --git a/source/docs/sphinx/requirements.in b/source/docs/sphinx/requirements.in new file mode 100644 index 00000000..221c9304 --- /dev/null +++ b/source/docs/sphinx/requirements.in @@ -0,0 +1 @@ +rocm-docs-core[api_reference]==1.4.0 diff --git a/source/docs/sphinx/requirements.txt b/source/docs/sphinx/requirements.txt new file mode 100644 index 00000000..9a5ca6b5 --- /dev/null +++ b/source/docs/sphinx/requirements.txt @@ -0,0 +1,169 @@ +# +# This file is autogenerated by pip-compile with Python 3.10 +# by the following command: +# +# pip-compile requirements.in +# +accessible-pygments==0.0.5 + # via pydata-sphinx-theme +alabaster==0.7.16 + # via sphinx +babel==2.15.0 + # via + # pydata-sphinx-theme + # sphinx +beautifulsoup4==4.12.3 + # via pydata-sphinx-theme +breathe==4.35.0 + # via rocm-docs-core +certifi==2024.6.2 + # via requests +cffi==1.16.0 + # via + # cryptography + # pynacl +charset-normalizer==3.3.2 + # via requests +click==8.1.7 + # via + # click-log + # doxysphinx + # sphinx-external-toc +click-log==0.4.0 + # via doxysphinx +cryptography==42.0.8 + # via pyjwt +deprecated==1.2.14 + # via pygithub +docutils==0.21.2 + # via + # breathe + # myst-parser + # pydata-sphinx-theme + # sphinx +doxysphinx==3.3.8 + # via rocm-docs-core +fastjsonschema==2.20.0 + # via rocm-docs-core +gitdb==4.0.11 + # via gitpython +gitpython==3.1.43 + # via rocm-docs-core +idna==3.7 + # via requests +imagesize==1.4.1 + # via sphinx +jinja2==3.1.4 + # via + # myst-parser + # sphinx +libsass==0.22.0 + # via doxysphinx +lxml==4.9.4 + # via doxysphinx +markdown-it-py==3.0.0 + # via + # mdit-py-plugins + # myst-parser +markupsafe==2.1.5 + # via jinja2 +mdit-py-plugins==0.4.1 + # via myst-parser +mdurl==0.1.2 + # via markdown-it-py +mpire==2.10.2 + # via doxysphinx +myst-parser==3.0.1 + # via rocm-docs-core +numpy==1.26.4 + # via doxysphinx +packaging==24.1 + # via + # pydata-sphinx-theme + # sphinx +pycparser==2.22 + # via cffi +pydata-sphinx-theme==0.15.3 + # via + # rocm-docs-core + # sphinx-book-theme +pygithub==2.3.0 + # via rocm-docs-core +pygments==2.18.0 + # via + # accessible-pygments + # mpire + # pydata-sphinx-theme + # sphinx +pyjson5==1.6.6 + # via doxysphinx +pyjwt[crypto]==2.8.0 + # via pygithub +pynacl==1.5.0 + # via pygithub +pyparsing==3.1.2 + # via doxysphinx +pyyaml==6.0.1 + # via + # myst-parser + # rocm-docs-core + # sphinx-external-toc +requests==2.32.3 + # via + # pygithub + # sphinx +rocm-docs-core[api-reference]==1.4.0 + # via -r requirements.in +smmap==5.0.1 + # via gitdb +snowballstemmer==2.2.0 + # via sphinx +soupsieve==2.5 + # via beautifulsoup4 +sphinx==7.3.7 + # via + # breathe + # myst-parser + # pydata-sphinx-theme + # rocm-docs-core + # sphinx-book-theme + # sphinx-copybutton + # sphinx-design + # sphinx-external-toc + # sphinx-notfound-page +sphinx-book-theme==1.1.3 + # via rocm-docs-core +sphinx-copybutton==0.5.2 + # via rocm-docs-core +sphinx-design==0.6.0 + # via rocm-docs-core +sphinx-external-toc==1.0.1 + # via rocm-docs-core +sphinx-notfound-page==1.0.2 + # via rocm-docs-core +sphinxcontrib-applehelp==1.0.8 + # via sphinx +sphinxcontrib-devhelp==1.0.6 + # via sphinx +sphinxcontrib-htmlhelp==2.0.5 + # via sphinx +sphinxcontrib-jsmath==1.0.1 + # via sphinx +sphinxcontrib-qthelp==1.0.7 + # via sphinx +sphinxcontrib-serializinghtml==1.1.10 + # via sphinx +tomli==2.0.1 + # via sphinx +tqdm==4.66.4 + # via mpire +typing-extensions==4.12.2 + # via + # pydata-sphinx-theme + # pygithub +urllib3==2.2.2 + # via + # pygithub + # requests +wrapt==1.16.0 + # via deprecated diff --git a/source/docs/tool_library_overview.md b/source/docs/tool_library_overview.md index 1625b071..e336063d 100644 --- a/source/docs/tool_library_overview.md +++ b/source/docs/tool_library_overview.md @@ -1,11 +1,5 @@ # Tool Library -```eval_rst -.. toctree:: - :glob: - :maxdepth: 4 -``` - ## Rocprofiler and ROCm Runtimes Design The ROCm runtimes are now designed to directly communicate with a new library called rocprofiler-register during their initialization. This library does cursory checks diff --git a/source/include/rocprofiler-sdk/agent.h b/source/include/rocprofiler-sdk/agent.h index 641ed793..473ce4b1 100644 --- a/source/include/rocprofiler-sdk/agent.h +++ b/source/include/rocprofiler-sdk/agent.h @@ -191,6 +191,22 @@ typedef struct rocprofiler_agent_v0_t ///< HSA_AMD_AGENT_INFO_DRIVER_NODE_ID property int32_t logical_node_id; ///< Logical sequence number. This will always be [0..N) where N is ///< the total number of agents + int32_t logical_node_type_id; + int32_t reserved_padding0; ///< padding logical_node_id to 64 bytes + + /// @var logical_node_type_id + /// @brief Logical sequence number with respect to other agents of same type. This will always + /// be [0..N) where N is the total number of X agents (where X is a ::rocprofiler_agent_type_t + /// value). This field is intended to help with environment variable indexing used to mask GPUs + /// at runtime (i.e. HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES) which start at zero and only + /// apply to GPUs, e.g., logical_node_type_id value for first GPU will be 0, second GPU will + /// have value of 1, etc., regardless of however many agents of a different type preceeded (and + /// thus increased the ::node_id or ::logical_node_id). + /// + /// Example: a system with 2 CPUs and 2 GPUs, where the node ids are 0=CPU, 1=GPU, 2=CPU, 3=GPU, + /// then then CPU node_ids 0 and 2 would have logical_node_type_id values of 0 and 1, + /// respectively, and GPU node_ids 1 and 3 would also have logical_node_type_id values of 0 + /// and 1. } rocprofiler_agent_v0_t; typedef rocprofiler_agent_v0_t rocprofiler_agent_t; diff --git a/source/include/rocprofiler-sdk/agent_profile.h b/source/include/rocprofiler-sdk/agent_profile.h index f339770e..caad421a 100644 --- a/source/include/rocprofiler-sdk/agent_profile.h +++ b/source/include/rocprofiler-sdk/agent_profile.h @@ -81,7 +81,6 @@ typedef void (*rocprofiler_agent_profile_callback_t)( * @param [in] cb Callback called when the context is started for the tool to specify what * counters to collect (rocprofiler_profile_config_id_t). * @param [in] user_data User supplied data to be passed to the callback cb when triggered - * @param [in] config_id Profile config detailing the counters to collect for this kernel * @return ::rocprofiler_status_t * @retval ::ROCPROFILER_STATUS_ERROR_CONTEXT_INVALID Returned if the context does not exist. * @retval ::ROCPROFILER_STATUS_ERROR_BUFFER_NOT_FOUND Returned if the buffer is not found. diff --git a/source/include/rocprofiler-sdk/amd_detail/CMakeLists.txt b/source/include/rocprofiler-sdk/amd_detail/CMakeLists.txt index 8bfe7fe5..1e0bdd70 100644 --- a/source/include/rocprofiler-sdk/amd_detail/CMakeLists.txt +++ b/source/include/rocprofiler-sdk/amd_detail/CMakeLists.txt @@ -3,7 +3,8 @@ # Installation of amd_detail headers # # -set(ROCPROFILER_AMD_DETAIL_HEADER_FILES thread_trace.h) +set(ROCPROFILER_AMD_DETAIL_HEADER_FILES thread_trace.h thread_trace_core.h + thread_trace_dispatch.h thread_trace_agent.h) install( FILES ${ROCPROFILER_AMD_DETAIL_HEADER_FILES} diff --git a/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/code_printing.hpp b/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/code_printing.hpp index 0ed53389..6beead3b 100644 --- a/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/code_printing.hpp +++ b/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/code_printing.hpp @@ -44,6 +44,8 @@ namespace codeobj { namespace disassembly { +using marker_id_t = segment::marker_id_t; + struct Instruction { Instruction() = default; @@ -51,53 +53,47 @@ struct Instruction : inst(std::move(_inst)) , size(_size) {} - std::string inst; - std::string comment; - uint64_t faddr; - uint64_t vaddr; - uint64_t ld_addr; - size_t size; -}; - -struct DSourceLine -{ - uint64_t vaddr; - uint64_t size; - std::string str; - uint64_t begin() const { return vaddr; } - bool inrange(uint64_t addr) const { return addr >= vaddr && addr < vaddr + size; } + std::string inst{}; + std::string comment{}; + uint64_t faddr{0}; + uint64_t vaddr{0}; + size_t size{0}; + uint64_t ld_addr{0}; // Instruction load address, if from loaded codeobj + marker_id_t codeobj_id{0}; // Instruction code object load id, if from loaded codeobj }; class CodeobjDecoderComponent { -public: - CodeobjDecoderComponent(const void* codeobj_data, uint64_t codeobj_size) + struct ProtectedFd { - m_fd = -1; + ProtectedFd(std::string_view uri) + { #if defined(_GNU_SOURCE) && defined(MFD_ALLOW_SEALING) && defined(MFD_CLOEXEC) - m_fd = ::memfd_create(m_uri.c_str(), MFD_ALLOW_SEALING | MFD_CLOEXEC); + m_fd = ::memfd_create(uri.data(), MFD_ALLOW_SEALING | MFD_CLOEXEC); #endif - if(m_fd == -1) // If fail, attempt under /tmp - m_fd = ::open("/tmp", O_TMPFILE | O_RDWR, 0666); - - if(m_fd == -1) - { - printf("could not create a temporary file for code object\n"); - return; + if(m_fd == -1) m_fd = ::open("/tmp", O_TMPFILE | O_RDWR, 0666); + if(m_fd == -1) throw std::runtime_error("Could not create a file for codeobj!"); } - - if(size_t size = ::write(m_fd, (const char*) codeobj_data, codeobj_size); - size != codeobj_size) + ~ProtectedFd() { - printf("could not write to the temporary file\n"); - return; + if(m_fd != -1) ::close(m_fd); } - ::lseek(m_fd, 0, SEEK_SET); - fsync(m_fd); + int m_fd{-1}; + }; + +public: + CodeobjDecoderComponent(const char* codeobj_data, uint64_t codeobj_size) + { + ProtectedFd prot(""); + if(::write(prot.m_fd, codeobj_data, codeobj_size) != static_cast(codeobj_size)) + throw std::runtime_error("Could not write to temporary file!"); + + ::lseek(prot.m_fd, 0, SEEK_SET); + fsync(prot.m_fd); m_line_number_map = {}; - std::unique_ptr dbg(dwarf_begin(m_fd, DWARF_C_READ), + std::unique_ptr dbg(dwarf_begin(prot.m_fd, DWARF_C_READ), [](Dwarf* _dbg) { dwarf_end(_dbg); }); if(dbg) @@ -105,7 +101,7 @@ class CodeobjDecoderComponent Dwarf_Off cu_offset{0}, next_offset; size_t header_size; - std::unordered_set used_addrs; + std::map line_addrs; while(!dwarf_nextcu( dbg.get(), cu_offset, &next_offset, &header_size, nullptr, nullptr, nullptr)) @@ -129,47 +125,42 @@ class CodeobjDecoderComponent std::string src = dwarf_linesrc(line, nullptr, nullptr); auto dwarf_line = src + ':' + std::to_string(line_number); - if(used_addrs.find(addr) != used_addrs.end()) + if(line_addrs.find(addr) != line_addrs.end()) { - size_t pos = m_line_number_map.lower_bound(addr); - m_line_number_map.data()[pos].str += ' ' + dwarf_line; + line_addrs.at(addr) += ' ' + dwarf_line; continue; } - used_addrs.insert(addr); - m_line_number_map.insert(DSourceLine{addr, 0, std::move(dwarf_line)}); + line_addrs.emplace(addr, std::move(dwarf_line)); } } cu_offset = next_offset; } - } - // Can throw - disassembly = - std::make_unique((const char*) codeobj_data, codeobj_size); - if(m_line_number_map.size()) - { - size_t total_size = 0; - for(size_t i = 0; i < m_line_number_map.size() - 1; i++) + auto it = line_addrs.begin(); + if(it != line_addrs.end()) { - size_t s = m_line_number_map.get(i + 1).vaddr - m_line_number_map.get(i).vaddr; - m_line_number_map.data()[i].size = s; - total_size += s; + while(std::next(it) != line_addrs.end()) + { + uint64_t delta = std::next(it)->first - it->first; + auto segment = segment::address_range_t{it->first, delta, 0}; + m_line_number_map.emplace(segment, std::move(it->second)); + it++; + } + auto segment = segment::address_range_t{it->first, codeobj_size - it->first, 0}; + m_line_number_map.emplace(segment, std::move(it->second)); } - m_line_number_map.back().size = std::max(total_size, codeobj_size) - total_size; } + + // Can throw + disassembly = std::make_unique(codeobj_data, codeobj_size); try { m_symbol_map = disassembly->GetKernelMap(); // Can throw } catch(...) {} - - // disassemble_kernels(); - } - ~CodeobjDecoderComponent() - { - if(m_fd) ::close(m_fd); } + ~CodeobjDecoderComponent() {} std::optional va2fo(uint64_t vaddr) { @@ -177,35 +168,26 @@ class CodeobjDecoderComponent return {}; }; - std::shared_ptr disassemble_instruction(uint64_t faddr, uint64_t vaddr) + std::unique_ptr disassemble_instruction(uint64_t faddr, uint64_t vaddr) { if(!disassembly) throw std::exception(); - const char* cpp_line = nullptr; - - try - { - const DSourceLine& it = m_line_number_map.find_obj(vaddr); - cpp_line = it.str.data(); - } catch(...) - {} - auto pair = disassembly->ReadInstruction(faddr); - auto inst = std::make_shared(std::move(pair.first), pair.second); + auto inst = std::make_unique(std::move(pair.first), pair.second); inst->faddr = faddr; inst->vaddr = vaddr; - if(cpp_line) inst->comment = cpp_line; + auto it = m_line_number_map.find({vaddr, 0, 0}); + if(it != m_line_number_map.end()) inst->comment = it->second; + return inst; } - int m_fd; - - cached_ordered_vector m_line_number_map; - std::map m_symbol_map{}; - std::string m_uri; + std::map m_symbol_map{}; std::vector> instructions{}; std::unique_ptr disassembly{}; + + std::map m_line_number_map{}; }; class LoadedCodeobjDecoder @@ -215,7 +197,7 @@ class LoadedCodeobjDecoder : load_addr(_load_addr) , load_end(_load_addr + _memsize) { - if(!filepath) throw "Empty filepath."; + if(!filepath) throw std::runtime_error("Empty filepath."); std::string_view fpath(filepath); @@ -223,7 +205,7 @@ class LoadedCodeobjDecoder { std::ifstream file(filepath, std::ios::in | std::ios::binary); - if(!file.is_open()) throw "Invalid filename " + std::string(filepath); + if(!file.is_open()) throw std::runtime_error("Invalid file " + std::string(filepath)); std::vector buffer; file.seekg(0, file.end); @@ -247,33 +229,20 @@ class LoadedCodeobjDecoder decoder = std::make_unique(reinterpret_cast(data), size); } - std::shared_ptr add_to_map(uint64_t ld_addr) + std::unique_ptr get(uint64_t ld_addr) { - if(!decoder || ld_addr < load_addr) throw std::out_of_range("Addr not in decoder"); + if(!decoder || ld_addr < load_addr) return nullptr; uint64_t voffset = ld_addr - load_addr; auto faddr = decoder->va2fo(voffset); - if(!faddr) throw std::out_of_range("Could not find file offset"); + if(!faddr) return nullptr; - auto shared = decoder->disassemble_instruction(*faddr, voffset); - shared->ld_addr = ld_addr; - decoded_map[ld_addr] = shared; - return shared; + auto unique = decoder->disassemble_instruction(*faddr, voffset); + if(unique == nullptr || unique->size == 0) return nullptr; + unique->ld_addr = ld_addr; + return unique; } - std::shared_ptr get(uint64_t addr) - { - if(decoded_map.find(addr) != decoded_map.end()) return decoded_map[addr]; - try - { - return add_to_map(addr); - } catch(std::exception& e) - { - std::cerr << e.what() << " at addr " << std::hex << addr << std::dec << std::endl; - } - throw std::out_of_range("Invalid address"); - return nullptr; - } uint64_t begin() const { return load_addr; }; uint64_t end() const { return load_end; } uint64_t size() const { return load_end - load_addr; } @@ -297,10 +266,9 @@ class LoadedCodeobjDecoder const uint64_t load_addr; private: - uint64_t load_end = 0; + uint64_t load_end{0}; - std::unordered_map> decoded_map; - std::unique_ptr decoder{nullptr}; + std::unique_ptr decoder{nullptr}; }; /** @@ -312,42 +280,53 @@ class CodeobjMap CodeobjMap() = default; virtual ~CodeobjMap() = default; - virtual void addDecoder(const char* filepath, - codeobj_marker_id_t id, - uint64_t load_addr, - uint64_t memsize) + virtual void addDecoder(const char* filepath, + marker_id_t id, + uint64_t load_addr, + uint64_t memsize) { decoders[id] = std::make_shared(filepath, load_addr, memsize); } - virtual void addDecoder(const void* data, - size_t memory_size, - codeobj_marker_id_t id, - uint64_t load_addr, - uint64_t memsize) + virtual void addDecoder(const void* data, + size_t memory_size, + marker_id_t id, + uint64_t load_addr, + uint64_t memsize) { decoders[id] = std::make_shared(data, memory_size, load_addr, memsize); } - virtual bool removeDecoderbyId(codeobj_marker_id_t id) { return decoders.erase(id) != 0; } + virtual bool removeDecoderbyId(marker_id_t id) { return decoders.erase(id) != 0; } - std::shared_ptr get(codeobj_marker_id_t id, uint64_t offset) + std::unique_ptr get(marker_id_t id, uint64_t offset) { - auto& decoder = decoders.at(id); - return decoder->get(decoder->begin() + offset); + try + { + auto& decoder = decoders.at(id); + auto inst = decoder->get(decoder->begin() + offset); + if(inst != nullptr) inst->codeobj_id = id; + return inst; + } catch(std::out_of_range&) + {} + return nullptr; } - const char* getSymbolName(codeobj_marker_id_t id, uint64_t offset) + const char* getSymbolName(marker_id_t id, uint64_t offset) { - auto& decoder = decoders.at(id); - uint64_t vaddr = decoder->begin() + offset; - if(decoder->inrange(vaddr)) return decoder->getSymbolName(vaddr); + try + { + auto& decoder = decoders.at(id); + uint64_t vaddr = decoder->begin() + offset; + if(decoder->inrange(vaddr)) return decoder->getSymbolName(vaddr); + } catch(std::out_of_range&) + {} return nullptr; } protected: - std::unordered_map> decoders{}; + std::unordered_map> decoders{}; }; /** @@ -361,39 +340,39 @@ class CodeobjAddressTranslate : public CodeobjMap CodeobjAddressTranslate() = default; ~CodeobjAddressTranslate() override = default; - virtual void addDecoder(const char* filepath, - codeobj_marker_id_t id, - uint64_t load_addr, - uint64_t memsize) override + virtual void addDecoder(const char* filepath, + marker_id_t id, + uint64_t load_addr, + uint64_t memsize) override { this->Super::addDecoder(filepath, id, load_addr, memsize); auto ptr = decoders.at(id); - table.insert({ptr->begin(), ptr->size(), id, 0}); + table.insert({ptr->begin(), ptr->size(), id}); } - virtual void addDecoder(const void* data, - size_t memory_size, - codeobj_marker_id_t id, - uint64_t load_addr, - uint64_t memsize) override + virtual void addDecoder(const void* data, + size_t memory_size, + marker_id_t id, + uint64_t load_addr, + uint64_t memsize) override { this->Super::addDecoder(data, memory_size, id, load_addr, memsize); auto ptr = decoders.at(id); - table.insert({ptr->begin(), ptr->size(), id, 0}); + table.insert({ptr->begin(), ptr->size(), id}); } - virtual bool removeDecoder(codeobj_marker_id_t id, uint64_t load_addr) + virtual bool removeDecoder(marker_id_t id, uint64_t load_addr) { return table.remove(load_addr) && this->Super::removeDecoderbyId(id); } - std::shared_ptr get(uint64_t vaddr) + std::unique_ptr get(uint64_t vaddr) { - auto& addr_range = table.find_codeobj_in_range(vaddr); - return this->Super::get(addr_range.id, vaddr - addr_range.vbegin); + auto addr_range = table.find_codeobj_in_range(vaddr); + return this->Super::get(addr_range.id, vaddr - addr_range.addr); } - std::shared_ptr get(codeobj_marker_id_t id, uint64_t offset) + std::unique_ptr get(marker_id_t id, uint64_t offset) { if(id == 0) return get(offset); @@ -425,7 +404,7 @@ class CodeobjAddressTranslate : public CodeobjMap return symbols; } - std::map getSymbolMap(codeobj_marker_id_t id) const + std::map getSymbolMap(marker_id_t id) const { if(decoders.find(id) == decoders.end()) return {}; @@ -439,7 +418,7 @@ class CodeobjAddressTranslate : public CodeobjMap } private: - CodeobjTableTranslator table; + segment::CodeobjTableTranslator table{}; }; } // namespace disassembly diff --git a/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/disassembly.hpp b/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/disassembly.hpp index 48b8c6d3..019cc7a7 100644 --- a/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/disassembly.hpp +++ b/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/disassembly.hpp @@ -154,10 +154,10 @@ class CodeObjectBinary if(!(size = std::stoul(size_it->second, nullptr, 0))) return; } - if(protocol == "memory") throw protocol + " protocol not supported!"; + if(protocol == "memory") throw std::runtime_error(protocol + " protocol not supported!"); std::ifstream file(decoded_path, std::ios::in | std::ios::binary); - if(!file || !file.is_open()) throw "could not open " + decoded_path; + if(!file || !file.is_open()) throw std::runtime_error("could not open " + decoded_path); if(!size) { @@ -165,7 +165,7 @@ class CodeObjectBinary size_t bytes = file.gcount(); file.clear(); - if(bytes < offset) throw "invalid uri " + decoded_path + " (file size < offset)"; + if(bytes < offset) throw std::runtime_error("invalid uri " + decoded_path); size = bytes - offset; } diff --git a/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/segment.hpp b/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/segment.hpp index f80f4ead..bcc188f6 100644 --- a/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/segment.hpp +++ b/source/include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj/segment.hpp @@ -24,139 +24,69 @@ #include #include #include +#include #include #include #include -using codeobj_marker_id_t = size_t; - -template -class ordered_vector : public std::vector +namespace rocprofiler { - using Super = std::vector; - -public: - void insert(const Type& elem) - { - size_t loc = lower_bound(elem.begin()); - if(this->size() && get(loc).begin() < elem.begin()) loc++; - this->Super::insert(this->begin() + loc, elem); - } - bool remove(const Type& elem) - { - if(!this->size()) return false; - size_t loc = lower_bound(elem.begin()); - if(get(loc) != elem) return false; +namespace codeobj +{ +namespace segment +{ +using marker_id_t = size_t; - this->Super::erase(this->begin() + loc); - return true; - } - bool remove(uint64_t elem_begin) - { - if(!this->size()) return false; - size_t loc = lower_bound(elem_begin); - if(get(loc).begin() != elem_begin) return false; +struct address_range_t +{ + uint64_t addr{0}; + uint64_t size{0}; + marker_id_t id{0}; - this->Super::erase(this->begin() + loc); - return true; - } - size_t lower_bound(size_t addr) const + bool operator==(const address_range_t& other) const { - if(!this->size()) return 0; - return binary_search(addr, 0, this->size() - 1); + return (addr >= other.addr && addr < other.addr + other.size) || + (other.addr >= addr && other.addr < addr + size); } - - size_t binary_search(size_t addr, size_t s, size_t e) const + bool operator<(const address_range_t& other) const { - if(s >= e) - return s; - else if(s + 1 == e) - return (get(e).begin() <= addr) ? e : s; - - size_t mid = (s + e) / 2; - if(get(mid).begin() <= addr) - return binary_search(addr, mid, e); - else - return binary_search(addr, s, mid); + if(*this == other) return false; + return addr < other.addr; } - const Type& get(size_t i) const { return this->operator[](i); } + bool inrange(uint64_t _addr) const { return addr <= _addr && addr + size > _addr; }; }; /** * @brief Finds a candidate codeobj for the given vaddr */ -template -class cached_ordered_vector : public ordered_vector +class CodeobjTableTranslator : public std::set { - using Super = ordered_vector; + using Super = std::set; public: - cached_ordered_vector() { reset(); } - - const Type& find_obj(uint64_t addr) - { - if(testCache(addr)) return get(cached_segment); - - size_t lb = this->lower_bound(addr); - if(lb >= this->size() || !get(lb).inrange(addr)) - throw std::string("segment addr out of range"); - - cached_segment = lb; - return get(cached_segment); - } - - uint64_t find_addr(uint64_t addr) { return find_obj(addr).begin(); } - - bool testCache(uint64_t addr) const + address_range_t find_codeobj_in_range(uint64_t addr) { - return this->cached_segment < this->size() && get(cached_segment).inrange(addr); + if(!cached_segment.inrange(addr)) + { + auto it = this->find(address_range_t{addr, 0, 0}); + if(it == this->end()) throw std::exception(); + cached_segment = *it; + } + return cached_segment; } - const Type& get(size_t index) const { return this->data()[index]; } - - void insert(const Type& elem) { this->Super::insert(elem); } - void insert_list(std::vector arange) - { - for(auto& elem : arange) - push_back(elem); - std::sort(this->begin(), this->end(), [](const Type& a, const Type& b) { - return a.begin() < b.begin(); - }); - }; - - void reset() { cached_segment = ~0; } - void clear() - { - reset(); - this->Super::clear(); - } - bool remove(uint64_t addr) + void clear_cache() { cached_segment = {}; } + bool remove(const address_range_t& range) { - reset(); - return this->Super::remove(addr); + clear_cache(); + return this->erase(range) != 0; } + bool remove(uint64_t addr) { return remove(address_range_t{addr, 0, 0}); } private: - size_t cached_segment = ~0; + address_range_t cached_segment{}; }; -struct address_range_t -{ - uint64_t vbegin; - uint64_t size; - codeobj_marker_id_t id; - uint64_t offset; - - bool operator<(const address_range_t& other) const { return vbegin < other.vbegin; } - bool inrange(uint64_t addr) const { return addr >= vbegin && addr < vbegin + size; }; - uint64_t begin() const { return vbegin; } -}; - -/** - * @brief Finds a candidate codeobj for the given vaddr - */ -class CodeobjTableTranslator : public cached_ordered_vector -{ -public: - const address_range_t& find_codeobj_in_range(uint64_t addr) { return this->find_obj(addr); } -}; +} // namespace segment +} // namespace codeobj +} // namespace rocprofiler diff --git a/source/include/rocprofiler-sdk/amd_detail/thread_trace.h b/source/include/rocprofiler-sdk/amd_detail/thread_trace.h index 5e769bb0..e089bc5e 100644 --- a/source/include/rocprofiler-sdk/amd_detail/thread_trace.h +++ b/source/include/rocprofiler-sdk/amd_detail/thread_trace.h @@ -22,175 +22,6 @@ #pragma once -#include -#include -#include -#include - -ROCPROFILER_EXTERN_C_INIT - -/** - * @defgroup THREAD_TRACE Thread Trace Service - * @brief Provides API calls to enable and handle thread trace data - * - * @{ - */ - -typedef enum -{ - ROCPROFILER_ATT_PARAMETER_TARGET_CU = 0, ///< Select the Target CU or WGP - ROCPROFILER_ATT_PARAMETER_SHADER_ENGINE_MASK, ///< Bitmask of shader engines. - ROCPROFILER_ATT_PARAMETER_BUFFER_SIZE, ///< Size of combined GPU buffer for ATT - ROCPROFILER_ATT_PARAMETER_SIMD_SELECT, ///< Bitmask (GFX9) or ID (Navi) of SIMDs - ROCPROFILER_ATT_PARAMETER_CODE_OBJECT_TRACE_ENABLE, ///< Enables Codeobj Markers IDs into ATT - ROCPROFILER_ATT_PARAMETER_LAST -} rocprofiler_att_parameter_type_t; - -typedef struct -{ - rocprofiler_att_parameter_type_t type; - uint64_t value; -} rocprofiler_att_parameter_t; - -typedef enum -{ - ROCPROFILER_ATT_CONTROL_NONE = 0, - ROCPROFILER_ATT_CONTROL_START = 1, - ROCPROFILER_ATT_CONTROL_STOP = 2, - ROCPROFILER_ATT_CONTROL_START_AND_STOP = 3 -} rocprofiler_att_control_flags_t; - -/** - * @brief Callback to be triggered every kernel dispatch, indicating to start and/or stop ATT - */ -typedef rocprofiler_att_control_flags_t (*rocprofiler_att_dispatch_callback_t)( - rocprofiler_queue_id_t queue_id, - const rocprofiler_agent_t* agent, - rocprofiler_correlation_id_t correlation_id, - rocprofiler_kernel_id_t kernel_id, - void* userdata); - -/** - * @brief Callback to be triggered every time some ATT data is generated by the device - * @param [in] shader_engine_id ID of shader engine, as enabled by SE_MASK - * @param [in] data Pointer to the buffer containing the ATT data - * @param [in] data_size Number of bytes in "data" - * @param [in] userdata Passed back to user - */ -typedef void (*rocprofiler_att_shader_data_callback_t)(int64_t shader_engine_id, - void* data, - size_t data_size, - void* userdata); - -/** - * @brief Enables the advanced thread trace service. - * @param [in] context_id context_id. - * @param [in] parameters List of ATT-specific parameters. - * @param [in] num_parameters Number of parameters. Zero is allowed. - * @param [in] dispatch_callback Control fn which decides when ATT starts/stop collecting. - * @param [in] shader_callback Callback fn where the collected data will be sent to. - * @param [in] callback_userdata Passed back to user. - */ -rocprofiler_status_t -rocprofiler_configure_thread_trace_service(rocprofiler_context_id_t context_id, - rocprofiler_att_parameter_t* parameters, - size_t num_parameters, - rocprofiler_att_dispatch_callback_t dispatch_callback, - rocprofiler_att_shader_data_callback_t shader_callback, - void* callback_userdata) ROCPROFILER_API; - -/** - * @brief Callback for rocprofiler to parsed ATT data. - * The caller must copy a desired instruction on isa_instruction and source_reference, - * while obeying the max length passed by the caller. - * If the caller's length is insufficient, then this function writes the minimum sizes to isa_size - * and source_size and returns ROCPROFILER_STATUS_ERROR_OUT_OF_RESOURCES. - * If call returns ROCPROFILER_STATUS_SUCCESS, isa_size and source_size are written with bytes used. - * @param[out] isa_instruction Where to copy the ISA line to. - * @param[out] isa_memory_size (Auto) The number of bytes to next instruction. 0 for custom ISA. - * @param[inout] isa_size Size of returned ISA string. - * @param[in] marker_id The generated ATT marker for given codeobject ID. - * @param[in] offset The offset from base vaddr for given codeobj ID. - * If marker_id == 0, this parameter is raw virtual address with no codeobj ID information. - * @param[in] userdata Arbitrary data pointer to be sent back to the user via callback. - * @retval ROCPROFILER_STATUS_SUCCESS on success. - * @retval ROCPROFILER_STATUS_ERROR on generic error. - * @retval ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT for invalid offset or invalid marker_id. - * @retval ROCPROFILER_STATUS_ERROR_OUT_OF_RESOURCES for insufficient isa_size or source_size. - */ -typedef rocprofiler_status_t (*rocprofiler_att_parser_isa_callback_t)(char* isa_instruction, - uint64_t* isa_memory_size, - uint64_t* isa_size, - uint64_t marker_id, - uint64_t offset, - void* userdata); - -/** - * @brief Callback for the ATT parser to retrieve Shader Engine data. - * Returns the amount of data filled. If no more data is available, then callback return 0 - * If the space available in the buffer is less than required for parsing the full data, - * the full data is transfered over multiple calls. - * When all data has been transfered from current shader_engine_id, the caller has the option to - * 1) Return -1 on shader_engine ID and parsing terminates - * 2) Move to the next shader engine. - * @param[out] shader_engine_id The ID of given shader engine. - * @param[out] buffer The buffer to fill up with SE data. - * @param[out] buffer_size The space available in the buffer. - * @param[in] userdata Arbitrary data pointer to be sent back to the user via callback. - * @returns Number of bytes remaining in shader engine. - * @retval 0 if no more SE data is available. Parsing will stop. - * @retval buffer_size if the buffer does not hold enough data for the current shader engine. - * @retval 0 > ret > buffer_size for partially filled buffer, and caller moves over to next SE. - */ -typedef uint64_t (*rocprofiler_att_parser_se_data_callback_t)(int* shader_engine_id, - uint8_t** buffer, - uint64_t* buffer_size, - void* userdata); - -typedef enum -{ - ROCPROFILER_ATT_PARSER_DATA_TYPE_ISA = 0, - ROCPROFILER_ATT_PARSER_DATA_TYPE_OCCUPANCY, -} rocprofiler_att_parser_data_type_t; - -typedef struct -{ - uint64_t marker_id; - uint64_t offset; - uint64_t hitcount; - uint64_t latency; -} rocprofiler_att_data_type_isa_t; - -typedef struct -{ - uint64_t marker_id; - uint64_t offset; - uint64_t timestamp : 63; - uint64_t enabled : 1; -} rocprofiler_att_data_type_occupancy_t; - -/** - * @brief Callback for rocprofiler to return traces back to rocprofiler. - * @param[in] att_data A datapoint retrieved from thread_trace - * @param[in] userdata Arbitrary data pointer to be sent back to the user via callback. - */ -typedef void (*rocprofiler_att_parser_trace_callback_t)(rocprofiler_att_parser_data_type_t type, - void* att_data, - void* userdata); - -/** - * @brief Iterate over all event coordinates for a given agent_t and event_t. - * @param[in] se_data_callback Callback to return shader engine data from. - * @param[in] trace_callback Callback where the trace data is returned to. - * @param[in] isa_callback Callback to return ISA lines. - * @param[in] userdata Userdata passed back to caller via callback. - */ -rocprofiler_status_t -rocprofiler_att_parse_data(rocprofiler_att_parser_se_data_callback_t se_data_callback, - rocprofiler_att_parser_trace_callback_t trace_callback, - rocprofiler_att_parser_isa_callback_t isa_callback, - void* userdata); - -/** @} */ - -ROCPROFILER_EXTERN_C_FINI +#include +#include +#include diff --git a/source/include/rocprofiler-sdk/amd_detail/thread_trace_agent.h b/source/include/rocprofiler-sdk/amd_detail/thread_trace_agent.h new file mode 100644 index 00000000..f1db19df --- /dev/null +++ b/source/include/rocprofiler-sdk/amd_detail/thread_trace_agent.h @@ -0,0 +1,64 @@ +// MIT License +// +// Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include +#include +#include +#include +#include + +ROCPROFILER_EXTERN_C_INIT + +/** + * @defgroup THREAD_TRACE Thread Trace Service + * @brief Provides API calls to enable and handle thread trace data + * + * @{ + */ + +/** + * @brief Configure Thread Trace Service for agent. There may only be one agent profile + * configured per context and can be only one active context that is profiling a single agent + * at a time. Multiple agent contexts can be started at the same time if they are profiling + * different agents. + * + * @param [in] context_id context id + * @param [in] parameters List of ATT-specific parameters. + * @param [in] num_parameters Number of parameters. Zero is allowed. + * @param [in] agent_id agent to configure profiling on. + * @param [in] shader_callback Callback fn where the collected data will be sent to. + * @param [in] callback_userdata Passed back to user. + */ +rocprofiler_status_t +rocprofiler_configure_agent_thread_trace_service( + rocprofiler_context_id_t context_id, + rocprofiler_att_parameter_t* parameters, + size_t num_parameters, + rocprofiler_agent_id_t agent_id, + rocprofiler_att_shader_data_callback_t shader_callback, + void* callback_userdata) ROCPROFILER_API; + +/** @} */ + +ROCPROFILER_EXTERN_C_FINI diff --git a/source/include/rocprofiler-sdk/amd_detail/thread_trace_core.h b/source/include/rocprofiler-sdk/amd_detail/thread_trace_core.h new file mode 100644 index 00000000..78d1ca11 --- /dev/null +++ b/source/include/rocprofiler-sdk/amd_detail/thread_trace_core.h @@ -0,0 +1,172 @@ +// MIT License +// +// Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include +#include +#include +#include + +ROCPROFILER_EXTERN_C_INIT + +/** + * @defgroup THREAD_TRACE Thread Trace Service + * @brief Provides API calls to enable and handle thread trace data + * + * @{ + */ + +typedef enum +{ + ROCPROFILER_ATT_PARAMETER_TARGET_CU = 0, ///< Select the Target CU or WGP + ROCPROFILER_ATT_PARAMETER_SHADER_ENGINE_MASK, ///< Bitmask of shader engines. + ROCPROFILER_ATT_PARAMETER_BUFFER_SIZE, ///< Size of combined GPU buffer for ATT + ROCPROFILER_ATT_PARAMETER_SIMD_SELECT, ///< Bitmask (GFX9) or ID (Navi) of SIMDs + ROCPROFILER_ATT_PARAMETER_PERFCOUNTERS_CTRL, + ROCPROFILER_ATT_PARAMETER_PERFCOUNTER, + ROCPROFILER_ATT_PARAMETER_LAST +} rocprofiler_att_parameter_type_t; + +typedef struct +{ + rocprofiler_att_parameter_type_t type; + union + { + uint64_t value; + struct + { + rocprofiler_counter_id_t counter_id; + uint64_t simd_mask : 4; + }; + }; +} rocprofiler_att_parameter_t; + +/** + * @brief Callback to be triggered every time some ATT data is generated by the device + * @param [in] shader_engine_id ID of shader engine, as enabled by SE_MASK + * @param [in] data Pointer to the buffer containing the ATT data + * @param [in] data_size Number of bytes in "data" + * @param [in] userdata_dispatch Passed back to user from rocprofiler_att_dispatch_callback_t() + * @param [in] userdata_config Passed back to user from configure_[...]_service() + */ +typedef void (*rocprofiler_att_shader_data_callback_t)(int64_t shader_engine_id, + void* data, + size_t data_size, + rocprofiler_user_data_t userdata); + +/** + * @brief Callback for rocprofiler to parsed ATT data. + * The caller must copy a desired instruction on isa_instruction and source_reference, + * while obeying the max length passed by the caller. + * If the caller's length is insufficient, then this function writes the minimum sizes to isa_size + * and source_size and returns ROCPROFILER_STATUS_ERROR_OUT_OF_RESOURCES. + * If call returns ROCPROFILER_STATUS_SUCCESS, isa_size and source_size are written with bytes used. + * @param[out] isa_instruction Where to copy the ISA line to. + * @param[out] isa_memory_size (Auto) The number of bytes to next instruction. 0 for custom ISA. + * @param[inout] isa_size Size of returned ISA string. + * @param[in] marker_id The generated ATT marker for given codeobject ID. + * @param[in] offset The offset from base vaddr for given codeobj ID. + * If marker_id == 0, this parameter is raw virtual address with no codeobj ID information. + * @param[in] userdata Arbitrary data pointer to be sent back to the user via callback. + * @retval ROCPROFILER_STATUS_SUCCESS on success. + * @retval ROCPROFILER_STATUS_ERROR on generic error. + * @retval ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT for invalid offset or invalid marker_id. + * @retval ROCPROFILER_STATUS_ERROR_OUT_OF_RESOURCES for insufficient isa_size or source_size. + */ +typedef rocprofiler_status_t (*rocprofiler_att_parser_isa_callback_t)(char* isa_instruction, + uint64_t* isa_memory_size, + uint64_t* isa_size, + uint64_t marker_id, + uint64_t offset, + void* userdata); + +/** + * @brief Callback for the ATT parser to retrieve Shader Engine data. + * Returns the amount of data filled. If no more data is available, then callback return 0 + * If the space available in the buffer is less than required for parsing the full data, + * the full data is transfered over multiple calls. + * When all data has been transfered from current shader_engine_id, the caller has the option to + * 1) Return -1 on shader_engine ID and parsing terminates + * 2) Move to the next shader engine. + * @param[out] shader_engine_id The ID of given shader engine. + * @param[out] buffer The buffer to fill up with SE data. + * @param[out] buffer_size The space available in the buffer. + * @param[in] userdata Arbitrary data pointer to be sent back to the user via callback. + * @returns Number of bytes remaining in shader engine. + * @retval 0 if no more SE data is available. Parsing will stop. + * @retval ret Where 0 > ret > buffer_size for partially filled buffer, and caller moves over to + * next SE. + * @retval buffer_size if the buffer does not hold enough data for the current shader engine. + */ +typedef uint64_t (*rocprofiler_att_parser_se_data_callback_t)(int* shader_engine_id, + uint8_t** buffer, + uint64_t* buffer_size, + void* userdata); + +typedef enum +{ + ROCPROFILER_ATT_PARSER_DATA_TYPE_ISA = 0, + ROCPROFILER_ATT_PARSER_DATA_TYPE_OCCUPANCY, +} rocprofiler_att_parser_data_type_t; + +typedef struct +{ + uint64_t marker_id; + uint64_t offset; + uint64_t hitcount; + uint64_t latency; +} rocprofiler_att_data_type_isa_t; + +typedef struct +{ + uint64_t marker_id; + uint64_t offset; + uint64_t timestamp : 63; + uint64_t enabled : 1; +} rocprofiler_att_data_type_occupancy_t; + +/** + * @brief Callback for rocprofiler to return traces back to rocprofiler. + * @param[in] att_data A datapoint retrieved from thread_trace + * @param[in] userdata Arbitrary data pointer to be sent back to the user via callback. + */ +typedef void (*rocprofiler_att_parser_trace_callback_t)(rocprofiler_att_parser_data_type_t type, + void* att_data, + void* userdata); + +/** + * @brief Iterate over all event coordinates for a given agent_t and event_t. + * @param[in] se_data_callback Callback to return shader engine data from. + * @param[in] trace_callback Callback where the trace data is returned to. + * @param[in] isa_callback Callback to return ISA lines. + * @param[in] userdata Userdata passed back to caller via callback. + */ +rocprofiler_status_t +rocprofiler_att_parse_data(rocprofiler_att_parser_se_data_callback_t se_data_callback, + rocprofiler_att_parser_trace_callback_t trace_callback, + rocprofiler_att_parser_isa_callback_t isa_callback, + void* userdata); + +/** @} */ + +ROCPROFILER_EXTERN_C_FINI diff --git a/source/include/rocprofiler-sdk/amd_detail/thread_trace_dispatch.h b/source/include/rocprofiler-sdk/amd_detail/thread_trace_dispatch.h new file mode 100644 index 00000000..686a77d1 --- /dev/null +++ b/source/include/rocprofiler-sdk/amd_detail/thread_trace_dispatch.h @@ -0,0 +1,82 @@ +// MIT License +// +// Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include +#include +#include +#include +#include + +ROCPROFILER_EXTERN_C_INIT + +/** + * @defgroup THREAD_TRACE Thread Trace Service + * @brief Provides API calls to enable and handle thread trace data + * + * @{ + */ + +typedef enum +{ + ROCPROFILER_ATT_CONTROL_NONE = 0, + ROCPROFILER_ATT_CONTROL_START = 1, + ROCPROFILER_ATT_CONTROL_STOP = 2, + ROCPROFILER_ATT_CONTROL_START_AND_STOP = 3 +} rocprofiler_att_control_flags_t; + +/** + * @brief Callback to be triggered every kernel dispatch, indicating to start and/or stop ATT + */ +typedef rocprofiler_att_control_flags_t (*rocprofiler_att_dispatch_callback_t)( + rocprofiler_queue_id_t queue_id, + const rocprofiler_agent_t* agent, + rocprofiler_correlation_id_t correlation_id, + rocprofiler_kernel_id_t kernel_id, + rocprofiler_dispatch_id_t dispatch_id, + rocprofiler_user_data_t* userdata_shader, + void* userdata_config); + +/** + * @brief Enables the advanced thread trace service for dispatch-based tracing. + * The tool has an option to enable/disable thread trace on every dispatch callback. + * This service enables kernel serialization. + * @param [in] context_id context_id. + * @param [in] parameters List of ATT-specific parameters. + * @param [in] num_parameters Number of parameters. Zero is allowed. + * @param [in] dispatch_callback Control fn which decides when ATT starts/stop collecting. + * @param [in] shader_callback Callback fn where the collected data will be sent to. + * @param [in] callback_userdata Passed back to user. + */ +rocprofiler_status_t +rocprofiler_configure_dispatch_thread_trace_service( + rocprofiler_context_id_t context_id, + rocprofiler_att_parameter_t* parameters, + size_t num_parameters, + rocprofiler_att_dispatch_callback_t dispatch_callback, + rocprofiler_att_shader_data_callback_t shader_callback, + void* callback_userdata) ROCPROFILER_API; + +/** @} */ + +ROCPROFILER_EXTERN_C_FINI diff --git a/source/include/rocprofiler-sdk/counters.h b/source/include/rocprofiler-sdk/counters.h index 76afc2cc..83932c12 100644 --- a/source/include/rocprofiler-sdk/counters.h +++ b/source/include/rocprofiler-sdk/counters.h @@ -87,7 +87,6 @@ typedef rocprofiler_status_t (*rocprofiler_available_dimensions_cb_t)( * @param [in] user_data data to pass into the callback * @return ::rocprofiler_status_t * @retval ROCPROFILER_STATUS_SUCCESS if dimension exists - * @retval ROCPROFILER_STATUS_ERROR_HSA_NOT_LOADED if HSA is not loaded when this is called * @retval ROCPROFILER_STATUS_ERROR_COUNTER_NOT_FOUND if counter is not found * @retval ROCPROFILER_STATUS_ERROR_DIM_NOT_FOUND if counter does not have this dimension */ @@ -115,11 +114,6 @@ rocprofiler_query_counter_info(rocprofiler_counter_id_t counter_id, /** * @brief This call returns the number of instances specific counter contains. - * WARNING: There is a restriction on this call in the alpha/beta release - * of rocprof. This call will not return correct instance information in - * tool_init and must be called as part of the dispatch callback for accurate - * instance counting information. The reason for this restriction is that HSA - * is not yet loaded on tool_init. * * @param [in] agent_id rocprofiler agent identifier * @param [in] counter_id counter id (obtained from iterate_agent_supported_counters) diff --git a/source/include/rocprofiler-sdk/defines.h b/source/include/rocprofiler-sdk/defines.h index 01117ab1..00a3ffbc 100644 --- a/source/include/rocprofiler-sdk/defines.h +++ b/source/include/rocprofiler-sdk/defines.h @@ -22,6 +22,8 @@ #pragma once +#include + /** * @defgroup SYMBOL_VERSIONING_GROUP Symbol Versions * @@ -129,3 +131,9 @@ # define ROCPROFILER_EXTERN_C_FINI # define ROCPROFILER_CXX_CODE(...) #endif + +#if __cplusplus +# define ROCPROFILER_UINT64_C(value) uint64_t(value) +#else +# define ROCPROFILER_UINT64_C(value) UINT64_C(value) +#endif diff --git a/source/include/rocprofiler-sdk/fwd.h b/source/include/rocprofiler-sdk/fwd.h index 781d4ddd..d1e67203 100644 --- a/source/include/rocprofiler-sdk/fwd.h +++ b/source/include/rocprofiler-sdk/fwd.h @@ -105,6 +105,7 @@ typedef enum // NOLINT(performance-enum-size) ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE, ///< The service is not available. ///< Please refer to API functions that return this ///< status code for more information. + ROCPROFILER_STATUS_ERROR_EXCEEDS_HW_LIMIT, ///< Exceeds hardware limits for collection ROCPROFILER_STATUS_LAST, } rocprofiler_status_t; @@ -520,9 +521,9 @@ typedef struct } rocprofiler_correlation_id_t; /** - * @brief The NULL correlation ID value. + * @brief The NULL value of an internal correlation ID. */ -#define ROCPROFILER_CORRELATION_ID_VALUE_NONE 0ULL +#define ROCPROFILER_CORRELATION_ID_INTERNAL_NONE ROCPROFILER_UINT64_C(0) /** * @struct rocprofiler_buffer_id_t diff --git a/source/include/rocprofiler-sdk/hip/api_args.h b/source/include/rocprofiler-sdk/hip/api_args.h index 6210c61b..31a70b7f 100644 --- a/source/include/rocprofiler-sdk/hip/api_args.h +++ b/source/include/rocprofiler-sdk/hip/api_args.h @@ -33,6 +33,7 @@ #include #include +#include ROCPROFILER_EXTERN_C_INIT @@ -2773,15 +2774,222 @@ typedef union rocprofiler_hip_api_args_t { hipStream_t stream; } hipGetStreamDeviceId; - // struct - // { - // hipGraphNode_t* phGraphNode; - // hipGraph_t hGraph; - // const hipGraphNode_t* dependencies; - // size_t numDependencies; - // const HIP_MEMSET_NODE_PARAMS* memsetParams; - // hipCtx_t ctx; - // } hipDrvGraphAddMemsetNode; + struct + { + hipGraphNode_t* phGraphNode; + hipGraph_t hGraph; + const hipGraphNode_t* dependencies; + size_t numDependencies; + const HIP_MEMSET_NODE_PARAMS* memsetParams; + hipCtx_t ctx; + } hipDrvGraphAddMemsetNode; + struct + { + hipGraphNode_t* pGraphNode; + hipGraph_t graph; + const hipGraphNode_t* pDependencies; + size_t numDependencies; + const hipExternalSemaphoreWaitNodeParams* nodeParams; + } hipGraphAddExternalSemaphoresWaitNode; + struct + { + hipGraphNode_t* pGraphNode; + hipGraph_t graph; + const hipGraphNode_t* pDependencies; + size_t numDependencies; + const hipExternalSemaphoreSignalNodeParams* nodeParams; + } hipGraphAddExternalSemaphoresSignalNode; + struct + { + hipGraphNode_t hNode; + const hipExternalSemaphoreSignalNodeParams* nodeParams; + } hipGraphExternalSemaphoresSignalNodeSetParams; + struct + { + hipGraphNode_t hNode; + const hipExternalSemaphoreWaitNodeParams* nodeParams; + } hipGraphExternalSemaphoresWaitNodeSetParams; + struct + { + hipGraphNode_t hNode; + hipExternalSemaphoreSignalNodeParams* params_out; + } hipGraphExternalSemaphoresSignalNodeGetParams; + struct + { + hipGraphNode_t hNode; + hipExternalSemaphoreWaitNodeParams* params_out; + } hipGraphExternalSemaphoresWaitNodeGetParams; + struct + { + hipGraphExec_t hGraphExec; + hipGraphNode_t hNode; + const hipExternalSemaphoreSignalNodeParams* nodeParams; + } hipGraphExecExternalSemaphoresSignalNodeSetParams; + struct + { + hipGraphExec_t hGraphExec; + hipGraphNode_t hNode; + const hipExternalSemaphoreWaitNodeParams* nodeParams; + } hipGraphExecExternalSemaphoresWaitNodeSetParams; + struct + { + hipGraphNode_t* pGraphNode; + hipGraph_t graph; + const hipGraphNode_t* pDependencies; + size_t numDependencies; + hipGraphNodeParams* nodeParams; + } hipGraphAddNode; + struct + { + hipGraphExec_t* pGraphExec; + hipGraph_t graph; + hipGraphInstantiateParams* instantiateParams; + } hipGraphInstantiateWithParams; + struct + { + // Empty struct has a size of 0 in C but size of 1 in C++. + // Add the rocprofiler_hip_api_no_args struct to fix this + rocprofiler_hip_api_no_args no_args; + } hipExtGetLastError; + struct + { + float* pBorderColor; + const textureReference* texRef; + } hipTexRefGetBorderColor; + struct + { + hipArray_t* pArray; + const textureReference* texRef; + } hipTexRefGetArray; +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 1 + struct + { + const char* symbol; + void** pfn; + int hipVersion; + uint64_t flags; + hipDriverProcAddressQueryResult* symbolStatus; + } hipGetProcAddress; +#endif +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 2 + struct + { + hipStream_t stream; + hipGraph_t graph; + const hipGraphNode_t* dependencies; + const hipGraphEdgeData* dependencyData; + size_t numDependencies; + hipStreamCaptureMode mode; + } hipStreamBeginCaptureToGraph; +#endif +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 3 + struct + { + hipFunction_t* functionPtr; + const void* symbolPtr; + } hipGetFuncBySymbol; +#endif +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 4 + struct + { + hipGraphNode_t* phGraphNode; + hipGraph_t hGraph; + const hipGraphNode_t* dependencies; + size_t numDependencies; + hipDeviceptr_t dptr; + } hipDrvGraphAddMemFreeNode; + struct + { + hipGraphExec_t hGraphExec; + hipGraphNode_t hNode; + const HIP_MEMCPY3D* copyParams; + hipCtx_t ctx; + } hipDrvGraphExecMemcpyNodeSetParams; + struct + { + hipGraphExec_t hGraphExec; + hipGraphNode_t hNode; + const HIP_MEMSET_NODE_PARAMS* memsetParams; + hipCtx_t ctx; + } hipDrvGraphExecMemsetNodeSetParams; + struct + { + int* device_arr; + int len; + } hipSetValidDevices; + struct + { + hipDeviceptr_t dstDevice; + hipArray_t srcArray; + size_t srcOffset; + size_t ByteCount; + } hipMemcpyAtoD; + struct + { + hipArray_t dstArray; + size_t dstOffset; + hipDeviceptr_t srcDevice; + size_t ByteCount; + } hipMemcpyDtoA; + struct + { + hipArray_t dstArray; + size_t dstOffset; + hipArray_t srcArray; + size_t srcOffset; + size_t ByteCount; + } hipMemcpyAtoA; + struct + { + void* dstHost; + hipArray_t srcArray; + size_t srcOffset; + size_t ByteCount; + hipStream_t stream; + } hipMemcpyAtoHAsync; + struct + { + hipArray_t dstArray; + size_t dstOffset; + const void* srcHost; + size_t ByteCount; + hipStream_t stream; + } hipMemcpyHtoAAsync; + struct + { + hipArray_t dst; + size_t wOffsetDst; + size_t hOffsetDst; + hipArray_const_t src; + size_t wOffsetSrc; + size_t hOffsetSrc; + size_t width; + size_t height; + hipMemcpyKind kind; + } hipMemcpy2DArrayToArray; + struct + { + hipGraphExec_t graphExec; + unsigned long long* flags; + } hipGraphExecGetFlags; + struct + { + hipGraphNode_t node; + hipGraphNodeParams* nodeParams; + } hipGraphNodeSetParams; + struct + { + hipGraphExec_t graphExec; + hipGraphNode_t node; + hipGraphNodeParams* nodeParams; + } hipGraphExecNodeSetParams; + struct + { + hipMipmappedArray_t* mipmap; + hipExternalMemory_t extMem; + const hipExternalMemoryMipmappedArrayDesc* mipmapDesc; + } hipExternalMemoryGetMappedMipmappedArray; +#endif } rocprofiler_hip_api_args_t; ROCPROFILER_EXTERN_C_FINI diff --git a/source/include/rocprofiler-sdk/hip/compiler_api_id.h b/source/include/rocprofiler-sdk/hip/compiler_api_id.h index e50ea467..01822494 100644 --- a/source/include/rocprofiler-sdk/hip/compiler_api_id.h +++ b/source/include/rocprofiler-sdk/hip/compiler_api_id.h @@ -22,6 +22,10 @@ #pragma once +#include + +#include + /** * @brief ROCProfiler enumeration of HIP Compiler API tracing operations */ diff --git a/source/include/rocprofiler-sdk/hip/runtime_api_id.h b/source/include/rocprofiler-sdk/hip/runtime_api_id.h index 281b9f42..e45ff8f0 100644 --- a/source/include/rocprofiler-sdk/hip/runtime_api_id.h +++ b/source/include/rocprofiler-sdk/hip/runtime_api_id.h @@ -22,6 +22,10 @@ #pragma once +#include + +#include + /** * @brief ROCProfiler enumeration of HIP runtime API tracing operations */ @@ -456,6 +460,44 @@ typedef enum // NOLINT(performance-enum-size) ROCPROFILER_HIP_RUNTIME_API_ID_hipStreamGetCaptureInfo_v2_spt, ROCPROFILER_HIP_RUNTIME_API_ID_hipLaunchHostFunc_spt, ROCPROFILER_HIP_RUNTIME_API_ID_hipGetStreamDeviceId, - // ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphAddMemsetNode, + ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphAddMemsetNode, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphAddExternalSemaphoresWaitNode, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphAddExternalSemaphoresSignalNode, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExternalSemaphoresSignalNodeSetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExternalSemaphoresWaitNodeSetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExternalSemaphoresSignalNodeGetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExternalSemaphoresWaitNodeGetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExecExternalSemaphoresSignalNodeSetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExecExternalSemaphoresWaitNodeSetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphAddNode, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphInstantiateWithParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipExtGetLastError, + ROCPROFILER_HIP_RUNTIME_API_ID_hipTexRefGetBorderColor, + ROCPROFILER_HIP_RUNTIME_API_ID_hipTexRefGetArray, +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 1 + ROCPROFILER_HIP_RUNTIME_API_ID_hipGetProcAddress, +#endif +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 2 + ROCPROFILER_HIP_RUNTIME_API_ID_hipStreamBeginCaptureToGraph, +#endif +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 3 + ROCPROFILER_HIP_RUNTIME_API_ID_hipGetFuncBySymbol, +#endif +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 4 + ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphAddMemFreeNode, + ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphExecMemcpyNodeSetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphExecMemsetNodeSetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipSetValidDevices, + ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAtoD, + ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyDtoA, + ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAtoA, + ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAtoHAsync, + ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyHtoAAsync, + ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpy2DArrayToArray, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExecGetFlags, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphNodeSetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExecNodeSetParams, + ROCPROFILER_HIP_RUNTIME_API_ID_hipExternalMemoryGetMappedMipmappedArray, +#endif ROCPROFILER_HIP_RUNTIME_API_ID_LAST, } rocprofiler_hip_runtime_api_id_t; diff --git a/source/include/rocprofiler-sdk/hsa/amd_ext_api_id.h b/source/include/rocprofiler-sdk/hsa/amd_ext_api_id.h index a4a6e0ad..4301c2ed 100644 --- a/source/include/rocprofiler-sdk/hsa/amd_ext_api_id.h +++ b/source/include/rocprofiler-sdk/hsa/amd_ext_api_id.h @@ -106,6 +106,9 @@ typedef enum // NOLINT(performance-enum-size) # if HSA_AMD_EXT_API_TABLE_STEP_VERSION >= 0x02 ROCPROFILER_HSA_AMD_EXT_API_ID_hsa_amd_queue_get_info, # endif +# if HSA_AMD_EXT_API_TABLE_STEP_VERSION >= 0x03 + ROCPROFILER_HSA_AMD_EXT_API_ID_hsa_amd_vmem_address_reserve_align, +# endif #endif ROCPROFILER_HSA_AMD_EXT_API_ID_LAST, diff --git a/source/include/rocprofiler-sdk/hsa/api_args.h b/source/include/rocprofiler-sdk/hsa/api_args.h index 67a8c80e..158eed53 100644 --- a/source/include/rocprofiler-sdk/hsa/api_args.h +++ b/source/include/rocprofiler-sdk/hsa/api_args.h @@ -1397,6 +1397,16 @@ typedef union rocprofiler_hsa_api_args_t void* value; } hsa_amd_queue_get_info; # endif +# if HSA_AMD_EXT_API_TABLE_STEP_VERSION >= 0x03 + struct + { + void** ptr; + size_t size; + uint64_t address; + uint64_t alignment; + uint64_t flags; + } hsa_amd_vmem_address_reserve_align; +# endif #endif } rocprofiler_hsa_api_args_t; diff --git a/source/include/rocprofiler-sdk/pc_sampling.h b/source/include/rocprofiler-sdk/pc_sampling.h index e77721c4..f39438cf 100644 --- a/source/include/rocprofiler-sdk/pc_sampling.h +++ b/source/include/rocprofiler-sdk/pc_sampling.h @@ -90,6 +90,9 @@ ROCPROFILER_EXTERN_C_INIT * * Constraint4: PC sampling feature is not available within the ROCgdb. * + * Constraint5: PC sampling service cannot be used simultaneously with + * counter collection service. + * * @param [in] context_id - id of the context used for starting/stopping PC sampling service * @param [in] agent_id - id of the agent on which caller tries using PC sampling capability * @param [in] method - the type of PC sampling the caller tries to use on the agent. @@ -105,7 +108,8 @@ ROCPROFILER_EXTERN_C_INIT * @retval ::ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL the amdgpu driver installed on the system * does not support the PC sampling feature * @retval ::ROCPROFILER_STATUS_ERROR a general error caused by the amdgpu driver - * + * @retval ::ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT counter collection service already + * setup in the context */ rocprofiler_status_t ROCPROFILER_API rocprofiler_configure_pc_sampling_service(rocprofiler_context_id_t context_id, @@ -138,7 +142,7 @@ typedef struct /// @brief A unit used to specify the interval of the @ref method for samples generation. /// @var min_interval /// @brief the highest possible frequencey for generating samples using @ref method. - /// @var max_interva + /// @var max_interval /// @brief the lowest possible frequency for generating samples using @ref method } rocprofiler_pc_sampling_configuration_t; @@ -244,25 +248,25 @@ typedef struct { uint64_t size; ///< Size of this struct rocprofiler_pc_sampling_header_v1_t flags; - uint8_t chiplet; /// chiplet index - uint8_t wave_id; /// wave identifier within the workgroup + uint8_t chiplet; ///< chiplet index + uint8_t wave_id; ///< wave identifier within the workgroup uint8_t wave_issued : 1; - uint8_t reserved : 7; /// reserved 7 bits, must be zero - uint32_t hw_id; /// compute unit identifier - uint64_t pc; /// Program counter of the wave of the moment of interruption + uint8_t reserved : 7; ///< reserved 7 bits, must be zero + uint32_t hw_id; ///< compute unit identifier + uint64_t pc; ///< Program counter of the wave of the moment of interruption uint64_t exec_mask; - rocprofiler_dim3_t workgroup_id; /// wave coordinates within the workgroup + rocprofiler_dim3_t workgroup_id; ///< wave coordinates within the workgroup uint32_t wave_count; - uint64_t timestamp; /// timestamp when sample is generated + uint64_t timestamp; ///< timestamp when sample is generated rocprofiler_correlation_id_t correlation_id; rocprofiler_pc_sampling_snapshot_v1_t - snapshot; /// @see ::rocprofiler_pc_sampling_snapshot_v1_t - uint32_t reserved2; /// for future use + snapshot; ///< @see ::rocprofiler_pc_sampling_snapshot_v1_t + uint32_t reserved2; ///< for future use /// @var flags /// @brief indicates what fields of this struct are meaningful for the represented sample. /// The values depend on what the underlying GPU agent architecture supports. - /// @var wave_issue + /// @var wave_issued /// @brief indicates whether the wave is issueing the instruction represented by the @ref pc /// @var exec_mask /// @brief shows how many SIMD lanes of the wave were executing the instruction diff --git a/source/include/rocprofiler-sdk/profile_config.h b/source/include/rocprofiler-sdk/profile_config.h index e7c97094..322d6bd3 100644 --- a/source/include/rocprofiler-sdk/profile_config.h +++ b/source/include/rocprofiler-sdk/profile_config.h @@ -40,12 +40,18 @@ ROCPROFILER_EXTERN_C_INIT * be used across many contexts. The profile has a fixed set of counters * that are collected (and specified by counter_list). The available * counters for an agent can be queried using - * @ref rocprofiler_iterate_agent_supported_counters. + * @ref rocprofiler_iterate_agent_supported_counters. An existing profile + * may be supplied via config_id to use as a base for the new profile. + * All counters in the existing profile will be copied over to the new + * profile. The existing profile will remain unmodified and usable with + * the new profile id being returned in config_id. * * @param [in] agent_id Agent identifier * @param [in] counters_list List of GPU counters * @param [in] counters_count Size of counters list - * @param [out] config_id Identifier for GPU counters group + * @param [in,out] config_id Identifier for GPU counters group. If an existing + profile is supplied, that profiles counters will be copied + over to a new profile (returned via this id) * @return ::rocprofiler_status_t * @retval ROCPROFILER_STATUS_SUCCESS if profile created * @retval ROCPROFILER_STATUS_ERROR if profile could not be created diff --git a/source/lib/common/CMakeLists.txt b/source/lib/common/CMakeLists.txt index b10a6e3f..8e4ebe5c 100644 --- a/source/lib/common/CMakeLists.txt +++ b/source/lib/common/CMakeLists.txt @@ -6,6 +6,7 @@ rocprofiler_activate_clang_tidy() set(common_sources environment.cpp demangle.cpp logging.cpp static_object.cpp utility.cpp xml.cpp) set(common_headers + abi.hpp defines.hpp environment.hpp demangle.hpp diff --git a/source/lib/common/abi.hpp b/source/lib/common/abi.hpp new file mode 100644 index 00000000..9f9015c5 --- /dev/null +++ b/source/lib/common/abi.hpp @@ -0,0 +1,74 @@ +// MIT License +// +// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include + +#include "lib/common/defines.hpp" + +#include + +namespace rocprofiler +{ +namespace common +{ +namespace abi +{ +constexpr auto +compute_table_offset(size_t num_funcs) +{ + return (num_funcs * sizeof(void*)) + sizeof(size_t); +} +} // namespace abi +} // namespace common +} // namespace rocprofiler + +// ROCP_SDK_ENFORCE_ABI_VERSIONING will cause a compiler error if the size of the API table +// changed (most likely due to addition of new dispatch table entry) to make sure the developer is +// reminded to update the table versioning value before changing the value in +// ROCP_SDK_ENFORCE_ABI_VERSIONING to make this static assert pass. +// +// ROCP_SDK_ENFORCE_ABI will cause a compiler error if the order of the members in the API table +// change. Do not reorder member variables and change existing ROCP_SDK_ENFORCE_ABI values -- +// always +// +// Please note: rocprofiler will do very strict compile time checks to make +// sure these versioning values are appropriately updated -- so commenting out this check, only +// updating the size field in ROCP_SDK_ENFORCE_ABI_VERSIONING, etc. will result in the +// rocprofiler-sdk failing to build and you will be forced to do the work anyway. +#if !defined(ROCPROFILER_UNSAFE_NO_VERSION_CHECK) && (defined(ROCPROFILER_CI) && ROCPROFILER_CI > 0) +# define ROCP_SDK_ENFORCE_ABI_VERSIONING(TABLE, NUM) \ + static_assert( \ + sizeof(TABLE) == ::rocprofiler::common::abi::compute_table_offset(NUM), \ + "size of the API table struct has changed. Update the STEP_VERSION number (or " \ + "in rare cases, the MAJOR_VERSION number)"); + +# define ROCP_SDK_ENFORCE_ABI(TABLE, ENTRY, NUM) \ + static_assert( \ + offsetof(TABLE, ENTRY) == ::rocprofiler::common::abi::compute_table_offset(NUM), \ + "ABI break for " #TABLE "." #ENTRY \ + ". Only add new function pointers to end of struct and do not rearrange them"); +#else +# define ROCP_SDK_ENFORCE_ABI_VERSIONING(TABLE, NUM) +# define ROCP_SDK_ENFORCE_ABI(TABLE, ENTRY, NUM) +#endif diff --git a/source/lib/common/utility.cpp b/source/lib/common/utility.cpp index a0457199..09c130cf 100644 --- a/source/lib/common/utility.cpp +++ b/source/lib/common/utility.cpp @@ -123,3 +123,29 @@ read_command_line(pid_t _pid) } } // namespace common } // namespace rocprofiler + +namespace +{ +std::atomic& +debugger_block() +{ + static std::atomic block = {true}; + return block; +} +} // namespace + +extern "C" { +void +rocprofiler_debugger_block() +{ + while(debugger_block().load() == true) + {}; + // debugger_block().exchange(true); +} + +void +rocprofiler_debugger_continue() +{ + debugger_block().exchange(false); +} +} diff --git a/source/lib/common/utility.hpp b/source/lib/common/utility.hpp index 5e84bd6e..b2e0eabf 100644 --- a/source/lib/common/utility.hpp +++ b/source/lib/common/utility.hpp @@ -265,3 +265,10 @@ yield(PredicateT&& predicate, } } // namespace common } // namespace rocprofiler + +extern "C" { +void +rocprofiler_debugger_block(); +void +rocprofiler_debugger_continue(); +} diff --git a/source/lib/rocprofiler-sdk-codeobj/tests/CMakeLists.txt b/source/lib/rocprofiler-sdk-codeobj/tests/CMakeLists.txt index 3181522c..a9a36a89 100644 --- a/source/lib/rocprofiler-sdk-codeobj/tests/CMakeLists.txt +++ b/source/lib/rocprofiler-sdk-codeobj/tests/CMakeLists.txt @@ -29,5 +29,5 @@ set_tests_properties(${codeobj-library-test_TESTS} PROPERTIES TIMEOUT 10 LABELS target_compile_definitions(codeobj-library-test PRIVATE -DCODEOBJ_BINARY_DIR=\"${CMAKE_CURRENT_BINARY_DIR}/\") -configure_file(smallkernel.b smallkernel.b COPYONLY) +configure_file(smallkernel.bin smallkernel.bin COPYONLY) configure_file(hipcc_output.s hipcc_output.s COPYONLY) diff --git a/source/lib/rocprofiler-sdk-codeobj/tests/codeobj_library_test.cpp b/source/lib/rocprofiler-sdk-codeobj/tests/codeobj_library_test.cpp index 3ae1442f..fde0f6c8 100644 --- a/source/lib/rocprofiler-sdk-codeobj/tests/codeobj_library_test.cpp +++ b/source/lib/rocprofiler-sdk-codeobj/tests/codeobj_library_test.cpp @@ -69,7 +69,7 @@ static const std::vector& GetCodeobjContents() { static std::vector buffer = []() { - std::string filename = CODEOBJ_BINARY_DIR "smallkernel.b"; + std::string filename = CODEOBJ_BINARY_DIR "smallkernel.bin"; std::ifstream file(filename.data(), std::ios::binary); using iterator_t = std::istreambuf_iterator; @@ -84,6 +84,8 @@ GetCodeobjContents() TEST(codeobj_library, segment_test) { + using CodeobjTableTranslator = rocprofiler::codeobj::segment::CodeobjTableTranslator; + CodeobjTableTranslator table; std::unordered_set used_addr{}; @@ -92,21 +94,30 @@ TEST(codeobj_library, segment_test) for(int j = 0; j < 2500; j++) { size_t addr = rand() % 10000000; - size_t size = (rand() % 10) + 1; + size_t size = 1; if(used_addr.find(addr) != used_addr.end()) continue; used_addr.insert(addr); - table.insert({addr, addr + size, 0, 0}); + table.insert({addr, size, 0}); } - for(size_t i = 1; i < table.size(); i++) - ASSERT_LT(table[i - 1], table[i]); + ASSERT_NE(table.begin(), table.end()); + { + auto it = std::next(table.begin()); + while(it != table.end()) + { + ASSERT_LT(*std::prev(it), *it); + it++; + } + } + std::vector addr_leftover(used_addr.begin(), used_addr.end()); for(size_t i = 0; i < 2400; i++) { - size_t idx = rand() % table.size(); - auto rdelem = table[idx]; - used_addr.erase(rdelem.vbegin); - ASSERT_NE(table.remove(rdelem.vbegin), 0); + size_t idx = rand() % addr_leftover.size(); + auto addr = addr_leftover.at(idx); + ASSERT_EQ(table.remove(addr), true); + addr_leftover.erase(addr_leftover.begin() + idx); + used_addr.erase(addr); } } } @@ -130,7 +141,7 @@ TEST(codeobj_library, decoder_component) CodeobjDecoderComponent component(objdata.data(), objdata.size()); - std::string kernel_with_protocol = "file://" CODEOBJ_BINARY_DIR "smallkernel.b"; + std::string kernel_with_protocol = "file://" CODEOBJ_BINARY_DIR "smallkernel.bin"; LoadedCodeobjDecoder loadecomp(kernel_with_protocol.data(), loaded_offset, objdata.size()); ASSERT_EQ(component.m_symbol_map.size(), 1); @@ -181,12 +192,14 @@ TEST(codeobj_library, loaded_codeobj_component) TEST(codeobj_library, codeobj_map_test) { + using marker_id_t = rocprofiler::codeobj::segment::marker_id_t; + const std::vector& objdata = rocprofiler::testing::codeobjhelper::GetCodeobjContents(); constexpr size_t laddr1 = 0x1000; constexpr size_t laddr3 = 0x3000; uint64_t kaddr = [&objdata]() { - CodeobjDecoderComponent comp((const void*) objdata.data(), objdata.size()); + CodeobjDecoderComponent comp(objdata.data(), objdata.size()); for(auto& [addr, _] : comp.m_symbol_map) return addr; return 0ul; @@ -195,19 +208,11 @@ TEST(codeobj_library, codeobj_map_test) EXPECT_NE(kaddr, 0); disassembly::CodeobjMap map; - map.addDecoder((const void*) objdata.data(), - objdata.size(), - codeobj_marker_id_t{1}, - laddr1, - objdata.size()); - map.addDecoder((const void*) objdata.data(), - objdata.size(), - codeobj_marker_id_t{3}, - laddr3, - objdata.size()); - - EXPECT_EQ(map.get(codeobj_marker_id_t{1}, kaddr)->inst, - map.get(codeobj_marker_id_t{3}, kaddr)->inst); + const void* objdataptr = (const void*) objdata.data(); + map.addDecoder(objdataptr, objdata.size(), marker_id_t{1}, laddr1, objdata.size()); + map.addDecoder(objdataptr, objdata.size(), marker_id_t{3}, laddr3, objdata.size()); + + EXPECT_EQ(map.get(marker_id_t{1}, kaddr)->inst, map.get(marker_id_t{3}, kaddr)->inst); ASSERT_EQ(map.removeDecoderbyId(1), true); ASSERT_EQ(map.removeDecoderbyId(3), true); @@ -216,6 +221,8 @@ TEST(codeobj_library, codeobj_map_test) TEST(codeobj_library, codeobj_table_test) { + using marker_id_t = rocprofiler::codeobj::segment::marker_id_t; + const std::vector& hiplines = codeobjhelper::GetHipccOutput(); const std::vector& objdata = codeobjhelper::GetCodeobjContents(); constexpr size_t laddr1 = 0x1000; @@ -225,7 +232,7 @@ TEST(codeobj_library, codeobj_table_test) uint64_t kaddr = 0, memsize = 0; std::tie(kaddr, memsize) = [&objdata]() { - CodeobjDecoderComponent comp((const void*) objdata.data(), objdata.size()); + CodeobjDecoderComponent comp(objdata.data(), objdata.size()); for(auto& [addr, symbol] : comp.m_symbol_map) return std::pair(addr, symbol.mem_size); return std::pair(0, 0); @@ -233,10 +240,8 @@ TEST(codeobj_library, codeobj_table_test) ASSERT_NE(kaddr, 0); ASSERT_NE(memsize, 0); - map.addDecoder( - (const void*) objdata.data(), objdata.size(), codeobj_marker_id_t{1}, laddr1, 0x2000); - map.addDecoder( - (const void*) objdata.data(), objdata.size(), codeobj_marker_id_t{3}, laddr3, 0x2000); + map.addDecoder((const void*) objdata.data(), objdata.size(), marker_id_t{1}, laddr1, 0x2000); + map.addDecoder((const void*) objdata.data(), objdata.size(), marker_id_t{3}, laddr3, 0x2000); EXPECT_NE(map.get(laddr1 + kaddr).get(), nullptr); EXPECT_NE(map.get(laddr3 + kaddr).get(), nullptr); diff --git a/source/lib/rocprofiler-sdk-codeobj/tests/smallkernel.b b/source/lib/rocprofiler-sdk-codeobj/tests/smallkernel.bin similarity index 100% rename from source/lib/rocprofiler-sdk-codeobj/tests/smallkernel.b rename to source/lib/rocprofiler-sdk-codeobj/tests/smallkernel.bin diff --git a/source/lib/rocprofiler-sdk-tool/tool.cpp b/source/lib/rocprofiler-sdk-tool/tool.cpp index 70d54784..143e7f47 100644 --- a/source/lib/rocprofiler-sdk-tool/tool.cpp +++ b/source/lib/rocprofiler-sdk-tool/tool.cpp @@ -900,12 +900,13 @@ get_counter_info_name(uint64_t record_id) auto counter_id = rocprofiler_counter_id_t{}; ROCPROFILER_CALL(rocprofiler_query_record_counter_id(record_id, &counter_id), "query record counter id"); - ROCPROFILER_CALL(rocprofiler_query_counter_info(rocprofiler_counter_id_t{counter_id}, - ROCPROFILER_COUNTER_INFO_VERSION_0, - static_cast(&info)), - "query counter info"); - std::string counter_name = info.name; - return counter_name; + if(rocprofiler_query_counter_info(rocprofiler_counter_id_t{counter_id}, + ROCPROFILER_COUNTER_INFO_VERSION_0, + static_cast(&info)) != ROCPROFILER_STATUS_SUCCESS) + { + ROCP_FATAL << "Could not find name for record id: " << record_id; + } + return {info.name}; } void diff --git a/source/lib/rocprofiler-sdk/agent.cpp b/source/lib/rocprofiler-sdk/agent.cpp index ced22f8a..7a969c85 100644 --- a/source/lib/rocprofiler-sdk/agent.cpp +++ b/source/lib/rocprofiler-sdk/agent.cpp @@ -369,6 +369,9 @@ read_topology() auto data = std::vector{}; uint64_t idcount = 0; uint64_t nodecount = 0; + uint64_t cpucount = 0; + uint64_t gpucount = 0; + uint64_t unkcount = 0; while(true) { @@ -398,11 +401,12 @@ read_topology() // we may have been able to open the properties file but if it was empty, we ignore it if(properties.empty()) continue; - auto agent_info = common::init_public_api_struct(rocprofiler_agent_t{}); - agent_info.type = ROCPROFILER_AGENT_TYPE_NONE; - agent_info.logical_node_id = idcount++; - agent_info.node_id = node_id; - agent_info.id.handle = (agent_info.logical_node_id) + get_agent_offset(); + auto agent_info = common::init_public_api_struct(rocprofiler_agent_t{}); + agent_info.type = ROCPROFILER_AGENT_TYPE_NONE; + agent_info.logical_node_id = idcount++; + agent_info.node_id = node_id; + agent_info.id.handle = (agent_info.logical_node_id) + get_agent_offset(); + agent_info.logical_node_type_id = -1; if(!name_prop.empty()) agent_info.model_name = @@ -419,6 +423,15 @@ read_topology() agent_info.type = ROCPROFILER_AGENT_TYPE_CPU; else if(agent_info.simd_count > 0) agent_info.type = ROCPROFILER_AGENT_TYPE_GPU; + else + ROCP_WARNING << "agent " << agent_info.node_id << " is neither a CPU nor a GPU"; + + if(agent_info.type == ROCPROFILER_AGENT_TYPE_CPU) + agent_info.logical_node_type_id = cpucount++; + else if(agent_info.type == ROCPROFILER_AGENT_TYPE_GPU) + agent_info.logical_node_type_id = gpucount++; + else + agent_info.logical_node_type_id = unkcount++; read_property(properties, "mem_banks_count", agent_info.mem_banks_count); read_property(properties, "caches_count", agent_info.caches_count); @@ -631,7 +644,7 @@ auto& get_agent_caches() { static auto*& _v = common::static_object>::construct(); - return *_v; + return *CHECK_NOTNULL(_v); } struct agent_pair @@ -643,8 +656,8 @@ struct agent_pair auto& get_agent_mapping() { - static auto _v = std::vector{}; - return _v; + static auto*& _v = common::static_object>::construct(); + return *CHECK_NOTNULL(_v); } } // namespace @@ -674,27 +687,28 @@ get_agent(rocprofiler_agent_id_t id) const std::vector& get_aql_handles() { - static std::vector _v = []() { - std::vector agent_handles; - for(auto& agent : get_agents()) - { - aqlprofile_agent_info_t agent_info = { - .agent_gfxip = agent->name, - .xcc_num = agent->num_xcc, - .se_num = agent->num_shader_banks, - .cu_num = agent->cu_count, - .shader_arrays_per_se = agent->simd_arrays_per_engine}; - aqlprofile_agent_handle_t handle = {.handle = 0}; - if(aqlprofile_register_agent(&handle, &agent_info) != HSA_STATUS_SUCCESS) + static auto*& _v = + common::static_object>::construct([]() { + std::vector agent_handles; + for(auto& agent : get_agents()) { - ROCP_WARNING << "Failed to register agent " << agent->name; + aqlprofile_agent_info_t agent_info = { + .agent_gfxip = agent->name, + .xcc_num = agent->num_xcc, + .se_num = agent->num_shader_banks, + .cu_num = agent->cu_count, + .shader_arrays_per_se = agent->simd_arrays_per_engine}; + aqlprofile_agent_handle_t handle = {.handle = 0}; + if(aqlprofile_register_agent(&handle, &agent_info) != HSA_STATUS_SUCCESS) + { + ROCP_WARNING << "Failed to register agent " << agent->name; + } + agent_handles.push_back(handle); } - agent_handles.push_back(handle); - } - return agent_handles; - }(); + return agent_handles; + }()); - return _v; + return *CHECK_NOTNULL(_v); } const aqlprofile_agent_handle_t* @@ -782,6 +796,8 @@ construct_agent_cache(::HsaApiTable* table) "{}", fmt::join(rocp_hsa_agent_node_ids.begin(), rocp_hsa_agent_node_ids.end(), ", ")); + get_agent_caches().clear(); + get_agent_mapping().clear(); get_agent_mapping().reserve(get_agent_mapping().size() + rocp_agents.size()); auto hsa_agent_node_map = std::unordered_map{}; diff --git a/source/lib/rocprofiler-sdk/agent_profile.cpp b/source/lib/rocprofiler-sdk/agent_profile.cpp index 35470079..5b0105cf 100644 --- a/source/lib/rocprofiler-sdk/agent_profile.cpp +++ b/source/lib/rocprofiler-sdk/agent_profile.cpp @@ -47,4 +47,4 @@ rocprofiler_sample_agent_profile_counting_service(rocprofiler_context_id_t con return rocprofiler::counters::read_agent_ctx( rocprofiler::context::get_registered_context(context_id), user_data, flags); } -} \ No newline at end of file +} diff --git a/source/lib/rocprofiler-sdk/aql/aql_profile_v2.h b/source/lib/rocprofiler-sdk/aql/aql_profile_v2.h index 6de750be..4692072e 100644 --- a/source/lib/rocprofiler-sdk/aql/aql_profile_v2.h +++ b/source/lib/rocprofiler-sdk/aql/aql_profile_v2.h @@ -184,14 +184,29 @@ aqlprofile_get_pmc_info(const aqlprofile_pmc_profile_t* profile, aqlprofile_pmc_info_type_t attribute, void* value); +// Profile parameter object +typedef struct +{ + hsa_ven_amd_aqlprofile_parameter_name_t parameter_name; + union + { + uint32_t value; + struct + { + uint32_t counter_id : 28; + uint32_t simd_mask : 4; + }; + }; +} aqlprofile_att_parameter_t; + /** * @brief AQLprofile struct containing information for Advanced Thread Trace */ typedef struct { - hsa_agent_t agent; - const hsa_ven_amd_aqlprofile_parameter_t* parameters; - uint32_t parameter_count; + hsa_agent_t agent; + const aqlprofile_att_parameter_t* parameters; + uint32_t parameter_count; } aqlprofile_att_profile_t; /** diff --git a/source/lib/rocprofiler-sdk/aql/helpers.cpp b/source/lib/rocprofiler-sdk/aql/helpers.cpp index 0628f4af..c3d6101f 100644 --- a/source/lib/rocprofiler-sdk/aql/helpers.cpp +++ b/source/lib/rocprofiler-sdk/aql/helpers.cpp @@ -21,15 +21,15 @@ // SOFTWARE. #include "lib/rocprofiler-sdk/aql/helpers.hpp" - -#include - -#include - #include "lib/common/logging.hpp" #include "lib/common/synchronized.hpp" #include "lib/common/utility.hpp" #include "lib/rocprofiler-sdk/counters/id_decode.hpp" +#include "lib/rocprofiler-sdk/hsa/hsa.hpp" + +#include + +#include namespace rocprofiler { @@ -66,9 +66,9 @@ get_block_counters(rocprofiler_agent_id_t agent, const aqlprofile_pmc_event_t& e rocprofiler_status_t set_dim_id_from_sample(rocprofiler_counter_instance_id_t& id, - hsa_agent_t agent, - hsa_ven_amd_aqlprofile_event_t event, - uint32_t sample_id) + aqlprofile_agent_handle_t agent, + aqlprofile_pmc_event_t event, + size_t sample_id) { auto callback = [](int, int sid, int, int coordinate, const char*, void* userdata) -> hsa_status_t { @@ -82,8 +82,8 @@ set_dim_id_from_sample(rocprofiler_counter_instance_id_t& id, return HSA_STATUS_SUCCESS; }; - if(hsa_ven_amd_aqlprofile_iterate_event_coord( - agent, event, sample_id, callback, static_cast(&id)) != HSA_STATUS_SUCCESS) + if(aqlprofile_iterate_event_coord(agent, event, sample_id, callback, static_cast(&id)) != + HSA_STATUS_SUCCESS) { return ROCPROFILER_STATUS_ERROR_AQL_NO_EVENT_COORD; } @@ -114,11 +114,15 @@ get_dim_info(rocprofiler_agent_id_t agent, } rocprofiler_status_t -set_profiler_active_on_queue(const AmdExtTable& api, - hsa_amd_memory_pool_t pool, +set_profiler_active_on_queue(hsa_amd_memory_pool_t pool, hsa_agent_t hsa_agent, const rocprofiler_profile_pkt_cb& packet_submit) { + CHECK(hsa::get_amd_ext_table() != nullptr); + CHECK(hsa::get_amd_ext_table()->hsa_amd_memory_pool_allocate_fn != nullptr); + CHECK(hsa::get_amd_ext_table()->hsa_amd_agents_allow_access_fn != nullptr); + CHECK(hsa::get_amd_ext_table()->hsa_amd_memory_pool_free_fn != nullptr); + // Inject packet to enable profiling of other process queues on this queue hsa_ven_amd_aqlprofile_profile_t profile{}; profile.agent = hsa_agent; @@ -134,15 +138,15 @@ set_profiler_active_on_queue(const AmdExtTable& api, const size_t mask = 0x1000 - 1; auto size = (profile.command_buffer.size + mask) & ~mask; - if(api.hsa_amd_memory_pool_allocate_fn(pool, size, 0, &profile.command_buffer.ptr) != - HSA_STATUS_SUCCESS) + if(hsa::get_amd_ext_table()->hsa_amd_memory_pool_allocate_fn( + pool, size, 0, &profile.command_buffer.ptr) != HSA_STATUS_SUCCESS) { ROCP_WARNING << "Failed to allocate memory to enable profile command on agent, some " "counters will be unavailable"; return ROCPROFILER_STATUS_ERROR; } - if(api.hsa_amd_agents_allow_access_fn(1, &hsa_agent, nullptr, profile.command_buffer.ptr) != - HSA_STATUS_SUCCESS) + if(hsa::get_amd_ext_table()->hsa_amd_agents_allow_access_fn( + 1, &hsa_agent, nullptr, profile.command_buffer.ptr) != HSA_STATUS_SUCCESS) { ROCP_WARNING << "Agent cannot access memory, some counters will be unavailable"; return ROCPROFILER_STATUS_ERROR; @@ -157,7 +161,7 @@ set_profiler_active_on_queue(const AmdExtTable& api, } packet_submit(packet); - api.hsa_amd_memory_pool_free_fn(profile.command_buffer.ptr); + hsa::get_amd_ext_table()->hsa_amd_memory_pool_free_fn(profile.command_buffer.ptr); return ROCPROFILER_STATUS_SUCCESS; } diff --git a/source/lib/rocprofiler-sdk/aql/helpers.hpp b/source/lib/rocprofiler-sdk/aql/helpers.hpp index 80247987..371f35a0 100644 --- a/source/lib/rocprofiler-sdk/aql/helpers.hpp +++ b/source/lib/rocprofiler-sdk/aql/helpers.hpp @@ -22,18 +22,18 @@ #pragma once -#include -#include -#include - -#include - -#include - #include "lib/rocprofiler-sdk/agent.hpp" #include "lib/rocprofiler-sdk/counters/metrics.hpp" #include "lib/rocprofiler-sdk/hsa/rocprofiler_packet.hpp" +#include + +#include + +#include +#include +#include + namespace rocprofiler { namespace aql @@ -57,13 +57,12 @@ get_dim_info(rocprofiler_agent_id_t agent, // Set dimension ids into id for sample rocprofiler_status_t set_dim_id_from_sample(rocprofiler_counter_instance_id_t& id, - hsa_agent_t agent, - hsa_ven_amd_aqlprofile_event_t event, - uint32_t sample_id); + aqlprofile_agent_handle_t agent, + aqlprofile_pmc_event_t event, + size_t sample_id); rocprofiler_status_t -set_profiler_active_on_queue(const AmdExtTable& api, - hsa_amd_memory_pool_t pool, +set_profiler_active_on_queue(hsa_amd_memory_pool_t pool, hsa_agent_t hsa_agent, const rocprofiler_profile_pkt_cb& packet_submit); } // namespace aql diff --git a/source/lib/rocprofiler-sdk/aql/packet_construct.cpp b/source/lib/rocprofiler-sdk/aql/packet_construct.cpp index 47e53af4..d67bd56a 100644 --- a/source/lib/rocprofiler-sdk/aql/packet_construct.cpp +++ b/source/lib/rocprofiler-sdk/aql/packet_construct.cpp @@ -27,6 +27,7 @@ #include #include #include "glog/logging.h" +#include "rocprofiler-sdk/fwd.h" #define CHECK_HSA(fn, message) \ { \ @@ -65,14 +66,15 @@ CounterPacketConstruct::CounterPacketConstruct(rocprofiler_agent_id_t for(unsigned block_index = 0; block_index < query_info.instance_count; ++block_index) { _metrics.back().instances.push_back( - {static_cast(query_info.id), - block_index, - event_id}); + {.block_index = block_index, + .event_id = event_id, + .flags = aqlprofile_pmc_event_flags_t{x.flags()}, + .block_name = static_cast(query_info.id)}); _metrics.back().events.push_back( {.block_index = block_index, .event_id = event_id, - .flags = aqlprofile_pmc_event_flags_t{0}, + .flags = aqlprofile_pmc_event_flags_t{x.flags()}, .block_name = static_cast(query_info.id)}); bool validate_event_result; @@ -85,117 +87,45 @@ CounterPacketConstruct::CounterPacketConstruct(rocprofiler_agent_id_t &validate_event_result) != HSA_STATUS_SUCCESS); ROCP_FATAL_IF(!validate_event_result) << "Invalid Metric: " << block_index << " " << event_id; - _event_to_metric[std::make_tuple( - static_cast(query_info.id), - block_index, - event_id)] = x; + _event_to_metric[_metrics.back().events.back()] = x; } } - // Check that we can collect all of the metrics in a single execution - // with a single AQL packet - can_collect(); _events = get_all_events(); } std::unique_ptr -CounterPacketConstruct::construct_packet(const AmdExtTable& ext) +CounterPacketConstruct::construct_packet(const CoreApiTable& coreapi, const AmdExtTable& ext) { - auto pkt_ptr = std::make_unique(ext.hsa_amd_memory_pool_free_fn); - auto& pkt = *pkt_ptr; - if(_events.empty()) - { - ROCP_TRACE << "No events for pkt"; - return pkt_ptr; - } - pkt.empty = false; - - const auto* agent_cache = + const auto* agent = rocprofiler::agent::get_agent_cache(CHECK_NOTNULL(rocprofiler::agent::get_agent(_agent))); - if(!agent_cache) - { - ROCP_FATAL << "No agent cache for agent id: " << _agent.handle; - } - - pkt.profile = hsa_ven_amd_aqlprofile_profile_t{ - agent_cache->get_hsa_agent(), - HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC, // SPM? - _events.data(), - static_cast(_events.size()), - nullptr, - 0u, - hsa_ven_amd_aqlprofile_descriptor_t{.ptr = nullptr, .size = 0}, - hsa_ven_amd_aqlprofile_descriptor_t{.ptr = nullptr, .size = 0}}; - auto& profile = pkt.profile; + if(!agent) ROCP_FATAL << "No agent cache for agent id: " << _agent.handle; hsa_amd_memory_pool_access_t _access = HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED; - ext.hsa_amd_agent_memory_pool_get_info_fn(agent_cache->get_hsa_agent(), - agent_cache->kernarg_pool(), + ext.hsa_amd_agent_memory_pool_get_info_fn(agent->get_hsa_agent(), + agent->kernarg_pool(), HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS, static_cast(&_access)); - // Memory is accessable by both the GPU and CPU, unlock the command buffer for - // sharing. - if(_access == HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED) - { - throw std::runtime_error( - fmt::format("Agent {} does not allow memory pool access for counter collection", - agent_cache->get_hsa_agent().handle)); - } - CHECK_HSA(hsa_ven_amd_aqlprofile_start(&profile, nullptr), "could not generate packet sizes"); + hsa::CounterAQLPacket::CounterMemoryPool pool; - if(profile.command_buffer.size == 0 || profile.output_buffer.size == 0) - { - throw std::runtime_error( - fmt::format("No command or output buffer size set. CMD_BUF={} PROFILE_BUF={}", - profile.command_buffer.size, - profile.output_buffer.size)); - } + if(_access == HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED) pool.bIgnoreKernArg = true; - // Allocate buffers and check the results - auto alloc_and_check = [&](auto& pool, auto** mem_loc, auto size) -> bool { - bool malloced = false; - size_t page_aligned = getPageAligned(size); - if(ext.hsa_amd_memory_pool_allocate_fn( - pool, page_aligned, 0, static_cast(mem_loc)) != HSA_STATUS_SUCCESS) - { - *mem_loc = malloc(page_aligned); - malloced = true; - } - else - { - CHECK(*mem_loc); - hsa_agent_t agent = agent_cache->get_hsa_agent(); - // Memory is accessable by both the GPU and CPU, unlock the command buffer for - // sharing. - LOG_IF(FATAL, - ext.hsa_amd_agents_allow_access_fn(1, &agent, nullptr, *mem_loc) != - HSA_STATUS_SUCCESS) - << "Error: Allowing access to Command Buffer"; - } - return malloced; - }; - - // Build command and output buffers - pkt.command_buf_mallocd = alloc_and_check( - agent_cache->cpu_pool(), &profile.command_buffer.ptr, profile.command_buffer.size); - pkt.output_buffer_malloced = alloc_and_check( - agent_cache->kernarg_pool(), &profile.output_buffer.ptr, profile.output_buffer.size); - memset(profile.output_buffer.ptr, 0x0, profile.output_buffer.size); - - CHECK_HSA(hsa_ven_amd_aqlprofile_start(&profile, &pkt.start), "failed to create start packet"); - CHECK_HSA(hsa_ven_amd_aqlprofile_stop(&profile, &pkt.stop), "failed to create stop packet"); - CHECK_HSA(hsa_ven_amd_aqlprofile_read(&profile, &pkt.read), "failed to create read packet"); - pkt.start.header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE; - pkt.stop.header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE; - pkt.read.header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE; - ROCP_TRACE << fmt::format("Following Packets Generated (output_buffer={}, output_size={}). " - "Start Pkt: {}, Read Pkt: {}, Stop Pkt: {}", - profile.output_buffer.ptr, - profile.output_buffer.size, - pkt.start, - pkt.read, - pkt.stop); - return pkt_ptr; + pool.allocate_fn = ext.hsa_amd_memory_pool_allocate_fn; + pool.allow_access_fn = ext.hsa_amd_agents_allow_access_fn; + pool.free_fn = ext.hsa_amd_memory_pool_free_fn; + pool.api_copy_fn = coreapi.hsa_memory_copy_fn; + pool.fill_fn = ext.hsa_amd_memory_fill_fn; + + pool.gpu_agent = agent->get_hsa_agent(); + pool.cpu_pool_ = agent->cpu_pool(); + pool.kernarg_pool_ = agent->kernarg_pool(); + + const auto* aql_agent = rocprofiler::agent::get_aql_agent(agent->get_rocp_agent()->id); + if(aql_agent == nullptr) throw std::runtime_error("Could not get AQL agent!"); + + if(_events.empty()) ROCP_TRACE << "No events for pkt"; + + return std::make_unique(*aql_agent, pool, _events); } ThreadTraceAQLPacketFactory::ThreadTraceAQLPacketFactory(const hsa::AgentCache& agent, @@ -216,20 +146,39 @@ ThreadTraceAQLPacketFactory::ThreadTraceAQLPacketFactory(const hsa::AgentCache& uint32_t shader_engine_mask = static_cast(params.shader_engine_mask); uint32_t simd = static_cast(params.simd_select); uint32_t buffer_size = static_cast(params.buffer_size); + uint32_t perf_ctrl = static_cast(params.perfcounter_ctrl); aql_params.clear(); - aql_params.push_back({HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_COMPUTE_UNIT_TARGET, cu}); - aql_params.push_back({HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_SE_MASK, shader_engine_mask}); - aql_params.push_back({HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_SIMD_SELECTION, simd}); - aql_params.push_back({HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_ATT_BUFFER_SIZE, buffer_size}); + + aql_params.push_back({HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_COMPUTE_UNIT_TARGET, {cu}}); + aql_params.push_back({HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_SE_MASK, {shader_engine_mask}}); + aql_params.push_back({HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_SIMD_SELECTION, {simd}}); + aql_params.push_back({HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_ATT_BUFFER_SIZE, {buffer_size}}); + + if(perf_ctrl != 0 && !params.perfcounters.empty()) + { + for(const auto& perf_counter : params.perfcounters) + { + aqlprofile_att_parameter_t param{}; + param.parameter_name = HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_PERFCOUNTER_NAME; + param.counter_id = perf_counter.first; + param.simd_mask = perf_counter.second; + aql_params.push_back(param); + } + + aqlprofile_att_parameter_t param{}; + param.parameter_name = HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_PERFCOUNTER_CTRL; + param.value = perf_ctrl - 1; + aql_params.push_back(param); + } } std::unique_ptr -ThreadTraceAQLPacketFactory::construct_packet() +ThreadTraceAQLPacketFactory::construct_control_packet() { - uint32_t num_params = static_cast(aql_params.size()); - auto profile = aqlprofile_att_profile_t{tracepool.gpu_agent, aql_params.data(), num_params}; - auto packet = std::make_unique(this->tracepool, profile); + auto num_params = static_cast(aql_params.size()); + auto profile = aqlprofile_att_profile_t{tracepool.gpu_agent, aql_params.data(), num_params}; + auto packet = std::make_unique(this->tracepool, profile); packet->clear(); return packet; } @@ -246,10 +195,10 @@ ThreadTraceAQLPacketFactory::construct_unload_marker_packet(uint64_t id) return std::make_unique(tracepool, id, 0, 0, false, true); } -std::vector +std::vector CounterPacketConstruct::get_all_events() const { - std::vector ret; + std::vector ret; for(const auto& metric : _metrics) { ret.insert(ret.end(), metric.instances.begin(), metric.instances.end()); @@ -258,11 +207,9 @@ CounterPacketConstruct::get_all_events() const } const counters::Metric* -CounterPacketConstruct::event_to_metric(const hsa_ven_amd_aqlprofile_event_t& event) const +CounterPacketConstruct::event_to_metric(const aqlprofile_pmc_event_t& event) const { - if(const auto* ptr = rocprofiler::common::get_val( - _event_to_metric, - std::make_tuple(event.block_name, event.block_index, event.counter_id))) + if(const auto* ptr = rocprofiler::common::get_val(_event_to_metric, event)) { return ptr; } @@ -282,7 +229,7 @@ CounterPacketConstruct::get_counter_events(const counters::Metric& metric) const throw std::runtime_error(fmt::format("Cannot Find Events for {}", metric)); } -void +rocprofiler_status_t CounterPacketConstruct::can_collect() { // Verify that the counters fit within harrdware limits @@ -307,13 +254,10 @@ CounterPacketConstruct::can_collect() { if(auto* max = CHECK_NOTNULL(common::get_val(max_allowed, block_name)); count > *max) { - throw std::runtime_error( - fmt::format("Block {} exceeds max number of hardware counters ({} > {})", - static_cast(block_name.first), - count, - *max)); + return ROCPROFILER_STATUS_ERROR_EXCEEDS_HW_LIMIT; } } + return ROCPROFILER_STATUS_SUCCESS; } } // namespace aql } // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/aql/packet_construct.hpp b/source/lib/rocprofiler-sdk/aql/packet_construct.hpp index 058e0bb6..9a78cd29 100644 --- a/source/lib/rocprofiler-sdk/aql/packet_construct.hpp +++ b/source/lib/rocprofiler-sdk/aql/packet_construct.hpp @@ -36,6 +36,25 @@ #include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" #include "lib/rocprofiler-sdk/hsa/queue.hpp" #include "lib/rocprofiler-sdk/thread_trace/att_core.hpp" +#include "rocprofiler-sdk/fwd.h" + +inline bool +operator==(aqlprofile_pmc_event_t lhs, aqlprofile_pmc_event_t rhs) +{ + if(lhs.block_name != rhs.block_name) return false; + if(lhs.block_index != rhs.block_index) return false; + if(lhs.event_id != rhs.event_id) return false; + return lhs.flags.raw == rhs.flags.raw; +} + +inline bool +operator<(aqlprofile_pmc_event_t lhs, aqlprofile_pmc_event_t rhs) +{ + if(lhs.block_name != rhs.block_name) return lhs.block_name < rhs.block_name; + if(lhs.block_index != rhs.block_index) return lhs.block_index < rhs.block_index; + if(lhs.event_id != rhs.event_id) return lhs.event_id < rhs.event_id; + return lhs.flags.raw < rhs.flags.raw; +} namespace rocprofiler { @@ -54,14 +73,17 @@ class CounterPacketConstruct public: CounterPacketConstruct(rocprofiler_agent_id_t agent, const std::vector& metrics); - std::unique_ptr construct_packet(const AmdExtTable&); + std::unique_ptr construct_packet(const CoreApiTable&, + const AmdExtTable&); - const counters::Metric* event_to_metric(const hsa_ven_amd_aqlprofile_event_t& event) const; - std::vector get_all_events() const; - const std::vector& get_counter_events(const counters::Metric&) const; + const counters::Metric* event_to_metric(const aqlprofile_pmc_event_t& event) const; + std::vector get_all_events() const; + const std::vector& get_counter_events(const counters::Metric&) const; rocprofiler_agent_id_t agent() const { return _agent; } + rocprofiler_status_t can_collect(); + private: static constexpr size_t MEM_PAGE_ALIGN = 0x1000; static constexpr size_t MEM_PAGE_MASK = MEM_PAGE_ALIGN - 1; @@ -70,36 +92,38 @@ class CounterPacketConstruct protected: struct AQLProfileMetric { - counters::Metric metric; - std::vector instances; - std::vector events; + counters::Metric metric; + std::vector instances; + std::vector events; }; - void can_collect(); - - rocprofiler_agent_id_t _agent; - std::vector _metrics; - std::vector _events; - std::map, counters::Metric> - _event_to_metric; + rocprofiler_agent_id_t _agent; + std::vector _metrics; + std::vector _events; + std::map _event_to_metric; }; class ThreadTraceAQLPacketFactory { + using thread_trace_parameter_pack = thread_trace::thread_trace_parameter_pack; + public: ThreadTraceAQLPacketFactory(const hsa::AgentCache& agent, const thread_trace_parameter_pack& params, const CoreApiTable& coreapi, const AmdExtTable& ext); - std::unique_ptr construct_packet(); + + std::unique_ptr construct_control_packet(); std::unique_ptr construct_load_marker_packet(uint64_t id, uint64_t addr, uint64_t size); + std::unique_ptr construct_unload_marker_packet(uint64_t id); + std::vector aql_params; + private: - hsa::TraceMemoryPool tracepool; - std::vector aql_params; + hsa::TraceMemoryPool tracepool; }; } // namespace aql diff --git a/source/lib/rocprofiler-sdk/aql/tests/CMakeLists.txt b/source/lib/rocprofiler-sdk/aql/tests/CMakeLists.txt index 5907cb77..99cfd326 100644 --- a/source/lib/rocprofiler-sdk/aql/tests/CMakeLists.txt +++ b/source/lib/rocprofiler-sdk/aql/tests/CMakeLists.txt @@ -10,9 +10,14 @@ target_sources(aql-test PRIVATE ${ROCPROFILER_LIB_AQL_TEST_SOURCES}) target_link_libraries( aql-test - PRIVATE rocprofiler-sdk::rocprofiler-static-library rocprofiler-sdk::rocprofiler-glog - rocprofiler-sdk::rocprofiler-hsa-runtime rocprofiler-sdk::rocprofiler-hip - rocprofiler-sdk::rocprofiler-common-library GTest::gtest GTest::gtest_main) + PRIVATE rocprofiler-sdk::counter-test-constants + rocprofiler-sdk::rocprofiler-static-library + rocprofiler-sdk::rocprofiler-glog + rocprofiler-sdk::rocprofiler-hsa-runtime + rocprofiler-sdk::rocprofiler-hip + rocprofiler-sdk::rocprofiler-common-library + GTest::gtest + GTest::gtest_main) gtest_add_tests( TARGET aql-test diff --git a/source/lib/rocprofiler-sdk/aql/tests/aql_test.cpp b/source/lib/rocprofiler-sdk/aql/tests/aql_test.cpp index 3607cfa6..f633da2c 100644 --- a/source/lib/rocprofiler-sdk/aql/tests/aql_test.cpp +++ b/source/lib/rocprofiler-sdk/aql/tests/aql_test.cpp @@ -20,23 +20,22 @@ // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE // SOFTWARE. -#include +#include "lib/rocprofiler-sdk/agent.hpp" +#include "lib/rocprofiler-sdk/aql/packet_construct.hpp" +#include "lib/rocprofiler-sdk/counters/metrics.hpp" +#include "lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp" +#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" +#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" +#include "rocprofiler-sdk/fwd.h" -#include -#include #include +#include #include #include #include -#include "lib/rocprofiler-sdk/agent.hpp" -#include "lib/rocprofiler-sdk/aql/helpers.hpp" -#include "lib/rocprofiler-sdk/aql/packet_construct.hpp" -#include "lib/rocprofiler-sdk/counters/metrics.hpp" -#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" -#include "lib/rocprofiler-sdk/hsa/queue.hpp" -#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" +using namespace rocprofiler::counters::test_constants; namespace rocprofiler { @@ -51,6 +50,7 @@ get_ext_table() val.hsa_amd_memory_pool_free_fn = hsa_amd_memory_pool_free; val.hsa_amd_agent_memory_pool_get_info_fn = hsa_amd_agent_memory_pool_get_info; val.hsa_amd_agents_allow_access_fn = hsa_amd_agents_allow_access; + val.hsa_amd_memory_fill_fn = hsa_amd_memory_fill; return val; }(); return _v; @@ -137,18 +137,8 @@ TEST(aql_profile, too_many_counters) ROCP_INFO << fmt::format("Found Agent: {}", agent.get_hsa_agent().handle); auto metrics = rocprofiler::findDeviceMetrics(agent, {}); - EXPECT_THROW( - { - try - { - CounterPacketConstruct(agent.get_rocp_agent()->id, metrics); - } catch(const std::exception& e) - { - EXPECT_NE(e.what(), nullptr) << e.what(); - throw; - } - }, - std::runtime_error); + EXPECT_NE(CounterPacketConstruct(agent.get_rocp_agent()->id, metrics).can_collect(), + ROCPROFILER_STATUS_SUCCESS); } hsa_shut_down(); } @@ -164,7 +154,9 @@ TEST(aql_profile, packet_generation_single) { auto metrics = rocprofiler::findDeviceMetrics(agent, {"SQ_WAVES"}); CounterPacketConstruct pkt(agent.get_rocp_agent()->id, metrics); - auto test_pkt = pkt.construct_packet(rocprofiler::get_ext_table()); + auto test_pkt = + pkt.construct_packet(rocprofiler::get_api_table(), rocprofiler::get_ext_table()); + EXPECT_TRUE(test_pkt); } @@ -183,13 +175,15 @@ TEST(aql_profile, packet_generation_multi) auto metrics = rocprofiler::findDeviceMetrics(agent, {"SQ_WAVES", "TA_FLAT_READ_WAVEFRONTS"}); CounterPacketConstruct pkt(agent.get_rocp_agent()->id, metrics); - auto test_pkt = pkt.construct_packet(rocprofiler::get_ext_table()); + auto test_pkt = + pkt.construct_packet(rocprofiler::get_api_table(), rocprofiler::get_ext_table()); EXPECT_TRUE(test_pkt); } hsa_shut_down(); } +/* class TestAqlPacket : public rocprofiler::hsa::CounterAQLPacket { public: @@ -225,3 +219,4 @@ TEST(aql_profile, test_aql_packet) // Why is this valid? TestAqlPacket test_pkt2(false); } +*/ diff --git a/source/lib/rocprofiler-sdk/aql/tests/helpers.cpp b/source/lib/rocprofiler-sdk/aql/tests/helpers.cpp index 325912fc..1302457d 100644 --- a/source/lib/rocprofiler-sdk/aql/tests/helpers.cpp +++ b/source/lib/rocprofiler-sdk/aql/tests/helpers.cpp @@ -20,9 +20,14 @@ // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE // SOFTWARE. -#include +#include "lib/rocprofiler-sdk/aql/helpers.hpp" +#include "lib/rocprofiler-sdk/agent.hpp" +#include "lib/rocprofiler-sdk/counters/metrics.hpp" +#include "lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp" +#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" +#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" -#include +#include #include #include @@ -30,50 +35,11 @@ #include #include -#include "lib/rocprofiler-sdk/agent.hpp" -#include "lib/rocprofiler-sdk/aql/helpers.hpp" -#include "lib/rocprofiler-sdk/aql/packet_construct.hpp" -#include "lib/rocprofiler-sdk/counters/id_decode.hpp" -#include "lib/rocprofiler-sdk/counters/metrics.hpp" -#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" -#include "lib/rocprofiler-sdk/hsa/queue.hpp" -#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" - using namespace rocprofiler; +using namespace rocprofiler::counters::test_constants; namespace { -AmdExtTable& -get_ext_table() -{ - static auto _v = []() { - auto val = AmdExtTable{}; - val.hsa_amd_memory_pool_get_info_fn = hsa_amd_memory_pool_get_info; - val.hsa_amd_agent_iterate_memory_pools_fn = hsa_amd_agent_iterate_memory_pools; - val.hsa_amd_memory_pool_allocate_fn = hsa_amd_memory_pool_allocate; - val.hsa_amd_memory_pool_free_fn = hsa_amd_memory_pool_free; - val.hsa_amd_agent_memory_pool_get_info_fn = hsa_amd_agent_memory_pool_get_info; - val.hsa_amd_agents_allow_access_fn = hsa_amd_agents_allow_access; - return val; - }(); - return _v; -} - -CoreApiTable& -get_api_table() -{ - static auto _v = []() { - auto val = CoreApiTable{}; - val.hsa_iterate_agents_fn = hsa_iterate_agents; - val.hsa_agent_get_info_fn = hsa_agent_get_info; - val.hsa_queue_create_fn = hsa_queue_create; - val.hsa_queue_destroy_fn = hsa_queue_destroy; - val.hsa_signal_wait_relaxed_fn = hsa_signal_wait_relaxed; - return val; - }(); - return _v; -} - auto findDeviceMetrics(const rocprofiler_agent_t& agent, const std::unordered_set& metrics) { diff --git a/source/lib/rocprofiler-sdk/context/context.hpp b/source/lib/rocprofiler-sdk/context/context.hpp index a18749b9..895b6be4 100644 --- a/source/lib/rocprofiler-sdk/context/context.hpp +++ b/source/lib/rocprofiler-sdk/context/context.hpp @@ -91,18 +91,7 @@ struct dispatch_counter_collection_service struct agent_counter_collection_service { - rocprofiler::counters::agent_callback_data callback_data; - // Signal to manage the startup of the context. Allows us to ensure that - // the AQL packet we inject with start_context() completes before returning - hsa_signal_t start_signal; - std::shared_ptr profile; - rocprofiler_buffer_id_t buffer; - rocprofiler_agent_id_t agent_id; - rocprofiler_agent_profile_callback_t cb; - void* user_data; - // A flag to state wether or not the counter set is currently enabled. This is primarily - // to protect against multithreaded calls to enable a context (and enabling already - // enabled counters). + std::vector agent_data; enum class state { @@ -138,8 +127,8 @@ struct context std::unique_ptr counter_collection = {}; std::unique_ptr agent_counter_collection = {}; std::unique_ptr pc_sampler = {}; - // TODO: Make a unique pointer instead - std::shared_ptr thread_trace = {}; + + std::unique_ptr thread_trace = {}; }; // set the client index needs to be called before allocate_context() diff --git a/source/lib/rocprofiler-sdk/counters.cpp b/source/lib/rocprofiler-sdk/counters.cpp index ff9e8572..5cf63bfd 100644 --- a/source/lib/rocprofiler-sdk/counters.cpp +++ b/source/lib/rocprofiler-sdk/counters.cpp @@ -26,6 +26,7 @@ #include #include "lib/common/container/small_vector.hpp" +#include "lib/common/logging.hpp" #include "lib/common/static_object.hpp" #include "lib/common/synchronized.hpp" #include "lib/rocprofiler-sdk/agent.hpp" @@ -74,6 +75,7 @@ rocprofiler_query_counter_info(rocprofiler_counter_id_t counter_id, return ROCPROFILER_STATUS_SUCCESS; } + ROCP_ERROR << fmt::format("Could not find counter with id = {}", counter_id.handle); return ROCPROFILER_STATUS_ERROR_COUNTER_NOT_FOUND; } diff --git a/source/lib/rocprofiler-sdk/counters/agent_profiling.cpp b/source/lib/rocprofiler-sdk/counters/agent_profiling.cpp index f6c33916..2a4c9a0c 100644 --- a/source/lib/rocprofiler-sdk/counters/agent_profiling.cpp +++ b/source/lib/rocprofiler-sdk/counters/agent_profiling.cpp @@ -21,17 +21,21 @@ // SOFTWARE. #include "lib/rocprofiler-sdk/counters/agent_profiling.hpp" -#include - #include "lib/common/logging.hpp" #include "lib/rocprofiler-sdk/buffer.hpp" #include "lib/rocprofiler-sdk/context/context.hpp" #include "lib/rocprofiler-sdk/counters/controller.hpp" #include "lib/rocprofiler-sdk/counters/core.hpp" #include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" +#include "lib/rocprofiler-sdk/hsa/hsa.hpp" #include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" #include "lib/rocprofiler-sdk/hsa/rocprofiler_packet.hpp" -#include "rocprofiler-sdk/fwd.h" + +#include + +#include +#include +#include namespace rocprofiler { @@ -45,13 +49,15 @@ hsa_inited() } uint64_t -submitPacket(const CoreApiTable& table, hsa_queue_t* queue, const void* packet) +submitPacket(hsa_queue_t* queue, const void* packet) { const uint32_t pkt_size = 0x40; // advance command queue - const uint64_t write_idx = table.hsa_queue_add_write_index_scacq_screl_fn(queue, 1); - while((write_idx - table.hsa_queue_load_read_index_relaxed_fn(queue)) >= queue->size) + const uint64_t write_idx = + hsa::get_core_table()->hsa_queue_add_write_index_scacq_screl_fn(queue, 1); + while((write_idx - hsa::get_core_table()->hsa_queue_load_read_index_relaxed_fn(queue)) >= + queue->size) { sched_yield(); } @@ -73,7 +79,7 @@ submitPacket(const CoreApiTable& table, hsa_queue_t* queue, const void* packet) header_atomic_ptr->store(slot_data[0], std::memory_order_release); // ringdoor bell - table.hsa_signal_store_relaxed_fn(queue->doorbell_signal, write_idx); + hsa::get_core_table()->hsa_signal_store_relaxed_fn(queue->doorbell_signal, write_idx); return write_idx; } @@ -91,43 +97,44 @@ header_pkt(hsa_packet_type_t type) } std::unique_ptr -construct_aql_pkt(const hsa::AgentCache& agent, std::shared_ptr& profile) +construct_aql_pkt(std::shared_ptr& profile) { - if(counter_callback_info::setup_profile_config(agent, profile) != ROCPROFILER_STATUS_SUCCESS) + if(counter_callback_info::setup_profile_config(profile) != ROCPROFILER_STATUS_SUCCESS) { return nullptr; } auto pkts = profile->pkt_generator->construct_packet( + CHECK_NOTNULL(hsa::get_queue_controller())->get_core_table(), CHECK_NOTNULL(hsa::get_queue_controller())->get_ext_table()); - pkts->start.header = header_pkt(HSA_PACKET_TYPE_VENDOR_SPECIFIC); - pkts->start.completion_signal.handle = 0; - pkts->stop.header = header_pkt(HSA_PACKET_TYPE_VENDOR_SPECIFIC); - pkts->read.header = header_pkt(HSA_PACKET_TYPE_VENDOR_SPECIFIC); + pkts->packets.start_packet.header = header_pkt(HSA_PACKET_TYPE_VENDOR_SPECIFIC); + pkts->packets.stop_packet.header = header_pkt(HSA_PACKET_TYPE_VENDOR_SPECIFIC); + pkts->packets.read_packet.header = header_pkt(HSA_PACKET_TYPE_VENDOR_SPECIFIC); + + pkts->packets.start_packet.completion_signal.handle = 0; return pkts; } bool agent_async_handler(hsa_signal_value_t /*signal_v*/, void* data) { - const auto* ctx = context::get_registered_context({.handle = (uint64_t) data}); - if(!ctx) return false; + if(!data) return false; + const auto& callback_data = *static_cast(data); - const auto& agent_ctx = *ctx->agent_counter_collection; - const auto& prof_config = agent_ctx.profile; + const auto& prof_config = callback_data.profile; // Decode the AQL packet data auto decoded_pkt = - EvaluateAST::read_pkt(prof_config->pkt_generator.get(), *agent_ctx.callback_data.packet); + EvaluateAST::read_pkt(prof_config->pkt_generator.get(), *callback_data.packet); EvaluateAST::read_special_counters( *prof_config->agent, prof_config->required_special_counters, decoded_pkt); - auto* buf = buffer::get_buffer(agent_ctx.buffer.handle); + auto* buf = buffer::get_buffer(callback_data.buffer.handle); if(!buf) { ROCP_FATAL << fmt::format("Buffer {} destroyed before record was written", - agent_ctx.buffer.handle); + callback_data.buffer.handle); return false; } @@ -139,119 +146,123 @@ agent_async_handler(hsa_signal_value_t /*signal_v*/, void* data) ast.set_out_id(*ret); for(auto& val : *ret) { - val.user_data = agent_ctx.callback_data.user_data; + val.user_data = callback_data.user_data; buf->emplace( ROCPROFILER_BUFFER_CATEGORY_COUNTERS, ROCPROFILER_COUNTER_RECORD_VALUE, val); } } // reset the signal to allow another sample to start - agent_ctx.callback_data.table.hsa_signal_store_relaxed_fn(agent_ctx.callback_data.completion, - 1); + hsa::get_core_table()->hsa_signal_store_relaxed_fn(callback_data.completion, 1); return true; } +/** + * Setup the agent for handling profiling. This includes setting up the AQL packet, + * setting up the async handler, and (if this is the first time profiling) setting + * the profiling register on the queue. This function should only be called when + * the context is in the LOCKED status. + */ void -init_callback_data(const rocprofiler::context::context& ctx, const hsa::AgentCache& agent) +init_callback_data(rocprofiler::counters::agent_callback_data& callback_data, + const hsa::AgentCache& agent) { - // Note: Calls to this function should be protected by agent_ctx.status being set - // to LOCKED by the caller. This is to prevent multiple threads from trying to - // setup the same agent at the same time. - auto& agent_ctx = *ctx.agent_counter_collection; - if(agent_ctx.callback_data.packet) return; + // we have already setup this ctx + if(callback_data.packet) return; - agent_ctx.callback_data.packet = construct_aql_pkt(agent, agent_ctx.profile); + callback_data.packet = construct_aql_pkt(callback_data.profile); + callback_data.queue = agent.profile_queue(); - if(agent_ctx.callback_data.completion.handle != 0) return; + if(callback_data.completion.handle != 0) return; - // If we do not have a completion handle, this is our first time profiling this agent. - // Setup our shared data structures. - agent_ctx.callback_data.queue = agent.profile_queue(); - - agent_ctx.callback_data.table = CHECK_NOTNULL(hsa::get_queue_controller())->get_core_table(); + CHECK(hsa::get_core_table() != nullptr); + CHECK(hsa::get_amd_ext_table() != nullptr); + CHECK(hsa::get_core_table()->hsa_signal_create_fn != nullptr); + CHECK(hsa::get_core_table()->hsa_signal_wait_relaxed_fn != nullptr); + CHECK(hsa::get_core_table()->hsa_signal_store_relaxed_fn != nullptr); + CHECK(hsa::get_amd_ext_table()->hsa_amd_signal_async_handler_fn != nullptr); // Tri-state signal // 1: allow next sample to start // 0: sample in progress // -1: sample complete - CHECK_EQ(agent_ctx.callback_data.table.hsa_signal_create_fn( - 1, 0, nullptr, &agent_ctx.callback_data.completion), + CHECK_EQ(hsa::get_core_table()->hsa_signal_create_fn(1, 0, nullptr, &callback_data.completion), HSA_STATUS_SUCCESS); // Signal to manage the startup of the context. Allows us to ensure that // the AQL packet we inject with start_context() completes before returning CHECK_EQ( - agent_ctx.callback_data.table.hsa_signal_create_fn(1, 0, nullptr, &agent_ctx.start_signal), + hsa::get_core_table()->hsa_signal_create_fn(1, 0, nullptr, &callback_data.start_signal), HSA_STATUS_SUCCESS); // Setup callback // NOLINTBEGIN(performance-no-int-to-ptr) - CHECK_EQ(CHECK_NOTNULL(hsa::get_queue_controller()) - ->get_ext_table() - .hsa_amd_signal_async_handler_fn(agent_ctx.callback_data.completion, - HSA_SIGNAL_CONDITION_LT, - 0, - agent_async_handler, - (void*) ctx.context_idx), + CHECK_EQ(hsa::get_amd_ext_table()->hsa_amd_signal_async_handler_fn(callback_data.completion, + HSA_SIGNAL_CONDITION_LT, + 0, + agent_async_handler, + &callback_data), HSA_STATUS_SUCCESS); // NOLINTEND(performance-no-int-to-ptr) + // If we do not have a completion handle, this is our first time profiling this agent. + // Setup our shared data structures. + static std::unordered_set queues_init; + if(queues_init.find(callback_data.queue) != queues_init.end()) return; + queues_init.insert(callback_data.queue); + // Set state of the queue to allow profiling (may not be needed since AQL // may do this in the future). + CHECK(agent.cpu_pool().handle != 0); + CHECK(agent.get_hsa_agent().handle != 0); + aql::set_profiler_active_on_queue( - CHECK_NOTNULL(hsa::get_queue_controller())->get_ext_table(), - agent.cpu_pool(), - agent.get_hsa_agent(), - [&](hsa::rocprofiler_packet pkt) { - pkt.ext_amd_aql_pm4.completion_signal = agent_ctx.callback_data.completion; - submitPacket( - agent_ctx.callback_data.table, agent_ctx.callback_data.queue, (void*) &pkt); - if(agent_ctx.callback_data.table.hsa_signal_wait_relaxed_fn( - agent_ctx.callback_data.completion, - HSA_SIGNAL_CONDITION_EQ, - 0, - 20000000, - HSA_WAIT_STATE_ACTIVE) != 0) + agent.cpu_pool(), agent.get_hsa_agent(), [&](hsa::rocprofiler_packet pkt) { + pkt.ext_amd_aql_pm4.completion_signal = callback_data.completion; + submitPacket(callback_data.queue, (void*) &pkt); + constexpr auto timeout_hint = + std::chrono::duration_cast(std::chrono::seconds{1}); + if(hsa::get_core_table()->hsa_signal_wait_relaxed_fn(callback_data.completion, + HSA_SIGNAL_CONDITION_EQ, + 0, + timeout_hint.count(), + HSA_WAIT_STATE_ACTIVE) != 0) { ROCP_FATAL << "Could not set agent to be profiled"; } - agent_ctx.callback_data.table.hsa_signal_store_relaxed_fn( - agent_ctx.callback_data.completion, 1); + hsa::get_core_table()->hsa_signal_store_relaxed_fn(callback_data.completion, 1); }); } } // namespace +/** + * Read the previously started profiling registers for each agent. Injects both the read packet + * and the stop packet (a sidestep to the AQL issues) into the queue and optionally waits for the + * return. A small note here is that this function should avoid allocations to be signal safe. + * + * Special Case: If the counters the user requests are purely constants, skip packet injection + * and trigger the async handler manually. + */ rocprofiler_status_t read_agent_ctx(const context::context* ctx, rocprofiler_user_data_t user_data, rocprofiler_counter_flag_t flags) { - if(!ctx->agent_counter_collection || !ctx->agent_counter_collection->profile) + rocprofiler_status_t status = ROCPROFILER_STATUS_SUCCESS; + if(!ctx->agent_counter_collection) { - if(!ctx->agent_counter_collection) - { - ROCP_ERROR << fmt::format("Context {} has no agent counter collection", - ctx->context_idx); - } - else - { - ROCP_ERROR << fmt::format("Context {} has no profile", ctx->context_idx); - } + ROCP_ERROR << fmt::format("Context {} has no agent counter collection", ctx->context_idx); return ROCPROFILER_STATUS_ERROR_CONTEXT_INVALID; } auto& agent_ctx = *ctx->agent_counter_collection; + // If we have not initiualized HSA yet, nothing to read, return; if(hsa_inited().load() == false) { return ROCPROFILER_STATUS_ERROR; } - const auto* agent = agent::get_agent_cache(agent_ctx.profile->agent); - - // If the agent no longer exists or we don't have a profile queue, reading is an error - if(!agent || !agent->profile_queue()) return ROCPROFILER_STATUS_ERROR; - // Set the state to LOCKED to prevent other calls to start/stop/read. auto expected = rocprofiler::context::agent_counter_collection_service::state::ENABLED; if(!agent_ctx.status.compare_exchange_strong( @@ -260,53 +271,88 @@ read_agent_ctx(const context::context* ctx, return ROCPROFILER_STATUS_ERROR_CONTEXT_ERROR; } - CHECK(agent_ctx.callback_data.packet); - - ROCP_TRACE << fmt::format("Agent Infor for Running Counter: Name = {}, XCC = {}, " - "SE = {}, CU = {}, SIMD = {}", - agent->get_rocp_agent()->name, - agent->get_rocp_agent()->num_xcc, - agent->get_rocp_agent()->num_shader_banks, - agent->get_rocp_agent()->cu_count, - agent->get_rocp_agent()->simd_arrays_per_engine); - - // Remove when AQL is updated to not require stop to be called first - submitPacket(agent_ctx.callback_data.table, - agent->profile_queue(), - (void*) &agent_ctx.callback_data.packet->stop); - - // Submit the read packet to the queue - submitPacket(agent_ctx.callback_data.table, - agent->profile_queue(), - (void*) &agent_ctx.callback_data.packet->read); - - // Submit a barrier packet. This is needed to flush hardware caches. Without this - // the read packet may not have the correct data. - rocprofiler::hsa::rocprofiler_packet barrier{}; - barrier.barrier_and.header = header_pkt(HSA_PACKET_TYPE_BARRIER_AND); - barrier.barrier_and.completion_signal = agent_ctx.callback_data.completion; - agent_ctx.callback_data.table.hsa_signal_store_relaxed_fn(agent_ctx.callback_data.completion, - 0); - agent_ctx.callback_data.user_data = user_data; - submitPacket( - agent_ctx.callback_data.table, agent->profile_queue(), (void*) &barrier.barrier_and); - - // Wait for the barrier/read packet to complete - if(flags != ROCPROFILER_COUNTER_FLAG_ASYNC) + for(auto& callback_data : agent_ctx.agent_data) { - // Wait for any inprogress samples to complete before returning - agent_ctx.callback_data.table.hsa_signal_wait_relaxed_fn(agent_ctx.callback_data.completion, - HSA_SIGNAL_CONDITION_EQ, - 1, - UINT64_MAX, - HSA_WAIT_STATE_ACTIVE); + const auto* agent = agent::get_agent_cache(callback_data.profile->agent); + + // If the agent no longer exists or we don't have a profile queue, reading is an error + if(!agent || !agent->profile_queue()) + { + status = ROCPROFILER_STATUS_ERROR; + break; + } + + // No AQL packet, nothing to do here. + if(!callback_data.packet) continue; + + // If we have no hardware counters but a packet. The caller is expecting + // non-hardware based counter values to be returned. We can skip packet injection + // and trigger the async handler directly + if(callback_data.profile->reqired_hw_counters.empty()) + { + callback_data.user_data = user_data; + hsa::get_core_table()->hsa_signal_store_relaxed_fn(callback_data.completion, -1); + // Wait for the barrier/read packet to complete + if(flags != ROCPROFILER_COUNTER_FLAG_ASYNC) + { + // Wait for any inprogress samples to complete before returning + hsa::get_core_table()->hsa_signal_wait_relaxed_fn(callback_data.completion, + HSA_SIGNAL_CONDITION_EQ, + 1, + UINT64_MAX, + HSA_WAIT_STATE_ACTIVE); + } + continue; + } + + ROCP_TRACE << fmt::format("Agent Info for Running Counter: Name = {}, XCC = {}, " + "SE = {}, CU = {}, SIMD = {}", + agent->get_rocp_agent()->name, + agent->get_rocp_agent()->num_xcc, + agent->get_rocp_agent()->num_shader_banks, + agent->get_rocp_agent()->cu_count, + agent->get_rocp_agent()->simd_arrays_per_engine); + + // Submit the read packet to the queue + submitPacket(agent->profile_queue(), &callback_data.packet->packets.read_packet); + + // Submit a barrier packet. This is needed to flush hardware caches. Without this + // the read packet may not have the correct data. + rocprofiler::hsa::rocprofiler_packet barrier{}; + barrier.barrier_and.header = header_pkt(HSA_PACKET_TYPE_BARRIER_AND); + barrier.barrier_and.completion_signal = callback_data.completion; + hsa::get_core_table()->hsa_signal_store_relaxed_fn(callback_data.completion, 0); + callback_data.user_data = user_data; + submitPacket(agent->profile_queue(), &barrier.barrier_and); + + // Wait for the barrier/read packet to complete + if(flags != ROCPROFILER_COUNTER_FLAG_ASYNC) + { + // Wait for any inprogress samples to complete before returning + hsa::get_core_table()->hsa_signal_wait_relaxed_fn(callback_data.completion, + HSA_SIGNAL_CONDITION_EQ, + 1, + UINT64_MAX, + HSA_WAIT_STATE_ACTIVE); + } } agent_ctx.status.exchange( rocprofiler::context::agent_counter_collection_service::state::ENABLED); - return ROCPROFILER_STATUS_SUCCESS; + return status; } +/** + * Start the agent profiling for the context. For each agent that this context is + * enabled for, we will call the tool to get the profile config. This config will + * will then be used to generate the AQL packet (if it differs from the previous + * profile used). init_callback_data does this initialization. If a tool does not + * supply a profile, we skip this agent. We then submit the start packet to the + * profile queue. This call is synchronous. + * + * Special Case: if constants are the only counters being collected, we skip + * packet injection. + */ rocprofiler_status_t start_agent_ctx(const context::context* ctx) { @@ -323,17 +369,6 @@ start_agent_ctx(const context::context* ctx) return ROCPROFILER_STATUS_SUCCESS; } - const auto* agent = agent::get_agent_cache(agent::get_agent(agent_ctx.agent_id)); - // Note: we may not have an AgentCache yet if HSA is not started. - // This is not an error and the startup will happen on hsa registration. - if(!agent) return ROCPROFILER_STATUS_ERROR; - - // But if we have an agent cache, we need a profile queue. - if(!agent->profile_queue()) - { - return ROCPROFILER_STATUS_ERROR_NO_PROFILE_QUEUE; - } - // Set the state to LOCKED to prevent other calls to start/stop/read. auto expected = rocprofiler::context::agent_counter_collection_service::state::DISABLED; if(!agent_ctx.status.compare_exchange_strong( @@ -342,94 +377,120 @@ start_agent_ctx(const context::context* ctx) return ROCPROFILER_STATUS_ERROR_SERVICE_ALREADY_CONFIGURED; } - // Ask the tool what profile we should use for this agent - agent_ctx.cb( - {.handle = ctx->context_idx}, - agent_ctx.agent_id, - [](rocprofiler_context_id_t context_id, - rocprofiler_profile_config_id_t config_id) -> rocprofiler_status_t { - auto* cb_ctx = rocprofiler::context::get_mutable_registered_context(context_id); - if(!cb_ctx) return ROCPROFILER_STATUS_ERROR_CONTEXT_INVALID; + for(auto& callback_data : agent_ctx.agent_data) + { + const auto* agent = agent::get_agent_cache(agent::get_agent(callback_data.agent_id)); - auto config = rocprofiler::counters::get_profile_config(config_id); - if(!config) return ROCPROFILER_STATUS_ERROR_PROFILE_NOT_FOUND; + if(!agent) + { + ROCP_ERROR << "No agent found for context: " << ctx->context_idx; + status = ROCPROFILER_STATUS_ERROR; + break; + } - if(!cb_ctx->agent_counter_collection) - { - return ROCPROFILER_STATUS_ERROR_CONTEXT_INVALID; - } + // But if we have an agent cache, we need a profile queue. + if(!agent->profile_queue()) + { + ROCP_ERROR << "No profile queue found for context: " << ctx->context_idx; + status = ROCPROFILER_STATUS_ERROR_NO_PROFILE_QUEUE; + break; + } - // Only allow profiles to be set in the locked state - if(cb_ctx->agent_counter_collection->status.load() != - rocprofiler::context::agent_counter_collection_service::state::LOCKED) - { - return ROCPROFILER_STATUS_ERROR_CONFIGURATION_LOCKED; - } + // Ask the tool what profile we should use for this agent + callback_data.cb( + {.handle = ctx->context_idx}, + callback_data.agent_id, + [](rocprofiler_context_id_t context_id, + rocprofiler_profile_config_id_t config_id) -> rocprofiler_status_t { + auto* cb_ctx = rocprofiler::context::get_mutable_registered_context(context_id); + if(!cb_ctx) return ROCPROFILER_STATUS_ERROR_CONTEXT_INVALID; - // Only update the profile if it has changed. Avoids packet regeneration. - if(!cb_ctx->agent_counter_collection->profile || - cb_ctx->agent_counter_collection->profile->id.handle != config_id.handle) - { - if(cb_ctx->agent_counter_collection->agent_id.handle != config->agent->id.handle) + auto config = rocprofiler::counters::get_profile_config(config_id); + if(!config) return ROCPROFILER_STATUS_ERROR_PROFILE_NOT_FOUND; + + if(!cb_ctx->agent_counter_collection) { - return ROCPROFILER_STATUS_ERROR_AGENT_MISMATCH; + return ROCPROFILER_STATUS_ERROR_CONTEXT_INVALID; } - cb_ctx->agent_counter_collection->profile = config; - cb_ctx->agent_counter_collection->callback_data.packet.reset(); - } - return ROCPROFILER_STATUS_SUCCESS; - }, - agent_ctx.user_data); + // Only allow profiles to be set in the locked state + if(cb_ctx->agent_counter_collection->status.load() != + rocprofiler::context::agent_counter_collection_service::state::LOCKED) + { + return ROCPROFILER_STATUS_ERROR_CONFIGURATION_LOCKED; + } - // User didn't set a profile - if(!agent_ctx.profile) - { - agent_ctx.status.exchange( - rocprofiler::context::agent_counter_collection_service::state::DISABLED); - return status; - } + for(auto& agent_data : cb_ctx->agent_counter_collection->agent_data) + { + // Find the agent that this profile is for and set it. + if(agent_data.agent_id.handle == config->agent->id.handle) + { + // If the profile config has changed, reset the packet + // and swap the profile. + if(agent_data.profile != config) + { + agent_data.profile = config; + agent_data.packet.reset(); + } + // A flag to state that we set a profile + agent_data.set_profile = true; + return ROCPROFILER_STATUS_SUCCESS; + } + } - // Generate necessary structures in the context (packet gen, etc) to process - // this packet. - init_callback_data(*ctx, *agent); + return ROCPROFILER_STATUS_ERROR_AGENT_MISMATCH; + }, + callback_data.callback_data.ptr); - // No hardware counters were actually asked for (i.e. all constants) - if(agent_ctx.profile->reqired_hw_counters.empty()) - { - agent_ctx.status.exchange( - rocprofiler::context::agent_counter_collection_service::state::DISABLED); - return ROCPROFILER_STATUS_ERROR_NO_HARDWARE_COUNTERS; - } + // If we did not set a profile, we have nothing to do. + if(!callback_data.set_profile) + { + callback_data.packet.reset(); + continue; + } - // We could not generate AQL packets for some reason - if(!agent_ctx.callback_data.packet) - { - agent_ctx.status.exchange( - rocprofiler::context::agent_counter_collection_service::state::DISABLED); - return ROCPROFILER_STATUS_ERROR_AST_GENERATION_FAILED; - } + callback_data.set_profile = false; + CHECK(callback_data.profile); - agent_ctx.callback_data.packet->start.completion_signal = agent_ctx.start_signal; - agent_ctx.callback_data.table.hsa_signal_store_relaxed_fn(agent_ctx.start_signal, 1); - submitPacket(agent_ctx.callback_data.table, - agent->profile_queue(), - (void*) &agent_ctx.callback_data.packet->start); + // Generate necessary structures in the context (packet gen, etc) to process + // this packet. + init_callback_data(callback_data, *agent); - // Wait for startup to finish before continuing - agent_ctx.callback_data.table.hsa_signal_wait_relaxed_fn( - agent_ctx.start_signal, HSA_SIGNAL_CONDITION_EQ, 0, UINT64_MAX, HSA_WAIT_STATE_ACTIVE); + // No hardware counters were actually asked for (i.e. all constants) + if(callback_data.profile->reqired_hw_counters.empty()) + { + continue; + } + + callback_data.packet->packets.start_packet.completion_signal = callback_data.start_signal; + hsa::get_core_table()->hsa_signal_store_relaxed_fn(callback_data.start_signal, 1); + submitPacket(agent->profile_queue(), &callback_data.packet->packets.start_packet); + + // Wait for startup to finish before continuing + hsa::get_core_table()->hsa_signal_wait_relaxed_fn(callback_data.start_signal, + HSA_SIGNAL_CONDITION_EQ, + 0, + UINT64_MAX, + HSA_WAIT_STATE_ACTIVE); + } agent_ctx.status.exchange( rocprofiler::context::agent_counter_collection_service::state::ENABLED); - return ROCPROFILER_STATUS_SUCCESS; + return status; } +/** + * Issue the stop packet for all active agents in this context. This call is + * synchronous. + * + * Special Case: if no hardware counters are being collected, skip issuing the + * stop packet. + */ rocprofiler_status_t stop_agent_ctx(const context::context* ctx) { auto status = ROCPROFILER_STATUS_SUCCESS; - if(!ctx->agent_counter_collection || !ctx->agent_counter_collection->profile) + if(!ctx->agent_counter_collection) { return status; } @@ -441,9 +502,6 @@ stop_agent_ctx(const context::context* ctx) return ROCPROFILER_STATUS_SUCCESS; } - const auto* agent = agent::get_agent_cache(agent_ctx.profile->agent); - if(!agent || !agent->profile_queue()) return status; - auto expected = rocprofiler::context::agent_counter_collection_service::state::ENABLED; if(!agent_ctx.status.compare_exchange_strong( expected, rocprofiler::context::agent_counter_collection_service::state::LOCKED)) @@ -452,24 +510,34 @@ stop_agent_ctx(const context::context* ctx) return ROCPROFILER_STATUS_SUCCESS; } - CHECK(agent_ctx.callback_data.packet); + for(auto& callback_data : agent_ctx.agent_data) + { + if(!callback_data.packet) continue; + + const auto* agent = agent::get_agent_cache(callback_data.profile->agent); + if(!agent || !agent->profile_queue()) continue; - submitPacket(agent_ctx.callback_data.table, - agent->profile_queue(), - (void*) &agent_ctx.callback_data.packet->stop); + if(!callback_data.profile->reqired_hw_counters.empty()) + { + // Remove when AQL is updated to not require stop to be called first + submitPacket(agent->profile_queue(), &callback_data.packet->packets.stop_packet); + } - // Wait for any inprogress samples to complete before returning - agent_ctx.callback_data.table.hsa_signal_wait_relaxed_fn(agent_ctx.callback_data.completion, - HSA_SIGNAL_CONDITION_EQ, - 1, - UINT64_MAX, - HSA_WAIT_STATE_ACTIVE); + // Wait for the stop packet to complete + hsa::get_core_table()->hsa_signal_wait_relaxed_fn(callback_data.completion, + HSA_SIGNAL_CONDITION_EQ, + 1, + UINT64_MAX, + HSA_WAIT_STATE_ACTIVE); + } + agent_ctx.status.exchange( + rocprofiler::context::agent_counter_collection_service::state::DISABLED); return status; } // If we have ctx's that were started before HSA was initialized, we need to -// actually start those contexts now. +// actually start those contexts now that we have an HSA instance. rocprofiler_status_t agent_profile_hsa_registration() { @@ -486,7 +554,7 @@ agent_profile_hsa_registration() agent_callback_data::~agent_callback_data() { - if(completion.handle != 0) table.hsa_signal_destroy_fn(completion); + if(completion.handle != 0) hsa::get_core_table()->hsa_signal_destroy_fn(completion); } } // namespace counters -} // namespace rocprofiler \ No newline at end of file +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/counters/agent_profiling.hpp b/source/lib/rocprofiler-sdk/counters/agent_profiling.hpp index 603828a3..be2bec04 100644 --- a/source/lib/rocprofiler-sdk/counters/agent_profiling.hpp +++ b/source/lib/rocprofiler-sdk/counters/agent_profiling.hpp @@ -21,12 +21,13 @@ // SOFTWARE. #pragma once +#include "lib/rocprofiler-sdk/counters/controller.hpp" +#include "lib/rocprofiler-sdk/hsa/aql_packet.hpp" + #include #include #include -#include "lib/rocprofiler-sdk/hsa/aql_packet.hpp" - namespace rocprofiler { namespace context @@ -38,17 +39,40 @@ namespace counters { struct agent_callback_data { - CoreApiTable table; - hsa_queue_t* queue{nullptr}; - std::unique_ptr packet; + uint64_t context_idx = 0; + hsa_queue_t* queue = nullptr; + std::unique_ptr packet = {}; // Tri-state signal used to know what the current state of processing // a sample is. The states are: // 1: allow next sample to start (i.e. no in progress work) // 0: sample in progress // -1: sample complete (i.e. signal for caller that sample is ready) - hsa_signal_t completion{.handle = 0}; - rocprofiler_user_data_t user_data{.value = 0}; + hsa_signal_t completion = {.handle = 0}; + hsa_signal_t start_signal = {.handle = 0}; + rocprofiler_user_data_t user_data = {.value = 0}; + rocprofiler_user_data_t callback_data = {.value = 0}; + + std::shared_ptr profile = {}; + rocprofiler_agent_id_t agent_id = {.handle = 0}; + rocprofiler_agent_profile_callback_t cb = nullptr; + rocprofiler_buffer_id_t buffer = {.handle = 0}; + bool set_profile = false; + + agent_callback_data() = default; + agent_callback_data(agent_callback_data&& rhs) noexcept + : queue(rhs.queue) + , packet(std::move(rhs.packet)) + , completion(rhs.completion) + , start_signal(rhs.start_signal) + , user_data(rhs.user_data) + , callback_data(rhs.callback_data) + , profile(rhs.profile) + , agent_id(rhs.agent_id) + , cb(rhs.cb) + , buffer(rhs.buffer) + {} + ~agent_callback_data(); }; @@ -82,5 +106,8 @@ read_agent_ctx(const context::context* ctx, rocprofiler_user_data_t user_data, rocprofiler_counter_flag_t flags); +uint64_t +submitPacket(hsa_queue_t* queue, const void* packet); + } // namespace counters -} // namespace rocprofiler \ No newline at end of file +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/counters/controller.cpp b/source/lib/rocprofiler-sdk/counters/controller.cpp index 3511483d..4ea09c33 100644 --- a/source/lib/rocprofiler-sdk/counters/controller.cpp +++ b/source/lib/rocprofiler-sdk/counters/controller.cpp @@ -76,6 +76,11 @@ CounterController::configure_agent_collection(rocprofiler_context_id_t auto& ctx = *ctx_p; if(ctx.counter_collection) return ROCPROFILER_STATUS_ERROR_AGENT_DISPATCH_CONFLICT; + + // FIXME: Due to the clock gating issue, counter collection and PC sampling service + // cannot coexist in the same context for now. + if(ctx.pc_sampler) return ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT; + if(!rocprofiler::buffer::get_buffer(buffer_id.handle)) { return ROCPROFILER_STATUS_ERROR_BUFFER_NOT_FOUND; @@ -87,10 +92,12 @@ CounterController::configure_agent_collection(rocprofiler_context_id_t std::make_unique(); } - ctx.agent_counter_collection->agent_id = agent_id; - ctx.agent_counter_collection->cb = cb; - ctx.agent_counter_collection->user_data = user_data; - ctx.agent_counter_collection->buffer = buffer_id; + ctx.agent_counter_collection->agent_data.emplace_back(); + ctx.agent_counter_collection->agent_data.back().callback_data = + rocprofiler_user_data_t{.ptr = user_data}; + ctx.agent_counter_collection->agent_data.back().agent_id = agent_id; + ctx.agent_counter_collection->agent_data.back().cb = cb; + ctx.agent_counter_collection->agent_data.back().buffer = buffer_id; return ROCPROFILER_STATUS_SUCCESS; } @@ -115,6 +122,10 @@ CounterController::configure_dispatch( if(ctx.agent_counter_collection) return ROCPROFILER_STATUS_ERROR_AGENT_DISPATCH_CONFLICT; + // FIXME: Due to the clock gating issue, counter collection and PC sampling service + // cannot coexist in the same context for now. + if(ctx.pc_sampler) return ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT; + if(!ctx.counter_collection) { ctx.counter_collection = @@ -153,10 +164,24 @@ get_controller() return controller; } -uint64_t -create_counter_profile(std::shared_ptr&& config) +rocprofiler_status_t +create_counter_profile(std::shared_ptr config) { - return get_controller().add_profile(std::move(config)); + auto status = ROCPROFILER_STATUS_SUCCESS; + if(status = counters::counter_callback_info::setup_profile_config(config); + status != ROCPROFILER_STATUS_SUCCESS) + { + return status; + } + + if(status = config->pkt_generator->can_collect(); status != ROCPROFILER_STATUS_SUCCESS) + { + return status; + } + + get_controller().add_profile(std::move(config)); + + return status; } void @@ -177,4 +202,4 @@ get_profile_config(rocprofiler_profile_config_id_t id) } } } // namespace counters -} // namespace rocprofiler \ No newline at end of file +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/counters/controller.hpp b/source/lib/rocprofiler-sdk/counters/controller.hpp index 5083810a..488abaab 100644 --- a/source/lib/rocprofiler-sdk/counters/controller.hpp +++ b/source/lib/rocprofiler-sdk/counters/controller.hpp @@ -101,8 +101,8 @@ class CounterController CounterController& get_controller(); -uint64_t -create_counter_profile(std::shared_ptr&& config); +rocprofiler_status_t +create_counter_profile(std::shared_ptr config); void destroy_counter_profile(uint64_t id); diff --git a/source/lib/rocprofiler-sdk/counters/core.cpp b/source/lib/rocprofiler-sdk/counters/core.cpp index 0150af79..fbf06c5d 100644 --- a/source/lib/rocprofiler-sdk/counters/core.cpp +++ b/source/lib/rocprofiler-sdk/counters/core.cpp @@ -38,8 +38,7 @@ namespace rocprofiler namespace counters { rocprofiler_status_t -counter_callback_info::setup_profile_config(const hsa::AgentCache& agent, - std::shared_ptr& profile) +counter_callback_info::setup_profile_config(std::shared_ptr& profile) { if(profile->pkt_generator || !profile->reqired_hw_counters.empty()) { @@ -104,7 +103,7 @@ counter_callback_info::setup_profile_config(const hsa::AgentCache& age } profile->pkt_generator = std::make_unique( - agent.get_rocp_agent()->id, + config.agent->id, std::vector{profile->reqired_hw_counters.begin(), profile->reqired_hw_counters.end()}); return ROCPROFILER_STATUS_SUCCESS; @@ -112,13 +111,12 @@ counter_callback_info::setup_profile_config(const hsa::AgentCache& age rocprofiler_status_t counter_callback_info::get_packet(std::unique_ptr& ret_pkt, - const hsa::AgentCache& agent, std::shared_ptr& profile) { rocprofiler_status_t status; // Check packet cache profile->packets.wlock([&](auto& pkt_vector) { - status = counter_callback_info::setup_profile_config(agent, profile); + status = counter_callback_info::setup_profile_config(profile); if(!pkt_vector.empty() && status == ROCPROFILER_STATUS_SUCCESS) { ret_pkt = std::move(pkt_vector.back()); @@ -131,11 +129,11 @@ counter_callback_info::get_packet(std::unique_ptr& { // If we do not have a packet in the cache, create one. ret_pkt = profile->pkt_generator->construct_packet( + CHECK_NOTNULL(hsa::get_queue_controller())->get_core_table(), CHECK_NOTNULL(hsa::get_queue_controller())->get_ext_table()); } - ret_pkt->before_krn_pkt.clear(); - ret_pkt->after_krn_pkt.clear(); + ret_pkt->clear(); packet_return_map.wlock([&](auto& data) { data.emplace(ret_pkt.get(), profile); }); return ROCPROFILER_STATUS_SUCCESS; diff --git a/source/lib/rocprofiler-sdk/counters/core.hpp b/source/lib/rocprofiler-sdk/counters/core.hpp index 6110c94c..04ead963 100644 --- a/source/lib/rocprofiler-sdk/counters/core.hpp +++ b/source/lib/rocprofiler-sdk/counters/core.hpp @@ -66,11 +66,9 @@ struct counter_callback_info std::unordered_map>> packet_return_map{}; - static rocprofiler_status_t setup_profile_config(const hsa::AgentCache&, - std::shared_ptr&); + static rocprofiler_status_t setup_profile_config(std::shared_ptr&); rocprofiler_status_t get_packet(std::unique_ptr&, - const hsa::AgentCache&, std::shared_ptr&); }; diff --git a/source/lib/rocprofiler-sdk/counters/dimensions.hpp b/source/lib/rocprofiler-sdk/counters/dimensions.hpp index ee88d032..6bbb996c 100644 --- a/source/lib/rocprofiler-sdk/counters/dimensions.hpp +++ b/source/lib/rocprofiler-sdk/counters/dimensions.hpp @@ -85,4 +85,4 @@ struct formatter return fmt::format_to(ctx.out(), "[{}, {}]", dims.name(), dims.size()); } }; -} // namespace fmt \ No newline at end of file +} // namespace fmt diff --git a/source/lib/rocprofiler-sdk/counters/dispatch_handlers.cpp b/source/lib/rocprofiler-sdk/counters/dispatch_handlers.cpp index 9e30fcfb..03ae4020 100644 --- a/source/lib/rocprofiler-sdk/counters/dispatch_handlers.cpp +++ b/source/lib/rocprofiler-sdk/counters/dispatch_handlers.cpp @@ -72,7 +72,7 @@ queue_cb(const context::context* ctx, // Packet generated when no instrumentation is performed. May contain serialization // packets/barrier packets (and can be empty). auto no_instrumentation = [&]() { - auto ret_pkt = std::make_unique(nullptr); + auto ret_pkt = std::make_unique(); // If we have a counter collection context but it is not enabled, we still might need // to add barrier packets to transition from serialized -> unserialized execution. This // transition is coordinated by the serializer. @@ -138,7 +138,7 @@ queue_cb(const context::context* ctx, CHECK(prof_config); std::unique_ptr ret_pkt; - auto status = info->get_packet(ret_pkt, queue.get_agent(), prof_config); + auto status = info->get_packet(ret_pkt, prof_config); CHECK_EQ(status, ROCPROFILER_STATUS_SUCCESS) << rocprofiler_get_status_string(status); maybe_add_serialization(ret_pkt); @@ -147,13 +147,10 @@ queue_cb(const context::context* ctx, return ret_pkt; } - ret_pkt->before_krn_pkt.push_back(ret_pkt->start); - ret_pkt->after_krn_pkt.push_back(ret_pkt->read); - ret_pkt->after_krn_pkt.push_back(ret_pkt->stop); + ret_pkt->populate_before(); + ret_pkt->populate_after(); for(auto& aql_pkt : ret_pkt->after_krn_pkt) - { aql_pkt.completion_signal.handle = 0; - } return ret_pkt; } @@ -280,4 +277,4 @@ completed_cb(const context::context* ctx, } } } // namespace counters -} // namespace rocprofiler \ No newline at end of file +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/counters/dispatch_handlers.hpp b/source/lib/rocprofiler-sdk/counters/dispatch_handlers.hpp index 894091dd..918ff98b 100644 --- a/source/lib/rocprofiler-sdk/counters/dispatch_handlers.hpp +++ b/source/lib/rocprofiler-sdk/counters/dispatch_handlers.hpp @@ -1,4 +1,3 @@ - // MIT License // // Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. @@ -54,4 +53,4 @@ queue_cb(const context::context* ctx, const context::correlation_id* correlation_id); } // namespace counters -} // namespace rocprofiler \ No newline at end of file +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp b/source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp index 66b6082a..cf68cb42 100644 --- a/source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp +++ b/source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp @@ -207,11 +207,15 @@ EvaluateAST::EvaluateAST(rocprofiler_counter_id_t out_id, , _reduce_dimension_set(ast.reduce_dimension_set) , _out_id(out_id) { - if(_type == NodeType::REFERENCE_NODE) + if(_type == NodeType::REFERENCE_NODE || _type == NodeType::ACCUMULATE_NODE) { try { _metric = metrics.at(std::get(ast.value)); + if(_type == NodeType::ACCUMULATE_NODE) + { + _metric.setflags(static_cast(ast.accumulate_op)); + } } catch(std::exception& e) { throw std::runtime_error( @@ -277,6 +281,7 @@ EvaluateAST::set_dimensions() _dimension_types = first.size() > second.size() ? first : second; } break; + case ACCUMULATE_NODE: case REFERENCE_NODE: { _dimension_types = get_dim_types(_metric); @@ -377,6 +382,11 @@ EvaluateAST::validate_raw_ast(const std::unordered_map& met // Dimensionindex values should be within limits for this metric and GPU. } break; + case ACCUMULATE_NODE: + { + // Future todo only to be applied on sq metric + } + break; } } catch(std::exception& e) { @@ -466,39 +476,36 @@ EvaluateAST::read_pkt(const aql::CounterPacketConstruct* pkt_gen, hsa::AQLPacket { std::unordered_map>* data; const aql::CounterPacketConstruct* pkt_gen; - hsa_agent_t agent; + aqlprofile_agent_handle_t agent; }; - auto agent = CHECK_NOTNULL(rocprofiler::agent::get_agent_cache( - CHECK_NOTNULL(rocprofiler::agent::get_agent(pkt_gen->agent())))) - ->get_hsa_agent(); + auto aql_agent = *CHECK_NOTNULL(rocprofiler::agent::get_aql_agent(pkt_gen->agent())); + std::unordered_map> ret; if(pkt.empty) return ret; - it_data aql_data{.data = &ret, .pkt_gen = pkt_gen, .agent = agent}; - ; - hsa_status_t status = hsa_ven_amd_aqlprofile_iterate_data( - &pkt.profile, - [](hsa_ven_amd_aqlprofile_info_type_t info_type, - hsa_ven_amd_aqlprofile_info_data_t* info_data, - void* data) { + it_data aql_data{.data = &ret, .pkt_gen = pkt_gen, .agent = aql_agent}; + + hsa_status_t status = aqlprofile_pmc_iterate_data( + pkt.handle, + [](aqlprofile_pmc_event_t event, uint64_t counter_id, uint64_t counter_value, void* data) { CHECK(data); - auto& it = *static_cast(data); - if(info_type != HSA_VEN_AMD_AQLPROFILE_INFO_PMC_DATA) return HSA_STATUS_SUCCESS; - const auto* metric = it.pkt_gen->event_to_metric(info_data->pmc_data.event); + auto& it = *static_cast(data); + const auto* metric = it.pkt_gen->event_to_metric(event); + if(!metric) return HSA_STATUS_SUCCESS; + auto& vec = it.data->emplace(metric->id(), std::vector{}) .first->second; auto& next_rec = vec.emplace_back(); set_counter_in_rec(next_rec.id, {.handle = metric->id()}); // Actual dimension info needs to be used here in the future - auto aql_status = aql::set_dim_id_from_sample( - next_rec.id, it.agent, info_data->pmc_data.event, info_data->sample_id); + auto aql_status = aql::set_dim_id_from_sample(next_rec.id, it.agent, event, counter_id); CHECK_EQ(aql_status, ROCPROFILER_STATUS_SUCCESS) << rocprofiler_get_status_string(aql_status); // set_dim_in_rec(next_rec.id, ROCPROFILER_DIMENSION_NONE, vec.size() - 1); // Note: in the near future we need to use hw_counter here instead - next_rec.counter_value = info_data->pmc_data.result; + next_rec.counter_value = counter_value; return HSA_STATUS_SUCCESS; }, &aql_data); @@ -522,6 +529,7 @@ EvaluateAST::expand_derived(std::unordered_map& asts) _expanded = true; for(auto& child : _children) { + if(child._type == NodeType::ACCUMULATE_NODE) continue; if(auto* ptr = rocprofiler::common::get_val(asts, child.metric().name())) { ptr->expand_derived(asts); @@ -629,6 +637,8 @@ EvaluateAST::evaluate( .dispatch_id = a.dispatch_id, .user_data = {.value = 0}}; }); + case ACCUMULATE_NODE: + // todo update how to read the hybrid metric case REFERENCE_NODE: { auto* result = rocprofiler::common::get_val(results_map, _metric.id()); diff --git a/source/lib/rocprofiler-sdk/counters/evaluate_ast.hpp b/source/lib/rocprofiler-sdk/counters/evaluate_ast.hpp index dcc6e1d6..7f9b2625 100644 --- a/source/lib/rocprofiler-sdk/counters/evaluate_ast.hpp +++ b/source/lib/rocprofiler-sdk/counters/evaluate_ast.hpp @@ -48,8 +48,7 @@ enum DimensionTypes DIMENSION_SHADER_ENGINE = 1 << 2, DIMENSION_AGENT = 1 << 3, DIMENSION_PMC_CHANNEL = 1 << 4, - DIMENSION_CU = 1 << 5, - DIMENSION_LAST = 1 << 6, + DIMENSION_LAST = 1 << 5, }; enum ReduceOperation diff --git a/source/lib/rocprofiler-sdk/counters/id_decode.cpp b/source/lib/rocprofiler-sdk/counters/id_decode.cpp index 2d4e96d2..b9a6457d 100644 --- a/source/lib/rocprofiler-sdk/counters/id_decode.cpp +++ b/source/lib/rocprofiler-sdk/counters/id_decode.cpp @@ -43,7 +43,7 @@ dimension_map() {ROCPROFILER_DIMENSION_SHADER_ENGINE, std::string_view("DIMENSION_SHADER_ENGINE")}, {ROCPROFILER_DIMENSION_AGENT, std::string_view("DIMENSION_AGENT")}, {ROCPROFILER_DIMENSION_SHADER_ARRAY, std::string_view("DIMENSION_SHADER_ARRAY")}, - {ROCPROFILER_DIMENSION_CU, std::string_view("DIMENSION_CU")}, + {ROCPROFILER_DIMENSION_WGP, std::string_view("DIMENSION_WGP")}, {ROCPROFILER_DIMENSION_INSTANCE, std::string_view("DIMENSION_INSTANCE")}, }); return *_v; @@ -67,7 +67,7 @@ aqlprofile_id_to_rocprof_instance() {"AID", ROCPROFILER_DIMENSION_AID}, {"SE", ROCPROFILER_DIMENSION_SHADER_ENGINE}, {"SA", ROCPROFILER_DIMENSION_SHADER_ARRAY}, - {"CU", ROCPROFILER_DIMENSION_CU}, + {"WGP", ROCPROFILER_DIMENSION_WGP}, {"INSTANCE", ROCPROFILER_DIMENSION_INSTANCE}, }; diff --git a/source/lib/rocprofiler-sdk/counters/id_decode.hpp b/source/lib/rocprofiler-sdk/counters/id_decode.hpp index c6dbb014..0c779f22 100644 --- a/source/lib/rocprofiler-sdk/counters/id_decode.hpp +++ b/source/lib/rocprofiler-sdk/counters/id_decode.hpp @@ -45,7 +45,7 @@ enum rocprofiler_profile_counter_instance_types ROCPROFILER_DIMENSION_SHADER_ENGINE, ///< SE dimension of result ROCPROFILER_DIMENSION_AGENT, ///< Agent dimension ROCPROFILER_DIMENSION_SHADER_ARRAY, ///< Number of shader arrays - ROCPROFILER_DIMENSION_CU, ///< Number of compute units + ROCPROFILER_DIMENSION_WGP, ///< Number of workgroup processors ROCPROFILER_DIMENSION_INSTANCE, ///< Number of instances ROCPROFILER_DIMENSION_LAST }; diff --git a/source/lib/rocprofiler-sdk/counters/metrics.cpp b/source/lib/rocprofiler-sdk/counters/metrics.cpp index 991e1957..e9b60dfe 100644 --- a/source/lib/rocprofiler-sdk/counters/metrics.cpp +++ b/source/lib/rocprofiler-sdk/counters/metrics.cpp @@ -101,7 +101,7 @@ loadXml(const std::string& filename, bool load_constants = false) * respec the XML (which we should...). */ if(gfx_name.find("metric") == std::string::npos || - gfx_name.find("top.") == std::string::npos) + gfx_name.find("top.") == std::string::npos || gfx_name.find("gfx") == std::string::npos) continue; auto& metricVec = @@ -195,6 +195,24 @@ getMetricIdMap() return id_map; } +std::unordered_map +getPerfCountersIdMap() +{ + std::unordered_map map; + + for(const auto& [agent, list] : *CHECK_NOTNULL(getMetricMap())) + { + if(agent.find("gfx9") == std::string::npos) continue; + for(const auto& metric : list) + { + if(metric.name().find("SQ_") == 0 && !metric.event().empty()) + map.emplace(metric.id(), std::stoi(metric.event())); + } + } + + return map; +} + const MetricMap* getMetricMap() { @@ -252,7 +270,7 @@ checkValidMetric(const std::string& agent, const Metric& metric) bool operator<(Metric const& lhs, Metric const& rhs) { - return lhs.id() < rhs.id(); + return std::tie(lhs.id_, lhs.flags_) < std::tie(rhs.id_, rhs.flags_); } bool @@ -266,7 +284,8 @@ operator==(Metric const& lhs, Metric const& rhs) x.expression_, x.special_, x.id_, - x.empty_); + x.empty_, + x.flags_); }; return get_tie(lhs) == get_tie(rhs); } diff --git a/source/lib/rocprofiler-sdk/counters/metrics.hpp b/source/lib/rocprofiler-sdk/counters/metrics.hpp index b6c1e7ad..cb9612a1 100644 --- a/source/lib/rocprofiler-sdk/counters/metrics.hpp +++ b/source/lib/rocprofiler-sdk/counters/metrics.hpp @@ -64,8 +64,11 @@ class Metric const std::string& expression() const { return expression_; } const std::string& special() const { return special_; } uint64_t id() const { return id_; } + uint32_t flags() const { return flags_; } bool empty() const { return empty_; } + void setflags(uint32_t flags) { this->flags_ = flags; } + friend bool operator<(Metric const& lhs, Metric const& rhs); friend bool operator==(Metric const& lhs, Metric const& rhs); @@ -78,6 +81,7 @@ class Metric std::string special_ = {}; int64_t id_ = -1; bool empty_ = false; + uint32_t flags_ = 0; }; using MetricMap = std::unordered_map>; @@ -114,6 +118,13 @@ getMetricsForAgent(const std::string&); const MetricIdMap* getMetricIdMap(); +/** + * Get the metric event ids for perfcounters options in thread trace + * applicable only for GFX9 agents and SQ block counters + */ +std::unordered_map +getPerfCountersIdMap(); + /** * Checks if a metric is valid for a given agent **/ diff --git a/source/lib/rocprofiler-sdk/counters/parser/parser.cpp b/source/lib/rocprofiler-sdk/counters/parser/parser.cpp index 153330bd..37c53c99 100644 --- a/source/lib/rocprofiler-sdk/counters/parser/parser.cpp +++ b/source/lib/rocprofiler-sdk/counters/parser/parser.cpp @@ -133,13 +133,14 @@ enum yysymbol_kind_t YYSYMBOL_NAME = 20, /* NAME */ YYSYMBOL_REDUCE = 21, /* REDUCE */ YYSYMBOL_SELECT = 22, /* SELECT */ - YYSYMBOL_LOWER_THAN_ELSE = 23, /* LOWER_THAN_ELSE */ - YYSYMBOL_ELSE = 24, /* ELSE */ - YYSYMBOL_YYACCEPT = 25, /* $accept */ - YYSYMBOL_top = 26, /* top */ - YYSYMBOL_exp = 27, /* exp */ - YYSYMBOL_reduce_dim_args = 28, /* reduce_dim_args */ - YYSYMBOL_select_dim_args = 29 /* select_dim_args */ + YYSYMBOL_ACCUMULATE = 23, /* ACCUMULATE */ + YYSYMBOL_LOWER_THAN_ELSE = 24, /* LOWER_THAN_ELSE */ + YYSYMBOL_ELSE = 25, /* ELSE */ + YYSYMBOL_YYACCEPT = 26, /* $accept */ + YYSYMBOL_top = 27, /* top */ + YYSYMBOL_exp = 28, /* exp */ + YYSYMBOL_reduce_dim_args = 29, /* reduce_dim_args */ + YYSYMBOL_select_dim_args = 30 /* select_dim_args */ }; typedef enum yysymbol_kind_t yysymbol_kind_t; @@ -451,21 +452,21 @@ union yyalloc #endif /* !YYCOPY_NEEDED */ /* YYFINAL -- State number of the termination state. */ -#define YYFINAL 11 +#define YYFINAL 13 /* YYLAST -- Last index in YYTABLE. */ -#define YYLAST 54 +#define YYLAST 60 /* YYNTOKENS -- Number of terminals. */ -#define YYNTOKENS 25 +#define YYNTOKENS 26 /* YYNNTS -- Number of nonterminals. */ #define YYNNTS 5 /* YYNRULES -- Number of rules. */ -#define YYNRULES 16 +#define YYNRULES 17 /* YYNSTATES -- Number of states. */ -#define YYNSTATES 44 +#define YYNSTATES 50 /* YYMAXUTOK -- Last valid token kind. */ -#define YYMAXUTOK 278 +#define YYMAXUTOK 279 /* YYTRANSLATE(TOKEN-NUM) -- Symbol number corresponding to TOKEN-NUM as returned by yylex, with out-of-bounds checking. */ @@ -476,22 +477,22 @@ union yyalloc /* YYTRANSLATE[TOKEN-NUM] -- Symbol number corresponding to TOKEN-NUM as returned by yylex. */ static const yytype_int8 yytranslate[] = { - 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, - 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, - 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, - 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, - 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 15, 2, 2, 2, 2, 2, - 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, - 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, - 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, - 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, - 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 3, 4, - 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24}; + 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 15, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, + 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 3, 4, + 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25}; #if YYDEBUG /* YYRLINE[YYN] -- Source line where rule number YYN was defined. */ static const yytype_int8 yyrline[] = - {0, 57, 57, 60, 61, 62, 63, 64, 65, 66, 69, 73, 77, 84, 87, 94, 97}; + {0, 58, 58, 61, 62, 63, 64, 65, 66, 67, 70, 75, 79, 83, 90, 93, 100, 103}; #endif /** Accessing symbol of state STATE. */ @@ -528,6 +529,7 @@ static const char* const yytname[] = {"\"end of file\"", "NAME", "REDUCE", "SELECT", + "ACCUMULATE", "LOWER_THAN_ELSE", "ELSE", "$accept", @@ -544,7 +546,7 @@ yysymbol_name(yysymbol_kind_t yysymbol) } #endif -#define YYPACT_NINF (-3) +#define YYPACT_NINF (-10) #define yypact_value_is_default(Yyn) ((Yyn) == YYPACT_NINF) @@ -554,48 +556,50 @@ yysymbol_name(yysymbol_kind_t yysymbol) /* YYPACT[STATE-NUM] -- Index in YYTABLE of the portion describing STATE-NUM. */ -static const yytype_int8 yypact[] = {11, 11, -3, -3, 1, 16, 7, 32, 18, 11, 11, -3, 11, 11, 11, - 11, -3, -2, 13, 0, 0, -3, -3, 6, 28, 17, 20, -3, 30, 34, - 31, 24, 27, 36, 33, 35, 37, -3, 24, 38, 20, -3, -3, -3}; +static const yytype_int8 yypact[] = {2, 2, -10, -10, -7, -2, 3, 21, 38, 27, 2, 2, 14, + -10, 2, 2, 2, 2, -10, 0, 23, 18, 13, 13, -10, -10, + 16, 28, 25, -9, 26, 37, -10, 39, 30, 36, -10, 29, 33, + 42, 40, 41, 43, -10, 29, 44, 26, -10, -10, -10}; /* YYDEFACT[STATE-NUM] -- Default reduction number in state STATE-NUM. Performed when YYTABLE does not specify something else to do. Zero means the default is an error. */ -static const yytype_int8 yydefact[] = {0, 0, 3, 9, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, - 0, 8, 0, 0, 4, 5, 6, 7, 0, 0, 0, 0, 10, 0, 0, - 0, 0, 0, 0, 13, 0, 15, 12, 0, 0, 0, 14, 11, 16}; +static const yytype_int8 yydefact[] = {0, 0, 3, 9, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, + 0, 8, 0, 0, 0, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 11, 0, + 0, 0, 10, 0, 0, 0, 14, 0, 16, 13, 0, 0, 0, 15, 12, 17}; /* YYPGOTO[NTERM-NUM]. */ -static const yytype_int8 yypgoto[] = {-3, -3, -1, 14, 9}; +static const yytype_int8 yypgoto[] = {-10, -10, -1, 11, 10}; /* YYDEFGOTO[NTERM-NUM]. */ -static const yytype_int8 yydefgoto[] = {0, 6, 7, 35, 30}; +static const yytype_int8 yydefgoto[] = {0, 7, 8, 41, 35}; /* YYTABLE[YYPACT[STATE-NUM]] -- What to do in state STATE-NUM. If positive, shift that token. If negative, reduce the rule whose number is the opposite. If YYTABLE_NINF, syntax error. */ -static const yytype_int8 yytable[] = {8, 12, 13, 14, 15, 14, 15, 11, 17, 18, 9, 19, 20, 21, - 22, 23, 12, 13, 14, 15, 1, 12, 13, 14, 15, 10, 25, 27, - 16, 2, 24, 3, 4, 5, 28, 12, 13, 14, 15, 26, 29, 31, - 32, 33, 34, 36, 37, 39, 42, 43, 38, 0, 41, 0, 40}; +static const yytype_int8 yytable[] = { + 9, 32, 10, 14, 15, 16, 17, 11, 33, 19, 20, 1, 12, 22, 23, 24, 25, 26, 16, 17, 2, + 13, 3, 4, 5, 6, 14, 15, 16, 17, 14, 15, 16, 17, 21, 28, 29, 18, 38, 30, 27, 14, + 15, 16, 17, 31, 34, 36, 39, 40, 37, 42, 43, 45, 48, 47, 49, 44, 0, 0, 46}; -static const yytype_int8 yycheck[] = {1, 3, 4, 5, 6, 5, 6, 0, 9, 10, 9, 12, 13, 14, - 15, 17, 3, 4, 5, 6, 9, 3, 4, 5, 6, 9, 20, 10, - 10, 18, 17, 20, 21, 22, 17, 3, 4, 5, 6, 11, 20, 11, - 8, 12, 20, 18, 10, 12, 10, 40, 17, -1, 38, -1, 17}; +static const yytype_int8 yycheck[] = {1, 10, 9, 3, 4, 5, 6, 9, 17, 10, 11, 9, 9, 14, 15, 16, + 17, 17, 5, 6, 18, 0, 20, 21, 22, 23, 3, 4, 5, 6, 3, 4, + 5, 6, 20, 17, 20, 10, 8, 11, 17, 3, 4, 5, 6, 20, 20, 10, + 12, 20, 11, 18, 10, 12, 10, 44, 46, 17, -1, -1, 17}; /* YYSTOS[STATE-NUM] -- The symbol kind of the accessing symbol of state STATE-NUM. */ -static const yytype_int8 yystos[] = {0, 9, 18, 20, 21, 22, 26, 27, 27, 9, 9, 0, 3, 4, 5, - 6, 10, 27, 27, 27, 27, 27, 27, 17, 17, 20, 11, 10, 17, 20, - 29, 11, 8, 12, 20, 28, 18, 10, 17, 12, 17, 28, 10, 29}; +static const yytype_int8 yystos[] = {0, 9, 18, 20, 21, 22, 23, 27, 28, 28, 9, 9, 9, + 0, 3, 4, 5, 6, 10, 28, 28, 20, 28, 28, 28, 28, + 17, 17, 17, 20, 11, 20, 10, 17, 20, 30, 10, 11, 8, + 12, 20, 29, 18, 10, 17, 12, 17, 29, 10, 30}; /* YYR1[RULE-NUM] -- Symbol kind of the left-hand side of rule RULE-NUM. */ static const yytype_int8 yyr1[] = - {0, 25, 26, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 29, 29}; + {0, 26, 27, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 30, 30}; /* YYR2[RULE-NUM] -- Number of symbols on the right-hand side of rule RULE-NUM. */ -static const yytype_int8 yyr2[] = {0, 2, 1, 1, 3, 3, 3, 3, 3, 1, 6, 10, 8, 1, 3, 3, 5}; +static const yytype_int8 yyr2[] = {0, 2, 1, 1, 3, 3, 3, 3, 3, 1, 6, 6, 10, 8, 1, 3, 3, 5}; enum { @@ -1020,133 +1024,143 @@ yyparse(RawAST** result) switch(yyn) { case 2: /* top: exp */ -#line 57 "parser.y" +#line 58 "parser.y" { *result = (yyvsp[0].a); } -#line 1119 "parser.cpp" +#line 1122 "parser.cpp" break; case 3: /* exp: NUMBER */ -#line 60 "parser.y" +#line 61 "parser.y" { (yyval.a) = new RawAST(NUMBER_NODE, (yyvsp[0].d)); } -#line 1125 "parser.cpp" +#line 1128 "parser.cpp" break; case 4: /* exp: exp ADD exp */ -#line 61 "parser.y" +#line 62 "parser.y" { (yyval.a) = new RawAST(ADDITION_NODE, {(yyvsp[-2].a), (yyvsp[0].a)}); } -#line 1131 "parser.cpp" +#line 1134 "parser.cpp" break; case 5: /* exp: exp SUB exp */ -#line 62 "parser.y" +#line 63 "parser.y" { (yyval.a) = new RawAST(SUBTRACTION_NODE, {(yyvsp[-2].a), (yyvsp[0].a)}); } -#line 1137 "parser.cpp" +#line 1140 "parser.cpp" break; case 6: /* exp: exp MUL exp */ -#line 63 "parser.y" +#line 64 "parser.y" { (yyval.a) = new RawAST(MULTIPLY_NODE, {(yyvsp[-2].a), (yyvsp[0].a)}); } -#line 1143 "parser.cpp" +#line 1146 "parser.cpp" break; case 7: /* exp: exp DIV exp */ -#line 64 "parser.y" +#line 65 "parser.y" { (yyval.a) = new RawAST(DIVIDE_NODE, {(yyvsp[-2].a), (yyvsp[0].a)}); } -#line 1149 "parser.cpp" +#line 1152 "parser.cpp" break; case 8: /* exp: OP exp CP */ -#line 65 "parser.y" +#line 66 "parser.y" { (yyval.a) = (yyvsp[-1].a); } -#line 1155 "parser.cpp" +#line 1158 "parser.cpp" break; case 9: /* exp: NAME */ -#line 66 "parser.y" +#line 67 "parser.y" { (yyval.a) = new RawAST(REFERENCE_NODE, (yyvsp[0].s)); free((yyvsp[0].s)); } -#line 1163 "parser.cpp" +#line 1166 "parser.cpp" break; - case 10: /* exp: REDUCE OP exp CM NAME CP */ -#line 69 "parser.y" + case 10: /* exp: ACCUMULATE OP NAME CM NAME CP */ +#line 70 "parser.y" + { + (yyval.a) = new RawAST(ACCUMULATE_NODE, (yyvsp[-3].s), (yyvsp[-1].s)); + free((yyvsp[-3].s)); + free((yyvsp[-1].s)); + } +#line 1176 "parser.cpp" + break; + + case 11: /* exp: REDUCE OP exp CM NAME CP */ +#line 75 "parser.y" { (yyval.a) = new RawAST(REDUCE_NODE, (yyvsp[-3].a), (yyvsp[-1].s), NULL); free((yyvsp[-1].s)); } -#line 1172 "parser.cpp" +#line 1185 "parser.cpp" break; - case 11: /* exp: REDUCE OP exp CM NAME CM O_SQ reduce_dim_args C_SQ CP */ -#line 73 "parser.y" + case 12: /* exp: REDUCE OP exp CM NAME CM O_SQ reduce_dim_args C_SQ CP */ +#line 79 "parser.y" { (yyval.a) = new RawAST(REDUCE_NODE, (yyvsp[-7].a), (yyvsp[-5].s), (yyvsp[-2].ll)); free((yyvsp[-5].s)); } -#line 1181 "parser.cpp" +#line 1194 "parser.cpp" break; - case 12: /* exp: SELECT OP exp CM O_SQ select_dim_args C_SQ CP */ -#line 77 "parser.y" + case 13: /* exp: SELECT OP exp CM O_SQ select_dim_args C_SQ CP */ +#line 83 "parser.y" { (yyval.a) = new RawAST(SELECT_NODE, (yyvsp[-5].a), (yyvsp[-2].ll)); } -#line 1189 "parser.cpp" +#line 1202 "parser.cpp" break; - case 13: /* reduce_dim_args: NAME */ -#line 84 "parser.y" + case 14: /* reduce_dim_args: NAME */ +#line 90 "parser.y" { (yyval.ll) = new LinkedList((yyvsp[0].s), NULL); free((yyvsp[0].s)); } -#line 1197 "parser.cpp" +#line 1210 "parser.cpp" break; - case 14: /* reduce_dim_args: NAME CM reduce_dim_args */ -#line 87 "parser.y" + case 15: /* reduce_dim_args: NAME CM reduce_dim_args */ +#line 93 "parser.y" { (yyval.ll) = new LinkedList((yyvsp[-2].s), (yyvsp[0].ll)); free((yyvsp[-2].s)); } -#line 1205 "parser.cpp" +#line 1218 "parser.cpp" break; - case 15: /* select_dim_args: NAME EQUALS NUMBER */ -#line 94 "parser.y" + case 16: /* select_dim_args: NAME EQUALS NUMBER */ +#line 100 "parser.y" { (yyval.ll) = new LinkedList((yyvsp[-2].s), (yyvsp[0].d), NULL); free((yyvsp[-2].s)); } -#line 1213 "parser.cpp" +#line 1226 "parser.cpp" break; - case 16: /* select_dim_args: NAME EQUALS NUMBER CM select_dim_args */ -#line 97 "parser.y" + case 17: /* select_dim_args: NAME EQUALS NUMBER CM select_dim_args */ +#line 103 "parser.y" { (yyval.ll) = new LinkedList((yyvsp[-4].s), (yyvsp[-2].d), (yyvsp[0].ll)); free((yyvsp[-4].s)); } -#line 1221 "parser.cpp" +#line 1234 "parser.cpp" break; -#line 1225 "parser.cpp" +#line 1238 "parser.cpp" default: break; } @@ -1320,4 +1334,4 @@ yyparse(RawAST** result) return yyresult; } -#line 103 "parser.y" +#line 109 "parser.y" diff --git a/source/lib/rocprofiler-sdk/counters/parser/parser.h b/source/lib/rocprofiler-sdk/counters/parser/parser.h index 355f02ff..7c3abb26 100644 --- a/source/lib/rocprofiler-sdk/counters/parser/parser.h +++ b/source/lib/rocprofiler-sdk/counters/parser/parser.h @@ -35,8 +35,8 @@ especially those whose name start with YY_ or yy_. They are private implementation details that can be changed or removed. */ -#ifndef YY_YY_ROCPROFILER_SOURCE_LIB_ROCPROFILER_COUNTERS_PARSER_PARSER_H_INCLUDED -#define YY_YY_ROCPROFILER_SOURCE_LIB_ROCPROFILER_COUNTERS_PARSER_PARSER_H_INCLUDED +#ifndef YY_YY_ROCPROFILER_SOURCE_LIB_ROCPROFILER_SDK_COUNTERS_PARSER_PARSER_H_INCLUDED +#define YY_YY_ROCPROFILER_SOURCE_LIB_ROCPROFILER_SDK_COUNTERS_PARSER_PARSER_H_INCLUDED /* Debug traces. */ #ifndef YYDEBUG # define YYDEBUG 1 @@ -81,8 +81,9 @@ enum yytokentype NAME = 274, /* NAME */ REDUCE = 275, /* REDUCE */ SELECT = 276, /* SELECT */ - LOWER_THAN_ELSE = 277, /* LOWER_THAN_ELSE */ - ELSE = 278 /* ELSE */ + ACCUMULATE = 277, /* ACCUMULATE */ + LOWER_THAN_ELSE = 278, /* LOWER_THAN_ELSE */ + ELSE = 279 /* ELSE */ }; typedef enum yytokentype yytoken_kind_t; #endif @@ -98,7 +99,7 @@ union YYSTYPE int64_t d; char* s; -# line 102 "parser.h" +# line 103 "parser.h" }; typedef union YYSTYPE YYSTYPE; # define YYSTYPE_IS_TRIVIAL 1 @@ -110,4 +111,4 @@ extern YYSTYPE yylval; int yyparse(RawAST** result); -#endif /* !YY_YY_ROCPROFILER_SOURCE_LIB_ROCPROFILER_COUNTERS_PARSER_PARSER_H_INCLUDED */ +#endif /* !YY_YY_ROCPROFILER_SOURCE_LIB_ROCPROFILER_SDK_COUNTERS_PARSER_PARSER_H_INCLUDED */ diff --git a/source/lib/rocprofiler-sdk/counters/parser/parser.y b/source/lib/rocprofiler-sdk/counters/parser/parser.y index f0caca79..3e7cb80a 100644 --- a/source/lib/rocprofiler-sdk/counters/parser/parser.y +++ b/source/lib/rocprofiler-sdk/counters/parser/parser.y @@ -39,6 +39,7 @@ void yyerror(rocprofiler::counters::RawAST**, const char *s) { ROCP_ERROR << s; %token NUMBER RANGE /* set data type for numbers */ %token NAME /* set data type for variables and user-defined functions */ %token REDUCE SELECT /* set data type for special functions */ +%token ACCUMULATE %type exp /* set data type for expressions */ %type NAME %type NUMBER @@ -64,6 +65,11 @@ exp: NUMBER { $$ = new RawAST(NUMBER_NODE, $1); } | NAME { $$ = new RawAST(REFERENCE_NODE, $1); free($1); } + | ACCUMULATE OP NAME CM NAME CP { + $$ = new RawAST(ACCUMULATE_NODE, $3, $5); + free($3); + free($5); + } | REDUCE OP exp CM NAME CP { $$ = new RawAST(REDUCE_NODE, $3, $5, NULL); free($5); diff --git a/source/lib/rocprofiler-sdk/counters/parser/raw_ast.hpp b/source/lib/rocprofiler-sdk/counters/parser/raw_ast.hpp index 84f7ca20..b35d245f 100644 --- a/source/lib/rocprofiler-sdk/counters/parser/raw_ast.hpp +++ b/source/lib/rocprofiler-sdk/counters/parser/raw_ast.hpp @@ -54,6 +54,14 @@ enum NodeType SELECT_NODE, SUBTRACTION_NODE, CONSTANT_NODE, + ACCUMULATE_NODE +}; + +enum class ACCUMULATE_OP_TYPE +{ + NONE = 0, + LOW_RESOLUTION, + HIGH_RESOLUTION }; struct LinkedList @@ -75,8 +83,9 @@ struct LinkedList struct RawAST { // Node type - NodeType type{NONE}; // Operation to perform on the counter set - std::string reduce_op{}; + NodeType type{NONE}; // Operation to perform on the counter set + std::string reduce_op{}; + ACCUMULATE_OP_TYPE accumulate_op{ACCUMULATE_OP_TYPE::NONE}; // Stores either the name or digit dependening on whether this // is a name or number @@ -164,6 +173,20 @@ struct RawAST } } + RawAST(NodeType t, const char* v, const char* op) + : type(t) + , value(std::string{CHECK_NOTNULL(v)}) + { + CHECK_NOTNULL(op); + static std::unordered_map map = { + {"NONE", ACCUMULATE_OP_TYPE::NONE}, + {"LOW_RES", ACCUMULATE_OP_TYPE::LOW_RESOLUTION}, + {"HIGH_RES", ACCUMULATE_OP_TYPE::HIGH_RESOLUTION}, + }; + accumulate_op = map.at(static_cast(op)); + CHECK_EQ(t, ACCUMULATE_NODE); + } + // Select operation constructor. Counter is the counter AST // to use for the reduce op, refs is the reference set AST. // dimensions contains the mapping for selecting dimensions @@ -227,16 +250,26 @@ struct formatter {rocprofiler::counters::MULTIPLY_NODE, "MULTIPLY_NODE"}, {rocprofiler::counters::NUMBER_NODE, "NUMBER_NODE"}, {rocprofiler::counters::RANGE_NODE, "RANGE_NODE"}, + {rocprofiler::counters::ACCUMULATE_NODE, "ACCUMULATE_NODE"}, {rocprofiler::counters::REDUCE_NODE, "REDUCE_NODE"}, {rocprofiler::counters::REFERENCE_NODE, "REFERENCE_NODE"}, {rocprofiler::counters::SELECT_NODE, "SELECT_NODE"}, {rocprofiler::counters::SUBTRACTION_NODE, "SUBTRACTION_NODE"}, }; - auto out = fmt::format_to(ctx.out(), - "{{\"Type\":\"{}\", \"REDUCE_OP\":\"{}\",", - NodeTypeToString.at(ast.type), - ast.reduce_op); + static std::unordered_map + AccumulateTypeToString = { + {rocprofiler::counters::ACCUMULATE_OP_TYPE::NONE, "NONE"}, + {rocprofiler::counters::ACCUMULATE_OP_TYPE::HIGH_RESOLUTION, "HIGH_RES"}, + {rocprofiler::counters::ACCUMULATE_OP_TYPE::LOW_RESOLUTION, "LOW_RES"}, + }; + + auto out = + fmt::format_to(ctx.out(), + "{{\"Type\":\"{}\", \"REDUCE_OP\":\"{}\", \"ACCUMULATE_OP\":\"{}\",", + NodeTypeToString.at(ast.type), + ast.reduce_op, + AccumulateTypeToString.at(ast.accumulate_op)); if(const auto* string_val = std::get_if(&ast.value)) { diff --git a/source/lib/rocprofiler-sdk/counters/parser/scanner.cpp b/source/lib/rocprofiler-sdk/counters/parser/scanner.cpp index e773ca90..aa161b6d 100644 --- a/source/lib/rocprofiler-sdk/counters/parser/scanner.cpp +++ b/source/lib/rocprofiler-sdk/counters/parser/scanner.cpp @@ -388,8 +388,8 @@ yy_fatal_error(const char* msg); (yy_hold_char) = *yy_cp; \ *yy_cp = '\0'; \ (yy_c_buf_p) = yy_cp; -#define YY_NUM_RULES 22 -#define YY_END_OF_BUFFER 23 +#define YY_NUM_RULES 23 +#define YY_END_OF_BUFFER 24 /* This struct is not used in this scanner, but its presence is necessary. */ struct yy_trans_info @@ -397,19 +397,20 @@ struct yy_trans_info flex_int32_t yy_verify; flex_int32_t yy_nxt; }; -static const flex_int16_t yy_accept[48] = { - 0, 0, 0, 23, 21, 20, 18, 6, 7, 3, 1, 9, 2, 21, 4, 14, 10, 8, 17, 11, 12, 17, 17, 5, - 14, 19, 13, 14, 0, 17, 17, 17, 19, 13, 0, 0, 14, 17, 17, 0, 13, 17, 17, 17, 17, 15, 16, 0}; +static const flex_int16_t yy_accept[58] = { + 0, 0, 0, 24, 22, 21, 19, 6, 7, 3, 1, 9, 2, 22, 4, 14, 10, 8, 18, 11, + 12, 18, 18, 18, 5, 14, 20, 13, 14, 0, 18, 18, 18, 18, 20, 13, 0, 0, 14, 18, + 18, 18, 0, 13, 18, 18, 18, 18, 18, 18, 18, 15, 16, 18, 18, 18, 17, 0}; static const YY_CHAR yy_ec[256] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 1, 1, 14, 1, 1, 1, 15, 15, 15, 15, 16, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, - 15, 15, 15, 15, 15, 15, 15, 17, 1, 18, 1, 15, 1, 15, 15, 19, 20, + 15, 15, 15, 15, 15, 15, 15, 17, 1, 18, 1, 15, 1, 19, 15, 20, 21, - 21, 15, 15, 15, 15, 15, 15, 22, 15, 15, 15, 15, 15, 23, 24, 25, 26, 15, 15, 15, 15, - 15, 1, 27, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 22, 15, 15, 15, 15, 15, 15, 23, 24, 15, 15, 15, 15, 25, 26, 27, 28, 15, 15, 15, 15, + 15, 1, 29, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, @@ -418,40 +419,40 @@ static const YY_CHAR yy_ec[256] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}; -static const YY_CHAR yy_meta[28] = {0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, - 1, 3, 3, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 1}; +static const YY_CHAR yy_meta[30] = {0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, + 3, 3, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1}; -static const flex_int16_t yy_base[50] = {0, 0, 0, 70, 71, 71, 71, 71, 71, 71, 71, 71, 71, - 57, 57, 18, 71, 71, 0, 71, 71, 46, 45, 71, 17, 0, - 19, 31, 37, 0, 45, 42, 0, 38, 44, 51, 49, 32, 36, - 43, 36, 26, 23, 16, 11, 0, 0, 71, 29, 59}; +static const flex_int16_t yy_base[60] = { + 0, 0, 0, 81, 82, 82, 82, 82, 82, 82, 82, 82, 82, 68, 68, 20, 82, 82, 0, 82, + 82, 58, 55, 54, 82, 19, 0, 21, 28, 39, 0, 55, 53, 50, 0, 33, 45, 60, 59, 42, + 41, 46, 55, 54, 41, 44, 43, 34, 39, 32, 33, 0, 0, 34, 20, 17, 0, 82, 31, 57}; -static const flex_int16_t yy_def[50] = {0, 47, 1, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, - 47, 47, 47, 47, 47, 48, 47, 47, 48, 48, 47, 47, 49, - 47, 47, 47, 48, 48, 48, 49, 47, 47, 47, 47, 48, 48, - 47, 47, 48, 48, 48, 48, 48, 48, 0, 47, 47}; +static const flex_int16_t yy_def[60] = {0, 57, 1, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, + 57, 57, 57, 58, 57, 57, 58, 58, 58, 57, 57, 59, 57, 57, 57, + 58, 58, 58, 58, 59, 57, 57, 57, 57, 58, 58, 58, 57, 57, 58, + 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 0, 57, 57}; -static const flex_int16_t yy_nxt[99] = { - 0, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 18, 19, 20, 18, - 18, 18, 18, 21, 22, 18, 18, 23, 26, 24, 27, 33, 29, 28, 28, 34, 46, 45, 28, 28, - 34, 26, 44, 27, 35, 43, 35, 28, 40, 36, 33, 39, 28, 39, 34, 40, 40, 42, 41, 34, - 32, 36, 32, 36, 38, 37, 31, 30, 25, 24, 47, 3, 47, 47, 47, 47, 47, 47, 47, 47, - 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47 +static const flex_int16_t yy_nxt[112] = { + 0, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 18, 19, 20, 21, 18, + 18, 18, 18, 18, 22, 23, 18, 18, 24, 27, 25, 28, 35, 30, 29, 29, 36, 27, 56, 28, 29, + 29, 36, 29, 35, 37, 55, 37, 36, 29, 38, 42, 54, 42, 36, 53, 43, 34, 52, 34, 51, 50, + 49, 48, 47, 43, 43, 46, 45, 44, 38, 38, 41, 40, 39, 33, 32, 31, 26, 25, 57, 3, 57, + 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, -}; + 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57}; -static const flex_int16_t yy_chk[99] = { - 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, - 1, 1, 1, 1, 1, 1, 1, 1, 15, 24, 15, 26, 48, 24, 15, 26, 44, 43, 24, 15, - 26, 27, 42, 27, 28, 41, 28, 27, 40, 28, 33, 34, 27, 34, 33, 39, 34, 38, 37, 33, - 49, 36, 49, 35, 31, 30, 22, 21, 14, 13, 3, 47, 47, 47, 47, 47, 47, 47, 47, 47, - 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47 +static const flex_int16_t yy_chk[112] = { + 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 1, 15, 25, 15, 27, 58, 25, 15, 27, 28, 55, 28, 25, + 15, 27, 28, 35, 29, 54, 29, 35, 28, 29, 36, 53, 36, 35, 50, 36, 59, 49, 59, 48, 47, + 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 33, 32, 31, 23, 22, 21, 14, 13, 3, 57, 57, + 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, -}; + 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57}; /* Table of booleans, true if rule could match eol. */ -static const flex_int32_t yy_rule_can_match_eol[23] = { - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, +static const flex_int32_t yy_rule_can_match_eol[24] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, }; static yy_state_type yy_last_accepting_state; @@ -476,9 +477,9 @@ char* yytext; #include "raw_ast.hpp" using namespace std; #define YYDEBUG 1 -#line 511 "scanner.cpp" +#line 518 "scanner.cpp" /* float exponent */ -#line 513 "scanner.cpp" +#line 520 "scanner.cpp" #define INITIAL 0 @@ -713,7 +714,7 @@ YY_DECL { #line 15 "scanner.l" -#line 730 "scanner.cpp" +#line 737 "scanner.cpp" while(/*CONSTCOND*/ 1) /* loops until end-of-file is reached */ { @@ -740,11 +741,11 @@ YY_DECL while(yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state) { yy_current_state = (int) yy_def[yy_current_state]; - if(yy_current_state >= 48) yy_c = yy_meta[yy_c]; + if(yy_current_state >= 58) yy_c = yy_meta[yy_c]; } yy_current_state = yy_nxt[yy_base[yy_current_state] + yy_c]; ++yy_cp; - } while(yy_base[yy_current_state] != 71); + } while(yy_base[yy_current_state] != 82); yy_find_action: yy_act = yy_accept[yy_current_state]; @@ -870,40 +871,46 @@ YY_DECL } YY_BREAK case 17: YY_RULE_SETUP -#line 37 "scanner.l" +#line 36 "scanner.l" + { + return ACCUMULATE; + } + YY_BREAK + case 18: YY_RULE_SETUP +#line 38 "scanner.l" { yylval.s = strdup(yytext); return NAME; } YY_BREAK - case 18: - /* rule 18 can match eol */ + case 19: + /* rule 19 can match eol */ YY_RULE_SETUP -#line 42 "scanner.l" +#line 43 "scanner.l" { return EOL; } - YY_BREAK - case 19: YY_RULE_SETUP -#line 43 "scanner.l" - YY_BREAK case 20: YY_RULE_SETUP #line 44 "scanner.l" - { /* ignore white space */ - } + YY_BREAK case 21: YY_RULE_SETUP #line 45 "scanner.l" - { - throw std::runtime_error(fmt::format("Mystery character {}", *yytext)); + { /* ignore white space */ } YY_BREAK case 22: YY_RULE_SETUP #line 46 "scanner.l" + { + throw std::runtime_error(fmt::format("Mystery character {}", *yytext)); + } + YY_BREAK + case 23: YY_RULE_SETUP +#line 47 "scanner.l" YY_FATAL_ERROR("flex scanner jammed"); YY_BREAK -#line 909 "scanner.cpp" +#line 921 "scanner.cpp" case YY_STATE_EOF(INITIAL): yyterminate(); case YY_END_OF_BUFFER: @@ -1187,7 +1194,7 @@ yy_get_previous_state(void) while(yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state) { yy_current_state = (int) yy_def[yy_current_state]; - if(yy_current_state >= 48) yy_c = yy_meta[yy_c]; + if(yy_current_state >= 58) yy_c = yy_meta[yy_c]; } yy_current_state = yy_nxt[yy_base[yy_current_state] + yy_c]; } @@ -1215,10 +1222,10 @@ yy_try_NUL_trans(yy_state_type yy_current_state) while(yy_chk[yy_base[yy_current_state] + yy_c] != yy_current_state) { yy_current_state = (int) yy_def[yy_current_state]; - if(yy_current_state >= 48) yy_c = yy_meta[yy_c]; + if(yy_current_state >= 58) yy_c = yy_meta[yy_c]; } yy_current_state = yy_nxt[yy_base[yy_current_state] + yy_c]; - yy_is_jam = (yy_current_state == 47); + yy_is_jam = (yy_current_state == 57); return yy_is_jam ? 0 : yy_current_state; } @@ -1864,4 +1871,4 @@ yyfree(void* ptr) #define YYTABLES_NAME "yytables" -#line 46 "scanner.l" +#line 47 "scanner.l" diff --git a/source/lib/rocprofiler-sdk/counters/parser/scanner.l b/source/lib/rocprofiler-sdk/counters/parser/scanner.l index 96cff467..411eeaa0 100644 --- a/source/lib/rocprofiler-sdk/counters/parser/scanner.l +++ b/source/lib/rocprofiler-sdk/counters/parser/scanner.l @@ -33,6 +33,7 @@ EXP ([Ee][-+]?[0-9]+) "reduce" { return REDUCE; } "select" { return SELECT; } +"accumulate" { return ACCUMULATE; } [a-z_A-Z][a-z_A-Z0-9]* { yylval.s = strdup(yytext); diff --git a/source/lib/rocprofiler-sdk/counters/parser/tests/parser_test.cpp b/source/lib/rocprofiler-sdk/counters/parser/tests/parser_test.cpp index 8b3d5f88..2d863cff 100644 --- a/source/lib/rocprofiler-sdk/counters/parser/tests/parser_test.cpp +++ b/source/lib/rocprofiler-sdk/counters/parser/tests/parser_test.cpp @@ -33,36 +33,48 @@ TEST(parser, base_ops) { std::map expressionToExpected = { {"AB * BA", - "{\"Type\":\"MULTIPLY_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " - "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " - "\"Value\":\"BA\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "{\"Type\":\"MULTIPLY_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"AB\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}," + "{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"BA\", " + "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]}"}, {"AB + BA", - "{\"Type\":\"ADDITION_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " - "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " - "\"Value\":\"BA\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "{\"Type\":\"ADDITION_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"AB\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}," + "{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"BA\", " + "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]}"}, {"CD - ZX", - "{\"Type\":\"SUBTRACTION_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"CD\", " - "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " - "\"Value\":\"ZX\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "{\"Type\":\"SUBTRACTION_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"CD\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}," + "{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"ZX\", " + "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]}"}, {"NM / DB", - "{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"NM\", " - "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " - "\"Value\":\"DB\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"NM\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}," + "{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"DB\", " + "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]}"}}; for(auto [op, expected] : expressionToExpected) @@ -81,51 +93,70 @@ TEST(parser, order_of_ops) { std::map expressionToExpected = { {"(AB + BA) / CD", - "{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", " + "{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " "\"Counter_Set\":[{\"Type\":\"ADDITION_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " - "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " - "\"Value\":\"BA\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"AB\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}," + "{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"BA\", " + "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[]," + " \"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " "\"Value\":\"CD\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}"}, + "\"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}"}, {"(AB / BA) - BN", - "{\"Type\":\"SUBTRACTION_NODE\", \"REDUCE_OP\":\"\", " + "{\"Type\":\"SUBTRACTION_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " "\"Counter_Set\":[{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " - "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " - "\"Value\":\"BA\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"AB\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}," + "{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"BA\", " + "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " "\"Value\":\"BN\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}"}, + "\"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}"}, {"AD / (CD - ZX)", - "{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AD\", " - "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"SUBTRACTION_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"CD\", " + "{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"AD\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}," + "{\"Type\":\"SUBTRACTION_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"CD\", " "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " - "\"Value\":\"ZX\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", " + "\"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", \"Value\":\"ZX\", \"Counter_Set\":[], " + "\"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}"}, + "\"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}"}, {"MN * (NM / DB)", - "{\"Type\":\"MULTIPLY_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"MN\", " - "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"NM\", " - "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " - "\"Value\":\"DB\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " - "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "{\"Type\":\"MULTIPLY_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"MN\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}," + "{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"NM\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}," + "{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Value\":\"DB\", " + "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]}"}}; @@ -145,29 +176,37 @@ TEST(parser, reduction) { std::vector> expressionToExpected = { {"reduce(AB, SUM, [DIMENSION_XCC,DIMENSION_SHADER_ENGINE])", - "{\"Type\":\"REDUCE_NODE\", \"REDUCE_OP\":\"SUM\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " + "{\"Type\":\"REDUCE_NODE\", \"REDUCE_OP\":\"SUM\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"AB\", " "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " "\"Reduce_Dimension_Set\":[\"3\",\"1\"], \"Select_Dimension_Set\":[]}"}, {"reduce(AB+CD, SUM, [DIMENSION_XCC,DIMENSION_SHADER_ENGINE])", - "{\"Type\":\"REDUCE_NODE\", \"REDUCE_OP\":\"SUM\", " + "{\"Type\":\"REDUCE_NODE\", \"REDUCE_OP\":\"SUM\", \"ACCUMULATE_OP\":\"NONE\", " "\"Counter_Set\":[{\"Type\":\"ADDITION_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"AB\", " "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", " "\"Value\":\"CD\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " "\"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[\"3\",\"1\"], " "\"Select_Dimension_Set\":[]}"}, {"reduce(AB,DIV, [DIMENSION_XCC,DIMENSION_SHADER_ENGINE])+reduce(DC,SUM, " "[DIMENSION_XCC,DIMENSION_SHADER_ENGINE])", - "{\"Type\":\"ADDITION_NODE\", \"REDUCE_OP\":\"\", " + "{\"Type\":\"ADDITION_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " "\"Counter_Set\":[{\"Type\":\"REDUCE_NODE\", \"REDUCE_OP\":\"DIV\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"AB\", " "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " "\"Reduce_Dimension_Set\":[\"3\",\"1\"], " "\"Select_Dimension_Set\":[]},{\"Type\":\"REDUCE_NODE\", \"REDUCE_OP\":\"SUM\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"DC\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"DC\", " "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " "\"Reduce_Dimension_Set\":[\"3\",\"1\"], \"Select_Dimension_Set\":[]}], " "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}"}}; @@ -188,25 +227,30 @@ TEST(parser, DISABLED_selection) { std::map expressionToExpected = { {"select(AB, [SE=1,XCC=0])+select(DC,[SE=2])", - "{\"Type\":\"ADDITION_NODE\", \"REDUCE_OP\":\"\", " + "{\"Type\":\"ADDITION_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " "\"Counter_Set\":[{\"Type\":\"SELECT_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " + "\"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"AB\", " "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[\"(\"XCC\", 0)\",\"(\"SE\", " - "1)\"]},{\"Type\":\"SELECT_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"DC\", " + "1)\"]},{\"Type\":\"SELECT_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"DC\", " "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[\"(\"SE\", 2)\"]}], " "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}"}, {"select(AB, [SE=2,XCC=1,WGP=3])", - "{\"Type\":\"SELECT_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " + "{\"Type\":\"SELECT_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"AB\", " "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[\"(\"WGP\", 3)\",\"(\"XCC\", " "1)\",\"(\"SE\", 2)\"]}"}, {"select(AB, [XCC=0])", - "{\"Type\":\"SELECT_NODE\", \"REDUCE_OP\":\"\", " - "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", \"Value\":\"AB\", " + "{\"Type\":\"SELECT_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"AB\", " "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[\"(\"XCC\", 0)\"]}"}}; @@ -216,6 +260,7 @@ TEST(parser, DISABLED_selection) auto* buf = yy_scan_string(op.c_str()); yyparse(&ast); ASSERT_TRUE(ast); + auto exp = fmt::format("{}", *ast); EXPECT_EQ(fmt::format("{}", *ast), expected); yy_delete_buffer(buf); delete ast; @@ -241,6 +286,71 @@ TEST(parser, parse_derived_counters) } } +TEST(parser, parse_accum_counter) +{ + std::map expressionToExpected = { + {"accumulate(SQ_WAVES,NONE)", + "{\"Type\":\"ACCUMULATE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", \"Value\"" + ":\"SQ_WAVES\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}"}, + {"accumulate(SQ_WAVES,HIGH_RES)", + "{\"Type\":\"ACCUMULATE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"HIGH_RES\", " + "\"Value" + "\":\"SQ_WAVES\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}"}, + {"accumulate(SQ_WAVES,LOW_RES)", + "{\"Type\":\"ACCUMULATE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"LOW_RES\", " + "\"Value\"" + ":\"SQ_WAVES\", \"Counter_Set\":[], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}"}}; + + for(auto [op, expected] : expressionToExpected) + { + RawAST* ast = nullptr; + auto* buf = yy_scan_string(op.c_str()); + yyparse(&ast); + ASSERT_TRUE(ast); + auto exp = fmt::format("{}", *ast); + EXPECT_EQ(fmt::format("{}", *ast), expected); + yy_delete_buffer(buf); + delete ast; + } +} + +TEST(parser, parse_nested_accum_counter) +{ + std::map expressionToExpected = { + {"reduce(accumulate(SQ_LEVEL_WAVES,HIGH_RES),sum)/reduce(GRBM_GUI_ACTIVE,max)/CU_NUM", + "{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"DIVIDE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Counter_Set\":[{\"Type\":\"REDUCE_NODE\", " + "\"REDUCE_OP\":\"sum\", \"ACCUMULATE_OP\":\"NONE\", " + "\"Counter_Set\":[{\"Type\":\"ACCUMULATE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"HIGH_RES\", \"Value\":\"SQ_LEVEL_WAVES\", \"Counter_Set\":[], " + "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]},{\"Type\":\"REDUCE_NODE\", \"REDUCE_OP\":\"max\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Counter_Set\":[{\"Type\":\"REFERENCE_NODE\", " + "\"REDUCE_OP\":\"\", \"ACCUMULATE_OP\":\"NONE\", \"Value\":\"GRBM_GUI_ACTIVE\", " + "\"Counter_Set\":[], \"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], " + "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]},{\"Type\":\"REFERENCE_NODE\", \"REDUCE_OP\":\"\", " + "\"ACCUMULATE_OP\":\"NONE\", \"Value\":\"CU_NUM\", \"Counter_Set\":[], " + "\"Reduce_Dimension_Set\":[], \"Select_Dimension_Set\":[]}], \"Reduce_Dimension_Set\":[], " + "\"Select_Dimension_Set\":[]}"}}; + + for(auto [op, expected] : expressionToExpected) + { + RawAST* ast = nullptr; + auto* buf = yy_scan_string(op.c_str()); + yyparse(&ast); + ASSERT_TRUE(ast); + auto exp = fmt::format("{}", *ast); + EXPECT_EQ(fmt::format("{}", *ast), expected); + yy_delete_buffer(buf); + delete ast; + } +} + // TEST(parser, parse_complex_counters) // { // std::map expressionToExpected = { diff --git a/source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt b/source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt index 7a2954fb..e012dba2 100644 --- a/source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt +++ b/source/lib/rocprofiler-sdk/counters/tests/CMakeLists.txt @@ -39,6 +39,21 @@ endforeach() add_custom_target(agent_hasco_targets DEPENDS ${HSACO_TARGET_LIST}) +add_library(counter_test_constants OBJECT) +add_library(rocprofiler-sdk::counter-test-constants ALIAS counter_test_constants) +set(ROCPROFILER_LIB_COUNTER_TEST_CONSTANTS_SOURCES hsa_tables.cpp) +set(ROCPROFILER_LIB_COUNTER_TEST_CONSTANTS_HEADERS hsa_tables.hpp) +target_sources( + counter_test_constants + PUBLIC ${ROCPROFILER_LIB_COUNTER_TEST_CONSTANTS_HEADERS} + PRIVATE ${ROCPROFILER_LIB_COUNTER_TEST_CONSTANTS_SOURCES}) + +target_link_libraries( + counter_test_constants + PRIVATE rocprofiler-sdk::rocprofiler-common-library + rocprofiler-sdk::rocprofiler-static-library rocprofiler-sdk::rocprofiler-hip + rocprofiler-sdk::rocprofiler-hsa-runtime) + set(ROCPROFILER_LIB_COUNTER_TEST_SOURCES metrics_test.cpp evaluate_ast_test.cpp dimension.cpp init_order.cpp core.cpp code_object_loader.cpp agent_profiling.cpp) @@ -53,9 +68,13 @@ add_dependencies(counter-test agent_hasco_targets) target_link_libraries( counter-test - PRIVATE rocprofiler-sdk::rocprofiler-hsa-runtime rocprofiler-sdk::rocprofiler-hip + PRIVATE rocprofiler-sdk::counter-test-constants + rocprofiler-sdk::rocprofiler-hsa-runtime + rocprofiler-sdk::rocprofiler-hip rocprofiler-sdk::rocprofiler-common-library - rocprofiler-sdk::rocprofiler-static-library GTest::gtest GTest::gtest_main) + rocprofiler-sdk::rocprofiler-static-library + GTest::gtest + GTest::gtest_main) gtest_add_tests( TARGET counter-test diff --git a/source/lib/rocprofiler-sdk/counters/tests/agent_profiling.cpp b/source/lib/rocprofiler-sdk/counters/tests/agent_profiling.cpp index d20690c3..8af74cfa 100644 --- a/source/lib/rocprofiler-sdk/counters/tests/agent_profiling.cpp +++ b/source/lib/rocprofiler-sdk/counters/tests/agent_profiling.cpp @@ -20,23 +20,19 @@ // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE // SOFTWARE. -#include "lib/rocprofiler-sdk/counters/tests/agent_profiling.hpp" -#include "lib/common/logging.hpp" -#include "lib/rocprofiler-sdk/counters/tests/code_object_loader.hpp" - #include "lib/common/filesystem.hpp" +#include "lib/common/logging.hpp" #include "lib/common/utility.hpp" #include "lib/rocprofiler-sdk/agent.hpp" #include "lib/rocprofiler-sdk/context/context.hpp" -#include "lib/rocprofiler-sdk/counters/core.hpp" -#include "lib/rocprofiler-sdk/counters/dispatch_handlers.hpp" #include "lib/rocprofiler-sdk/counters/metrics.hpp" +#include "lib/rocprofiler-sdk/counters/tests/code_object_loader.hpp" +#include "lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp" #include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" -#include "lib/rocprofiler-sdk/hsa/queue.hpp" #include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" #include "lib/rocprofiler-sdk/registration.hpp" -#include "rocprofiler-sdk/buffer.h" +#include #include #include #include @@ -53,8 +49,8 @@ #include #include +using namespace rocprofiler::counters::test_constants; using namespace rocprofiler::counters::testing; -using namespace rocprofiler::counters; using namespace rocprofiler; #define ROCPROFILER_CALL(result, msg) \ @@ -75,73 +71,27 @@ using namespace rocprofiler; namespace { -AmdExtTable& -get_ext_table() -{ - static auto _v = []() { - auto val = AmdExtTable{}; - val.hsa_amd_memory_pool_get_info_fn = hsa_amd_memory_pool_get_info; - val.hsa_amd_agent_iterate_memory_pools_fn = hsa_amd_agent_iterate_memory_pools; - val.hsa_amd_memory_pool_allocate_fn = hsa_amd_memory_pool_allocate; - val.hsa_amd_memory_pool_free_fn = hsa_amd_memory_pool_free; - val.hsa_amd_agent_memory_pool_get_info_fn = hsa_amd_agent_memory_pool_get_info; - val.hsa_amd_agents_allow_access_fn = hsa_amd_agents_allow_access; - val.hsa_amd_queue_set_priority_fn = hsa_amd_queue_set_priority; - val.hsa_amd_signal_async_handler_fn = hsa_amd_signal_async_handler; - return val; - }(); - return _v; -} - -CoreApiTable& -get_api_table() -{ - static auto _v = []() { - auto val = CoreApiTable{}; - val.hsa_iterate_agents_fn = hsa_iterate_agents; - val.hsa_agent_get_info_fn = hsa_agent_get_info; - val.hsa_queue_create_fn = hsa_queue_create; - val.hsa_queue_destroy_fn = hsa_queue_destroy; - val.hsa_signal_create_fn = hsa_signal_create; - val.hsa_signal_destroy_fn = hsa_signal_destroy; - val.hsa_signal_store_screlease_fn = hsa_signal_store_screlease; - val.hsa_signal_load_scacquire_fn = hsa_signal_load_scacquire; - val.hsa_signal_add_relaxed_fn = hsa_signal_add_relaxed; - val.hsa_signal_subtract_relaxed_fn = hsa_signal_subtract_relaxed; - val.hsa_signal_wait_relaxed_fn = hsa_signal_wait_relaxed; - val.hsa_queue_create_fn = hsa_queue_create; - val.hsa_queue_add_write_index_scacq_screl_fn = hsa_queue_add_write_index_scacq_screl; - val.hsa_queue_load_read_index_relaxed_fn = hsa_queue_load_read_index_relaxed; - val.hsa_signal_store_relaxed_fn = hsa_signal_store_relaxed; - val.hsa_signal_load_relaxed_fn = hsa_signal_load_relaxed; - - return val; - }(); - return _v; -} - auto findDeviceMetrics(const hsa::AgentCache& agent, const std::unordered_set& metrics) { std::vector ret; - auto all_counters = counters::getMetricMap(); + const auto* all_counters = counters::getMetricMap(); ROCP_ERROR << "Looking up counters for " << std::string(agent.name()); - auto gfx_metrics = common::get_val(*all_counters, std::string(agent.name())); + const auto* gfx_metrics = common::get_val(*all_counters, std::string(agent.name())); if(!gfx_metrics) { ROCP_ERROR << "No counters found for " << std::string(agent.name()); return ret; } - for(auto& counter : *gfx_metrics) + for(const auto& counter : *gfx_metrics) { if(metrics.count(counter.name()) > 0 || metrics.empty()) { ret.push_back(counter); } } - ROCP_ERROR << "No counters found for " << std::string(agent.name()); return ret; } @@ -151,6 +101,8 @@ test_init() HsaApiTable table; table.amd_ext_ = &get_ext_table(); table.core_ = &get_api_table(); + rocprofiler::hsa::copy_table(table.core_, 0); + rocprofiler::hsa::copy_table(table.amd_ext_, 0); agent::construct_agent_cache(&table); ASSERT_TRUE(hsa::get_queue_controller() != nullptr); hsa::get_queue_controller()->init(get_api_table(), get_ext_table()); @@ -303,6 +255,7 @@ class agent_profile_test : public ::testing::Test registration::set_init_status(-1); context::push_client(1); test_init(); + // rocprofiler_debugger_block(); counters::agent_profile_hsa_registration(); std::string kernel_name = "null_kernel"; @@ -329,8 +282,34 @@ class agent_profile_test : public ::testing::Test &queue), HSA_STATUS_SUCCESS); + // We don't use the queue interceptor, need to enabling profiling manually + hsa_amd_profiling_set_profiler_enabled(queue, 1); + + hsa_signal_t completion_signal; + hsa_signal_create(1, 0, nullptr, &completion_signal); + + CHECK(agent.cpu_pool().handle != 0); + CHECK(agent.get_hsa_agent().handle != 0); + // Set state of the queue to allow profiling (may not be needed since AQL + // may do this in the future). + aql::set_profiler_active_on_queue( + agent.cpu_pool(), agent.get_hsa_agent(), [&](hsa::rocprofiler_packet pkt) { + pkt.ext_amd_aql_pm4.completion_signal = completion_signal; + submitPacket(queue, (void*) &pkt); + + if(hsa_signal_wait_relaxed(completion_signal, + HSA_SIGNAL_CONDITION_EQ, + 0, + 20000000, + HSA_WAIT_STATE_BLOCKED) != 0) + { + ROCP_FATAL << "Failed to set profiling mode on queue"; + } + hsa_signal_store_relaxed(completion_signal, 1); + }); + rocprofiler::hsa::rocprofiler_packet barrier{}; - hsa_signal_t completion_signal; + hsa_signal_create(1, 0, nullptr, &completion_signal); barrier.barrier_and.header = packet_header(HSA_PACKET_TYPE_BARRIER_AND); barrier.barrier_and.completion_signal = completion_signal; @@ -480,4 +459,20 @@ TEST_F(agent_profile_test, sync_gpu_util_verify) ROCP_ERROR << fmt::format("Name: {} Counter value: {}", info.name, val.counter_value); EXPECT_GT(val.counter_value, 0.0); } -} \ No newline at end of file +} + +TEST_F(agent_profile_test, sync_sq_waves_verify) +{ + test_run(ROCPROFILER_COUNTER_FLAG_NONE, {"SQ_WAVES_sum"}, 50000); + ROCP_ERROR << global_recs().size(); + + for(const auto& val : global_recs()) + { + rocprofiler_counter_id_t id; + rocprofiler_query_record_counter_id(val.id, &id); + rocprofiler_counter_info_v0_t info; + rocprofiler_query_counter_info(id, ROCPROFILER_COUNTER_INFO_VERSION_0, &info); + ROCP_ERROR << fmt::format("Name: {} Counter value: {}", info.name, val.counter_value); + EXPECT_GT(val.counter_value, 0.0); + } +} diff --git a/source/lib/rocprofiler-sdk/counters/tests/agent_profiling.hpp b/source/lib/rocprofiler-sdk/counters/tests/agent_profiling.hpp index 7c3d524c..5cf388ee 100644 --- a/source/lib/rocprofiler-sdk/counters/tests/agent_profiling.hpp +++ b/source/lib/rocprofiler-sdk/counters/tests/agent_profiling.hpp @@ -20,4 +20,4 @@ // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE // SOFTWARE. -#pragma once \ No newline at end of file +#pragma once diff --git a/source/lib/rocprofiler-sdk/counters/tests/code_object_loader.cpp b/source/lib/rocprofiler-sdk/counters/tests/code_object_loader.cpp index 27162a82..6d9eb13a 100644 --- a/source/lib/rocprofiler-sdk/counters/tests/code_object_loader.cpp +++ b/source/lib/rocprofiler-sdk/counters/tests/code_object_loader.cpp @@ -103,4 +103,4 @@ search_hasco(const common::filesystem::path& directory, std::string& filename) } } // namespace testing } // namespace counters -} // namespace rocprofiler \ No newline at end of file +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/counters/tests/core.cpp b/source/lib/rocprofiler-sdk/counters/tests/core.cpp index 339efa66..7ba514ba 100644 --- a/source/lib/rocprofiler-sdk/counters/tests/core.cpp +++ b/source/lib/rocprofiler-sdk/counters/tests/core.cpp @@ -26,6 +26,7 @@ #include "lib/rocprofiler-sdk/context/context.hpp" #include "lib/rocprofiler-sdk/counters/dispatch_handlers.hpp" #include "lib/rocprofiler-sdk/counters/metrics.hpp" +#include "lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp" #include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" #include "lib/rocprofiler-sdk/hsa/queue.hpp" #include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" @@ -47,6 +48,7 @@ #include #include +using namespace rocprofiler::counters::test_constants; using namespace rocprofiler::counters; using namespace rocprofiler; @@ -68,43 +70,6 @@ using namespace rocprofiler; namespace { -AmdExtTable& -get_ext_table() -{ - static auto _v = []() { - auto val = AmdExtTable{}; - val.hsa_amd_memory_pool_get_info_fn = hsa_amd_memory_pool_get_info; - val.hsa_amd_agent_iterate_memory_pools_fn = hsa_amd_agent_iterate_memory_pools; - val.hsa_amd_memory_pool_allocate_fn = hsa_amd_memory_pool_allocate; - val.hsa_amd_memory_pool_free_fn = hsa_amd_memory_pool_free; - val.hsa_amd_agent_memory_pool_get_info_fn = hsa_amd_agent_memory_pool_get_info; - val.hsa_amd_agents_allow_access_fn = hsa_amd_agents_allow_access; - return val; - }(); - return _v; -} - -CoreApiTable& -get_api_table() -{ - static auto _v = []() { - auto val = CoreApiTable{}; - val.hsa_iterate_agents_fn = hsa_iterate_agents; - val.hsa_agent_get_info_fn = hsa_agent_get_info; - val.hsa_queue_create_fn = hsa_queue_create; - val.hsa_queue_destroy_fn = hsa_queue_destroy; - val.hsa_signal_create_fn = hsa_signal_create; - val.hsa_signal_destroy_fn = hsa_signal_destroy; - val.hsa_signal_store_screlease_fn = hsa_signal_store_screlease; - val.hsa_signal_load_scacquire_fn = hsa_signal_load_scacquire; - val.hsa_signal_add_relaxed_fn = hsa_signal_add_relaxed; - val.hsa_signal_subtract_relaxed_fn = hsa_signal_subtract_relaxed; - val.hsa_signal_wait_relaxed_fn = hsa_signal_wait_relaxed; - return val; - }(); - return _v; -} - auto findDeviceMetrics(const hsa::AgentCache& agent, const std::unordered_set& metrics) { @@ -151,28 +116,14 @@ get_client_ctx() return ctx; } -struct buf_check -{ - size_t expected_size{0}; - bool is_special{false}; - double special_val{0.0}; -}; - void buffered_callback(rocprofiler_context_id_t, rocprofiler_buffer_id_t, rocprofiler_record_header_t** headers, size_t num_headers, - void* user_data, + void* /* user_data */, uint64_t) { - buf_check& expected = *static_cast(user_data); - if(expected.is_special) - { - // Special values are single value constants (from agent_t) - expected.expected_size = 1; - } - std::set seen_data; std::set seen_dims; for(size_t i = 0; i < num_headers; ++i) @@ -262,7 +213,7 @@ TEST(core, check_packet_generation) "Unable to create profile"); auto profile = counters::get_profile_config(cfg_id); ASSERT_TRUE(profile); - EXPECT_EQ(counters::counter_callback_info::setup_profile_config(agent, profile), + EXPECT_EQ(counters::counter_callback_info::setup_profile_config(profile), ROCPROFILER_STATUS_SUCCESS) << fmt::format("Could not build profile for {}", metric.name()); @@ -279,7 +230,7 @@ TEST(core, check_packet_generation) */ counters::counter_callback_info cb_info; std::unique_ptr pkt; - EXPECT_EQ(cb_info.get_packet(pkt, agent, profile), ROCPROFILER_STATUS_SUCCESS) + EXPECT_EQ(cb_info.get_packet(pkt, profile), ROCPROFILER_STATUS_SUCCESS) << "Unable to generate packet"; EXPECT_TRUE(pkt) << "Expected a packet to be generated"; cb_info.packet_return_map.wlock([&](const auto& data) { @@ -491,33 +442,17 @@ TEST(core, check_callbacks) ASSERT_TRUE(ret_pkt) << fmt::format("Expected a packet to be generated for - {}", metric.name()); - /** - * Fake some data for the counter - */ - size_t* fake_data = static_cast(ret_pkt->profile.output_buffer.ptr); - for(size_t i = 0; i < (ret_pkt->profile.output_buffer.size / sizeof(size_t)); i++) - { - fake_data[i] = i + 1; - } - /** * Create the buffer and run test */ rocprofiler_buffer_id_t opt_buff_id = {.handle = 0}; - buf_check check = { - .expected_size = ret_pkt->profile.output_buffer.size / sizeof(size_t), - .is_special = !metric.special().empty(), - .special_val = (metric.special().empty() ? 0.0 - : double(counters::get_agent_property( - std::string_view(metric.name()), - *agent.get_rocp_agent())))}; ROCPROFILER_CALL(rocprofiler_create_buffer(get_client_ctx(), 500 * sizeof(size_t), 500 * sizeof(size_t), ROCPROFILER_BUFFER_POLICY_LOSSLESS, buffered_callback, - &check, + nullptr, &opt_buff_id), "Could not create buffer"); cb_info->buffer = opt_buff_id; @@ -707,6 +642,68 @@ TEST(core, start_stop_callback_ctx) context::pop_client(1); } +TEST(core, test_profile_incremental) +{ + ASSERT_EQ(hsa_init(), HSA_STATUS_SUCCESS); + test_init(); + ASSERT_TRUE(hsa::get_queue_controller() != nullptr); + auto agents = hsa::get_queue_controller()->get_supported_agents(); + ASSERT_GT(agents.size(), 0); + for(const auto& [_, agent] : agents) + { + auto metrics = findDeviceMetrics(agent, {}); + ASSERT_FALSE(metrics.empty()); + ASSERT_TRUE(agent.get_rocp_agent()); + + std::map> metric_blocks; + for(const auto& metric : metrics) + { + if(!metric.block().empty()) + { + metric_blocks[metric.block()].push_back(metric); + } + } + + rocprofiler_profile_config_id_t cfg_id = {}; + + // Add one counter from each block to incrementally to make sure we can + // add them incrementally + for(const auto& [block_name, block_metrics] : metric_blocks) + { + rocprofiler_profile_config_id_t old_id = cfg_id; + rocprofiler_counter_id_t id = {.handle = block_metrics.front().id()}; + ROCPROFILER_CALL( + rocprofiler_create_profile_config(agent.get_rocp_agent()->id, &id, 1, &cfg_id), + "Unable to create profile incrementally when we should be able to"); + EXPECT_NE(old_id.handle, cfg_id.handle) + << "We expect that the handle changes this is due to the existing profile being " + "unmodifiable after creation: " + << block_name; + } + + // Check that we encounter an error of exceeds hardware limits eventually + auto status = ROCPROFILER_STATUS_SUCCESS; + for(const auto& metric : metrics) + { + /** + * Check profile construction + */ + rocprofiler_counter_id_t id = {.handle = metric.id()}; + if(status = + rocprofiler_create_profile_config(agent.get_rocp_agent()->id, &id, 1, &cfg_id); + status != ROCPROFILER_STATUS_SUCCESS) + { + break; + } + } + EXPECT_EQ(status, ROCPROFILER_STATUS_ERROR_EXCEEDS_HW_LIMIT); + } + + registration::set_init_status(1); + + registration::finalize(); +} + TEST(core, public_api_iterate_agents) { ASSERT_EQ(hsa_init(), HSA_STATUS_SUCCESS); diff --git a/source/lib/rocprofiler-sdk/counters/tests/dimension.cpp b/source/lib/rocprofiler-sdk/counters/tests/dimension.cpp index 1a690a55..6d802f6e 100644 --- a/source/lib/rocprofiler-sdk/counters/tests/dimension.cpp +++ b/source/lib/rocprofiler-sdk/counters/tests/dimension.cpp @@ -26,6 +26,7 @@ #include "lib/rocprofiler-sdk/counters/dimensions.hpp" #include "lib/rocprofiler-sdk/counters/id_decode.hpp" #include "lib/rocprofiler-sdk/counters/metrics.hpp" +#include "lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp" #include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" #include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" #include "lib/rocprofiler-sdk/registration.hpp" @@ -39,6 +40,8 @@ #include #include +using namespace rocprofiler::counters::test_constants; + namespace { void @@ -128,6 +131,12 @@ TEST(dimension, set_get) check_dim_pos(test_id, dim, i * 5); set_dim_in_rec(test_id, dim, i * 3); check_dim_pos(test_id, dim, i * 3); + for(size_t j = 1; j < 64; j++) + { + test_id = 0; + set_dim_in_rec(test_id, dim, j); + check_dim_pos(test_id, dim, j); + } } test_counter.handle = 123; @@ -145,37 +154,6 @@ using namespace rocprofiler; namespace { -AmdExtTable& -get_ext_table() -{ - static auto _v = []() { - auto val = AmdExtTable{}; - val.hsa_amd_memory_pool_get_info_fn = hsa_amd_memory_pool_get_info; - val.hsa_amd_agent_iterate_memory_pools_fn = hsa_amd_agent_iterate_memory_pools; - val.hsa_amd_memory_pool_allocate_fn = hsa_amd_memory_pool_allocate; - val.hsa_amd_memory_pool_free_fn = hsa_amd_memory_pool_free; - val.hsa_amd_agent_memory_pool_get_info_fn = hsa_amd_agent_memory_pool_get_info; - val.hsa_amd_agents_allow_access_fn = hsa_amd_agents_allow_access; - return val; - }(); - return _v; -} - -CoreApiTable& -get_api_table() -{ - static auto _v = []() { - auto val = CoreApiTable{}; - val.hsa_iterate_agents_fn = hsa_iterate_agents; - val.hsa_agent_get_info_fn = hsa_agent_get_info; - val.hsa_queue_create_fn = hsa_queue_create; - val.hsa_queue_destroy_fn = hsa_queue_destroy; - val.hsa_signal_wait_relaxed_fn = hsa_signal_wait_relaxed; - return val; - }(); - return _v; -} - auto findDeviceMetrics(const hsa::AgentCache& agent, const std::unordered_set& metrics) { diff --git a/source/lib/rocprofiler-sdk/counters/tests/evaluate_ast_test.cpp b/source/lib/rocprofiler-sdk/counters/tests/evaluate_ast_test.cpp index f1764008..50ffd812 100644 --- a/source/lib/rocprofiler-sdk/counters/tests/evaluate_ast_test.cpp +++ b/source/lib/rocprofiler-sdk/counters/tests/evaluate_ast_test.cpp @@ -526,6 +526,74 @@ TEST(evaluate_ast, evaluate_simple_counters) } } +TEST(evaulate_ast, evaulate_hybrid_counters) +{ + using namespace rocprofiler::counters; + + auto get_base_rec_id = [](uint64_t counter_id) { + rocprofiler_counter_instance_id_t base_id = 0; + set_counter_in_rec(base_id, {.handle = counter_id}); + return base_id; + }; + + std::unordered_map metrics = { + {"VOORHEES", Metric("gfx9", "VOORHEES", "a", "a", "a", "", "", 0)}, + {"KRUEGER", Metric("gfx9", "KRUEGER", "a", "a", "a", "", "", 1)}, + {"MYERS", Metric("gfx9", "MYERS", "a", "a", "a", "", "", 2)}, + {"BATES", Metric("gfx9", "BATES", "a", "a", "a", "accumulate(VOORHEES,NONE)", "", 3)}, + {"KRAMER", Metric("gfx9", "KRAMER", "a", "a", "a", "accumulate(KRUEGER,LOW_RES)", "", 4)}, + {"TORRANCE", + Metric("gfx9", "TORRANCE", "a", "a", "a", "accumulate(MYERS,HIGH_RES)", "", 5)}}; + std::unordered_map> base_counter_data = { + {"VOORHEES", construct_test_data_dim(get_base_rec_id(0), {ROCPROFILER_DIMENSION_NONE}, 8)}, + {"KRUEGER", construct_test_data_dim(get_base_rec_id(1), {ROCPROFILER_DIMENSION_NONE}, 8)}, + {"MYERS", construct_test_data_dim(get_base_rec_id(2), {ROCPROFILER_DIMENSION_NONE}, 8)}, + }; + + std::unordered_map> asts; + for(const auto& [val, metric] : metrics) + { + RawAST* ast = nullptr; + auto buf = yy_scan_string(metric.expression().empty() ? metric.name().c_str() + : metric.expression().c_str()); + yyparse(&ast); + ASSERT_TRUE(ast) << metric.expression() << " " << metric.name(); + asts.emplace("gfx9", std::unordered_map{}) + .first->second.emplace(val, + EvaluateAST({.handle = metric.id()}, metrics, *ast, "gfx9")); + yy_delete_buffer(buf); + delete ast; + } + + std::vector< + std::tuple, int64_t, uint32_t>> + derived_counters = { + {"BATES", base_counter_data["VOORHEES"], 1, 0}, + {"KRAMER", base_counter_data["KRUEGER"], 1, 1}, + {"TORRANCE", base_counter_data["MYERS"], 1, 2}, + }; + + std::unordered_map> base_counter_decode; + for(const auto& [name, base_counter_v] : base_counter_data) + { + base_counter_decode[metrics[name].id()] = base_counter_v; + } + + for(auto& [name, expected, eval_count, flag] : derived_counters) + { + LOG(INFO) << name; + auto eval_counters = + rocprofiler::counters::get_required_hardware_counters(asts, "gfx9", metrics[name]); + ASSERT_TRUE(eval_counters); + ASSERT_EQ(eval_counters->size(), eval_count); + ASSERT_EQ(eval_counters->begin()->flags(), flag); + std::vector>> cache; + asts.at("gfx9").at(name).expand_derived(asts.at("gfx9")); + auto ret = asts.at("gfx9").at(name).evaluate(base_counter_decode, cache); + EXPECT_EQ(ret->size(), expected.size()); + } +} + namespace { void diff --git a/source/lib/rocprofiler-sdk/counters/tests/hsa_tables.cpp b/source/lib/rocprofiler-sdk/counters/tests/hsa_tables.cpp new file mode 100644 index 00000000..83c7cb70 --- /dev/null +++ b/source/lib/rocprofiler-sdk/counters/tests/hsa_tables.cpp @@ -0,0 +1,266 @@ +// MIT License +// +// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include +#include +#include + +#include +#include +#include +#include + +namespace rocprofiler +{ +namespace counters +{ +namespace test_constants +{ +AmdExtTable& +get_ext_table() +{ + static auto _v = []() { + auto val = AmdExtTable{}; + val.version.major_id = HSA_AMD_EXT_API_TABLE_MAJOR_VERSION; + val.version.minor_id = sizeof(AmdExtTable); + val.version.step_id = HSA_AMD_EXT_API_TABLE_STEP_VERSION; + val.hsa_amd_coherency_get_type_fn = hsa_amd_coherency_get_type; + val.hsa_amd_coherency_set_type_fn = hsa_amd_coherency_set_type; + val.hsa_amd_profiling_set_profiler_enabled_fn = hsa_amd_profiling_set_profiler_enabled; + val.hsa_amd_profiling_async_copy_enable_fn = hsa_amd_profiling_async_copy_enable; + val.hsa_amd_profiling_get_dispatch_time_fn = hsa_amd_profiling_get_dispatch_time; + val.hsa_amd_profiling_get_async_copy_time_fn = hsa_amd_profiling_get_async_copy_time; + val.hsa_amd_profiling_convert_tick_to_system_domain_fn = + hsa_amd_profiling_convert_tick_to_system_domain; + val.hsa_amd_signal_async_handler_fn = hsa_amd_signal_async_handler; + val.hsa_amd_async_function_fn = hsa_amd_async_function; + val.hsa_amd_signal_wait_any_fn = hsa_amd_signal_wait_any; + val.hsa_amd_queue_cu_set_mask_fn = hsa_amd_queue_cu_set_mask; + val.hsa_amd_memory_pool_get_info_fn = hsa_amd_memory_pool_get_info; + val.hsa_amd_agent_iterate_memory_pools_fn = hsa_amd_agent_iterate_memory_pools; + val.hsa_amd_memory_pool_allocate_fn = hsa_amd_memory_pool_allocate; + val.hsa_amd_memory_pool_free_fn = hsa_amd_memory_pool_free; + val.hsa_amd_memory_async_copy_fn = hsa_amd_memory_async_copy; + val.hsa_amd_memory_async_copy_on_engine_fn = hsa_amd_memory_async_copy_on_engine; + val.hsa_amd_memory_copy_engine_status_fn = hsa_amd_memory_copy_engine_status; + val.hsa_amd_agent_memory_pool_get_info_fn = hsa_amd_agent_memory_pool_get_info; + val.hsa_amd_agents_allow_access_fn = hsa_amd_agents_allow_access; + val.hsa_amd_memory_pool_can_migrate_fn = hsa_amd_memory_pool_can_migrate; + val.hsa_amd_memory_migrate_fn = hsa_amd_memory_migrate; + val.hsa_amd_memory_lock_fn = hsa_amd_memory_lock; + val.hsa_amd_memory_unlock_fn = hsa_amd_memory_unlock; + val.hsa_amd_memory_fill_fn = hsa_amd_memory_fill; + val.hsa_amd_interop_map_buffer_fn = hsa_amd_interop_map_buffer; + val.hsa_amd_interop_unmap_buffer_fn = hsa_amd_interop_unmap_buffer; + val.hsa_amd_image_create_fn = hsa_amd_image_create; + val.hsa_amd_pointer_info_fn = hsa_amd_pointer_info; + val.hsa_amd_pointer_info_set_userdata_fn = hsa_amd_pointer_info_set_userdata; + val.hsa_amd_ipc_memory_create_fn = hsa_amd_ipc_memory_create; + val.hsa_amd_ipc_memory_attach_fn = hsa_amd_ipc_memory_attach; + val.hsa_amd_ipc_memory_detach_fn = hsa_amd_ipc_memory_detach; + val.hsa_amd_signal_create_fn = hsa_amd_signal_create; + val.hsa_amd_ipc_signal_create_fn = hsa_amd_ipc_signal_create; + val.hsa_amd_ipc_signal_attach_fn = hsa_amd_ipc_signal_attach; + val.hsa_amd_register_system_event_handler_fn = hsa_amd_register_system_event_handler; + // Cannot be set, no visable public symbols + // val.hsa_amd_queue_intercept_create_fn = hsa_amd_queue_intercept_create; + // val.hsa_amd_queue_intercept_register_fn = hsa_amd_queue_intercept_register; + val.hsa_amd_queue_set_priority_fn = hsa_amd_queue_set_priority; + val.hsa_amd_memory_async_copy_rect_fn = hsa_amd_memory_async_copy_rect; + // val.hsa_amd_runtime_queue_create_register_fn = hsa_amd_runtime_queue_create_register; + val.hsa_amd_memory_lock_to_pool_fn = hsa_amd_memory_lock_to_pool; + val.hsa_amd_register_deallocation_callback_fn = hsa_amd_register_deallocation_callback; + val.hsa_amd_deregister_deallocation_callback_fn = hsa_amd_deregister_deallocation_callback; + val.hsa_amd_signal_value_pointer_fn = hsa_amd_signal_value_pointer; + val.hsa_amd_svm_attributes_set_fn = hsa_amd_svm_attributes_set; + val.hsa_amd_svm_attributes_get_fn = hsa_amd_svm_attributes_get; + val.hsa_amd_svm_prefetch_async_fn = hsa_amd_svm_prefetch_async; + val.hsa_amd_spm_acquire_fn = hsa_amd_spm_acquire; + val.hsa_amd_spm_release_fn = hsa_amd_spm_release; + val.hsa_amd_spm_set_dest_buffer_fn = hsa_amd_spm_set_dest_buffer; + val.hsa_amd_queue_cu_get_mask_fn = hsa_amd_queue_cu_get_mask; + val.hsa_amd_portable_export_dmabuf_fn = hsa_amd_portable_export_dmabuf; + val.hsa_amd_portable_close_dmabuf_fn = hsa_amd_portable_close_dmabuf; + val.hsa_amd_vmem_address_reserve_fn = hsa_amd_vmem_address_reserve; + val.hsa_amd_vmem_address_free_fn = hsa_amd_vmem_address_free; + val.hsa_amd_vmem_handle_create_fn = hsa_amd_vmem_handle_create; + val.hsa_amd_vmem_handle_release_fn = hsa_amd_vmem_handle_release; + val.hsa_amd_vmem_map_fn = hsa_amd_vmem_map; + val.hsa_amd_vmem_unmap_fn = hsa_amd_vmem_unmap; + val.hsa_amd_vmem_set_access_fn = hsa_amd_vmem_set_access; + val.hsa_amd_vmem_get_access_fn = hsa_amd_vmem_get_access; + val.hsa_amd_vmem_export_shareable_handle_fn = hsa_amd_vmem_export_shareable_handle; + val.hsa_amd_vmem_import_shareable_handle_fn = hsa_amd_vmem_import_shareable_handle; + val.hsa_amd_vmem_retain_alloc_handle_fn = hsa_amd_vmem_retain_alloc_handle; + val.hsa_amd_vmem_get_alloc_properties_from_handle_fn = + hsa_amd_vmem_get_alloc_properties_from_handle; + val.hsa_amd_agent_set_async_scratch_limit_fn = hsa_amd_agent_set_async_scratch_limit; +#if HSA_AMD_EXT_API_TABLE_STEP_VERSION >= 0x02 + val.hsa_amd_queue_get_info_fn = hsa_amd_queue_get_info; +#endif + return val; + }(); + return _v; +} + +CoreApiTable& +get_api_table() +{ + static auto _v = []() { + auto val = CoreApiTable{}; + val.version.major_id = HSA_CORE_API_TABLE_MAJOR_VERSION; + val.version.minor_id = sizeof(CoreApiTable); + val.version.step_id = HSA_CORE_API_TABLE_STEP_VERSION; + val.hsa_init_fn = hsa_init; + val.hsa_shut_down_fn = hsa_shut_down; + val.hsa_system_get_info_fn = hsa_system_get_info; + val.hsa_system_extension_supported_fn = hsa_system_extension_supported; + val.hsa_system_get_extension_table_fn = hsa_system_get_extension_table; + val.hsa_iterate_agents_fn = hsa_iterate_agents; + val.hsa_agent_get_info_fn = hsa_agent_get_info; + val.hsa_queue_create_fn = hsa_queue_create; + val.hsa_soft_queue_create_fn = hsa_soft_queue_create; + val.hsa_queue_destroy_fn = hsa_queue_destroy; + val.hsa_queue_inactivate_fn = hsa_queue_inactivate; + val.hsa_queue_load_read_index_scacquire_fn = hsa_queue_load_read_index_scacquire; + val.hsa_queue_load_read_index_relaxed_fn = hsa_queue_load_read_index_relaxed; + val.hsa_queue_load_write_index_scacquire_fn = hsa_queue_load_write_index_scacquire; + val.hsa_queue_load_write_index_relaxed_fn = hsa_queue_load_write_index_relaxed; + val.hsa_queue_store_write_index_relaxed_fn = hsa_queue_store_write_index_relaxed; + val.hsa_queue_store_write_index_screlease_fn = hsa_queue_store_write_index_screlease; + val.hsa_queue_cas_write_index_scacq_screl_fn = hsa_queue_cas_write_index_scacq_screl; + val.hsa_queue_cas_write_index_scacquire_fn = hsa_queue_cas_write_index_scacquire; + val.hsa_queue_cas_write_index_relaxed_fn = hsa_queue_cas_write_index_relaxed; + val.hsa_queue_cas_write_index_screlease_fn = hsa_queue_cas_write_index_screlease; + val.hsa_queue_add_write_index_scacq_screl_fn = hsa_queue_add_write_index_scacq_screl; + val.hsa_queue_add_write_index_scacquire_fn = hsa_queue_add_write_index_scacquire; + val.hsa_queue_add_write_index_relaxed_fn = hsa_queue_add_write_index_relaxed; + val.hsa_queue_add_write_index_screlease_fn = hsa_queue_add_write_index_screlease; + val.hsa_queue_store_read_index_relaxed_fn = hsa_queue_store_read_index_relaxed; + val.hsa_queue_store_read_index_screlease_fn = hsa_queue_store_read_index_screlease; + val.hsa_agent_iterate_regions_fn = hsa_agent_iterate_regions; + val.hsa_region_get_info_fn = hsa_region_get_info; + val.hsa_agent_get_exception_policies_fn = hsa_agent_get_exception_policies; + val.hsa_agent_extension_supported_fn = hsa_agent_extension_supported; + val.hsa_memory_register_fn = hsa_memory_register; + val.hsa_memory_deregister_fn = hsa_memory_deregister; + val.hsa_memory_allocate_fn = hsa_memory_allocate; + val.hsa_memory_free_fn = hsa_memory_free; + val.hsa_memory_copy_fn = hsa_memory_copy; + val.hsa_memory_assign_agent_fn = hsa_memory_assign_agent; + val.hsa_signal_create_fn = hsa_signal_create; + val.hsa_signal_destroy_fn = hsa_signal_destroy; + val.hsa_signal_load_relaxed_fn = hsa_signal_load_relaxed; + val.hsa_signal_load_scacquire_fn = hsa_signal_load_scacquire; + val.hsa_signal_store_relaxed_fn = hsa_signal_store_relaxed; + val.hsa_signal_store_screlease_fn = hsa_signal_store_screlease; + val.hsa_signal_wait_relaxed_fn = hsa_signal_wait_relaxed; + val.hsa_signal_wait_scacquire_fn = hsa_signal_wait_scacquire; + val.hsa_signal_and_relaxed_fn = hsa_signal_and_relaxed; + val.hsa_signal_and_scacquire_fn = hsa_signal_and_scacquire; + val.hsa_signal_and_screlease_fn = hsa_signal_and_screlease; + val.hsa_signal_and_scacq_screl_fn = hsa_signal_and_scacq_screl; + val.hsa_signal_or_relaxed_fn = hsa_signal_or_relaxed; + val.hsa_signal_or_scacquire_fn = hsa_signal_or_scacquire; + val.hsa_signal_or_screlease_fn = hsa_signal_or_screlease; + val.hsa_signal_or_scacq_screl_fn = hsa_signal_or_scacq_screl; + val.hsa_signal_xor_relaxed_fn = hsa_signal_xor_relaxed; + val.hsa_signal_xor_scacquire_fn = hsa_signal_xor_scacquire; + val.hsa_signal_xor_screlease_fn = hsa_signal_xor_screlease; + val.hsa_signal_xor_scacq_screl_fn = hsa_signal_xor_scacq_screl; + val.hsa_signal_exchange_relaxed_fn = hsa_signal_exchange_relaxed; + val.hsa_signal_exchange_scacquire_fn = hsa_signal_exchange_scacquire; + val.hsa_signal_exchange_screlease_fn = hsa_signal_exchange_screlease; + val.hsa_signal_exchange_scacq_screl_fn = hsa_signal_exchange_scacq_screl; + val.hsa_signal_add_relaxed_fn = hsa_signal_add_relaxed; + val.hsa_signal_add_scacquire_fn = hsa_signal_add_scacquire; + val.hsa_signal_add_screlease_fn = hsa_signal_add_screlease; + val.hsa_signal_add_scacq_screl_fn = hsa_signal_add_scacq_screl; + val.hsa_signal_subtract_relaxed_fn = hsa_signal_subtract_relaxed; + val.hsa_signal_subtract_scacquire_fn = hsa_signal_subtract_scacquire; + val.hsa_signal_subtract_screlease_fn = hsa_signal_subtract_screlease; + val.hsa_signal_subtract_scacq_screl_fn = hsa_signal_subtract_scacq_screl; + val.hsa_signal_cas_relaxed_fn = hsa_signal_cas_relaxed; + val.hsa_signal_cas_scacquire_fn = hsa_signal_cas_scacquire; + val.hsa_signal_cas_screlease_fn = hsa_signal_cas_screlease; + val.hsa_signal_cas_scacq_screl_fn = hsa_signal_cas_scacq_screl; + val.hsa_isa_from_name_fn = hsa_isa_from_name; + val.hsa_isa_get_info_fn = hsa_isa_get_info; + val.hsa_isa_compatible_fn = hsa_isa_compatible; + val.hsa_code_object_serialize_fn = hsa_code_object_serialize; + val.hsa_code_object_deserialize_fn = hsa_code_object_deserialize; + val.hsa_code_object_destroy_fn = hsa_code_object_destroy; + val.hsa_code_object_get_info_fn = hsa_code_object_get_info; + val.hsa_code_object_get_symbol_fn = hsa_code_object_get_symbol; + val.hsa_code_symbol_get_info_fn = hsa_code_symbol_get_info; + val.hsa_code_object_iterate_symbols_fn = hsa_code_object_iterate_symbols; + val.hsa_executable_create_fn = hsa_executable_create; + val.hsa_executable_destroy_fn = hsa_executable_destroy; + val.hsa_executable_load_code_object_fn = hsa_executable_load_code_object; + val.hsa_executable_freeze_fn = hsa_executable_freeze; + val.hsa_executable_get_info_fn = hsa_executable_get_info; + val.hsa_executable_global_variable_define_fn = hsa_executable_global_variable_define; + val.hsa_executable_agent_global_variable_define_fn = + hsa_executable_agent_global_variable_define; + val.hsa_executable_readonly_variable_define_fn = hsa_executable_readonly_variable_define; + val.hsa_executable_validate_fn = hsa_executable_validate; + val.hsa_executable_get_symbol_fn = hsa_executable_get_symbol; + val.hsa_executable_symbol_get_info_fn = hsa_executable_symbol_get_info; + val.hsa_executable_iterate_symbols_fn = hsa_executable_iterate_symbols; + val.hsa_status_string_fn = hsa_status_string; + val.hsa_extension_get_name_fn = hsa_extension_get_name; + val.hsa_system_major_extension_supported_fn = hsa_system_major_extension_supported; + val.hsa_system_get_major_extension_table_fn = hsa_system_get_major_extension_table; + val.hsa_agent_major_extension_supported_fn = hsa_agent_major_extension_supported; + val.hsa_cache_get_info_fn = hsa_cache_get_info; + val.hsa_agent_iterate_caches_fn = hsa_agent_iterate_caches; + val.hsa_signal_silent_store_relaxed_fn = hsa_signal_silent_store_relaxed; + val.hsa_signal_silent_store_screlease_fn = hsa_signal_silent_store_screlease; + val.hsa_signal_group_create_fn = hsa_signal_group_create; + val.hsa_signal_group_destroy_fn = hsa_signal_group_destroy; + val.hsa_signal_group_wait_any_scacquire_fn = hsa_signal_group_wait_any_scacquire; + val.hsa_signal_group_wait_any_relaxed_fn = hsa_signal_group_wait_any_relaxed; + val.hsa_agent_iterate_isas_fn = hsa_agent_iterate_isas; + val.hsa_isa_get_info_alt_fn = hsa_isa_get_info_alt; + val.hsa_isa_get_exception_policies_fn = hsa_isa_get_exception_policies; + val.hsa_isa_get_round_method_fn = hsa_isa_get_round_method; + val.hsa_wavefront_get_info_fn = hsa_wavefront_get_info; + val.hsa_isa_iterate_wavefronts_fn = hsa_isa_iterate_wavefronts; + val.hsa_code_object_get_symbol_from_name_fn = hsa_code_object_get_symbol_from_name; + val.hsa_code_object_reader_create_from_file_fn = hsa_code_object_reader_create_from_file; + val.hsa_code_object_reader_create_from_memory_fn = + hsa_code_object_reader_create_from_memory; + val.hsa_code_object_reader_destroy_fn = hsa_code_object_reader_destroy; + val.hsa_executable_create_alt_fn = hsa_executable_create_alt; + val.hsa_executable_load_program_code_object_fn = hsa_executable_load_program_code_object; + val.hsa_executable_load_agent_code_object_fn = hsa_executable_load_agent_code_object; + val.hsa_executable_validate_alt_fn = hsa_executable_validate_alt; + val.hsa_executable_get_symbol_by_name_fn = hsa_executable_get_symbol_by_name; + val.hsa_executable_iterate_agent_symbols_fn = hsa_executable_iterate_agent_symbols; + val.hsa_executable_iterate_program_symbols_fn = hsa_executable_iterate_program_symbols; + return val; + }(); + return _v; +} +} // namespace test_constants +} // namespace counters +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp b/source/lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp new file mode 100644 index 00000000..b6a6c400 --- /dev/null +++ b/source/lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp @@ -0,0 +1,41 @@ +// MIT License +// +// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. +#pragma once + +#include +#include +#include + +namespace rocprofiler +{ +namespace counters +{ +namespace test_constants +{ +AmdExtTable& +get_ext_table(); + +CoreApiTable& +get_api_table(); +} // namespace test_constants +} // namespace counters +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/counters/tests/metrics_test.cpp b/source/lib/rocprofiler-sdk/counters/tests/metrics_test.cpp index e77e51d9..05bd3d02 100644 --- a/source/lib/rocprofiler-sdk/counters/tests/metrics_test.cpp +++ b/source/lib/rocprofiler-sdk/counters/tests/metrics_test.cpp @@ -171,7 +171,7 @@ TEST(metrics, check_agent_valid) if(other_gfx == gfx) continue; for(const auto& metric : other_counters) { - if(common_metrics.count(metric.id())) continue; + if(common_metrics.count(metric.id()) || !metric.special().empty()) continue; EXPECT_EQ(counters::checkValidMetric(gfx, metric), false) << fmt::format("GFX {} has Metric {} but shouldn't", gfx, metric); } diff --git a/source/lib/rocprofiler-sdk/counters/tests/metrics_test.h b/source/lib/rocprofiler-sdk/counters/tests/metrics_test.h index 65b91794..b552924f 100644 --- a/source/lib/rocprofiler-sdk/counters/tests/metrics_test.h +++ b/source/lib/rocprofiler-sdk/counters/tests/metrics_test.h @@ -166,7 +166,52 @@ static const std::unordered_map>> derived_gfx908 = {{"gfx908", - {{"GPU_UTIL", + {{"GPUBusy", + "", + "", + "100*GRBM_GUI_ACTIVE/GRBM_COUNT", + "The percentage of time GPU was busy."}, + {"Wavefronts", "", "", "SQ_WAVES", "Total wavefronts."}, + {"VALUInsts", + "", + "", + "SQ_INSTS_VALU/SQ_WAVES", + "The average number of vector ALU instructions executed per work-item (affected by flow " + "control)."}, + {"SALUInsts", + "", + "", + "SQ_INSTS_SALU/SQ_WAVES", + "The average number of scalar ALU instructions executed per work-item (affected by flow " + "control)."}, + {"SFetchInsts", + "", + "", + "SQ_INSTS_SMEM/SQ_WAVES", + "The average number of scalar fetch instructions from the video memory executed per " + "work-item (affected by flow control)."}, + {"GDSInsts", + "", + "", + "SQ_INSTS_GDS/SQ_WAVES", + "The average number of GDS read or GDS write instructions executed per work item " + "(affected by flow control)."}, + {"MemUnitBusy", + "", + "", + "100*reduce(TA_TA_BUSY,max)/GRBM_GUI_ACTIVE/SE_NUM", + "The percentage of GPUTime the memory unit is active. The result includes the stall " + "time (MemUnitStalled). This is measured with all extra fetches and writes and any " + "cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound)."}, + {"ALUStalledByLDS", + "", + "", + "400*SQ_WAIT_INST_LDS/SQ_WAVES/GRBM_GUI_ACTIVE", + "The percentage of GPUTime ALU units are stalled by the LDS input queue being full or " + "the output queue being not ready. If there are LDS bank conflicts, reduce them. " + "Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% " + "(optimal) to 100% (bad)."}, + {"GPU_UTIL", "", "", "100*GRBM_GUI_ACTIVE/GRBM_COUNT", @@ -175,7 +220,8 @@ static const std::unordered_map + + # GPUBusy The percentage of time GPU was busy. + + + # Wavefronts Total wavefronts. + + + # VALUInsts The average number of vector ALU instructions executed per work-item (affected by flow control). + + + # SALUInsts The average number of scalar ALU instructions executed per work-item (affected by flow control). + + + # SFetchInsts The average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control). + + + # GDSInsts The average number of GDS read or GDS write instructions executed per work item (affected by flow control). + + + # MemUnitBusy The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound). + + + # ALUStalledByLDS The percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad). + + + + + @@ -33,7 +92,7 @@ - + @@ -428,7 +487,7 @@ - + @@ -476,7 +535,7 @@ - + @@ -517,63 +576,3 @@ #Navi21 - - - - # GPUBusy The percentage of time GPU was busy. - - - # Wavefronts Total wavefronts. - - - # VALUInsts The average number of vector ALU instructions executed per work-item (affected by flow control). - - - # SALUInsts The average number of scalar ALU instructions executed per work-item (affected by flow control). - - - # SFetchInsts The average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control). - - - # GDSInsts The average number of GDS read or GDS write instructions executed per work item (affected by flow control). - - - # MemUnitBusy The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound). - - - # ALUStalledByLDS The percentage of GPUTime ALU units are stalled by the LDS input queue being full or the output queue being not ready. If there are LDS bank conflicts, reduce them. Otherwise, try reducing the number of LDS accesses if possible. Value range: 0% (optimal) to 100% (bad). - - - diff --git a/source/lib/rocprofiler-sdk/details/kfd_ioctl.h b/source/lib/rocprofiler-sdk/details/kfd_ioctl.h index 081f6681..bd69ad06 100644 --- a/source/lib/rocprofiler-sdk/details/kfd_ioctl.h +++ b/source/lib/rocprofiler-sdk/details/kfd_ioctl.h @@ -23,8 +23,8 @@ #ifndef KFD_IOCTL_H_INCLUDED #define KFD_IOCTL_H_INCLUDED +#include #include -#include /* * - 1.1 - initial version @@ -42,10 +42,9 @@ * - 1.14 - Update kfd_event_data * - 1.15 - Enable managing mappings in compute VMs with GEM_VA ioctl * - 1.16 - Add contiguous VRAM allocation flag - * - 1.17 - Add PC Sampling ioctl */ #define KFD_IOCTL_MAJOR_VERSION 1 -#define KFD_IOCTL_MINOR_VERSION 17 +#define KFD_IOCTL_MINOR_VERSION 16 struct kfd_ioctl_get_version_args { @@ -1724,7 +1723,7 @@ struct kfd_ioctl_pc_sample_args __u32 gpu_id; __u32 trace_id; __u32 flags; /* kfd_ioctl_pcs_query flags */ - __u32 reserved; + __u32 version; }; #define AMDKFD_IOCTL_BASE 'K' diff --git a/source/lib/rocprofiler-sdk/external_correlation.cpp b/source/lib/rocprofiler-sdk/external_correlation.cpp index 8646f9f5..2502443f 100644 --- a/source/lib/rocprofiler-sdk/external_correlation.cpp +++ b/source/lib/rocprofiler-sdk/external_correlation.cpp @@ -146,10 +146,10 @@ external_correlation::pop(rocprofiler_thread_id_t tid) { static auto default_tid = get_default_tid(); - return data.wlock( - [](external_correlation_map_t& _data, rocprofiler_thread_id_t tid_v) { + return data.rlock( + [](const external_correlation_map_t& _data, rocprofiler_thread_id_t tid_v) { if(_data.count(tid_v) == 0) return empty_user_data; - auto& itr = _data.at(tid_v); + const auto& itr = _data.at(tid_v); return itr.wlock([tid_v](external_correlation_stack_t& data_stack) { if(data_stack.empty()) return empty_user_data; auto ret = data_stack.back(); diff --git a/source/lib/rocprofiler-sdk/hip/CMakeLists.txt b/source/lib/rocprofiler-sdk/hip/CMakeLists.txt index f835c243..404d4c61 100644 --- a/source/lib/rocprofiler-sdk/hip/CMakeLists.txt +++ b/source/lib/rocprofiler-sdk/hip/CMakeLists.txt @@ -1,5 +1,5 @@ -set(ROCPROFILER_LIB_HIP_SOURCES hip.cpp) -set(ROCPROFILER_LIB_HIP_HEADERS defines.hpp hip.hpp types.hpp utils.hpp) +set(ROCPROFILER_LIB_HIP_SOURCES abi.cpp hip.cpp) +set(ROCPROFILER_LIB_HIP_HEADERS defines.hpp hip.hpp utils.hpp) target_sources(rocprofiler-object-library PRIVATE ${ROCPROFILER_LIB_HIP_SOURCES} ${ROCPROFILER_LIB_HIP_HEADERS}) diff --git a/source/lib/rocprofiler-sdk/hip/abi.cpp b/source/lib/rocprofiler-sdk/hip/abi.cpp new file mode 100644 index 00000000..038feb3f --- /dev/null +++ b/source/lib/rocprofiler-sdk/hip/abi.cpp @@ -0,0 +1,526 @@ +// MIT License +// +// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in +// all copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +// THE SOFTWARE. + +#include +#include + +#include "lib/common/abi.hpp" +#include "lib/common/defines.hpp" + +namespace rocprofiler +{ +namespace hip +{ +static_assert(HIP_COMPILER_API_TABLE_MAJOR_VERSION == 0, + "Major version updated for HIP compiler dispatch table"); + +// These ensure that function pointers are not re-ordered +ROCP_SDK_ENFORCE_ABI(HipCompilerDispatchTable, __hipPopCallConfiguration_fn, 0) +ROCP_SDK_ENFORCE_ABI(HipCompilerDispatchTable, __hipPushCallConfiguration_fn, 1) +ROCP_SDK_ENFORCE_ABI(HipCompilerDispatchTable, __hipRegisterFatBinary_fn, 2) +ROCP_SDK_ENFORCE_ABI(HipCompilerDispatchTable, __hipRegisterFunction_fn, 3) +ROCP_SDK_ENFORCE_ABI(HipCompilerDispatchTable, __hipRegisterManagedVar_fn, 4) +ROCP_SDK_ENFORCE_ABI(HipCompilerDispatchTable, __hipRegisterSurface_fn, 5) +ROCP_SDK_ENFORCE_ABI(HipCompilerDispatchTable, __hipRegisterTexture_fn, 6) +ROCP_SDK_ENFORCE_ABI(HipCompilerDispatchTable, __hipRegisterVar_fn, 7) +ROCP_SDK_ENFORCE_ABI(HipCompilerDispatchTable, __hipUnregisterFatBinary_fn, 8) + +#if HIP_COMPILER_API_TABLE_STEP_VERSION == 0 +ROCP_SDK_ENFORCE_ABI_VERSIONING(HipCompilerDispatchTable, 9) +#endif + +static_assert(HIP_RUNTIME_API_TABLE_MAJOR_VERSION == 0, + "Major version updated for HIP runtime dispatch table"); + +// These ensure that function pointers are not re-ordered +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipApiName_fn, 0) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipArray3DCreate_fn, 1) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipArray3DGetDescriptor_fn, 2) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipArrayCreate_fn, 3) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipArrayDestroy_fn, 4) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipArrayGetDescriptor_fn, 5) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipArrayGetInfo_fn, 6) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipBindTexture_fn, 7) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipBindTexture2D_fn, 8) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipBindTextureToArray_fn, 9) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipBindTextureToMipmappedArray_fn, 10) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipChooseDevice_fn, 11) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipChooseDeviceR0000_fn, 12) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipConfigureCall_fn, 13) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCreateSurfaceObject_fn, 14) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCreateTextureObject_fn, 15) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxCreate_fn, 16) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxDestroy_fn, 17) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxDisablePeerAccess_fn, 18) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxEnablePeerAccess_fn, 19) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxGetApiVersion_fn, 20) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxGetCacheConfig_fn, 21) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxGetCurrent_fn, 22) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxGetDevice_fn, 23) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxGetFlags_fn, 24) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxGetSharedMemConfig_fn, 25) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxPopCurrent_fn, 26) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxPushCurrent_fn, 27) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxSetCacheConfig_fn, 28) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxSetCurrent_fn, 29) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxSetSharedMemConfig_fn, 30) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCtxSynchronize_fn, 31) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDestroyExternalMemory_fn, 32) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDestroyExternalSemaphore_fn, 33) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDestroySurfaceObject_fn, 34) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDestroyTextureObject_fn, 35) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceCanAccessPeer_fn, 36) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceComputeCapability_fn, 37) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceDisablePeerAccess_fn, 38) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceEnablePeerAccess_fn, 39) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGet_fn, 40) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetAttribute_fn, 41) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetByPCIBusId_fn, 42) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetCacheConfig_fn, 43) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetDefaultMemPool_fn, 44) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetGraphMemAttribute_fn, 45) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetLimit_fn, 46) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetMemPool_fn, 47) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetName_fn, 48) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetP2PAttribute_fn, 49) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetPCIBusId_fn, 50) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetSharedMemConfig_fn, 51) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetStreamPriorityRange_fn, 52) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGetUuid_fn, 53) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceGraphMemTrim_fn, 54) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDevicePrimaryCtxGetState_fn, 55) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDevicePrimaryCtxRelease_fn, 56) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDevicePrimaryCtxReset_fn, 57) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDevicePrimaryCtxRetain_fn, 58) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDevicePrimaryCtxSetFlags_fn, 59) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceReset_fn, 60) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceSetCacheConfig_fn, 61) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceSetGraphMemAttribute_fn, 62) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceSetLimit_fn, 63) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceSetMemPool_fn, 64) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceSetSharedMemConfig_fn, 65) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceSynchronize_fn, 66) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDeviceTotalMem_fn, 67) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDriverGetVersion_fn, 68) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDrvGetErrorName_fn, 69) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDrvGetErrorString_fn, 70) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDrvGraphAddMemcpyNode_fn, 71) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDrvMemcpy2DUnaligned_fn, 72) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDrvMemcpy3D_fn, 73) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDrvMemcpy3DAsync_fn, 74) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDrvPointerGetAttributes_fn, 75) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipEventCreate_fn, 76) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipEventCreateWithFlags_fn, 77) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipEventDestroy_fn, 78) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipEventElapsedTime_fn, 79) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipEventQuery_fn, 80) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipEventRecord_fn, 81) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipEventSynchronize_fn, 82) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipExtGetLinkTypeAndHopCount_fn, 83) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipExtLaunchKernel_fn, 84) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipExtLaunchMultiKernelMultiDevice_fn, 85) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipExtMallocWithFlags_fn, 86) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipExtStreamCreateWithCUMask_fn, 87) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipExtStreamGetCUMask_fn, 88) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipExternalMemoryGetMappedBuffer_fn, 89) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFree_fn, 90) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFreeArray_fn, 91) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFreeAsync_fn, 92) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFreeHost_fn, 93) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFreeMipmappedArray_fn, 94) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFuncGetAttribute_fn, 95) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFuncGetAttributes_fn, 96) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFuncSetAttribute_fn, 97) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFuncSetCacheConfig_fn, 98) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipFuncSetSharedMemConfig_fn, 99) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGLGetDevices_fn, 100) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetChannelDesc_fn, 101) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetDevice_fn, 102) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetDeviceCount_fn, 103) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetDeviceFlags_fn, 104) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetDevicePropertiesR0600_fn, 105) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetDevicePropertiesR0000_fn, 106) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetErrorName_fn, 107) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetErrorString_fn, 108) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetLastError_fn, 109) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetMipmappedArrayLevel_fn, 110) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetSymbolAddress_fn, 111) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetSymbolSize_fn, 112) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetTextureAlignmentOffset_fn, 113) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetTextureObjectResourceDesc_fn, 114) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetTextureObjectResourceViewDesc_fn, 115) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetTextureObjectTextureDesc_fn, 116) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetTextureReference_fn, 117) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddChildGraphNode_fn, 118) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddDependencies_fn, 119) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddEmptyNode_fn, 120) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddEventRecordNode_fn, 121) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddEventWaitNode_fn, 122) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddHostNode_fn, 123) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddKernelNode_fn, 124) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddMemAllocNode_fn, 125) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddMemFreeNode_fn, 126) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddMemcpyNode_fn, 127) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddMemcpyNode1D_fn, 128) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddMemcpyNodeFromSymbol_fn, 129) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddMemcpyNodeToSymbol_fn, 130) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddMemsetNode_fn, 131) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphChildGraphNodeGetGraph_fn, 132) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphClone_fn, 133) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphCreate_fn, 134) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphDebugDotPrint_fn, 135) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphDestroy_fn, 136) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphDestroyNode_fn, 137) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphEventRecordNodeGetEvent_fn, 138) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphEventRecordNodeSetEvent_fn, 139) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphEventWaitNodeGetEvent_fn, 140) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphEventWaitNodeSetEvent_fn, 141) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecChildGraphNodeSetParams_fn, 142) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecDestroy_fn, 143) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecEventRecordNodeSetEvent_fn, 144) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecEventWaitNodeSetEvent_fn, 145) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecHostNodeSetParams_fn, 146) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecKernelNodeSetParams_fn, 147) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecMemcpyNodeSetParams_fn, 148) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecMemcpyNodeSetParams1D_fn, 149) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecMemcpyNodeSetParamsFromSymbol_fn, 150) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecMemcpyNodeSetParamsToSymbol_fn, 151) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecMemsetNodeSetParams_fn, 152) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecUpdate_fn, 153) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphGetEdges_fn, 154) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphGetNodes_fn, 155) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphGetRootNodes_fn, 156) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphHostNodeGetParams_fn, 157) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphHostNodeSetParams_fn, 158) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphInstantiate_fn, 159) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphInstantiateWithFlags_fn, 160) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphKernelNodeCopyAttributes_fn, 161) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphKernelNodeGetAttribute_fn, 162) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphKernelNodeGetParams_fn, 163) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphKernelNodeSetAttribute_fn, 164) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphKernelNodeSetParams_fn, 165) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphLaunch_fn, 166) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphMemAllocNodeGetParams_fn, 167) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphMemFreeNodeGetParams_fn, 168) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphMemcpyNodeGetParams_fn, 169) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphMemcpyNodeSetParams_fn, 170) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphMemcpyNodeSetParams1D_fn, 171) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphMemcpyNodeSetParamsFromSymbol_fn, 172) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphMemcpyNodeSetParamsToSymbol_fn, 173) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphMemsetNodeGetParams_fn, 174) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphMemsetNodeSetParams_fn, 175) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphNodeFindInClone_fn, 176) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphNodeGetDependencies_fn, 177) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphNodeGetDependentNodes_fn, 178) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphNodeGetEnabled_fn, 179) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphNodeGetType_fn, 180) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphNodeSetEnabled_fn, 181) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphReleaseUserObject_fn, 182) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphRemoveDependencies_fn, 183) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphRetainUserObject_fn, 184) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphUpload_fn, 185) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphicsGLRegisterBuffer_fn, 186) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphicsGLRegisterImage_fn, 187) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphicsMapResources_fn, 188) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphicsResourceGetMappedPointer_fn, 189) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphicsSubResourceGetMappedArray_fn, 190) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphicsUnmapResources_fn, 191) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphicsUnregisterResource_fn, 192) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipHostAlloc_fn, 193) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipHostFree_fn, 194) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipHostGetDevicePointer_fn, 195) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipHostGetFlags_fn, 196) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipHostMalloc_fn, 197) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipHostRegister_fn, 198) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipHostUnregister_fn, 199) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipImportExternalMemory_fn, 200) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipImportExternalSemaphore_fn, 201) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipInit_fn, 202) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipIpcCloseMemHandle_fn, 203) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipIpcGetEventHandle_fn, 204) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipIpcGetMemHandle_fn, 205) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipIpcOpenEventHandle_fn, 206) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipIpcOpenMemHandle_fn, 207) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipKernelNameRef_fn, 208) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipKernelNameRefByPtr_fn, 209) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipLaunchByPtr_fn, 210) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipLaunchCooperativeKernel_fn, 211) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipLaunchCooperativeKernelMultiDevice_fn, 212) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipLaunchHostFunc_fn, 213) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipLaunchKernel_fn, 214) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMalloc_fn, 215) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMalloc3D_fn, 216) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMalloc3DArray_fn, 217) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMallocArray_fn, 218) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMallocAsync_fn, 219) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMallocFromPoolAsync_fn, 220) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMallocHost_fn, 221) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMallocManaged_fn, 222) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMallocMipmappedArray_fn, 223) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMallocPitch_fn, 224) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemAddressFree_fn, 225) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemAddressReserve_fn, 226) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemAdvise_fn, 227) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemAllocHost_fn, 228) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemAllocPitch_fn, 229) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemCreate_fn, 230) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemExportToShareableHandle_fn, 231) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemGetAccess_fn, 232) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemGetAddressRange_fn, 233) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemGetAllocationGranularity_fn, 234) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemGetAllocationPropertiesFromHandle_fn, 235) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemGetInfo_fn, 236) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemImportFromShareableHandle_fn, 237) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemMap_fn, 238) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemMapArrayAsync_fn, 239) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolCreate_fn, 240) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolDestroy_fn, 241) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolExportPointer_fn, 242) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolExportToShareableHandle_fn, 243) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolGetAccess_fn, 244) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolGetAttribute_fn, 245) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolImportFromShareableHandle_fn, 246) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolImportPointer_fn, 247) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolSetAccess_fn, 248) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolSetAttribute_fn, 249) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPoolTrimTo_fn, 250) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPrefetchAsync_fn, 251) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemPtrGetInfo_fn, 252) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemRangeGetAttribute_fn, 253) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemRangeGetAttributes_fn, 254) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemRelease_fn, 255) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemRetainAllocationHandle_fn, 256) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemSetAccess_fn, 257) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemUnmap_fn, 258) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy_fn, 259) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2D_fn, 260) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DAsync_fn, 261) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DFromArray_fn, 262) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DFromArrayAsync_fn, 263) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DToArray_fn, 264) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DToArrayAsync_fn, 265) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy3D_fn, 266) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy3DAsync_fn, 267) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyAsync_fn, 268) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyAtoH_fn, 269) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyDtoD_fn, 270) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyDtoDAsync_fn, 271) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyDtoH_fn, 272) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyDtoHAsync_fn, 273) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyFromArray_fn, 274) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyFromSymbol_fn, 275) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyFromSymbolAsync_fn, 276) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyHtoA_fn, 277) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyHtoD_fn, 278) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyHtoDAsync_fn, 279) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyParam2D_fn, 280) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyParam2DAsync_fn, 281) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyPeer_fn, 282) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyPeerAsync_fn, 283) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyToArray_fn, 284) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyToSymbol_fn, 285) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyToSymbolAsync_fn, 286) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyWithStream_fn, 287) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset_fn, 288) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset2D_fn, 289) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset2DAsync_fn, 290) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset3D_fn, 291) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset3DAsync_fn, 292) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemsetAsync_fn, 293) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemsetD16_fn, 294) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemsetD16Async_fn, 295) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemsetD32_fn, 296) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemsetD32Async_fn, 297) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemsetD8_fn, 298) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemsetD8Async_fn, 299) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMipmappedArrayCreate_fn, 300) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMipmappedArrayDestroy_fn, 301) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMipmappedArrayGetLevel_fn, 302) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleGetFunction_fn, 303) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleGetGlobal_fn, 304) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleGetTexRef_fn, 305) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleLaunchCooperativeKernel_fn, 306) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleLaunchCooperativeKernelMultiDevice_fn, 307) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleLaunchKernel_fn, 308) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleLoad_fn, 309) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleLoadData_fn, 310) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleLoadDataEx_fn, 311) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleOccupancyMaxActiveBlocksPerMultiprocessor_fn, 312) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, + hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags_fn, + 313) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleOccupancyMaxPotentialBlockSize_fn, 314) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleOccupancyMaxPotentialBlockSizeWithFlags_fn, 315) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipModuleUnload_fn, 316) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipOccupancyMaxActiveBlocksPerMultiprocessor_fn, 317) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, + hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags_fn, + 318) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipOccupancyMaxPotentialBlockSize_fn, 319) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipPeekAtLastError_fn, 320) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipPointerGetAttribute_fn, 321) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipPointerGetAttributes_fn, 322) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipPointerSetAttribute_fn, 323) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipProfilerStart_fn, 324) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipProfilerStop_fn, 325) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipRuntimeGetVersion_fn, 326) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipSetDevice_fn, 327) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipSetDeviceFlags_fn, 328) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipSetupArgument_fn, 329) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipSignalExternalSemaphoresAsync_fn, 330) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamAddCallback_fn, 331) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamAttachMemAsync_fn, 332) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamBeginCapture_fn, 333) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamCreate_fn, 334) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamCreateWithFlags_fn, 335) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamCreateWithPriority_fn, 336) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamDestroy_fn, 337) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamEndCapture_fn, 338) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamGetCaptureInfo_fn, 339) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamGetCaptureInfo_v2_fn, 340) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamGetDevice_fn, 341) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamGetFlags_fn, 342) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamGetPriority_fn, 343) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamIsCapturing_fn, 344) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamQuery_fn, 345) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamSynchronize_fn, 346) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamUpdateCaptureDependencies_fn, 347) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamWaitEvent_fn, 348) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamWaitValue32_fn, 349) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamWaitValue64_fn, 350) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamWriteValue32_fn, 351) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamWriteValue64_fn, 352) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexObjectCreate_fn, 353) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexObjectDestroy_fn, 354) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexObjectGetResourceDesc_fn, 355) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexObjectGetResourceViewDesc_fn, 356) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexObjectGetTextureDesc_fn, 357) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetAddress_fn, 358) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetAddressMode_fn, 359) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetFilterMode_fn, 360) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetFlags_fn, 361) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetFormat_fn, 362) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetMaxAnisotropy_fn, 363) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetMipMappedArray_fn, 364) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetMipmapFilterMode_fn, 365) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetMipmapLevelBias_fn, 366) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetMipmapLevelClamp_fn, 367) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetAddress_fn, 368) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetAddress2D_fn, 369) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetAddressMode_fn, 370) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetArray_fn, 371) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetBorderColor_fn, 372) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetFilterMode_fn, 373) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetFlags_fn, 374) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetFormat_fn, 375) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetMaxAnisotropy_fn, 376) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetMipmapFilterMode_fn, 377) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetMipmapLevelBias_fn, 378) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetMipmapLevelClamp_fn, 379) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefSetMipmappedArray_fn, 380) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipThreadExchangeStreamCaptureMode_fn, 381) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipUnbindTexture_fn, 382) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipUserObjectCreate_fn, 383) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipUserObjectRelease_fn, 384) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipUserObjectRetain_fn, 385) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipWaitExternalSemaphoresAsync_fn, 386) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipCreateChannelDesc_fn, 387) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipExtModuleLaunchKernel_fn, 388) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipHccModuleLaunchKernel_fn, 389) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy_spt_fn, 390) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyToSymbol_spt_fn, 391) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyFromSymbol_spt_fn, 392) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2D_spt_fn, 393) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DFromArray_spt_fn, 394) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy3D_spt_fn, 395) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset_spt_fn, 396) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemsetAsync_spt_fn, 397) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset2D_spt_fn, 398) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset2DAsync_spt_fn, 399) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset3DAsync_spt_fn, 400) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemset3D_spt_fn, 401) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyAsync_spt_fn, 402) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy3DAsync_spt_fn, 403) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DAsync_spt_fn, 404) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyFromSymbolAsync_spt_fn, 405) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyToSymbolAsync_spt_fn, 406) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpyFromArray_spt_fn, 407) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DToArray_spt_fn, 408) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DFromArrayAsync_spt_fn, 409) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipMemcpy2DToArrayAsync_spt_fn, 410) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamQuery_spt_fn, 411) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamSynchronize_spt_fn, 412) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamGetPriority_spt_fn, 413) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamWaitEvent_spt_fn, 414) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamGetFlags_spt_fn, 415) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamAddCallback_spt_fn, 416) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipEventRecord_spt_fn, 417) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipLaunchCooperativeKernel_spt_fn, 418) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipLaunchKernel_spt_fn, 419) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphLaunch_spt_fn, 420) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamBeginCapture_spt_fn, 421) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamEndCapture_spt_fn, 422) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamIsCapturing_spt_fn, 423) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamGetCaptureInfo_spt_fn, 424) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamGetCaptureInfo_v2_spt_fn, 425) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipLaunchHostFunc_spt_fn, 426) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetStreamDeviceId_fn, 427) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipDrvGraphAddMemsetNode_fn, 428) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddExternalSemaphoresWaitNode_fn, 429); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddExternalSemaphoresSignalNode_fn, 430); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExternalSemaphoresSignalNodeSetParams_fn, 431); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExternalSemaphoresWaitNodeSetParams_fn, 432); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExternalSemaphoresSignalNodeGetParams_fn, 433); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExternalSemaphoresWaitNodeGetParams_fn, 434); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecExternalSemaphoresSignalNodeSetParams_fn, 435); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphExecExternalSemaphoresWaitNodeSetParams_fn, 436); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphAddNode_fn, 437); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGraphInstantiateWithParams_fn, 438); +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipExtGetLastError_fn, 439) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetBorderColor_fn, 440) +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipTexRefGetArray_fn, 441) + +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 1 +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetProcAddress_fn, 442) +#endif + +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 2 +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipStreamBeginCaptureToGraph_fn, 443); +#endif + +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 3 +ROCP_SDK_ENFORCE_ABI(HipDispatchTable, hipGetFuncBySymbol_fn, 444); +#endif + +#if HIP_RUNTIME_API_TABLE_STEP_VERSION == 0 +ROCP_SDK_ENFORCE_ABI_VERSIONING(HipDispatchTable, 442) +#elif HIP_RUNTIME_API_TABLE_STEP_VERSION == 1 +ROCP_SDK_ENFORCE_ABI_VERSIONING(HipDispatchTable, 443) +#elif HIP_RUNTIME_API_TABLE_STEP_VERSION == 2 +ROCP_SDK_ENFORCE_ABI_VERSIONING(HipDispatchTable, 444) +#elif HIP_RUNTIME_API_TABLE_STEP_VERSION == 3 +ROCP_SDK_ENFORCE_ABI_VERSIONING(HipDispatchTable, 445) +#elif HIP_RUNTIME_API_TABLE_STEP_VERSION == 4 +ROCP_SDK_ENFORCE_ABI_VERSIONING(HipDispatchTable, 459) +#endif +} // namespace hip +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/hip/details/CMakeLists.txt b/source/lib/rocprofiler-sdk/hip/details/CMakeLists.txt index fdb54985..4565f4bb 100644 --- a/source/lib/rocprofiler-sdk/hip/details/CMakeLists.txt +++ b/source/lib/rocprofiler-sdk/hip/details/CMakeLists.txt @@ -2,7 +2,7 @@ # # set(ROCPROFILER_LIB_HIP_DETAILS_SOURCES) -set(ROCPROFILER_LIB_HIP_DETAILS_HEADERS ostream.hpp) +set(ROCPROFILER_LIB_HIP_DETAILS_HEADERS format.hpp ostream.hpp) target_sources(rocprofiler-object-library PRIVATE ${ROCPROFILER_LIB_HIP_DETAILS_SOURCES} ${ROCPROFILER_LIB_HIP_DETAILS_HEADERS}) diff --git a/source/lib/rocprofiler-sdk/hip/details/format.hpp b/source/lib/rocprofiler-sdk/hip/details/format.hpp new file mode 100644 index 00000000..9579c377 --- /dev/null +++ b/source/lib/rocprofiler-sdk/hip/details/format.hpp @@ -0,0 +1,324 @@ +// MIT License +// +// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in +// all copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +// THE SOFTWARE. + +#pragma once + +#include "lib/rocprofiler-sdk/hip/details/ostream.hpp" + +#include +#include + +#include +// must be included after runtime api +#include + +#include "fmt/core.h" +#include "fmt/ranges.h" + +#define ROCP_SDK_HIP_FORMATTER(TYPE, ...) \ + template <> \ + struct formatter : rocprofiler::hip::details::base_formatter \ + { \ + template \ + auto format(const TYPE& v, Ctx& ctx) const \ + { \ + return fmt::format_to(ctx.out(), __VA_ARGS__); \ + } \ + }; + +#define ROCP_SDK_HIP_OSTREAM_FORMATTER(TYPE) \ + template <> \ + struct formatter : rocprofiler::hip::details::base_formatter \ + { \ + template \ + auto format(const TYPE& v, Ctx& ctx) const \ + { \ + auto _ss = std::stringstream{}; \ + _ss << v; \ + return fmt::format_to(ctx.out(), "{}", _ss.str()); \ + } \ + }; + +#define ROCP_SDK_HIP_FORMAT_CASE_STMT(PREFIX, SUFFIX) \ + case PREFIX##SUFFIX: return fmt::format_to(ctx.out(), #SUFFIX) + +namespace rocprofiler +{ +namespace hip +{ +namespace details +{ +struct base_formatter +{ + template + constexpr auto parse(ParseContext& ctx) + { + return ctx.begin(); + } +}; +} // namespace details +} // namespace hip +} // namespace rocprofiler + +namespace fmt +{ +template <> +struct formatter : rocprofiler::hip::details::base_formatter +{ + template + auto format(const rocprofiler_dim3_t& v, Ctx& ctx) const + { + return fmt::format_to(ctx.out(), "{}z={}, y={}, x={}{}", '{', v.z, v.y, v.x, '}'); + } +}; + +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipExtent) +ROCP_SDK_HIP_OSTREAM_FORMATTER(dim3) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipPitchedPtr) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipPos) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipMemcpy3DParms) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipMemAllocNodeParams) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipMemsetParams) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipKernelNodeParams) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipHostNodeParams) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipExternalSemaphoreSignalNodeParams) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipExternalSemaphoreWaitNodeParams) +ROCP_SDK_HIP_OSTREAM_FORMATTER(hipMemPoolProps) + +ROCP_SDK_HIP_FORMATTER(hipMemcpyNodeParams, + "{}flags={}, copyParams={}{}", + '{', + v.flags, + v.copyParams, + '}') +ROCP_SDK_HIP_FORMATTER(hipChildGraphNodeParams, + "{}graph={}{}", + '{', + static_cast(v.graph), + '}') +ROCP_SDK_HIP_FORMATTER(hipEventWaitNodeParams, + "{}event={}{}", + '{', + static_cast(v.event), + '}') +ROCP_SDK_HIP_FORMATTER(hipEventRecordNodeParams, + "{}event={}{}", + '{', + static_cast(v.event), + '}') + +ROCP_SDK_HIP_FORMATTER(hipMemFreeNodeParams, "{}dptr={}{}", '{', v.dptr, '}') +ROCP_SDK_HIP_FORMATTER(hipGraphInstantiateParams, + "{}errNode_out={}, flags={}, result_out={}, uploadStream={}{}", + '{', + static_cast(v.errNode_out), + v.flags, + v.result_out, + static_cast(v.uploadStream), + '}') +ROCP_SDK_HIP_FORMATTER(hipGraphEdgeData, + "{}from_port={}, to_port={}, type={}{}", + '{', + v.from_port, + v.to_port, + v.type, + '}') +ROCP_SDK_HIP_FORMATTER(HIP_MEMSET_NODE_PARAMS, + "{}dst={}, pitch={}, value={}, elementSize={}, width={}, height={}{}", + '{', + v.dst, + v.pitch, + v.value, + v.elementSize, + v.width, + v.height, + '}') +ROCP_SDK_HIP_FORMATTER(hipMemLocation, "{}type={}, id={}{}", '{', v.type, v.id, '}') + +template <> +struct formatter : rocprofiler::hip::details::base_formatter +{ + template + auto format(hipGraphNodeType v, Ctx& ctx) const + { + switch(v) + { + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, Kernel); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, Memcpy); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, Memset); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, Host); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, Graph); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, WaitEvent); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, EventRecord); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, ExtSemaphoreSignal); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, ExtSemaphoreWait); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, MemAlloc); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, MemFree); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, MemcpyFromSymbol); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, MemcpyToSymbol); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, Empty); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphNodeType, Count); + } + return fmt::format_to(ctx.out(), "Unknown"); + } +}; + +template <> +struct formatter : rocprofiler::hip::details::base_formatter +{ + template + auto format(hipGraphInstantiateResult v, Ctx& ctx) const + { + switch(v) + { + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphInstantiate, Success); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphInstantiate, Error); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphInstantiate, InvalidStructure); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphInstantiate, NodeOperationNotSupported); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipGraphInstantiate, MultipleDevicesNotSupported); + } + return fmt::format_to(ctx.out(), "Unknown"); + } +}; + +template <> +struct formatter : rocprofiler::hip::details::base_formatter +{ + template + auto format(hipMemAllocationType v, Ctx& ctx) const + { + switch(v) + { + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemAllocationType, Invalid); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemAllocationType, Pinned); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemAllocationType, Max); + } + return fmt::format_to(ctx.out(), "Unknown"); + } +}; + +template <> +struct formatter : rocprofiler::hip::details::base_formatter +{ + template + auto format(hipMemLocationType v, Ctx& ctx) const + { + switch(v) + { + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemLocationType, Invalid); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemLocationType, Device); + } + return fmt::format_to(ctx.out(), "Unknown"); + } +}; + +template <> +struct formatter : rocprofiler::hip::details::base_formatter +{ + template + auto format(hipMemAllocationHandleType v, Ctx& ctx) const + { + switch(v) + { + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemHandleType, None); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemHandleType, PosixFileDescriptor); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemHandleType, Win32); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemHandleType, Win32Kmt); + } + return fmt::format_to(ctx.out(), "Unknown"); + } +}; + +template <> +struct formatter : rocprofiler::hip::details::base_formatter +{ + template + auto format(hipMemcpyKind v, Ctx& ctx) const + { + switch(v) + { + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemcpy, HostToHost); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemcpy, HostToDevice); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemcpy, DeviceToHost); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemcpy, DeviceToDevice); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemcpy, Default); + ROCP_SDK_HIP_FORMAT_CASE_STMT(hipMemcpy, DeviceToDeviceNoCU); + } + return fmt::format_to(ctx.out(), "Unknown"); + } +}; + +template <> +struct formatter : rocprofiler::hip::details::base_formatter +{ + template + auto format(const hipGraphNodeParams& v, Ctx& ctx) const + { + switch(v.type) + { + case hipGraphNodeTypeKernel: + return fmt::format_to( + ctx.out(), "{}type={}, kernel={}{}", '{', v.type, v.kernel, '}'); + case hipGraphNodeTypeMemcpy: + return fmt::format_to( + ctx.out(), "{}type={}, memcpy={}{}", '{', v.type, v.memcpy, '}'); + case hipGraphNodeTypeMemset: + return fmt::format_to( + ctx.out(), "{}type={}, memset={}{}", '{', v.type, v.memset, '}'); + case hipGraphNodeTypeHost: + return fmt::format_to(ctx.out(), "{}type={}, host={}{}", '{', v.type, v.host, '}'); + case hipGraphNodeTypeGraph: + return fmt::format_to( + ctx.out(), "{}type={}, graph={}{}", '{', v.type, v.graph, '}'); + case hipGraphNodeTypeWaitEvent: + return fmt::format_to( + ctx.out(), "{}type={}, eventWait={}{}", '{', v.type, v.eventWait, '}'); + case hipGraphNodeTypeEventRecord: + return fmt::format_to( + ctx.out(), "{}type={}, eventRecord={}{}", '{', v.type, v.eventRecord, '}'); + case hipGraphNodeTypeExtSemaphoreSignal: + return fmt::format_to( + ctx.out(), "{}type={}, extSemSignal={}{}", '{', v.type, v.extSemSignal, '}'); + case hipGraphNodeTypeExtSemaphoreWait: + return fmt::format_to( + ctx.out(), "{}type={}, extSemWait={}{}", '{', v.type, v.extSemWait, '}'); + case hipGraphNodeTypeMemAlloc: + return fmt::format_to( + ctx.out(), "{}type={}, alloc={}{}", '{', v.type, v.alloc, '}'); + case hipGraphNodeTypeMemFree: + return fmt::format_to(ctx.out(), "{}type={}, free={}{}", '{', v.type, v.free, '}'); + case hipGraphNodeTypeMemcpyFromSymbol: + case hipGraphNodeTypeMemcpyToSymbol: + case hipGraphNodeTypeEmpty: + case hipGraphNodeTypeCount: + { + break; + } + } + return fmt::format_to(ctx.out(), "{}type={}{}", '{', v.type, '}'); + } +}; +} // namespace fmt + +#undef ROCP_SDK_HIP_FORMATTER +#undef ROCP_SDK_HIP_OSTREAM_FORMATTER +#undef ROCP_SDK_HIP_FORMAT_CASE_STMT diff --git a/source/lib/rocprofiler-sdk/hip/details/ostream.hpp b/source/lib/rocprofiler-sdk/hip/details/ostream.hpp index f7c736f8..8d30f98f 100644 --- a/source/lib/rocprofiler-sdk/hip/details/ostream.hpp +++ b/source/lib/rocprofiler-sdk/hip/details/ostream.hpp @@ -4994,7 +4994,14 @@ operator<<(std::ostream& out, const hipUUID& v) } inline static std::ostream& -operator<<(std::ostream& out, const hipDeviceProp_t& v) +operator<<(std::ostream& out, const hipDeviceProp_tR0000& v) +{ + ::rocprofiler::hip::detail::operator<<(out, v); + return out; +} + +inline static std::ostream& +operator<<(std::ostream& out, const hipDeviceProp_tR0600& v) { ::rocprofiler::hip::detail::operator<<(out, v); return out; diff --git a/source/lib/rocprofiler-sdk/hip/hip.cpp b/source/lib/rocprofiler-sdk/hip/hip.cpp index e2eee921..da38aecd 100644 --- a/source/lib/rocprofiler-sdk/hip/hip.cpp +++ b/source/lib/rocprofiler-sdk/hip/hip.cpp @@ -25,8 +25,6 @@ #include "lib/common/utility.hpp" #include "lib/rocprofiler-sdk/buffer.hpp" #include "lib/rocprofiler-sdk/context/context.hpp" -#include "lib/rocprofiler-sdk/hip/details/ostream.hpp" -#include "lib/rocprofiler-sdk/hip/types.hpp" #include "lib/rocprofiler-sdk/hip/utils.hpp" #include "lib/rocprofiler-sdk/registration.hpp" #include "lib/rocprofiler-sdk/tracing/tracing.hpp" @@ -222,9 +220,6 @@ hip_api_impl::functor(Args... args) return; } - ROCP_FATAL_IF(external_corr_ids.size() < (callback_contexts.size() + buffered_contexts.size())) - << "missing external correlation ids"; - auto buffer_record = common::init_public_api_struct(buffered_api_data_t{}); auto tracer_data = common::init_public_api_struct(callback_api_data_t{}); auto* corr_id = tracing::correlation_service::construct(ref_count); diff --git a/source/lib/rocprofiler-sdk/hip/hip.def.cpp b/source/lib/rocprofiler-sdk/hip/hip.def.cpp index 9e7d485e..03130d75 100644 --- a/source/lib/rocprofiler-sdk/hip/hip.def.cpp +++ b/source/lib/rocprofiler-sdk/hip/hip.def.cpp @@ -509,7 +509,49 @@ HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNT HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipStreamGetCaptureInfo_v2_spt, hipStreamGetCaptureInfo_v2_spt, hipStreamGetCaptureInfo_v2_spt_fn, stream, captureStatus_out, id_out, graph_out, dependencies_out, numDependencies_out) HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipLaunchHostFunc_spt, hipLaunchHostFunc_spt, hipLaunchHostFunc_spt_fn, stream, fn, userData) HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGetStreamDeviceId, hipGetStreamDeviceId, hipGetStreamDeviceId_fn, stream) -// HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphAddMemsetNode, hipDrvGraphAddMemsetNode, hipDrvGraphAddMemsetNode_fn, phGraphNode, hGraph, dependencies, numDependencies, memsetParams, ctx) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphAddMemsetNode, hipDrvGraphAddMemsetNode, hipDrvGraphAddMemsetNode_fn, phGraphNode, hGraph, dependencies, numDependencies, memsetParams, ctx) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphAddExternalSemaphoresWaitNode, hipGraphAddExternalSemaphoresWaitNode, hipGraphAddExternalSemaphoresWaitNode_fn, pGraphNode, graph, pDependencies, numDependencies, nodeParams) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphAddExternalSemaphoresSignalNode, hipGraphAddExternalSemaphoresSignalNode, hipGraphAddExternalSemaphoresSignalNode_fn, pGraphNode, graph, pDependencies, numDependencies, nodeParams) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExternalSemaphoresSignalNodeSetParams, hipGraphExternalSemaphoresSignalNodeSetParams, hipGraphExternalSemaphoresSignalNodeSetParams_fn, hNode, nodeParams) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExternalSemaphoresWaitNodeSetParams, hipGraphExternalSemaphoresWaitNodeSetParams, hipGraphExternalSemaphoresWaitNodeSetParams_fn, hNode, nodeParams) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExternalSemaphoresSignalNodeGetParams, hipGraphExternalSemaphoresSignalNodeGetParams, hipGraphExternalSemaphoresSignalNodeGetParams_fn, hNode, params_out) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExternalSemaphoresWaitNodeGetParams, hipGraphExternalSemaphoresWaitNodeGetParams, hipGraphExternalSemaphoresWaitNodeGetParams_fn, hNode, params_out) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExecExternalSemaphoresSignalNodeSetParams, hipGraphExecExternalSemaphoresSignalNodeSetParams, hipGraphExecExternalSemaphoresSignalNodeSetParams_fn, hGraphExec, hNode, nodeParams) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExecExternalSemaphoresWaitNodeSetParams, hipGraphExecExternalSemaphoresWaitNodeSetParams, hipGraphExecExternalSemaphoresWaitNodeSetParams_fn, hGraphExec, hNode, nodeParams) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphAddNode, hipGraphAddNode, hipGraphAddNode_fn, pGraphNode, graph, pDependencies, numDependencies, nodeParams) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphInstantiateWithParams, hipGraphInstantiateWithParams, hipGraphInstantiateWithParams_fn, pGraphExec, graph, instantiateParams) +HIP_API_INFO_DEFINITION_0(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipExtGetLastError, hipExtGetLastError, hipExtGetLastError_fn) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipTexRefGetBorderColor, hipTexRefGetBorderColor, hipTexRefGetBorderColor_fn, pBorderColor, texRef) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipTexRefGetArray, hipTexRefGetArray, hipTexRefGetArray_fn, pArray, texRef) + +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 1 +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGetProcAddress, hipGetProcAddress, hipGetProcAddress_fn, symbol, pfn, hipVersion, flags, symbolStatus) +#endif + +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 2 +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipStreamBeginCaptureToGraph, hipStreamBeginCaptureToGraph, hipStreamBeginCaptureToGraph_fn, stream, graph, dependencies, dependencyData, numDependencies, mode) +#endif + +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 3 +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGetFuncBySymbol, hipGetFuncBySymbol, hipGetFuncBySymbol_fn, functionPtr, symbolPtr) +#endif + +#if HIP_RUNTIME_API_TABLE_STEP_VERSION >= 4 +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphAddMemFreeNode, hipDrvGraphAddMemFreeNode, hipDrvGraphAddMemFreeNode_fn, phGraphNode, hGraph, dependencies, numDependencies, dptr) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphExecMemcpyNodeSetParams, hipDrvGraphExecMemcpyNodeSetParams, hipDrvGraphExecMemcpyNodeSetParams_fn, hGraphExec, hNode, copyParams, ctx) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipDrvGraphExecMemsetNodeSetParams, hipDrvGraphExecMemsetNodeSetParams, hipDrvGraphExecMemsetNodeSetParams_fn, hGraphExec, hNode, memsetParams, ctx) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipSetValidDevices, hipSetValidDevices, hipSetValidDevices_fn, device_arr, len) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAtoD, hipMemcpyAtoD, hipMemcpyAtoD_fn, dstDevice, srcArray, srcOffset, ByteCount) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyDtoA, hipMemcpyDtoA, hipMemcpyDtoA_fn, dstArray, dstOffset, srcDevice, ByteCount) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAtoA, hipMemcpyAtoA, hipMemcpyAtoA_fn, dstArray, dstOffset, srcArray, srcOffset, ByteCount) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAtoHAsync, hipMemcpyAtoHAsync, hipMemcpyAtoHAsync_fn, dstHost, srcArray, srcOffset, ByteCount, stream) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyHtoAAsync, hipMemcpyHtoAAsync, hipMemcpyHtoAAsync_fn, dstArray, dstOffset, srcHost, ByteCount, stream) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpy2DArrayToArray, hipMemcpy2DArrayToArray, hipMemcpy2DArrayToArray_fn, dst, wOffsetDst, hOffsetDst, src, wOffsetSrc, hOffsetSrc, width, height, kind) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExecGetFlags, hipGraphExecGetFlags, hipGraphExecGetFlags_fn, graphExec, flags) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphNodeSetParams, hipGraphNodeSetParams, hipGraphNodeSetParams_fn, node, nodeParams) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipGraphExecNodeSetParams, hipGraphExecNodeSetParams, hipGraphExecNodeSetParams_fn, graphExec, node, nodeParams) +HIP_API_INFO_DEFINITION_V(ROCPROFILER_HIP_TABLE_ID_Runtime, ROCPROFILER_HIP_RUNTIME_API_ID_hipExternalMemoryGetMappedMipmappedArray, hipExternalMemoryGetMappedMipmappedArray, hipExternalMemoryGetMappedMipmappedArray_fn, mipmap, extMem, mipmapDesc) +#endif // clang-format on #else diff --git a/source/lib/rocprofiler-sdk/hip/hip.hpp b/source/lib/rocprofiler-sdk/hip/hip.hpp index e74cc597..b51be748 100644 --- a/source/lib/rocprofiler-sdk/hip/hip.hpp +++ b/source/lib/rocprofiler-sdk/hip/hip.hpp @@ -25,12 +25,7 @@ #include #include - -#if HIP_VERSION_MAJOR < 6 -# include "lib/rocprofiler-sdk/hip/details/hip_api_trace.hpp" -#else -# include -#endif +#include #include #include diff --git a/source/lib/rocprofiler-sdk/hip/types.hpp b/source/lib/rocprofiler-sdk/hip/types.hpp deleted file mode 100644 index c99eba80..00000000 --- a/source/lib/rocprofiler-sdk/hip/types.hpp +++ /dev/null @@ -1,182 +0,0 @@ -// MIT License -// -// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. -// -// Permission is hereby granted, free of charge, to any person obtaining a copy -// of this software and associated documentation files (the "Software"), to deal -// in the Software without restriction, including without limitation the rights -// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -// copies of the Software, and to permit persons to whom the Software is -// furnished to do so, subject to the following conditions: -// -// The above copyright notice and this permission notice shall be included in -// all copies or substantial portions of the Software. -// -// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -// THE SOFTWARE. - -#pragma once - -#include -#include - -#include "lib/common/defines.hpp" - -// #ifndef ROCPROFILER_UNSAFE_NO_VERSION_CHECK -// # if defined(ROCPROFILER_CI) && ROCPROFILER_CI > 0 -// # if HIP_API_TABLE_MAJOR_VERSION <= 0x01 -// namespace rocprofiler -// { -// namespace hip -// { -// static_assert(HIP_CORE_API_TABLE_MAJOR_VERSION == 0x01, -// "Change in the major version of HIP core API table"); -// static_assert(HIP_AMD_EXT_API_TABLE_MAJOR_VERSION == 0x01, -// "Change in the major version of HIP amd-extended API table"); -// static_assert(HIP_FINALIZER_API_TABLE_MAJOR_VERSION == 0x01, -// "Change in the major version of HIP finalizer API table"); -// static_assert(HIP_IMAGE_API_TABLE_MAJOR_VERSION == 0x01, -// "Change in the major version of HIP image API table"); - -// static_assert(HIP_CORE_API_TABLE_STEP_VERSION == 0x00, -// "Change in the major version of HIP core API table"); -// static_assert(HIP_AMD_EXT_API_TABLE_STEP_VERSION == 0x00, -// "Change in the major version of HIP amd-extended API table"); -// static_assert(HIP_FINALIZER_API_TABLE_STEP_VERSION == 0x00, -// "Change in the major version of HIP finalizer API table"); -// static_assert(HIP_IMAGE_API_TABLE_STEP_VERSION == 0x00, -// "Change in the major version of HIP image API table"); - -// // this should always be updated to latest table size -// template -// struct table_size; - -// // latest version of hip runtime that has been updated for support by rocprofiler -// // and the current version of hip runtime during this compilation -// constexpr size_t latest_version = ROCPROFILER_COMPUTE_VERSION(1, 11, 0); -// constexpr size_t current_version = ROCPROFILER_HIP_RUNTIME_VERSION; - -// // aliases to the template specializations providing the table size info -// using current_table_size_t = table_size; -// using latest_table_size_t = table_size; - -// // specialization for v1.9 -// template <> -// struct table_size -// { -// static constexpr size_t finalizer_ext = 64; -// static constexpr size_t image_ext = 120; -// static constexpr size_t amd_ext = 456; -// static constexpr size_t core_api_ext = 1016; -// }; - -// // specialization for v1.10 - increased amd_ext by 10 functions -// template <> -// struct table_size -// : table_size -// { -// static constexpr size_t amd_ext = 552; -// }; - -// // version 1.11 is same as 1.10 -// template <> -// struct table_size -// : table_size -// {}; - -// // default static asserts to check against latest version -// // e.g. v1.12 might have the same table sizes as v1.11 so -// // we don't want to fail to compile if nothing has changed -// template -// struct table_size : latest_table_size_t -// {}; - -// // if you hit these static asserts, that means HIP added entries to the table but did not update -// the -// // step numbers -// static_assert(sizeof(FinalizerExtTable) == current_table_size_t::finalizer_ext, -// "HIP finalizer API table size changed or version not supported"); -// static_assert(sizeof(ImageExtTable) == current_table_size_t::image_ext, -// "HIP image-extended API table size changed or version not supported"); -// static_assert(sizeof(AmdExtTable) == current_table_size_t::amd_ext, -// "HIP amd-extended API table size changed or version not supported"); -// static_assert(sizeof(CoreApiTable) == current_table_size_t::core_api_ext, -// "HIP core API table size changed or version not supported"); -// } // namespace hip -// } // namespace rocprofiler -// # else -// namespace rocprofiler -// { -// namespace hip -// { -// static_assert(HIP_CORE_API_TABLE_MAJOR_VERSION == 0x02, -// "Change in the major version of HIP core API table"); -// static_assert(HIP_AMD_EXT_API_TABLE_MAJOR_VERSION == 0x02, -// "Change in the major version of HIP amd-extended API table"); -// static_assert(HIP_FINALIZER_API_TABLE_MAJOR_VERSION == 0x02, -// "Change in the major version of HIP finalizer API table"); -// static_assert(HIP_IMAGE_API_TABLE_MAJOR_VERSION == 0x02, -// "Change in the major version of HIP image API table"); - -// static_assert(HIP_CORE_API_TABLE_STEP_VERSION == 0x00, -// "Change in the major version of HIP core API table"); -// static_assert(HIP_AMD_EXT_API_TABLE_STEP_VERSION == 0x00, -// "Change in the major version of HIP amd-extended API table"); -// static_assert(HIP_FINALIZER_API_TABLE_STEP_VERSION == 0x00, -// "Change in the major version of HIP finalizer API table"); -// static_assert(HIP_IMAGE_API_TABLE_STEP_VERSION == 0x00, -// "Change in the major version of HIP image API table"); -// static_assert(HIP_AQLPROFILE_API_TABLE_STEP_VERSION == 0x00, -// "Change in the major version of HIP aqlprofile API table"); - -// // this should always be updated to latest table size -// template -// struct table_size; - -// // latest version of hip runtime that has been updated for support by rocprofiler -// // and the current version of hip runtime during this compilation -// constexpr size_t latest_version = ROCPROFILER_COMPUTE_VERSION(1, 12, 0); -// constexpr size_t current_version = ROCPROFILER_HIP_RUNTIME_VERSION; - -// // aliases to the template specializations providing the table size info -// using current_table_size_t = table_size; -// using latest_table_size_t = table_size; - -// // specialization for v1.12 -// template <> -// struct table_size -// { -// static constexpr size_t finalizer_ext = 64; -// static constexpr size_t image_ext = 120; -// static constexpr size_t amd_ext = 552; -// static constexpr size_t core_api_ext = 1016; -// }; - -// // default static asserts to check against latest version -// // e.g. v1.12 might have the same table sizes as v1.11 so -// // we don't want to fail to compile if nothing has changed -// template -// struct table_size : latest_table_size_t -// {}; - -// // if you hit these static asserts, that means HIP added entries to the table but did not update -// the -// // step numbers -// static_assert(sizeof(FinalizerExtTable) == current_table_size_t::finalizer_ext, -// "HIP finalizer API table size changed or version not supported"); -// static_assert(sizeof(ImageExtTable) == current_table_size_t::image_ext, -// "HIP image-extended API table size changed or version not supported"); -// static_assert(sizeof(AmdExtTable) == current_table_size_t::amd_ext, -// "HIP amd-extended API table size changed or version not supported"); -// static_assert(sizeof(CoreApiTable) == current_table_size_t::core_api_ext, -// "HIP core API table size changed or version not supported"); -// } // namespace hip -// } // namespace rocprofiler -// # endif -// # endif -// #endif diff --git a/source/lib/rocprofiler-sdk/hip/utils.hpp b/source/lib/rocprofiler-sdk/hip/utils.hpp index fa305e2a..f733263a 100644 --- a/source/lib/rocprofiler-sdk/hip/utils.hpp +++ b/source/lib/rocprofiler-sdk/hip/utils.hpp @@ -26,6 +26,7 @@ #include "lib/common/mpl.hpp" #include "lib/common/stringize_arg.hpp" +#include "lib/rocprofiler-sdk/hip/details/format.hpp" #include "lib/rocprofiler-sdk/hip/details/ostream.hpp" #include "fmt/core.h" @@ -44,12 +45,6 @@ namespace hip { namespace utils { -inline static std::ostream& -operator<<(std::ostream& out, const hipDeviceProp_tR0000& v) -{ - return ::rocprofiler::hip::detail::operator<<(out, v); -} - template auto stringize_impl(const Tp& _v) @@ -79,22 +74,3 @@ stringize(int32_t max_deref, Args... args) } // namespace utils } // namespace hip } // namespace rocprofiler - -namespace fmt -{ -template <> -struct formatter -{ - template - constexpr auto parse(ParseContext& ctx) - { - return ctx.begin(); - } - - template - auto format(const rocprofiler_dim3_t& v, Ctx& ctx) const - { - return fmt::format_to(ctx.out(), "{}z={}, y={}, x={}{}", '{', v.z, v.y, v.x, '}'); - } -}; -} // namespace fmt diff --git a/source/lib/rocprofiler-sdk/hsa/aql_packet.cpp b/source/lib/rocprofiler-sdk/hsa/aql_packet.cpp index dea20a56..97628b12 100644 --- a/source/lib/rocprofiler-sdk/hsa/aql_packet.cpp +++ b/source/lib/rocprofiler-sdk/hsa/aql_packet.cpp @@ -36,40 +36,96 @@ namespace rocprofiler { namespace hsa { -CounterAQLPacket::~CounterAQLPacket() +hsa_status_t +CounterAQLPacket::CounterMemoryPool::Alloc(void** ptr, size_t size, desc_t flags, void* data) { - if(!profile.command_buffer.ptr) - { - // pass, nothing malloced - } - else if(!command_buf_mallocd) - { - CHECK_HSA(free_func(profile.command_buffer.ptr), "freeing memory"); - } - else + if(size == 0) { - ::free(profile.command_buffer.ptr); + if(ptr != nullptr) *ptr = nullptr; + return HSA_STATUS_SUCCESS; } + if(!data) return HSA_STATUS_ERROR; + auto& pool = *reinterpret_cast(data); - if(!profile.output_buffer.ptr) - { - // pass, nothing malloced - } - else if(!output_buffer_malloced) - { - CHECK_HSA(free_func(profile.output_buffer.ptr), "freeing memory"); - } + if(!pool.allocate_fn || !pool.free_fn || !pool.allow_access_fn) return HSA_STATUS_ERROR; + if(!flags.host_access || pool.kernarg_pool_.handle == 0 || !pool.fill_fn) + return HSA_STATUS_ERROR; + + hsa_status_t status; + if(!pool.bIgnoreKernArg && flags.memory_hint == AQLPROFILE_MEMORY_HINT_DEVICE_UNCACHED) + status = pool.allocate_fn(pool.kernarg_pool_, size, 0, ptr); else - { - ::free(profile.output_buffer.ptr); - } + status = pool.allocate_fn(pool.cpu_pool_, size, 0, ptr); + + if(status != HSA_STATUS_SUCCESS) return status; + + status = pool.fill_fn(*ptr, 0u, size / sizeof(uint32_t)); + if(status != HSA_STATUS_SUCCESS) return status; + + status = pool.allow_access_fn(1, &pool.gpu_agent, nullptr, *ptr); + return status; +} + +void +CounterAQLPacket::CounterMemoryPool::Free(void* ptr, void* data) +{ + if(ptr == nullptr) return; + + assert(data); + auto& pool = *reinterpret_cast(data); + assert(pool.free_fn); + pool.free_fn(ptr); } hsa_status_t -BaseTTAQLPacket::Alloc(void** ptr, size_t size, desc_t flags, void* data) +CounterAQLPacket::CounterMemoryPool::Copy(void* dst, const void* src, size_t size, void* data) { + if(size == 0) return HSA_STATUS_SUCCESS; if(!data) return HSA_STATUS_ERROR; - auto& pool = reinterpret_cast(data)->tracepool; + auto& pool = *reinterpret_cast(data); + + if(!pool.api_copy_fn) return HSA_STATUS_ERROR; + + return pool.api_copy_fn(dst, src, size); +} + +CounterAQLPacket::CounterAQLPacket(aqlprofile_agent_handle_t agent, + CounterAQLPacket::CounterMemoryPool _pool, + const std::vector& events) +: pool(_pool) +{ + if(events.empty()) return; + + packets.start_packet = null_amd_aql_pm4_packet; + packets.stop_packet = null_amd_aql_pm4_packet; + packets.read_packet = null_amd_aql_pm4_packet; + + aqlprofile_pmc_profile_t profile{}; + profile.agent = agent; + profile.events = events.data(); + profile.event_count = static_cast(events.size()); + + hsa_status_t status = aqlprofile_pmc_create_packets(&this->handle, + &this->packets, + profile, + &CounterMemoryPool::Alloc, + &CounterMemoryPool::Free, + &CounterMemoryPool::Copy, + reinterpret_cast(&pool)); + if(status != HSA_STATUS_SUCCESS) ROCP_FATAL << "Could not create PMC packets!"; + + auto header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE; + packets.start_packet.header = header; + packets.stop_packet.header = header; + packets.read_packet.header = header; + empty = false; +} + +hsa_status_t +TraceMemoryPool::Alloc(void** ptr, size_t size, desc_t flags, void* data) +{ + if(!data) return HSA_STATUS_ERROR; + auto& pool = *reinterpret_cast(data); if(!pool.allocate_fn || !pool.free_fn || !pool.allow_access_fn) return HSA_STATUS_ERROR; @@ -91,19 +147,19 @@ BaseTTAQLPacket::Alloc(void** ptr, size_t size, desc_t flags, void* data) } void -BaseTTAQLPacket::Free(void* ptr, void* data) +TraceMemoryPool::Free(void* ptr, void* data) { assert(data); - auto& pool = reinterpret_cast(data)->tracepool; + auto& pool = *reinterpret_cast(data); if(pool.free_fn) pool.free_fn(ptr); } hsa_status_t -BaseTTAQLPacket::Copy(void* dst, const void* src, size_t size, void* data) +TraceMemoryPool::Copy(void* dst, const void* src, size_t size, void* data) { if(!data) return HSA_STATUS_ERROR; - auto& pool = reinterpret_cast(data)->tracepool; + auto& pool = *reinterpret_cast(data); if(!pool.api_copy_fn) return HSA_STATUS_ERROR; @@ -112,9 +168,15 @@ BaseTTAQLPacket::Copy(void* dst, const void* src, size_t size, void* data) TraceControlAQLPacket::TraceControlAQLPacket(const TraceMemoryPool& _tracepool, const aqlprofile_att_profile_t& p) -: BaseTTAQLPacket(_tracepool) +: tracepool(std::make_shared(_tracepool)) { - auto status = aqlprofile_att_create_packets(&handle, &packets, p, &Alloc, &Free, &Copy, this); + auto status = aqlprofile_att_create_packets(&tracepool->handle, + &packets, + p, + &TraceMemoryPool::Alloc, + &TraceMemoryPool::Free, + &TraceMemoryPool::Copy, + tracepool.get()); CHECK_HSA(status, "failed to create ATT packet"); packets.start_packet.header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE; @@ -122,14 +184,8 @@ TraceControlAQLPacket::TraceControlAQLPacket(const TraceMemoryPool& _tr packets.start_packet.completion_signal = hsa_signal_t{.handle = 0}; packets.stop_packet.completion_signal = hsa_signal_t{.handle = 0}; this->empty = false; -}; -void -TraceControlAQLPacket::populate_before() -{ - before_krn_pkt.push_back(packets.start_packet); - for(auto& [_, codeobj] : loaded_codeobj) - if(codeobj) before_krn_pkt.push_back(codeobj->packet); + clear(); }; CodeobjMarkerAQLPacket::CodeobjMarkerAQLPacket(const TraceMemoryPool& _tracepool, @@ -138,22 +194,29 @@ CodeobjMarkerAQLPacket::CodeobjMarkerAQLPacket(const TraceMemoryPool& _tracepool uint64_t size, bool bFromStart, bool bIsUnload) -: BaseTTAQLPacket(_tracepool) +: tracepool(_tracepool) { aqlprofile_att_codeobj_data_t codeobj{}; codeobj.id = id; codeobj.addr = addr; codeobj.size = size; - codeobj.agent = _tracepool.gpu_agent; + codeobj.agent = tracepool.gpu_agent; codeobj.isUnload = bIsUnload; codeobj.fromStart = bFromStart; - auto status = aqlprofile_att_codeobj_marker(&packet, &handle, codeobj, &Alloc, &Free, this); + auto status = aqlprofile_att_codeobj_marker(&packet, + &tracepool.handle, + codeobj, + &TraceMemoryPool::Alloc, + &TraceMemoryPool::Free, + &tracepool); CHECK_HSA(status, "failed to create ATT marker"); packet.header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE; packet.completion_signal = hsa_signal_t{.handle = 0}; this->empty = false; + + clear(); } } // namespace hsa diff --git a/source/lib/rocprofiler-sdk/hsa/aql_packet.hpp b/source/lib/rocprofiler-sdk/hsa/aql_packet.hpp index 8136e876..e68f8555 100644 --- a/source/lib/rocprofiler-sdk/hsa/aql_packet.hpp +++ b/source/lib/rocprofiler-sdk/hsa/aql_packet.hpp @@ -25,6 +25,8 @@ #include "lib/common/container/small_vector.hpp" #include "lib/rocprofiler-sdk/aql/aql_profile_v2.h" +#include + #include #include @@ -66,21 +68,27 @@ class AQLPacket before_krn_pkt.clear(); after_krn_pkt.clear(); } + bool isEmpty() const { return empty; } virtual void populate_before() = 0; virtual void populate_after() = 0; - aqlprofile_handle_t pkt_handle = {.handle = 0}; + aqlprofile_handle_t GetHandle() const { return handle; } + aqlprofile_handle_t handle = {.handle = 0}; + bool empty = {true}; - bool empty = {true}; - hsa_ven_amd_aqlprofile_profile_t profile = {}; - hsa_ext_amd_aql_pm4_packet_t start = null_amd_aql_pm4_packet; - hsa_ext_amd_aql_pm4_packet_t stop = null_amd_aql_pm4_packet; - hsa_ext_amd_aql_pm4_packet_t read = null_amd_aql_pm4_packet; common::container::small_vector before_krn_pkt = {}; common::container::small_vector after_krn_pkt = {}; +}; - bool isEmpty() const { return empty; } +class EmptyAQLPacket : public AQLPacket +{ +public: + EmptyAQLPacket() = default; + ~EmptyAQLPacket() override = default; + + void populate_before() override{}; + void populate_after() override{}; }; class CounterAQLPacket : public AQLPacket @@ -88,26 +96,52 @@ class CounterAQLPacket : public AQLPacket friend class rocprofiler::aql::CounterPacketConstruct; using memory_pool_free_func_t = decltype(::hsa_amd_memory_pool_free)*; + struct CounterMemoryPool + { + using desc_t = aqlprofile_buffer_desc_flags_t; + + hsa_agent_t gpu_agent; + hsa_amd_memory_pool_t cpu_pool_; + hsa_amd_memory_pool_t kernarg_pool_; + decltype(hsa_amd_memory_pool_allocate)* allocate_fn; + decltype(hsa_amd_agents_allow_access)* allow_access_fn; + decltype(hsa_amd_memory_pool_free)* free_fn; + decltype(hsa_amd_memory_fill)* fill_fn; + decltype(hsa_memory_copy)* api_copy_fn; + bool bIgnoreKernArg; + + static void Free(void* ptr, void* data); + static hsa_status_t Alloc(void** ptr, size_t size, desc_t flags, void* data); + static hsa_status_t Copy(void* dst, const void* src, size_t size, void* data); + }; + public: - CounterAQLPacket(memory_pool_free_func_t func) - : free_func{func} {}; - ~CounterAQLPacket() override; + CounterAQLPacket(aqlprofile_agent_handle_t agent, + CounterMemoryPool pool, + const std::vector& events); + ~CounterAQLPacket() override { aqlprofile_pmc_delete_packets(this->handle); }; - void populate_before() override { before_krn_pkt.push_back(start); }; + void populate_before() override + { + if(!empty) before_krn_pkt.push_back(packets.start_packet); + }; void populate_after() override { - after_krn_pkt.push_back(stop); - after_krn_pkt.push_back(read); + if(empty) return; + after_krn_pkt.push_back(packets.read_packet); + after_krn_pkt.push_back(packets.stop_packet); }; + aqlprofile_pmc_aql_packets_t packets{}; + protected: - bool command_buf_mallocd = false; - bool output_buffer_malloced = false; - memory_pool_free_func_t free_func = nullptr; + CounterMemoryPool pool{}; }; struct TraceMemoryPool { + using desc_t = aqlprofile_buffer_desc_flags_t; + hsa_agent_t gpu_agent; hsa_amd_memory_pool_t cpu_pool_; hsa_amd_memory_pool_t gpu_pool_; @@ -115,33 +149,16 @@ struct TraceMemoryPool decltype(hsa_amd_agents_allow_access)* allow_access_fn; decltype(hsa_amd_memory_pool_free)* free_fn; decltype(hsa_memory_copy)* api_copy_fn; -}; - -class BaseTTAQLPacket : public AQLPacket -{ - friend class rocprofiler::aql::ThreadTraceAQLPacketFactory; - -protected: - using desc_t = aqlprofile_buffer_desc_flags_t; - -public: - BaseTTAQLPacket(const TraceMemoryPool& _tracepool) - : tracepool(_tracepool){}; - ~BaseTTAQLPacket() override { aqlprofile_att_delete_packets(this->handle); }; - aqlprofile_handle_t GetHandle() const { return handle; } - hsa_agent_t GetAgent() const { return tracepool.gpu_agent; } - -protected: - TraceMemoryPool tracepool; aqlprofile_handle_t handle; + ~TraceMemoryPool() { aqlprofile_att_delete_packets(this->handle); }; static hsa_status_t Alloc(void** ptr, size_t size, desc_t flags, void* data); static void Free(void* ptr, void* data); static hsa_status_t Copy(void* dst, const void* src, size_t size, void* data); }; -class CodeobjMarkerAQLPacket : public BaseTTAQLPacket +class CodeobjMarkerAQLPacket : public AQLPacket { friend class rocprofiler::aql::ThreadTraceAQLPacketFactory; @@ -157,10 +174,16 @@ class CodeobjMarkerAQLPacket : public BaseTTAQLPacket void populate_before() override { before_krn_pkt.push_back(packet); }; void populate_after() override{}; + aqlprofile_handle_t GetHandle() const { return tracepool.handle; } + hsa_agent_t GetAgent() const { return tracepool.gpu_agent; } + hsa_ext_amd_aql_pm4_packet_t packet; + +protected: + TraceMemoryPool tracepool; }; -class TraceControlAQLPacket : public BaseTTAQLPacket +class TraceControlAQLPacket : public AQLPacket { friend class rocprofiler::aql::ThreadTraceAQLPacketFactory; using code_object_id_t = uint64_t; @@ -170,19 +193,37 @@ class TraceControlAQLPacket : public BaseTTAQLPacket const aqlprofile_att_profile_t& profile); ~TraceControlAQLPacket() override = default; + explicit TraceControlAQLPacket(const TraceControlAQLPacket& other) + : AQLPacket() + { + this->tracepool = other.tracepool; + this->packets = other.packets; + this->loaded_codeobj = other.loaded_codeobj; + } + + aqlprofile_handle_t GetHandle() const { return tracepool->handle; } + hsa_agent_t GetAgent() const { return tracepool->gpu_agent; } + + void populate_before() override + { + before_krn_pkt.push_back(packets.start_packet); + for(auto& [_, codeobj] : loaded_codeobj) + before_krn_pkt.push_back(codeobj->packet); + } + void populate_after() override { after_krn_pkt.push_back(packets.stop_packet); } + void add_codeobj(code_object_id_t id, uint64_t addr, uint64_t size) { loaded_codeobj[id] = - std::make_unique(tracepool, id, addr, size, true, false); + std::make_shared(*tracepool, id, addr, size, true, false); } - void remove_codeobj(code_object_id_t id) { loaded_codeobj.erase(id); } + bool remove_codeobj(code_object_id_t id) { return loaded_codeobj.erase(id) != 0; } - void populate_before() override; - void populate_after() override { after_krn_pkt.push_back(packets.stop_packet); } +protected: + std::shared_ptr tracepool; + aqlprofile_att_control_aql_packets_t packets; -private: - aqlprofile_att_control_aql_packets_t packets; - std::unordered_map> loaded_codeobj; + std::unordered_map> loaded_codeobj; }; } // namespace hsa diff --git a/source/lib/rocprofiler-sdk/hsa/async_copy.cpp b/source/lib/rocprofiler-sdk/hsa/async_copy.cpp index 5160c3eb..7b2d97a4 100644 --- a/source/lib/rocprofiler-sdk/hsa/async_copy.cpp +++ b/source/lib/rocprofiler-sdk/hsa/async_copy.cpp @@ -239,7 +239,7 @@ active_signals::create() if(m_signal.handle != 0) return; // function pointer may be null during unit testing - if(get_core_table()->hsa_signal_create_fn) + if(hsa::get_hsa_ref_count() > 0 && get_core_table()->hsa_signal_create_fn) { ROCP_HSA_TABLE_CALL(ERROR, get_core_table()->hsa_signal_create_fn(0, 0, nullptr, &m_signal)); @@ -252,7 +252,7 @@ active_signals::destroy() if(m_signal.handle == 0) return; // function pointer may be null during unit testing - if(get_core_table()->hsa_signal_destroy_fn) + if(hsa::get_hsa_ref_count() > 0 && get_core_table()->hsa_signal_destroy_fn) { ROCP_HSA_TABLE_CALL(ERROR, get_core_table()->hsa_signal_destroy_fn(m_signal)); m_signal.handle = 0; @@ -853,11 +853,19 @@ async_copy_init(hsa_api_table_t* _orig, uint64_t _tbl_instance) } void -async_copy_fini() +async_copy_sync() { if(!async_copy::get_active_signals()) return; async_copy::get_active_signals()->sync(); +} + +void +async_copy_fini() +{ + if(!async_copy::get_active_signals()) return; + + async_copy_sync(); async_copy::get_active_signals()->destroy(); } } // namespace hsa diff --git a/source/lib/rocprofiler-sdk/hsa/async_copy.hpp b/source/lib/rocprofiler-sdk/hsa/async_copy.hpp index 1d549971..782bd7e8 100644 --- a/source/lib/rocprofiler-sdk/hsa/async_copy.hpp +++ b/source/lib/rocprofiler-sdk/hsa/async_copy.hpp @@ -48,6 +48,9 @@ get_ids(); void async_copy_init(hsa_api_table_t* _orig, uint64_t _tbl_instance); +void +async_copy_sync(); + void async_copy_fini(); } // namespace hsa diff --git a/source/lib/rocprofiler-sdk/hsa/hsa.cpp b/source/lib/rocprofiler-sdk/hsa/hsa.cpp index 9381192f..1e5a6ca3 100644 --- a/source/lib/rocprofiler-sdk/hsa/hsa.cpp +++ b/source/lib/rocprofiler-sdk/hsa/hsa.cpp @@ -337,9 +337,6 @@ hsa_api_impl::functor(Args... args) return; } - ROCP_FATAL_IF(external_corr_ids.size() < (callback_contexts.size() + buffered_contexts.size())) - << "missing external correlation ids"; - auto buffer_record = common::init_public_api_struct(buffer_hsa_api_record_t{}); auto tracer_data = common::init_public_api_struct(callback_hsa_api_data_t{}); auto* corr_id = tracing::correlation_service::construct(ref_count); @@ -536,6 +533,31 @@ should_wrap_functor(const context::context_array_t& _contexts, return false; } +auto hsa_reference_count_value = std::atomic{0}; + +hsa_status_t +hsa_init_refcnt_impl() +{ + struct scoped_dtor + { + scoped_dtor() = default; + ~scoped_dtor() { ++hsa_reference_count_value; } + }; + auto _dtor = scoped_dtor{}; + return get_core_table()->hsa_init_fn(); +} + +hsa_status_t +hsa_shut_down_refcnt_impl() +{ + if(hsa_reference_count_value > 0) + { + --hsa_reference_count_value; + return get_core_table()->hsa_shut_down_fn(); + } + return HSA_STATUS_SUCCESS; +} + template void copy_table(Tp* _orig, uint64_t _tbl_instance, std::integral_constant) @@ -573,6 +595,20 @@ copy_table(Tp* _orig, uint64_t _tbl_instance, std::integral_constant(hsa_pc_sampling_ext_table_t* _tbl, uint6 #endif #undef INSTANTIATE_HSA_TABLE_FUNC + +int +get_hsa_ref_count() +{ + auto _val = hsa_reference_count_value.load(); + ROCP_TRACE << "hsa reference count: " << _val; + return _val; +} } // namespace hsa } // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/hsa/hsa.def.cpp b/source/lib/rocprofiler-sdk/hsa/hsa.def.cpp index 1314a9fd..5e3dd6fa 100644 --- a/source/lib/rocprofiler-sdk/hsa/hsa.def.cpp +++ b/source/lib/rocprofiler-sdk/hsa/hsa.def.cpp @@ -444,6 +444,17 @@ HSA_API_INFO_DEFINITION_V(ROCPROFILER_HSA_TABLE_ID_AmdExt, attribute, value) # endif +# if HSA_AMD_EXT_API_TABLE_STEP_VERSION >= 0x03 +HSA_API_INFO_DEFINITION_V(ROCPROFILER_HSA_TABLE_ID_AmdExt, + ROCPROFILER_HSA_AMD_EXT_API_ID_hsa_amd_vmem_address_reserve_align, + hsa_amd_vmem_address_reserve_align, + hsa_amd_vmem_address_reserve_align_fn, + ptr, + size, + address, + alignment, + flags) +# endif # endif #elif defined(ROCPROFILER_LIB_ROCPROFILER_HSA_ASYNC_COPY_CPP_IMPL) && \ diff --git a/source/lib/rocprofiler-sdk/hsa/hsa.hpp b/source/lib/rocprofiler-sdk/hsa/hsa.hpp index 52aa3a67..eb76a49a 100644 --- a/source/lib/rocprofiler-sdk/hsa/hsa.hpp +++ b/source/lib/rocprofiler-sdk/hsa/hsa.hpp @@ -176,5 +176,8 @@ copy_table(TableT* _orig, uint64_t _tbl_instance); template void update_table(TableT* _orig, uint64_t _tbl_instance); + +int +get_hsa_ref_count(); } // namespace hsa } // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/hsa/queue.cpp b/source/lib/rocprofiler-sdk/hsa/queue.cpp index d8d0139e..7bfaeb29 100644 --- a/source/lib/rocprofiler-sdk/hsa/queue.cpp +++ b/source/lib/rocprofiler-sdk/hsa/queue.cpp @@ -60,6 +60,16 @@ static_assert(offsetof(hsa_ext_amd_aql_pm4_packet_t, completion_signal) == offsetof(hsa_barrier_or_packet_t, completion_signal), "unexpected ABI incompatibility"); +#define ROCP_HSA_TABLE_CALL(SEVERITY, EXPR) \ + auto ROCPROFILER_VARIABLE(rocp_hsa_table_call_, __LINE__) = (EXPR); \ + ROCP_##SEVERITY##_IF(ROCPROFILER_VARIABLE(rocp_hsa_table_call_, __LINE__) != \ + HSA_STATUS_SUCCESS) \ + << #EXPR << " returned non-zero status code " \ + << ROCPROFILER_VARIABLE(rocp_hsa_table_call_, __LINE__) << " :: " \ + << ::rocprofiler::hsa::get_hsa_status_string( \ + ROCPROFILER_VARIABLE(rocp_hsa_table_call_, __LINE__)) \ + << ". " + namespace rocprofiler { namespace hsa @@ -281,11 +291,17 @@ WriteInterceptor(const void* packets, internal_corr_id); queue.async_started(); + + const auto original_completion_signal = original_packet.completion_signal; + const bool existing_completion_signal = (original_completion_signal.handle != 0); + const uint64_t kernel_id = code_object::get_kernel_id(original_packet.kernel_object); + // Copy kernel pkt, copy is to allow for signal to be modified rocprofiler_packet kernel_pkt = packets_arr[i]; - uint64_t kernel_id = code_object::get_kernel_id(kernel_pkt.kernel_dispatch.kernel_object); - queue.create_signal(HSA_AMD_SIGNAL_AMD_GPU_ONLY, - &kernel_pkt.ext_amd_aql_pm4.completion_signal); + // create our own signal that we can get a callback on. if there is an original completion + // signal we will create a barrier packet, assign the original completion signal that that + // barrier packet, and add it right after the kernel packet + queue.create_signal(0, &kernel_pkt.kernel_dispatch.completion_signal); // computes the "size" based on the offset of reserved_padding field constexpr auto kernel_dispatch_info_rt_size = @@ -379,23 +395,19 @@ WriteInterceptor(const void* packets, } #endif + // emplace the kernel packet transformed_packets.emplace_back(kernel_pkt); - // Make a copy of the original packet, adding its signal to a barrier - // packet and create a new signal for it to get timestamps - if(original_packet.completion_signal.handle != 0u) + // if the original completion signal exists, trigger it via a barrier packet + if(existing_completion_signal) { - hsa_barrier_and_packet_t barrier{}; + auto barrier = hsa_barrier_and_packet_t{}; barrier.header = HSA_PACKET_TYPE_BARRIER_AND << HSA_PACKET_HEADER_TYPE; - barrier.header |= 1 << HSA_PACKET_HEADER_BARRIER; - barrier.completion_signal = original_packet.completion_signal; + barrier.header |= (1 << HSA_PACKET_HEADER_BARRIER); + barrier.completion_signal = original_completion_signal; transformed_packets.emplace_back(barrier); } - hsa_signal_t interrupt_signal{}; - // Adding a barrier packet with the original packet's completion signal. - queue.create_signal(0, &interrupt_signal); - bool injected_end_pkt = false; for(const auto& pkt_injection : inst_pkt) { @@ -406,20 +418,20 @@ WriteInterceptor(const void* packets, } } + auto completion_signal = hsa_signal_t{.handle = 0}; + auto interrupt_signal = hsa_signal_t{.handle = 0}; if(injected_end_pkt) { + // Adding a barrier packet with the original packet's completion signal. + queue.create_signal(0, &interrupt_signal); + completion_signal = interrupt_signal; transformed_packets.back().ext_amd_aql_pm4.completion_signal = interrupt_signal; CreateBarrierPacket(&interrupt_signal, &interrupt_signal, transformed_packets); } else { - get_core_table()->hsa_signal_store_screlease_fn(interrupt_signal, 0); - hsa_barrier_and_packet_t barrier{}; - barrier.header = HSA_PACKET_TYPE_BARRIER_AND << HSA_PACKET_HEADER_TYPE; - barrier.header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCACQUIRE_FENCE_SCOPE; - barrier.header |= HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_SCRELEASE_FENCE_SCOPE; - barrier.completion_signal = interrupt_signal; - transformed_packets.emplace_back(barrier); + completion_signal = kernel_pkt.kernel_dispatch.completion_signal; + get_core_table()->hsa_signal_store_screlease_fn(completion_signal, 0); } ROCP_FATAL_IF(packet_type != HSA_PACKET_TYPE_KERNEL_DISPATCH) @@ -428,7 +440,7 @@ WriteInterceptor(const void* packets, // Enqueue the signal into the handler. Will call completed_cb when // signal completes. queue.signal_async_handler( - interrupt_signal, + completion_signal, new Queue::queue_info_session_t{.queue = queue, .inst_pkt = std::move(inst_pkt), .interrupt_signal = interrupt_signal, @@ -479,24 +491,46 @@ Queue::Queue(const AgentCache& agent, , _ext_api(ext_api) , _agent(agent) { - LOG_IF(FATAL, - _ext_api.hsa_amd_queue_intercept_create_fn(_agent.get_hsa_agent(), - size, - type, - callback, - data, - private_segment_size, - group_segment_size, - &_intercept_queue) != HSA_STATUS_SUCCESS) + ROCP_HSA_TABLE_CALL(FATAL, + _ext_api.hsa_amd_queue_intercept_create_fn(_agent.get_hsa_agent(), + size, + type, + callback, + data, + private_segment_size, + group_segment_size, + &_intercept_queue)) << "Could not create intercept queue"; - LOG_IF(FATAL, - _ext_api.hsa_amd_profiling_set_profiler_enabled_fn(_intercept_queue, true) != - HSA_STATUS_SUCCESS) + ROCP_HSA_TABLE_CALL(FATAL, + _ext_api.hsa_amd_profiling_set_profiler_enabled_fn(_intercept_queue, true)) << "Could not setup intercept profiler"; - LOG_IF(FATAL, - _ext_api.hsa_amd_queue_intercept_register_fn(_intercept_queue, WriteInterceptor, this)) + CHECK(_agent.cpu_pool().handle != 0); + CHECK(_agent.get_hsa_agent().handle != 0); + // Set state of the queue to allow profiling + aql::set_profiler_active_on_queue( + _agent.cpu_pool(), _agent.get_hsa_agent(), [&](hsa::rocprofiler_packet pkt) { + hsa_signal_t completion; + create_signal(0, &completion); + pkt.ext_amd_aql_pm4.completion_signal = completion; + counters::submitPacket(_intercept_queue, &pkt); + constexpr auto timeout_hint = + std::chrono::duration_cast(std::chrono::seconds{1}); + if(core_api.hsa_signal_wait_relaxed_fn(completion, + HSA_SIGNAL_CONDITION_EQ, + 0, + timeout_hint.count(), + HSA_WAIT_STATE_ACTIVE) != 0) + { + ROCP_FATAL << "Could not set agent to be profiled"; + } + core_api.hsa_signal_destroy_fn(completion); + }); + + ROCP_HSA_TABLE_CALL( + FATAL, + _ext_api.hsa_amd_queue_intercept_register_fn(_intercept_queue, WriteInterceptor, this)) << "Could not register interceptor"; create_signal(0, &ready_signal); @@ -522,9 +556,10 @@ Queue::signal_async_handler(const hsa_signal_t& signal, Queue::queue_info_sessio }); #endif hsa_status_t status = _ext_api.hsa_amd_signal_async_handler_fn( - signal, HSA_SIGNAL_CONDITION_EQ, -1, AsyncSignalHandler, static_cast(data)); + signal, HSA_SIGNAL_CONDITION_EQ, -1, AsyncSignalHandler, data); ROCP_FATAL_IF(status != HSA_STATUS_SUCCESS && status != HSA_STATUS_INFO_BREAK) - << "Error: hsa_amd_signal_async_handler failed"; + << "Error: hsa_amd_signal_async_handler failed with error code " << status + << " :: " << hsa::get_hsa_status_string(status); } void @@ -532,7 +567,8 @@ Queue::create_signal(uint32_t attribute, hsa_signal_t* signal) const { hsa_status_t status = _ext_api.hsa_amd_signal_create_fn(1, 0, nullptr, attribute, signal); ROCP_FATAL_IF(status != HSA_STATUS_SUCCESS && status != HSA_STATUS_INFO_BREAK) - << "Error: hsa_amd_signal_create failed"; + << "Error: hsa_amd_signal_create failed with error code " << status + << " :: " << hsa::get_hsa_status_string(status); } void @@ -541,7 +577,7 @@ Queue::sync() const if(_active_kernels.handle != 0u) { _core_api.hsa_signal_wait_relaxed_fn( - _active_kernels, HSA_SIGNAL_CONDITION_EQ, 0, -1, HSA_WAIT_STATE_ACTIVE); + _active_kernels, HSA_SIGNAL_CONDITION_EQ, 0, UINT64_MAX, HSA_WAIT_STATE_ACTIVE); } } diff --git a/source/lib/rocprofiler-sdk/hsa/queue_controller.cpp b/source/lib/rocprofiler-sdk/hsa/queue_controller.cpp index b966f9e6..3398b97a 100644 --- a/source/lib/rocprofiler-sdk/hsa/queue_controller.cpp +++ b/source/lib/rocprofiler-sdk/hsa/queue_controller.cpp @@ -137,14 +137,19 @@ constexpr rocprofiler_agent_t default_agent = .product_name = nullptr, .model_name = nullptr, .node_id = 0, - .logical_node_id = 0}; + .logical_node_id = 0, + .logical_node_type_id = 0, + .reserved_padding0 = 0}; } // namespace void QueueController::add_queue(hsa_queue_t* id, std::unique_ptr queue) { - for(auto& pre_initialize_fn : pre_initialize) - pre_initialize_fn(queue->get_agent(), get_core_table(), get_ext_table()); + for(const auto& itr : context::get_registered_contexts()) + { + if(itr->thread_trace) + itr->thread_trace->resource_init(queue->get_agent(), get_core_table(), get_ext_table()); + } CHECK(queue); _callback_cache.wlock([&](auto& callbacks) { @@ -167,11 +172,16 @@ void QueueController::destroy_queue(hsa_queue_t* id) { if(!id) return; - _queues.wlock([&](auto& map) { - for(auto& deinitialize_fn : pre_deinitialize) + + for(const auto& itr : context::get_registered_contexts()) + { + if(!itr->thread_trace) continue; + + _queues.wlock([&](auto& map) { if(map.find(id) != map.end()) - deinitialize_fn(map.at(id)->get_agent(), get_core_table(), get_ext_table()); - }); + itr->thread_trace->resource_deinit(map.at(id)->get_agent()); + }); + } const auto* queue = get_queue(*id); @@ -254,10 +264,9 @@ QueueController::init(CoreApiTable& core_table, AmdExtTable& ext_table) auto enable_intercepter = false; for(const auto& itr : context::get_registered_contexts()) { - constexpr auto expected_context_size = 200UL; + constexpr auto expected_context_size = 208UL; static_assert( - sizeof(context::context) == - expected_context_size + sizeof(std::shared_ptr), + sizeof(context::context) == expected_context_size, "If you added a new field to context struct, make sure there is a check here if it " "requires queue interception. Once you have done so, increment expected_context_size"); @@ -267,26 +276,12 @@ QueueController::init(CoreApiTable& core_table, AmdExtTable& ext_table) (itr->buffered_tracer && itr->buffered_tracer->domains(ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH)); - if(itr->counter_collection || itr->pc_sampler || has_kernel_tracing) + if(itr->counter_collection || itr->pc_sampler || has_kernel_tracing || + itr->agent_counter_collection || itr->thread_trace) { enable_intercepter = true; break; } - else if(itr->thread_trace) - { - enable_intercepter = true; - std::weak_ptr trace = itr->thread_trace; - - // TODO: Make it wrapper on HSA initialization - pre_initialize.emplace_back( - [trace](const AgentCache& cache, const CoreApiTable& core, const AmdExtTable& ext) { - if(auto locked = trace.lock()) locked->resource_init(cache, core, ext); - }); - pre_deinitialize.emplace_back( - [trace](const AgentCache& cache, const CoreApiTable&, const AmdExtTable&) { - if(auto locked = trace.lock()) locked->resource_deinit(cache); - }); - } } if(enable_intercepter) @@ -395,6 +390,13 @@ queue_controller_init(HsaApiTable* table) CHECK_NOTNULL(get_queue_controller())->init(*table->core_, *table->amd_ext_); } +void +queue_controller_sync() +{ + if(get_queue_controller()) + get_queue_controller()->iterate_queues([](const Queue* _queue) { _queue->sync(); }); +} + void queue_controller_fini() { diff --git a/source/lib/rocprofiler-sdk/hsa/queue_controller.hpp b/source/lib/rocprofiler-sdk/hsa/queue_controller.hpp index d7193904..eb0c7423 100644 --- a/source/lib/rocprofiler-sdk/hsa/queue_controller.hpp +++ b/source/lib/rocprofiler-sdk/hsa/queue_controller.hpp @@ -103,7 +103,6 @@ class QueueController private: using client_id_map_t = std::unordered_map; using agent_cache_map_t = std::unordered_map; - using resource_alloc_t = void(const AgentCache&, const CoreApiTable&, const AmdExtTable&); CoreApiTable _core_table = {}; AmdExtTable _ext_table = {}; @@ -111,9 +110,6 @@ class QueueController common::Synchronized _callback_cache = {}; agent_cache_map_t _supported_agents = {}; common::Synchronized _profiler_serializer; - - std::vector> pre_initialize; - std::vector> pre_deinitialize; }; QueueController* @@ -125,6 +121,9 @@ queue_controller_init(HsaApiTable* table); void queue_controller_fini(); +void +queue_controller_sync(); + void profiler_serializer_kernel_completion_signal(hsa_signal_t queue_block_signal); diff --git a/source/lib/rocprofiler-sdk/hsa/types.hpp b/source/lib/rocprofiler-sdk/hsa/types.hpp index aa38b140..7994393f 100644 --- a/source/lib/rocprofiler-sdk/hsa/types.hpp +++ b/source/lib/rocprofiler-sdk/hsa/types.hpp @@ -134,7 +134,7 @@ struct table_size; // latest version of hsa runtime that has been updated for support by rocprofiler // and the current version of hsa runtime during this compilation -constexpr size_t latest_version = ROCPROFILER_COMPUTE_VERSION(1, 13, 0); +constexpr size_t latest_version = ROCPROFILER_COMPUTE_VERSION(1, 14, 0); constexpr size_t current_version = ROCPROFILER_HSA_RUNTIME_VERSION; // aliases to the template specializations providing the table size info @@ -173,8 +173,25 @@ struct table_size static constexpr size_t amd_ext = 552; # elif HSA_AMD_EXT_API_TABLE_STEP_VERSION == 0x1 static constexpr size_t amd_ext = 560; -# else +# elif HSA_AMD_EXT_API_TABLE_STEP_VERSION == 0x2 static constexpr size_t amd_ext = 568; +# elif HSA_AMD_EXT_API_TABLE_STEP_VERSION > 0x2 + static constexpr size_t amd_ext = 576; +# endif +}; + +// specialization for v1.14 +template <> +struct table_size +{ + static constexpr size_t finalizer_ext = 64; + static constexpr size_t image_ext = 120; + static constexpr size_t core_api_ext = 1016; + static constexpr size_t amd_tool = 64; +# if HSA_AMD_EXT_API_TABLE_STEP_VERSION == 0x2 + static constexpr size_t amd_ext = 568; +# elif HSA_AMD_EXT_API_TABLE_STEP_VERSION > 0x2 + static constexpr size_t amd_ext = 576; # endif }; diff --git a/source/lib/rocprofiler-sdk/marker/marker.cpp b/source/lib/rocprofiler-sdk/marker/marker.cpp index 6a6de2bc..b6efa826 100644 --- a/source/lib/rocprofiler-sdk/marker/marker.cpp +++ b/source/lib/rocprofiler-sdk/marker/marker.cpp @@ -165,9 +165,6 @@ roctx_api_impl::functor(Args... args) return; } - ROCP_FATAL_IF(external_corr_ids.size() < (callback_contexts.size() + buffered_contexts.size())) - << "missing external correlation ids"; - auto ref_count = 2; auto buffer_record = common::init_public_api_struct(buffered_api_data_t{}); auto tracer_data = common::init_public_api_struct(callback_api_data_t{}); diff --git a/source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp b/source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp index 580833d6..d45b3c86 100644 --- a/source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp +++ b/source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp @@ -42,21 +42,21 @@ namespace pc_sampling { namespace ioctl { -// forward declaration -rocprofiler_ioctl_version_info_t& -get_ioctl_version(); - -// IOCTL 1.17 is the first one supporting PC sampling. -#define CHECK_IOCTL_VERSION \ - do \ - { \ - auto ioctl_version = get_ioctl_version(); \ - if(ioctl_version.major_version < 1 || ioctl_version.minor_version < 17) \ - { \ - LOG(ERROR) << "PC sampling unavailable\n"; \ - return ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL; \ - } \ - } while(0) +namespace +{ +#define PC_SAMPLING_IOCTL_BITMASK 0xFFFF + +/** + * @brief Used to determine the version of PC sampling + * IOCTL implementation in the driver. + * + * @todo Remove this once the KFD IOCTL is upstreamed + */ +struct pc_sampling_ioctl_version_t +{ + uint32_t major_version; /// PC sampling IOCTL major version + uint32_t minor_version; /// PC sampling IOCTL minor version +}; int kfd_open() @@ -106,30 +106,136 @@ ioctl(int fd, unsigned long request, void* arg) } // More or less taken from the HsaKmt -rocprofiler_ioctl_version_info_t -query_ioctl_version(void) -{ - rocprofiler_ioctl_version_info_t ioctl_version; - ioctl_version.minor_version = 0; - ioctl_version.major_version = 0; - // If querying the IOCTL version fails, return major_version/minor_version = 0; +/** + * @brief Query KFD IOCTL version. + * + */ +rocprofiler_status_t +get_ioctl_version(rocprofiler_ioctl_version_info_t& ioctl_version) +{ struct kfd_ioctl_get_version_args args = {.major_version = 0, .minor_version = 0}; + if(ioctl(get_kfd_fd(), AMDKFD_IOC_GET_VERSION, &args) != 0) + { + // An error occured while querying KFD IOCTL version. + return ROCPROFILER_STATUS_ERROR; + } + + // Extract KFD IOCTL version + ioctl_version.major_version = args.major_version; + ioctl_version.minor_version = args.minor_version; + return ROCPROFILER_STATUS_SUCCESS; +} - if(ioctl(get_kfd_fd(), AMDKFD_IOC_GET_VERSION, &args) == 0) +/** + * @brief KFD IOCTL PC Sampling API version is provided via + * the `kfd_ioctl_pc_sample_args.version` field by + * @ref ::KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES` IOCTL function. + * The latter function requires @p kfd_gpu_id + * This mechanism is used for internal versioning of the PC sampling + * implementation. + * + * @todo: Remove once KFD IOCTL is upstreamed. + * + * @param[in] kfd_gpu_id - KFD GPU identifier + * @param[out] pcs_ioctl_version - The PC sampling IOCTL version. Invalid if + * the return value is different than ::ROCPROFILER_STATUS_SUCCESS + * @return ::rocprofiler_status_t + */ +rocprofiler_status_t +get_pc_sampling_ioctl_version(uint32_t kfd_gpu_id, pc_sampling_ioctl_version_t& pcs_ioctl_version) +{ + struct kfd_ioctl_pc_sample_args args; + args.op = KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES; + args.gpu_id = kfd_gpu_id; + args.sample_info_ptr = 0; + args.num_sample_info = 0; + args.flags = 0; + args.version = 0; + + auto ret = ioctl(get_kfd_fd(), AMDKFD_IOC_PC_SAMPLE, &args); + + if(ret == -EBUSY) + { + // The ROCProfiler-SDK is used inside the ROCgdb. + // The `KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES` is not executed, + // so the value of the args.version is irrelevant. + // Report that PC sampling cannot be used from within the ROCgdb. + return ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE; + } + else if(ret == -EOPNOTSUPP) + { + // The GPU does not support PC sampling. + return ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE; + } + else if(ret != 0) { - ioctl_version.major_version = args.major_version; - ioctl_version.minor_version = args.minor_version; + // An unexpected error occured, so we cannot be sure if the + // context of the `version` is valid. + return ROCPROFILER_STATUS_ERROR; } - return ioctl_version; + // `version` field contains PC Sampling IOCTL version + auto version = args.version; + // Lower 16 bits represent minor version + pcs_ioctl_version.minor_version = version & PC_SAMPLING_IOCTL_BITMASK; + // Upper 16 bits represent major version + pcs_ioctl_version.major_version = (version >> 16) & PC_SAMPLING_IOCTL_BITMASK; + + return ROCPROFILER_STATUS_SUCCESS; } -rocprofiler_ioctl_version_info_t& -get_ioctl_version() +/** + * @brief Check if PC sampling is supported on the device with @p kfd_gpu_id. + * + * Starting from KFD IOCTL 1.16, KFD delivers beta implementation of the PC sampling. + * Furthermore, ROCProfiler-SDK expects PC sampling IOCTL 0.1 version. + * @todo: Once KFD is upstreamed, ROCProfiler-SDK will rely only on KFD IOCTL version. + * + * @return ::rocprofiler_status_t + * @retval ::ROCPROFILER_STATUS_SUCCESS PC sampling is supported in the driver. + * Other values informs users about the reason why PC sampling is not supported. + */ +rocprofiler_status_t +is_pc_sampling_supported(uint32_t kfd_gpu_id) { - static auto v = query_ioctl_version(); - return v; + // Verify KFD 1.16 version + rocprofiler_ioctl_version_info_t ioctl_version = {.major_version = 0, .minor_version = 0}; + auto status = get_ioctl_version(ioctl_version); + if(status != ROCPROFILER_STATUS_SUCCESS) + return status; + else if(ioctl_version.major_version < 1 || ioctl_version.minor_version < 16) + { + // The KFD IOCTL version is the same for all available devices. + // Thus, emit the message and skip all tests and samples on the system in use. + ROCP_ERROR << "PC sampling unavailable\n"; + return ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL; + } + + // TODO: remove once KFD is upstreamed + // Verify PC sampling IOCTL 0.1 version + pc_sampling_ioctl_version_t pcs_ioctl_version = {.major_version = 0, .minor_version = 0}; + status = get_pc_sampling_ioctl_version(kfd_gpu_id, pcs_ioctl_version); + if(status != ROCPROFILER_STATUS_SUCCESS) + { + // The reason for not emitting the "PC sampling unavailable" message is the following. + // Assume that all devices except one support PC sampling on the system. + // By emitting the message for that one device that doesn't support PC sampling, + // all tests and samples are skipped. Instead, tests and samples will ignore + // that one problematic device and continue using PC sampling on other devices + // that support this feature. + return status; + } + else if(pcs_ioctl_version.major_version < 1 && pcs_ioctl_version.minor_version < 1) + { + // The PC sampling IOCTL version is the same for all available devices. + // Thus, emit the message and skip all tests and samples on the system in use. + ROCP_ERROR << "PC sampling unavailable\n"; + return ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL; + } + + // PC sampling is supported on the device with `kfd_gpu_id`. + return ROCPROFILER_STATUS_SUCCESS; } /** @@ -231,12 +337,13 @@ convert_ioctl_pcs_config_to_rocp(const rocprofiler_ioctl_pc_sampling_info_t& ioc return ROCPROFILER_STATUS_SUCCESS; } +} // namespace rocprofiler_status_t ioctl_query_pcs_configs(const rocprofiler_agent_t* agent, rocp_pcs_cfgs_vec_t& rocp_configs) { - // Assert the IOCTL version - CHECK_IOCTL_VERSION; + if(auto status = is_pc_sampling_supported(agent->gpu_id); status != ROCPROFILER_STATUS_SUCCESS) + return status; uint32_t kfd_gpu_id = agent->gpu_id; @@ -337,8 +444,8 @@ ioctl_pcs_create(const rocprofiler_agent_t* agent, uint64_t interval, uint32_t* ioctl_pcs_id) { - // Assert the IOCTL version - CHECK_IOCTL_VERSION; + if(auto status = is_pc_sampling_supported(agent->gpu_id); status != ROCPROFILER_STATUS_SUCCESS) + return status; rocprofiler_ioctl_pc_sampling_info_t ioctl_cfg; auto ret = create_ioctl_pcs_config_from_rocp(ioctl_cfg, method, unit, interval); diff --git a/source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp b/source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp index 0add87bb..c697bef5 100644 --- a/source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp +++ b/source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp @@ -198,9 +198,9 @@ add_upcoming_samples(const device_handle device, samples[p].correlation_id = corr_map->get(device, trap); } catch(std::exception& e) { - samples[p].correlation_id = {.internal = ROCPROFILER_CORRELATION_ID_VALUE_NONE, + samples[p].correlation_id = {.internal = ROCPROFILER_CORRELATION_ID_INTERNAL_NONE, .external = rocprofiler_user_data_t{ - .value = ROCPROFILER_CORRELATION_ID_VALUE_NONE}}; + .value = ROCPROFILER_CORRELATION_ID_INTERNAL_NONE}}; status = PCSAMPLE_STATUS_PARSER_ERROR; } } diff --git a/source/lib/rocprofiler-sdk/pc_sampling/parser/pc_record_interface.cpp b/source/lib/rocprofiler-sdk/pc_sampling/parser/pc_record_interface.cpp index abaaac84..278866ac 100644 --- a/source/lib/rocprofiler-sdk/pc_sampling/parser/pc_record_interface.cpp +++ b/source/lib/rocprofiler-sdk/pc_sampling/parser/pc_record_interface.cpp @@ -114,4 +114,4 @@ PCSamplingParserContext::generate_upcoming_pc_record( buff->emplace(ROCPROFILER_BUFFER_CATEGORY_PC_SAMPLING, ROCPROFILER_PC_SAMPLING_RECORD_SAMPLE, samples[i]); -}; \ No newline at end of file +} diff --git a/source/lib/rocprofiler-sdk/pc_sampling/service.cpp b/source/lib/rocprofiler-sdk/pc_sampling/service.cpp index 1e303749..8e16fde6 100644 --- a/source/lib/rocprofiler-sdk/pc_sampling/service.cpp +++ b/source/lib/rocprofiler-sdk/pc_sampling/service.cpp @@ -162,6 +162,20 @@ configure_pc_sampling_service(context::context* ctx, uint64_t interval, rocprofiler_buffer_id_t buffer_id) { + // FIXME: PC Sampling cannot be used simultaneously with counter collection. + // PC sampling requires clock gating to be disabled on MI2xx and MI3xx, + // otherwise a weird GPU hang might appear and a machine must be rebooted. + // Current implementation of (dispatch) counter collection service assumes disabling + // the clock gating before dispatching a kernel and reenabling the clock gating + // after kernel completion. Consequently, if PC sampling is active, (dispatch) + // counter collection service can enable clock gating and hang might appear. + // As a workaround, PC sampling and (dispatch) counter collection service + // cannot coexist in the same context. + if(ctx->counter_collection || ctx->agent_counter_collection) + { + return ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT; + } + if(!ctx->pc_sampler) { ctx->pc_sampler = std::make_unique(); diff --git a/source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt b/source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt index c78649f2..a4d973ed 100644 --- a/source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt +++ b/source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt @@ -5,7 +5,7 @@ include(GoogleTest) set(ROCPROFILER_LIB_PC_SAMPLING_TEST_SOURCES configure_service.cpp cid_manager.cpp # samples_processing.cpp - query_configuration.cpp) + pc_sampling_vs_counter_collection.cpp query_configuration.cpp) set(ROCPROFILER_LIB_PC_SAMPLING_TEST_HEADERS pc_sampling_internals.hpp) add_executable(pcs-test) diff --git a/source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_internals.hpp b/source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_internals.hpp index a752c249..d7e532c1 100644 --- a/source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_internals.hpp +++ b/source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_internals.hpp @@ -60,4 +60,4 @@ get_active_pc_sampling_service(); } // namespace hsa } // namespace pc_sampling -} // namespace rocprofiler \ No newline at end of file +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_vs_counter_collection.cpp b/source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_vs_counter_collection.cpp new file mode 100644 index 00000000..79b8610d --- /dev/null +++ b/source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_vs_counter_collection.cpp @@ -0,0 +1,514 @@ +// MIT License +// +// Copyright (c) 2023 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include "lib/common/utility.hpp" + +#include +#include +#include +#include +#include + +#include + +#include +#include + +namespace +{ +constexpr size_t BUFFER_SIZE_BYTES = 8192; +constexpr size_t WATERMARK = (BUFFER_SIZE_BYTES / 4); + +#define ROCPROFILER_CALL(ARG, MSG) \ + { \ + auto _status = (ARG); \ + EXPECT_EQ(_status, ROCPROFILER_STATUS_SUCCESS) << MSG << " :: " << #ARG; \ + } + +using cc_setup_fn_t = std::function; + +struct callback_data +{ + rocprofiler_client_id_t* client_id = nullptr; + rocprofiler_client_finalize_t client_fini_func = nullptr; + rocprofiler_context_id_t client_ctx = {}; + rocprofiler_buffer_id_t client_buffer = {}; + rocprofiler_callback_thread_t client_thread = {}; + uint64_t client_workflow_count = {}; + uint64_t client_callback_count = {}; + int64_t current_depth = 0; + int64_t max_depth = 0; + std::map client_correlation = {}; + std::vector gpu_pcs_agents = {}; + cc_setup_fn_t cc_setup_fn = {}; +}; + +struct agent_data +{ + uint64_t agent_count = 0; + std::vector agents = {}; +}; + +// =========================== Functions related to the PC sampling service + +bool +is_pc_sampling_supported(rocprofiler_agent_id_t agent_id) +{ + auto cb = [](const rocprofiler_pc_sampling_configuration_t* configs, + size_t num_config, + void* user_data) { + auto* avail_configs = + static_cast*>(user_data); + // printf("The agent with the id: %lu supports the %lu configurations: \n", + // agent_id_.handle, num_config); + for(size_t i = 0; i < num_config; i++) + { + avail_configs->emplace_back(configs[i]); + } + return ROCPROFILER_STATUS_SUCCESS; + }; + + std::vector configs; + auto status = rocprofiler_query_pc_sampling_agent_configurations(agent_id, cb, &configs); + + if(status != ROCPROFILER_STATUS_SUCCESS) + { + // PC sampling is not supported + return false; + } + else if(configs.size() > 0) + { + return true; + } + else + { + return false; + } +} + +rocprofiler_status_t +find_all_gpu_agents_supporting_pc_sampling_impl(rocprofiler_agent_version_t version, + const void** agents, + size_t num_agents, + void* user_data) +{ + EXPECT_EQ(version, ROCPROFILER_AGENT_INFO_VERSION_0); + + // user_data represent the pointer to the array where gpu_agent will be stored + if(!user_data) return ROCPROFILER_STATUS_ERROR; + + auto* _out_agents = static_cast*>(user_data); + auto* _agents = reinterpret_cast(agents); + for(size_t i = 0; i < num_agents; i++) + { + if(_agents[i]->type == ROCPROFILER_AGENT_TYPE_GPU) + { + if(is_pc_sampling_supported(_agents[i]->id)) _out_agents->push_back(_agents[i]); + + printf("[%s] %s :: id=%zu, type=%i\n", + __FUNCTION__, + _agents[i]->name, + _agents[i]->id.handle, + _agents[i]->type); + } + else + { + printf("[%s] %s :: id=%zu, type=%i\n", + __FUNCTION__, + _agents[i]->name, + _agents[i]->id.handle, + _agents[i]->type); + } + } + + return ROCPROFILER_STATUS_SUCCESS; +} + +const rocprofiler_pc_sampling_configuration_t +extract_pc_sampling_config_prefer_stochastic(rocprofiler_agent_id_t agent_id) +{ + auto cb = [](const rocprofiler_pc_sampling_configuration_t* configs, + size_t num_config, + void* user_data) { + auto* avail_configs = + static_cast*>(user_data); + // printf("The agent with the id: %lu supports the %lu configurations: \n", + // agent_id_.handle, num_config); + for(size_t i = 0; i < num_config; i++) + { + avail_configs->emplace_back(configs[i]); + } + return ROCPROFILER_STATUS_SUCCESS; + }; + std::vector configs; + ROCPROFILER_CALL(rocprofiler_query_pc_sampling_agent_configurations(agent_id, cb, &configs), + "Failed to query available configurations"); + + const rocprofiler_pc_sampling_configuration_t* first_host_trap_config = nullptr; + const rocprofiler_pc_sampling_configuration_t* first_stochastic_config = nullptr; + // Search until encountering on the stochastic configuration, if any. + // Otherwise, use the host trap config + for(auto const& cfg : configs) + { + if(cfg.method == ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC) + { + first_stochastic_config = &cfg; + break; + } + else if(!first_host_trap_config && cfg.method == ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP) + { + first_host_trap_config = &cfg; + } + } + + // Check if the stochastic config is found. Use host trap config otherwise. + const rocprofiler_pc_sampling_configuration_t* picked_cfg = + (first_stochastic_config != nullptr) ? first_stochastic_config : first_host_trap_config; + + return *picked_cfg; +} + +void +rocprofiler_pc_sampling_callback(rocprofiler_context_id_t /*context_id*/, + rocprofiler_buffer_id_t /*buffer_id*/, + rocprofiler_record_header_t** /*headers*/, + size_t /*num_headers*/, + void* /*data*/, + uint64_t /*drop_count*/) +{} + +// =================== Functions related to the counter collection service +void +record_callback(rocprofiler_profile_counting_dispatch_data_t /*dispatch_data*/, + rocprofiler_record_counter_t* /*record_data*/, + size_t /*record_count*/, + rocprofiler_user_data_t /*user_data*/, + void* /*callback_data_args*/) +{} + +void +dispatch_callback(rocprofiler_profile_counting_dispatch_data_t /*dispatch_data*/, + rocprofiler_profile_config_id_t* /*config*/, + rocprofiler_user_data_t* /*user_data*/, + void* /*callback_data_args*/) +{} + +void +set_profile(rocprofiler_context_id_t /*context_id*/, + rocprofiler_agent_id_t /*agent*/, + rocprofiler_agent_set_profile_callback_t /*set_config*/, + void*) +{} + +void +rocprofiler_counter_collection_callback(rocprofiler_context_id_t /*context_id*/, + rocprofiler_buffer_id_t /*buffer_id*/, + rocprofiler_record_header_t** /*headers*/, + size_t /*num_headers*/, + void* /*data*/, + uint64_t /*drop_count*/) +{} + +using cc_setup_fn_t = std::function; + +void +pc_sampling_vs_counter_collection(cc_setup_fn_t cc_setup_fn) +{ + using init_func_t = int (*)(rocprofiler_client_finalize_t, void*); + using fini_func_t = void (*)(void*); + + // using hsa_iterate_agents_cb_t = hsa_status_t (*)(hsa_agent_t, void*); + + auto cmd_line = rocprofiler::common::read_command_line(getpid()); + ASSERT_FALSE(cmd_line.empty()); + + static init_func_t tool_init = [](rocprofiler_client_finalize_t fini_func, + void* client_data) -> int { + auto* cb_data = static_cast(client_data); + + cb_data->client_workflow_count++; + cb_data->client_fini_func = fini_func; + + // This function returns the all gpu agents supporting some kind of PC sampling + EXPECT_EQ( + rocprofiler_query_available_agents(ROCPROFILER_AGENT_INFO_VERSION_0, + &find_all_gpu_agents_supporting_pc_sampling_impl, + sizeof(rocprofiler_agent_t), + static_cast(&cb_data->gpu_pcs_agents)), + ROCPROFILER_STATUS_SUCCESS); + + if(cb_data->gpu_pcs_agents.size() == 0) + { + ROCP_ERROR << "PC sampling unavailable\n"; + exit(0); + } + + EXPECT_EQ(rocprofiler_create_context(&cb_data->client_ctx), ROCPROFILER_STATUS_SUCCESS); + + // Create PC sampling buffer + EXPECT_EQ(rocprofiler_create_buffer(cb_data->client_ctx, + BUFFER_SIZE_BYTES, + WATERMARK, + ROCPROFILER_BUFFER_POLICY_LOSSLESS, + rocprofiler_pc_sampling_callback, + client_data, + &cb_data->client_buffer), + ROCPROFILER_STATUS_SUCCESS); + + // Configure counter collection service. + // Agent counter collection service is configured on the first listed GPU device. + cb_data->cc_setup_fn(cb_data->client_ctx, cb_data->gpu_pcs_agents.at(0)->id); + + // Configuring PC sampling service should fail + for(const auto* agent : cb_data->gpu_pcs_agents) + { + const auto agent_id = agent->id; + const auto pcs_config = extract_pc_sampling_config_prefer_stochastic(agent_id); + + size_t interval = pcs_config.max_interval; + + // This calls succeeds + EXPECT_EQ(rocprofiler_configure_pc_sampling_service(cb_data->client_ctx, + agent_id, + pcs_config.method, + pcs_config.unit, + interval, + cb_data->client_buffer), + ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT); + } + + // no errors + return 0; + }; + + static fini_func_t tool_fini = [](void* client_data) -> void { + auto* cb_data = static_cast(client_data); + EXPECT_EQ(rocprofiler_stop_context(cb_data->client_ctx), ROCPROFILER_STATUS_SUCCESS); + + static_cast(client_data)->client_workflow_count++; + }; + + static auto cb_data = callback_data{}; + cb_data.cc_setup_fn = cc_setup_fn; + + static auto cfg_result = + rocprofiler_tool_configure_result_t{sizeof(rocprofiler_tool_configure_result_t), + tool_init, + tool_fini, + static_cast(&cb_data)}; + + static rocprofiler_configure_func_t rocp_init = + [](uint32_t version, + const char* runtime_version, + uint32_t prio, + rocprofiler_client_id_t* client_id) -> rocprofiler_tool_configure_result_t* { + auto expected_version = ROCPROFILER_VERSION; + EXPECT_EQ(expected_version, version); + EXPECT_EQ(std::string_view{runtime_version}, std::string_view{ROCPROFILER_VERSION_STRING}); + EXPECT_EQ(prio, 0); + EXPECT_EQ(client_id->name, nullptr); + cb_data.client_id = client_id; + cb_data.client_id->name = ::testing::UnitTest::GetInstance()->current_test_info()->name(); + + return &cfg_result; + }; + + EXPECT_EQ(rocprofiler_force_configure(rocp_init), ROCPROFILER_STATUS_SUCCESS); +} + +void +counter_collection_vs_pc_sampling(cc_setup_fn_t cc_setup_fn) +{ + using init_func_t = int (*)(rocprofiler_client_finalize_t, void*); + using fini_func_t = void (*)(void*); + + // using hsa_iterate_agents_cb_t = hsa_status_t (*)(hsa_agent_t, void*); + + auto cmd_line = rocprofiler::common::read_command_line(getpid()); + ASSERT_FALSE(cmd_line.empty()); + + static init_func_t tool_init = [](rocprofiler_client_finalize_t fini_func, + void* client_data) -> int { + auto* cb_data = static_cast(client_data); + + cb_data->client_workflow_count++; + cb_data->client_fini_func = fini_func; + + // This function returns the all gpu agents supporting some kind of PC sampling + EXPECT_EQ( + rocprofiler_query_available_agents(ROCPROFILER_AGENT_INFO_VERSION_0, + &find_all_gpu_agents_supporting_pc_sampling_impl, + sizeof(rocprofiler_agent_t), + static_cast(&cb_data->gpu_pcs_agents)), + ROCPROFILER_STATUS_SUCCESS); + + if(cb_data->gpu_pcs_agents.size() == 0) + { + ROCP_ERROR << "PC sampling unavailable\n"; + exit(0); + } + + EXPECT_EQ(rocprofiler_create_context(&cb_data->client_ctx), ROCPROFILER_STATUS_SUCCESS); + + // Create PC sampling buffer + EXPECT_EQ(rocprofiler_create_buffer(cb_data->client_ctx, + BUFFER_SIZE_BYTES, + WATERMARK, + ROCPROFILER_BUFFER_POLICY_LOSSLESS, + rocprofiler_pc_sampling_callback, + client_data, + &cb_data->client_buffer), + ROCPROFILER_STATUS_SUCCESS); + + // Configuring PC sampling service first + for(const auto* agent : cb_data->gpu_pcs_agents) + { + const auto agent_id = agent->id; + const auto pcs_config = extract_pc_sampling_config_prefer_stochastic(agent_id); + + size_t interval = pcs_config.max_interval; + + // This calls succeeds + EXPECT_EQ(rocprofiler_configure_pc_sampling_service(cb_data->client_ctx, + agent_id, + pcs_config.method, + pcs_config.unit, + interval, + cb_data->client_buffer), + ROCPROFILER_STATUS_SUCCESS); + } + + // Configuring counter collection service on the first listed GPU agent should fail + cb_data->cc_setup_fn(cb_data->client_ctx, cb_data->gpu_pcs_agents.at(0)->id); + + // no errors + return 0; + }; + + static fini_func_t tool_fini = [](void* client_data) -> void { + auto* cb_data = static_cast(client_data); + EXPECT_EQ(rocprofiler_stop_context(cb_data->client_ctx), ROCPROFILER_STATUS_SUCCESS); + + static_cast(client_data)->client_workflow_count++; + }; + + static auto cb_data = callback_data{}; + cb_data.cc_setup_fn = cc_setup_fn; + + static auto cfg_result = + rocprofiler_tool_configure_result_t{sizeof(rocprofiler_tool_configure_result_t), + tool_init, + tool_fini, + static_cast(&cb_data)}; + + static rocprofiler_configure_func_t rocp_init = + [](uint32_t version, + const char* runtime_version, + uint32_t prio, + rocprofiler_client_id_t* client_id) -> rocprofiler_tool_configure_result_t* { + auto expected_version = ROCPROFILER_VERSION; + EXPECT_EQ(expected_version, version); + EXPECT_EQ(std::string_view{runtime_version}, std::string_view{ROCPROFILER_VERSION_STRING}); + EXPECT_EQ(prio, 0); + EXPECT_EQ(client_id->name, nullptr); + cb_data.client_id = client_id; + cb_data.client_id->name = ::testing::UnitTest::GetInstance()->current_test_info()->name(); + + return &cfg_result; + }; + + EXPECT_EQ(rocprofiler_force_configure(rocp_init), ROCPROFILER_STATUS_SUCCESS); +} + +} // namespace + +TEST(pc_sampling, pc_sampling_vs_dispatch_counter_collection) +{ + auto dispatch_counter_collection_setup_fn = [](rocprofiler_context_id_t context_id, + rocprofiler_agent_id_t /*agent_id*/) { + // Configure dispatch counter collection service on all agents + EXPECT_EQ(rocprofiler_configure_callback_dispatch_profile_counting_service( + context_id, dispatch_callback, nullptr, record_callback, nullptr), + ROCPROFILER_STATUS_SUCCESS); + }; + + pc_sampling_vs_counter_collection(dispatch_counter_collection_setup_fn); +} + +TEST(pc_sampling, pc_sampling_vs_agent_counter_collection) +{ + auto agent_counter_collection_setup_fn = [](rocprofiler_context_id_t context_id, + rocprofiler_agent_id_t agent_id) { + rocprofiler_buffer_id_t cc_buf_id; + // Create PC sampling buffer + EXPECT_EQ(rocprofiler_create_buffer(context_id, + BUFFER_SIZE_BYTES, + WATERMARK, + ROCPROFILER_BUFFER_POLICY_LOSSLESS, + rocprofiler_counter_collection_callback, + nullptr, + &cc_buf_id), + ROCPROFILER_STATUS_SUCCESS); + + EXPECT_EQ(rocprofiler_configure_agent_profile_counting_service( + context_id, cc_buf_id, agent_id, set_profile, nullptr), + ROCPROFILER_STATUS_SUCCESS); + }; + + pc_sampling_vs_counter_collection(agent_counter_collection_setup_fn); +} + +TEST(pc_sampling, dispatch_counter_collection_vs_pc_sampling) +{ + auto dispatch_counter_collection_setup_fn = [](rocprofiler_context_id_t context_id, + rocprofiler_agent_id_t /*agent_id*/) { + // Configure dispatch counter collection service on all agents + EXPECT_EQ(rocprofiler_configure_callback_dispatch_profile_counting_service( + context_id, dispatch_callback, nullptr, record_callback, nullptr), + ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT); + }; + + counter_collection_vs_pc_sampling(dispatch_counter_collection_setup_fn); +} + +TEST(pc_sampling, agent_counter_collection_vs_pc_sampling) +{ + auto agent_counter_collection_setup_fn = [](rocprofiler_context_id_t context_id, + rocprofiler_agent_id_t agent_id) { + rocprofiler_buffer_id_t cc_buf_id; + // Create PC sampling buffer + EXPECT_EQ(rocprofiler_create_buffer(context_id, + BUFFER_SIZE_BYTES, + WATERMARK, + ROCPROFILER_BUFFER_POLICY_LOSSLESS, + rocprofiler_counter_collection_callback, + nullptr, + &cc_buf_id), + ROCPROFILER_STATUS_SUCCESS); + + EXPECT_EQ(rocprofiler_configure_agent_profile_counting_service( + context_id, cc_buf_id, agent_id, set_profile, nullptr), + ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT); + }; + + counter_collection_vs_pc_sampling(agent_counter_collection_setup_fn); +} diff --git a/source/lib/rocprofiler-sdk/profile_config.cpp b/source/lib/rocprofiler-sdk/profile_config.cpp index 86aa4524..b68fda2c 100644 --- a/source/lib/rocprofiler-sdk/profile_config.cpp +++ b/source/lib/rocprofiler-sdk/profile_config.cpp @@ -22,15 +22,14 @@ #include #include +#include +#include -#include "lib/common/synchronized.hpp" #include "lib/common/utility.hpp" #include "lib/rocprofiler-sdk/agent.hpp" -#include "lib/rocprofiler-sdk/aql/helpers.hpp" +#include "lib/rocprofiler-sdk/counters/controller.hpp" #include "lib/rocprofiler-sdk/counters/core.hpp" -#include "lib/rocprofiler-sdk/counters/evaluate_ast.hpp" #include "lib/rocprofiler-sdk/counters/metrics.hpp" -#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" extern "C" { /** @@ -39,7 +38,9 @@ extern "C" { * @param [in] agent Agent identifier * @param [in] counters_list List of GPU counters * @param [in] counters_count Size of counters list - * @param [out] config_id Identifier for GPU counters group + * @param [in/out] config_id Identifier for GPU counters group. If an existing + profile is supplied, that profiles counters will be copied + over to a new profile (returned via this id). * @return ::rocprofiler_status_t */ rocprofiler_status_t @@ -48,7 +49,8 @@ rocprofiler_create_profile_config(rocprofiler_agent_id_t agent_id, size_t counters_count, rocprofiler_profile_config_id_t* config_id) { - const auto* agent = ::rocprofiler::agent::get_agent(agent_id); + std::unordered_set already_added; + const auto* agent = ::rocprofiler::agent::get_agent(agent_id); if(!agent) return ROCPROFILER_STATUS_ERROR_AGENT_NOT_FOUND; std::shared_ptr config = @@ -61,6 +63,9 @@ rocprofiler_create_profile_config(rocprofiler_agent_id_t agent_id, const auto* metric_ptr = rocprofiler::common::get_val(id_map, counter_id.handle); if(!metric_ptr) return ROCPROFILER_STATUS_ERROR_COUNTER_NOT_FOUND; + // Don't add duplicates + if(!already_added.emplace(metric_ptr->id()).second) continue; + if(!rocprofiler::counters::checkValidMetric(std::string(agent->name), *metric_ptr)) { return ROCPROFILER_STATUS_ERROR_METRIC_NOT_VALID_FOR_AGENT; @@ -68,8 +73,26 @@ rocprofiler_create_profile_config(rocprofiler_agent_id_t agent_id, config->metrics.push_back(*metric_ptr); } - config->agent = agent; - config_id->handle = rocprofiler::counters::create_counter_profile(std::move(config)); + if(config_id->handle != 0) + { + // Copy existing counters from previous config + if(auto existing = rocprofiler::counters::get_profile_config(*config_id)) + { + for(const auto& metric : existing->metrics) + { + if(!already_added.emplace(metric.id()).second) continue; + config->metrics.push_back(metric); + } + } + } + + config->agent = agent; + if(auto status = rocprofiler::counters::create_counter_profile(config); + status != ROCPROFILER_STATUS_SUCCESS) + { + return status; + } + *config_id = config->id; return ROCPROFILER_STATUS_SUCCESS; } diff --git a/source/lib/rocprofiler-sdk/registration.cpp b/source/lib/rocprofiler-sdk/registration.cpp index 76251c97..8561c959 100644 --- a/source/lib/rocprofiler-sdk/registration.cpp +++ b/source/lib/rocprofiler-sdk/registration.cpp @@ -486,6 +486,9 @@ invoke_client_finalizer(rocprofiler_client_id_t client_id) rocprofiler_tool_finalize_t _finalize_func = nullptr; std::swap(_finalize_func, itr->configure_result->finalize); + hsa::async_copy_sync(); + hsa::queue_controller_sync(); + auto _fini_status = get_fini_status(); if(_fini_status == 0) set_fini_status(-1); _finalize_func(itr->configure_result->tool_data); @@ -759,6 +762,7 @@ rocprofiler_set_api_table(const char* name, rocprofiler::hsa::async_copy_init(hsa_api_table, lib_instance); rocprofiler::code_object::initialize(hsa_api_table); + rocprofiler::thread_trace::code_object::initialize(hsa_api_table); #if ROCPROFILER_SDK_HSA_PC_SAMPLING > 0 rocprofiler::pc_sampling::code_object::initialize(hsa_api_table); #endif diff --git a/source/lib/rocprofiler-sdk/rocprofiler.cpp b/source/lib/rocprofiler-sdk/rocprofiler.cpp index d5a1ed1e..8ab8274b 100644 --- a/source/lib/rocprofiler-sdk/rocprofiler.cpp +++ b/source/lib/rocprofiler-sdk/rocprofiler.cpp @@ -108,9 +108,10 @@ ROCPROFILER_STATUS_STRING(ROCPROFILER_STATUS_ERROR_NO_HARDWARE_COUNTERS, ROCPROFILER_STATUS_STRING(ROCPROFILER_STATUS_ERROR_AGENT_MISMATCH, "Counter profile agent does not match the agent in the context") ROCPROFILER_STATUS_STRING(ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE, - "The service is not available." - "Please refer to API functions that return this status code" - "for more information.") + "The service is not available. Please refer to API functions that return " + "this status code for more information.") +ROCPROFILER_STATUS_STRING(ROCPROFILER_STATUS_ERROR_EXCEEDS_HW_LIMIT, + "Request exceeds the capabilities of the hardware to collect") template const char* diff --git a/source/lib/rocprofiler-sdk/tests/CMakeLists.txt b/source/lib/rocprofiler-sdk/tests/CMakeLists.txt index 68f4fec3..8303de58 100644 --- a/source/lib/rocprofiler-sdk/tests/CMakeLists.txt +++ b/source/lib/rocprofiler-sdk/tests/CMakeLists.txt @@ -20,7 +20,10 @@ target_link_libraries( rocprofiler-lib-tests PRIVATE rocprofiler-sdk::rocprofiler-static-library rocprofiler-sdk::rocprofiler-common-library - rocprofiler-sdk::rocprofiler-hsa-runtime GTest::gtest GTest::gtest_main) + rocprofiler-sdk::counter-test-constants + rocprofiler-sdk::rocprofiler-hsa-runtime + GTest::gtest + GTest::gtest_main) gtest_add_tests( TARGET rocprofiler-lib-tests diff --git a/source/lib/rocprofiler-sdk/tests/agent.cpp b/source/lib/rocprofiler-sdk/tests/agent.cpp index 538be36b..ff4a0a93 100644 --- a/source/lib/rocprofiler-sdk/tests/agent.cpp +++ b/source/lib/rocprofiler-sdk/tests/agent.cpp @@ -103,9 +103,11 @@ TEST(rocprofiler_lib, agent_abi) EXPECT_EQ(offsetof(rocprofiler_agent_t, model_name), 272) << msg; EXPECT_EQ(offsetof(rocprofiler_agent_t, node_id), 280) << msg; EXPECT_EQ(offsetof(rocprofiler_agent_t, logical_node_id), 284) << msg; + EXPECT_EQ(offsetof(rocprofiler_agent_t, logical_node_type_id), 288) << msg; + EXPECT_EQ(offsetof(rocprofiler_agent_t, reserved_padding0), 292) << msg; // Add test for offset of new field above this. Do NOT change any existing values! - constexpr auto expected_rocp_agent_size = 288; + constexpr auto expected_rocp_agent_size = 296; // If a new field is added, increase this value by the size of the new field(s) EXPECT_EQ(sizeof(rocprofiler_agent_t), expected_rocp_agent_size) << "ABI break. If you added a new field, make sure that this is the only new check that " diff --git a/source/lib/rocprofiler-sdk/tests/hsa_barrier.cpp b/source/lib/rocprofiler-sdk/tests/hsa_barrier.cpp index b567c5c7..91361e8d 100644 --- a/source/lib/rocprofiler-sdk/tests/hsa_barrier.cpp +++ b/source/lib/rocprofiler-sdk/tests/hsa_barrier.cpp @@ -1,3 +1,33 @@ +// MIT License +// +// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. +#include "lib/rocprofiler-sdk/hsa/hsa_barrier.hpp" +#include "lib/rocprofiler-sdk/agent.hpp" +#include "lib/rocprofiler-sdk/context/context.hpp" +#include "lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp" +#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" +#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" +#include "lib/rocprofiler-sdk/registration.hpp" + +#include #include #include @@ -6,57 +36,12 @@ #include #include -#include - -#include "lib/rocprofiler-sdk/agent.hpp" -#include "lib/rocprofiler-sdk/context/context.hpp" -#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" -#include "lib/rocprofiler-sdk/hsa/hsa_barrier.hpp" -#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" -#include "lib/rocprofiler-sdk/registration.hpp" - using namespace rocprofiler; using namespace rocprofiler::hsa; +using namespace rocprofiler::counters::test_constants; namespace { -AmdExtTable& -get_ext_table() -{ - static auto _v = []() { - auto val = AmdExtTable{}; - val.hsa_amd_memory_pool_get_info_fn = hsa_amd_memory_pool_get_info; - val.hsa_amd_agent_iterate_memory_pools_fn = hsa_amd_agent_iterate_memory_pools; - val.hsa_amd_memory_pool_allocate_fn = hsa_amd_memory_pool_allocate; - val.hsa_amd_memory_pool_free_fn = hsa_amd_memory_pool_free; - val.hsa_amd_agent_memory_pool_get_info_fn = hsa_amd_agent_memory_pool_get_info; - val.hsa_amd_agents_allow_access_fn = hsa_amd_agents_allow_access; - return val; - }(); - return _v; -} - -CoreApiTable& -get_api_table() -{ - static auto _v = []() { - auto val = CoreApiTable{}; - val.hsa_iterate_agents_fn = hsa_iterate_agents; - val.hsa_agent_get_info_fn = hsa_agent_get_info; - val.hsa_queue_create_fn = hsa_queue_create; - val.hsa_queue_destroy_fn = hsa_queue_destroy; - val.hsa_signal_create_fn = hsa_signal_create; - val.hsa_signal_destroy_fn = hsa_signal_destroy; - val.hsa_signal_store_screlease_fn = hsa_signal_store_screlease; - val.hsa_signal_load_scacquire_fn = hsa_signal_load_scacquire; - val.hsa_signal_add_relaxed_fn = hsa_signal_add_relaxed; - val.hsa_signal_subtract_relaxed_fn = hsa_signal_subtract_relaxed; - val.hsa_signal_wait_relaxed_fn = hsa_signal_wait_relaxed; - return val; - }(); - return _v; -} - namespace rocprofiler { namespace hsa diff --git a/source/lib/rocprofiler-sdk/tests/intercept_table.cpp b/source/lib/rocprofiler-sdk/tests/intercept_table.cpp index 2ab135f3..311b7eb4 100644 --- a/source/lib/rocprofiler-sdk/tests/intercept_table.cpp +++ b/source/lib/rocprofiler-sdk/tests/intercept_table.cpp @@ -228,11 +228,8 @@ TEST(rocprofiler_lib, intercept_table_and_callback_tracing) static auto& cb_data = get_client_callback_data(); - static auto cfg_result = - rocprofiler_tool_configure_result_t{sizeof(rocprofiler_tool_configure_result_t), - tool_init, - tool_fini, - static_cast(&cb_data)}; + static auto cfg_result = rocprofiler_tool_configure_result_t{ + sizeof(rocprofiler_tool_configure_result_t), tool_init, tool_fini, &cb_data}; static rocprofiler_configure_func_t rocp_init = [](uint32_t version, @@ -251,7 +248,7 @@ TEST(rocprofiler_lib, intercept_table_and_callback_tracing) { ROCPROFILER_CALL_EXPECT( rocprofiler_at_intercept_table_registration( - api_registration_callback, itr, static_cast(&cb_data)), + api_registration_callback, itr, &cb_data), "test should be updated if new (non-HSA, non-HIP) intercept table is supported", ROCPROFILER_STATUS_SUCCESS); } @@ -273,9 +270,11 @@ TEST(rocprofiler_lib, intercept_table_and_callback_tracing) return status; }; + hsa_init(); hsa_init(); auto _agent_data = agent_data{}; - hsa_status_t itr_status = hsa_iterate_agents(agent_cb, static_cast(&_agent_data)); + hsa_status_t itr_status = hsa_iterate_agents(agent_cb, &_agent_data); + hsa_shut_down(); hsa_shut_down(); EXPECT_EQ(itr_status, HSA_STATUS_SUCCESS); @@ -300,16 +299,7 @@ TEST(rocprofiler_lib, intercept_table_and_callback_tracing) EXPECT_EQ(itr.second.first, itr.second.second) << "mismatched wrap counts for " << itr.first << " (lhs=tool_wrapper, rhs=rocprofiler_wrapper)"; - if(itr.first != "hsa_init") - { - EXPECT_GT(itr.second.first, 0) << itr.first << " not wrapped"; - } - else - { - EXPECT_EQ(itr.second.first, 0) << itr.first - << " was wrapped. If hsa runtime has been updated to " - "include first call to hsa_init, update this test"; - } + EXPECT_GT(itr.second.first, 0) << itr.first << " not wrapped"; } auto get_count = [](std::string_view func_name) { @@ -317,10 +307,10 @@ TEST(rocprofiler_lib, intercept_table_and_callback_tracing) return cb_data.client_callback_count.at(func_name).first; }; - EXPECT_EQ(get_count("hsa_init"), 0); + EXPECT_EQ(get_count("hsa_init"), 1); EXPECT_EQ(get_count("hsa_iterate_agents"), 1); EXPECT_EQ(get_count("hsa_agent_get_info"), _agent_data.agent_count); - EXPECT_EQ(get_count("hsa_shut_down"), 1); + EXPECT_EQ(get_count("hsa_shut_down"), 2); } TEST(rocprofiler_lib, intercept_table_and_callback_tracing_disable_context) @@ -392,11 +382,8 @@ TEST(rocprofiler_lib, intercept_table_and_callback_tracing_disable_context) static auto& cb_data = get_client_callback_data(); cb_data = callback_data_ext{}; - static auto cfg_result = - rocprofiler_tool_configure_result_t{sizeof(rocprofiler_tool_configure_result_t), - tool_init, - tool_fini, - static_cast(&cb_data)}; + static auto cfg_result = rocprofiler_tool_configure_result_t{ + sizeof(rocprofiler_tool_configure_result_t), tool_init, tool_fini, &cb_data}; static rocprofiler_configure_func_t rocp_init = [](uint32_t version, @@ -415,7 +402,7 @@ TEST(rocprofiler_lib, intercept_table_and_callback_tracing_disable_context) { ROCPROFILER_CALL_EXPECT( rocprofiler_at_intercept_table_registration( - api_registration_callback, itr, static_cast(&cb_data)), + api_registration_callback, itr, &cb_data), "test should be updated if new (non-HSA, non-HIP) intercept table is supported", ROCPROFILER_STATUS_SUCCESS); } diff --git a/source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt b/source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt index 802ad64f..60f552d6 100644 --- a/source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt +++ b/source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt @@ -1,5 +1,6 @@ -set(ROCPROFILER_LIB_THREAD_TRACE_SOURCES att_core.cpp att_service.cpp att_parser.cpp) -set(ROCPROFILER_LIB_THREAD_TRACE_HEADERS att_core.hpp) +set(ROCPROFILER_LIB_THREAD_TRACE_SOURCES att_core.cpp att_service.cpp att_parser.cpp + code_object.cpp) +set(ROCPROFILER_LIB_THREAD_TRACE_HEADERS att_core.hpp code_object.hpp) target_sources(rocprofiler-object-library PRIVATE ${ROCPROFILER_LIB_THREAD_TRACE_SOURCES} ${ROCPROFILER_LIB_THREAD_TRACE_HEADERS}) diff --git a/source/lib/rocprofiler-sdk/thread_trace/att_core.cpp b/source/lib/rocprofiler-sdk/thread_trace/att_core.cpp index 431059d8..5be0eed9 100644 --- a/source/lib/rocprofiler-sdk/thread_trace/att_core.cpp +++ b/source/lib/rocprofiler-sdk/thread_trace/att_core.cpp @@ -55,17 +55,18 @@ constexpr size_t ROCPROFILER_QUEUE_SIZE = 64; namespace rocprofiler { +namespace thread_trace +{ struct cbdata_t { - void* tool_userdata; rocprofiler_att_shader_data_callback_t cb_fn; - rocprofiler_correlation_id_t corr_id; + const rocprofiler_user_data_t* dispatch_userdata; }; common::Synchronized> client; bool -AgentThreadTracer::Submit(hsa_ext_amd_aql_pm4_packet_t* packet) +ThreadTracerQueue::Submit(hsa_ext_amd_aql_pm4_packet_t* packet) { const uint64_t write_idx = add_write_index_relaxed_fn(queue, 1); @@ -94,14 +95,15 @@ AgentThreadTracer::Submit(hsa_ext_amd_aql_pm4_packet_t* packet) return true; } -AgentThreadTracer::AgentThreadTracer(thread_trace_parameter_pack _params, +ThreadTracerQueue::ThreadTracerQueue(thread_trace_parameter_pack _params, const hsa::AgentCache& cache, const CoreApiTable& coreapi, const AmdExtTable& ext) : params(std::move(_params)) +, agent_id(cache.get_rocp_agent()->id) { factory = std::make_unique(cache, this->params, coreapi, ext); - cached_resources = factory->construct_packet(); + control_packet = factory->construct_control_packet(); auto status = coreapi.hsa_queue_create_fn(cache.get_hsa_agent(), ROCPROFILER_QUEUE_SIZE, @@ -121,142 +123,82 @@ AgentThreadTracer::AgentThreadTracer(thread_trace_parameter_pack _params, signal_store_screlease_fn = coreapi.hsa_signal_store_screlease_fn; add_write_index_relaxed_fn = coreapi.hsa_queue_add_write_index_relaxed_fn; load_read_index_relaxed_fn = coreapi.hsa_queue_load_read_index_relaxed_fn; + + codeobj_reg = std::make_unique( + [this](rocprofiler_agent_id_t agent, uint64_t codeobj_id, uint64_t addr, uint64_t size) { + if(agent == this->agent_id) this->load_codeobj(codeobj_id, addr, size); + }, + [this](uint64_t codeobj_id) { this->unload_codeobj(codeobj_id); }); + + codeobj_reg->IterateLoaded(); } -AgentThreadTracer::~AgentThreadTracer() +ThreadTracerQueue::~ThreadTracerQueue() { std::unique_lock lk(trace_resources_mut); - - if(active_resources.packet != nullptr) - ROCP_WARNING << "Thread tracer being destroyed with thread trace active"; - - if(!this->queue) return; - - auto* packet = static_cast(active_resources.packet.get()); - if(packet) + if(active_traces.load() < 1) { - packet->clear(); - packet->populate_after(); - - for(auto& after_packet : packet->after_krn_pkt) - Submit(&after_packet); + if(queue_destroy_fn) queue_destroy_fn(this->queue); + return; } - if(queue_destroy_fn) queue_destroy_fn(this->queue); + ROCP_WARNING << "Thread tracer being destroyed with thread trace active"; + + control_packet->clear(); + control_packet->populate_after(); + + for(auto& after_packet : control_packet->after_krn_pkt) + Submit(&after_packet); } /** * Callback we get from HSA interceptor when a kernel packet is being enqueued. * We return an AQLPacket containing the start/stop/read packets for injection. */ -std::unique_ptr -AgentThreadTracer::pre_kernel_call(rocprofiler_att_control_flags_t control_flags, - rocprofiler_queue_id_t queue_id, - rocprofiler_correlation_id_t corr_id) +std::unique_ptr +ThreadTracerQueue::get_control(bool bStart) { - if(control_flags == ROCPROFILER_ATT_CONTROL_NONE) return nullptr; - std::unique_lock lk(trace_resources_mut); - if(control_flags == ROCPROFILER_ATT_CONTROL_STOP) - { - if(active_resources.packet == nullptr) - { - ROCP_ERROR << "Attempt at stopping a thread trace that has not started!\n"; - return nullptr; - } + auto active_resources = std::make_unique(*control_packet); + active_resources->clear(); - active_resources.packet->clear(); - active_resources.packet->populate_after(); - data_is_ready.fetch_add(1); - return std::move(active_resources.packet); - } - - if(active_resources.packet != nullptr) - { - ROCP_ERROR << "Attempt at starting a thread trace while another was active!\n"; - return nullptr; - } - else - { - active_resources.corr_id = corr_id; - active_resources.queue_id = queue_id; - } + if(bStart) active_traces.fetch_add(1); - if(cached_resources == nullptr) - { - ROCP_ERROR << "Attempt to initialize ATT without allocated resources!\n"; - return nullptr; - } - - cached_resources->clear(); - cached_resources->populate_before(); - - if((control_flags & ROCPROFILER_ATT_CONTROL_STOP) != 0) - { - cached_resources->populate_after(); - data_is_ready.fetch_add(1); - } - - return std::move(cached_resources); + return active_resources; } hsa_status_t thread_trace_callback(uint32_t shader, void* buffer, uint64_t size, void* callback_data) { - void* tool_userdata = static_cast(callback_data)->tool_userdata; - auto callback_fn = *static_cast(callback_data)->cb_fn; + auto& cb_data = *static_cast(callback_data); - callback_fn(shader, buffer, size, tool_userdata); + cb_data.cb_fn(shader, buffer, size, *cb_data.dispatch_userdata); return HSA_STATUS_SUCCESS; } void -AgentThreadTracer::post_kernel_call(std::unique_ptr&& aql) +ThreadTracerQueue::iterate_data(aqlprofile_handle_t handle, rocprofiler_user_data_t data) { - std::unique_lock lk(trace_resources_mut); - - active_resources.packet = std::move(aql); - - if(!active_resources.packet || data_is_ready.load() < 1) return; - auto* pkt = static_cast(active_resources.packet.get()); - - for(auto& record : remaining_codeobj_record) - { - if(!record.bUnload) - pkt->add_codeobj(record.id, record.addr, record.size); - else - pkt->remove_codeobj(record.id); - } - remaining_codeobj_record.clear(); - cbdata_t cb_dt{}; - cb_dt.corr_id = active_resources.corr_id; - cb_dt.tool_userdata = params.callback_userdata; - cb_dt.cb_fn = params.shader_cb_fn; + cb_dt.cb_fn = params.shader_cb_fn; + cb_dt.dispatch_userdata = &data; - auto status = aqlprofile_att_iterate_data(pkt->GetHandle(), thread_trace_callback, &cb_dt); + auto status = aqlprofile_att_iterate_data(handle, thread_trace_callback, &cb_dt); CHECK_HSA(status, "Failed to iterate ATT data"); - data_is_ready.fetch_sub(1); - cached_resources = std::move(active_resources.packet); + active_traces.fetch_sub(1); } void -AgentThreadTracer::load_codeobj(code_object_id_t id, uint64_t addr, uint64_t size) +ThreadTracerQueue::load_codeobj(code_object_id_t id, uint64_t addr, uint64_t size) { std::unique_lock lk(trace_resources_mut); - if(auto* pkt = static_cast(cached_resources.get())) - { - pkt->add_codeobj(id, addr, size); - return; - } + control_packet->add_codeobj(id, addr, size); - remaining_codeobj_record.push_back({id, addr, size, false}); - - if(!queue) return; + if(!queue || active_traces.load() < 1) return; auto packet = factory->construct_load_marker_packet(id, addr, size); bool bSuccess = Submit(&packet->packet); @@ -266,19 +208,12 @@ AgentThreadTracer::load_codeobj(code_object_id_t id, uint64_t addr, uint64_t siz } void -AgentThreadTracer::unload_codeobj(code_object_id_t id) +ThreadTracerQueue::unload_codeobj(code_object_id_t id) { std::unique_lock lk(trace_resources_mut); - if(auto* pkt = static_cast(cached_resources.get())) - { - pkt->remove_codeobj(id); - return; - } - - remaining_codeobj_record.push_back({id, 0, 0, true}); - - if(!queue) return; + if(!control_packet->remove_codeobj(id)) return; + if(!queue || active_traces.load() < 1) return; auto packet = factory->construct_unload_marker_packet(id); bool bSuccess = Submit(&packet->packet); @@ -287,52 +222,10 @@ AgentThreadTracer::unload_codeobj(code_object_id_t id) packet.release(); } -// TODO: make this a wrapper on HSA load instead of registering -void -GlobalThreadTracer::codeobj_tracing_callback(rocprofiler_callback_tracing_record_t record, - rocprofiler_user_data_t* /* user_data */, - void* callback_data) -{ - if(!callback_data) return; - if(record.kind != ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT) return; - if(record.operation != ROCPROFILER_CODE_OBJECT_LOAD) return; - - auto* rec = static_cast(record.payload); - assert(rec); - - GlobalThreadTracer& tracer = *static_cast(callback_data); - auto agent = rec->hsa_agent; - - std::shared_lock lk(tracer.agents_map_mut); - - if(record.phase == ROCPROFILER_CALLBACK_PHASE_UNLOAD) - { - try - { - tracer.loaded_codeobjs.at(rec->hsa_agent).erase(rec->code_object_id); - } catch(std::exception& e) - { - ROCP_WARNING << "Codeobj unload called for invalid ID " << rec->code_object_id; - } - } - else - { - tracer.loaded_codeobjs[agent][rec->code_object_id] = {rec->load_delta, rec->load_size}; - } - - auto tracer_it = tracer.agents.find(agent); - if(tracer_it == tracer.agents.end()) return; - - if(record.phase == ROCPROFILER_CALLBACK_PHASE_LOAD) - tracer_it->second->load_codeobj(rec->code_object_id, rec->load_delta, rec->load_size); - else if(record.phase == ROCPROFILER_CALLBACK_PHASE_UNLOAD) - tracer_it->second->unload_codeobj(rec->code_object_id); -} - void -GlobalThreadTracer::resource_init(const hsa::AgentCache& cache, - const CoreApiTable& coreapi, - const AmdExtTable& ext) +DispatchThreadTracer::resource_init(const hsa::AgentCache& cache, + const CoreApiTable& coreapi, + const AmdExtTable& ext) { auto agent = cache.get_hsa_agent(); std::unique_lock lk(agents_map_mut); @@ -344,13 +237,12 @@ GlobalThreadTracer::resource_init(const hsa::AgentCache& cache, return; } - auto new_tracer = std::make_unique(this->params, cache, coreapi, ext); - new_tracer->active_queues.store(1); + auto new_tracer = std::make_unique(this->params, cache, coreapi, ext); agents.emplace(agent, std::move(new_tracer)); } void -GlobalThreadTracer::resource_deinit(const hsa::AgentCache& cache) +DispatchThreadTracer::resource_deinit(const hsa::AgentCache& cache) { std::unique_lock lk(agents_map_mut); @@ -367,9 +259,11 @@ GlobalThreadTracer::resource_deinit(const hsa::AgentCache& cache) * We return an AQLPacket containing the start/stop/read packets for injection. */ std::unique_ptr -GlobalThreadTracer::pre_kernel_call(const hsa::Queue& queue, - rocprofiler_kernel_id_t kernel_id, - const context::correlation_id* corr_id) +DispatchThreadTracer::pre_kernel_call(const hsa::Queue& queue, + rocprofiler_kernel_id_t kernel_id, + rocprofiler_dispatch_id_t dispatch_id, + rocprofiler_user_data_t* user_data, + const context::correlation_id* corr_id) { rocprofiler_correlation_id_t rocprof_corr_id = rocprofiler_correlation_id_t{.internal = 0, .external = context::null_user_data}; @@ -377,27 +271,71 @@ GlobalThreadTracer::pre_kernel_call(const hsa::Queue& queue, if(corr_id) rocprof_corr_id.internal = corr_id->internal; // TODO: Get external + // Maybe adds serialization packets to the AQLPacket (if serializer is enabled) + // and maybe adds barrier packets if the state is transitioning from serialized <-> + // unserialized + auto maybe_add_serialization = [&](auto& gen_pkt) { + CHECK_NOTNULL(hsa::get_queue_controller())->serializer().rlock([&](const auto& serializer) { + for(auto& s_pkt : serializer.kernel_dispatch(queue)) + gen_pkt->before_krn_pkt.push_back(s_pkt.ext_amd_aql_pm4); + }); + }; + auto control_flags = params.dispatch_cb_fn(queue.get_id(), queue.get_agent().get_rocp_agent(), rocprof_corr_id, kernel_id, + dispatch_id, + user_data, params.callback_userdata); - if(control_flags == ROCPROFILER_ATT_CONTROL_NONE) return nullptr; + if(control_flags == ROCPROFILER_ATT_CONTROL_NONE) + { + auto empty = std::make_unique(); + maybe_add_serialization(empty); + return empty; + } std::shared_lock lk(agents_map_mut); auto it = agents.find(queue.get_agent().get_hsa_agent()); assert(it != agents.end() && it->second != nullptr); - auto packet = it->second->pre_kernel_call(control_flags, queue.get_id(), rocprof_corr_id); - if(packet != nullptr) post_move_data.fetch_add(1); + auto packet = it->second->get_control(bool(control_flags & ROCPROFILER_ATT_CONTROL_START)); + + post_move_data.fetch_add(1); + maybe_add_serialization(packet); + + if((control_flags & ROCPROFILER_ATT_CONTROL_START) != 0) packet->populate_before(); + + if((control_flags & ROCPROFILER_ATT_CONTROL_STOP) != 0) packet->populate_after(); + return packet; } +class SignalSerializerExit +{ +public: + SignalSerializerExit(const hsa::Queue::queue_info_session_t& _session) + : session(_session) + {} + ~SignalSerializerExit() + { + auto* controller = hsa::get_queue_controller(); + if(!controller) return; + + controller->serializer().wlock( + [&](auto& serializer) { serializer.kernel_completion_signal(session.queue); }); + } + const hsa::Queue::queue_info_session_t& session; +}; + void -GlobalThreadTracer::post_kernel_call(GlobalThreadTracer::inst_pkt_t& aql) +DispatchThreadTracer::post_kernel_call(DispatchThreadTracer::inst_pkt_t& aql, + const hsa::Queue::queue_info_session_t& session) { + SignalSerializerExit signal(session); + if(post_move_data.load() < 1) return; for(auto& aql_pkt : aql) @@ -408,20 +346,19 @@ GlobalThreadTracer::post_kernel_call(GlobalThreadTracer::inst_pkt_t& aql) std::shared_lock lk(agents_map_mut); post_move_data.fetch_sub(1); + if(pkt->after_krn_pkt.empty()) continue; + auto it = agents.find(pkt->GetAgent()); if(it != agents.end() && it->second != nullptr) - it->second->post_kernel_call(std::move(aql_pkt.first)); + it->second->iterate_data(pkt->GetHandle(), session.user_data); } } void -GlobalThreadTracer::start_context() +DispatchThreadTracer::start_context() { - if(codeobj_client_ctx.handle != 0) - { - auto status = rocprofiler_start_context(codeobj_client_ctx); - if(status != ROCPROFILER_STATUS_SUCCESS) throw std::exception(); - } + using corr_id_map_t = hsa::Queue::queue_info_session_t::external_corr_id_map_t; + CHECK_NOTNULL(hsa::get_queue_controller())->enable_serialization(); // Only one thread should be attempting to enable/disable this context client.wlock([&](auto& client_id) { @@ -431,22 +368,22 @@ GlobalThreadTracer::start_context() std::nullopt, [=](const hsa::Queue& q, const hsa::rocprofiler_packet& /* kern_pkt */, - rocprofiler_kernel_id_t kernel_id, - rocprofiler_dispatch_id_t /* dispatch_id */, - rocprofiler_user_data_t* /* user_data */, + rocprofiler_kernel_id_t kernel_id, + rocprofiler_dispatch_id_t dispatch_id, + rocprofiler_user_data_t* user_data, const corr_id_map_t& /* extern_corr_ids */, const context::correlation_id* corr_id) { - return this->pre_kernel_call(q, kernel_id, corr_id); + return this->pre_kernel_call(q, kernel_id, dispatch_id, user_data, corr_id); }, [=](const hsa::Queue& /* q */, hsa::rocprofiler_packet /* kern_pkt */, - const hsa::Queue::queue_info_session_t& /* session */, - inst_pkt_t& aql) { this->post_kernel_call(aql); }); + const hsa::Queue::queue_info_session_t& session, + inst_pkt_t& aql) { this->post_kernel_call(aql, session); }); }); } void -GlobalThreadTracer::stop_context() +DispatchThreadTracer::stop_context() { client.wlock([&](auto& client_id) { if(!client_id) return; @@ -455,6 +392,87 @@ GlobalThreadTracer::stop_context() hsa::get_queue_controller()->remove_callback(*client_id); client_id = std::nullopt; }); + + auto* controller = hsa::get_queue_controller(); + if(controller) controller->disable_serialization(); } +void +AgentThreadTracer::resource_init(const hsa::AgentCache& cache, + const CoreApiTable& coreapi, + const AmdExtTable& ext) +{ + auto id = cache.get_rocp_agent()->id; + std::unique_lock lk(agent_mut); + + if(params.find(id) == params.end()) return; + + if(tracers.find(id) != tracers.end()) + { + tracers.at(id)->active_queues.fetch_add(1); + return; + } + tracers.emplace(id, std::make_unique(params.at(id), cache, coreapi, ext)); +} + +void +AgentThreadTracer::resource_deinit(const hsa::AgentCache& cache) +{ + auto id = cache.get_rocp_agent()->id; + std::unique_lock lk(agent_mut); + + if(params.find(id) == params.end()) return; + if(tracers.find(id) == tracers.end()) return; + + auto& tracer = *tracers.at(id); + if(tracer.active_queues.fetch_sub(1) == 1) tracers.erase(id); +} + +void +AgentThreadTracer::start_context() +{ + std::unique_lock lk(agent_mut); + + if(tracers.empty()) + { + ROCP_FATAL << "Thread trace context not present for agent!"; + return; + } + + for(auto& [_, tracer] : tracers) + { + auto packet = tracer->get_control(true); + packet->populate_before(); + + for(auto& start : packet->before_krn_pkt) + tracer->Submit(&start); + } +} + +void +AgentThreadTracer::stop_context() +{ + std::unique_lock lk(agent_mut); + + if(tracers.empty()) + { + ROCP_FATAL << "Thread trace context not present for agent!"; + return; + } + + for(auto& [_, tracer] : tracers) + { + auto packet = tracer->get_control(false); + packet->populate_after(); + + for(auto& stop : packet->after_krn_pkt) + tracer->Submit(&stop); + + rocprofiler_user_data_t userdata{.ptr = tracer->params.callback_userdata}; + tracer->iterate_data(packet->GetHandle(), userdata); + } +} + +} // namespace thread_trace + } // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/thread_trace/att_core.hpp b/source/lib/rocprofiler-sdk/thread_trace/att_core.hpp index 91428b95..dcaab786 100644 --- a/source/lib/rocprofiler-sdk/thread_trace/att_core.hpp +++ b/source/lib/rocprofiler-sdk/thread_trace/att_core.hpp @@ -22,9 +22,11 @@ #pragma once +#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" +#include "lib/rocprofiler-sdk/thread_trace/code_object.hpp" + #include #include -#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" #include #include @@ -40,12 +42,19 @@ namespace rocprofiler { +namespace hsa +{ +class AQLPacket; +}; + +namespace thread_trace +{ struct thread_trace_parameter_pack { - rocprofiler_context_id_t context_id; - rocprofiler_att_dispatch_callback_t dispatch_cb_fn; - rocprofiler_att_shader_data_callback_t shader_cb_fn; - void* callback_userdata; + rocprofiler_context_id_t context_id{0}; + rocprofiler_att_dispatch_callback_t dispatch_cb_fn{nullptr}; + rocprofiler_att_shader_data_callback_t shader_cb_fn{nullptr}; + void* callback_userdata{nullptr}; // Parameters uint8_t target_cu = 1; @@ -55,110 +64,127 @@ struct thread_trace_parameter_pack uint64_t buffer_size = DEFAULT_BUFFER_SIZE; // GFX9 Only - std::vector perfcounters; - - static constexpr size_t DEFAULT_SIMD = 0x7; - static constexpr size_t DEFAULT_SE_MASK = 0x21; - static constexpr size_t DEFAULT_BUFFER_SIZE = 0x8000000; -}; - -namespace hsa -{ -class AQLPacket; -}; + std::vector> perfcounters; -struct ThreadTraceActiveResource -{ - rocprofiler_correlation_id_t corr_id; - rocprofiler_queue_id_t queue_id; - std::unique_ptr packet{nullptr}; + static constexpr size_t DEFAULT_SIMD = 0x7; + static constexpr size_t DEFAULT_PERFCOUNTER_SIMD_MASK = 0xF; + static constexpr size_t DEFAULT_SE_MASK = 0x21; + static constexpr size_t DEFAULT_BUFFER_SIZE = 0x8000000; + static constexpr size_t PERFCOUNTER_SIMD_MASK_SHIFT = 28; }; -class AgentThreadTracer +class ThreadTracerQueue { using code_object_id_t = uint64_t; - struct CodeobjRecord - { - code_object_id_t id; - uint64_t addr; - uint64_t size; - bool bUnload; - }; public: - AgentThreadTracer(thread_trace_parameter_pack _params, + ThreadTracerQueue(thread_trace_parameter_pack _params, const hsa::AgentCache&, const CoreApiTable&, const AmdExtTable&); - virtual ~AgentThreadTracer(); + virtual ~ThreadTracerQueue(); void load_codeobj(code_object_id_t id, uint64_t addr, uint64_t size); void unload_codeobj(code_object_id_t id); - std::unique_ptr pre_kernel_call(rocprofiler_att_control_flags_t control_flags, - rocprofiler_queue_id_t queue_id, - rocprofiler_correlation_id_t corr_id); + std::unique_ptr get_control(bool bStart); + void iterate_data(aqlprofile_handle_t handle, rocprofiler_user_data_t data); - void post_kernel_call(std::unique_ptr&& aql); - - hsa_queue_t* queue = nullptr; - std::mutex trace_resources_mut; - thread_trace_parameter_pack params; - std::unique_ptr cached_resources; - ThreadTraceActiveResource active_resources; - std::atomic data_is_ready{0}; - std::atomic active_queues{1}; - std::vector remaining_codeobj_record; + hsa_queue_t* queue = nullptr; + std::mutex trace_resources_mut; + thread_trace_parameter_pack params; + std::atomic active_traces{0}; + std::atomic active_queues{1}; + std::unique_ptr control_packet; std::unique_ptr factory; -private: bool Submit(hsa_ext_amd_aql_pm4_packet_t* packet); +private: + std::unique_ptr codeobj_reg{nullptr}; + + rocprofiler_agent_id_t agent_id; + decltype(hsa_queue_load_read_index_relaxed)* load_read_index_relaxed_fn{nullptr}; decltype(hsa_queue_add_write_index_relaxed)* add_write_index_relaxed_fn{nullptr}; decltype(hsa_signal_store_screlease)* signal_store_screlease_fn{nullptr}; decltype(hsa_queue_destroy)* queue_destroy_fn{nullptr}; -}; // namespace thread_trace +}; -class GlobalThreadTracer +class ThreadTracerInterface { - struct CodeobjAddrRange - { - int64_t addr; - uint64_t size; - }; +public: + ThreadTracerInterface() = default; + virtual ~ThreadTracerInterface() = default; + + virtual void start_context() = 0; + virtual void stop_context() = 0; + virtual void resource_init(const hsa::AgentCache&, const CoreApiTable&, const AmdExtTable&) = 0; + virtual void resource_deinit(const hsa::AgentCache&) = 0; +}; + +class DispatchThreadTracer : public ThreadTracerInterface +{ + using code_object_id_t = uint64_t; using AQLPacketPtr = std::unique_ptr; using inst_pkt_t = common::container::small_vector, 4>; - using corr_id_map_t = hsa::Queue::queue_info_session_t::external_corr_id_map_t; - using code_object_id_t = uint64_t; public: - GlobalThreadTracer(thread_trace_parameter_pack _params) - : params(std::move(_params)){}; - virtual void start_context(); - virtual void stop_context(); - virtual void resource_init(const hsa::AgentCache&, const CoreApiTable&, const AmdExtTable&); - virtual void resource_deinit(const hsa::AgentCache&); - virtual ~GlobalThreadTracer() = default; - - static void codeobj_tracing_callback(rocprofiler_callback_tracing_record_t record, - rocprofiler_user_data_t* user_data, - void* callback_data); + DispatchThreadTracer(thread_trace_parameter_pack _params) + : params(std::move(_params)) + {} + ~DispatchThreadTracer() override = default; + + void start_context() override; + void stop_context() override; + void resource_init(const hsa::AgentCache&, const CoreApiTable&, const AmdExtTable&) override; + void resource_deinit(const hsa::AgentCache&) override; std::unique_ptr pre_kernel_call(const hsa::Queue& queue, uint64_t kernel_id, + rocprofiler_dispatch_id_t dispatch_id, + rocprofiler_user_data_t* user_data, const context::correlation_id* corr_id); - void post_kernel_call(inst_pkt_t& aql); + void post_kernel_call(inst_pkt_t& aql, const hsa::Queue::queue_info_session_t& session); + + std::unordered_map> agents; - std::map> loaded_codeobjs; - std::unordered_map> agents; + std::shared_mutex agents_map_mut; + std::atomic post_move_data{0}; - std::atomic post_move_data{0}; - std::shared_mutex agents_map_mut; - rocprofiler_context_id_t codeobj_client_ctx{0}; thread_trace_parameter_pack params; +}; + +class AgentThreadTracer : public ThreadTracerInterface +{ +public: + AgentThreadTracer() = default; + ~AgentThreadTracer() override = default; + + void start_context() override; + void stop_context() override; + void resource_init(const hsa::AgentCache&, const CoreApiTable&, const AmdExtTable&) override; + void resource_deinit(const hsa::AgentCache&) override; + + void add_agent(rocprofiler_agent_id_t id, thread_trace_parameter_pack _params) + { + std::unique_lock lk(agent_mut); + params[id] = std::move(_params); + } + bool has_agent(rocprofiler_agent_id_t id) + { + std::unique_lock lk(agent_mut); + return params.find(id) != params.end(); + } + + std::map> tracers{}; + std::map params; + + std::mutex agent_mut; +}; + }; // namespace thread_trace } // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/thread_trace/att_service.cpp b/source/lib/rocprofiler-sdk/thread_trace/att_service.cpp index 39b7dcff..277029ce 100644 --- a/source/lib/rocprofiler-sdk/thread_trace/att_service.cpp +++ b/source/lib/rocprofiler-sdk/thread_trace/att_service.cpp @@ -20,21 +20,25 @@ // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE // SOFTWARE. +#include #include +#include #include "lib/rocprofiler-sdk/aql/helpers.hpp" #include "lib/rocprofiler-sdk/context/context.hpp" #include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" #include "lib/rocprofiler-sdk/registration.hpp" +#include "rocprofiler-sdk/amd_detail/thread_trace.h" extern "C" { rocprofiler_status_t ROCPROFILER_API -rocprofiler_configure_thread_trace_service(rocprofiler_context_id_t context_id, - rocprofiler_att_parameter_t* parameters, - size_t num_parameters, - rocprofiler_att_dispatch_callback_t dispatch_callback, - rocprofiler_att_shader_data_callback_t shader_callback, - void* callback_userdata) +rocprofiler_configure_dispatch_thread_trace_service( + rocprofiler_context_id_t context_id, + rocprofiler_att_parameter_t* parameters, + size_t num_parameters, + rocprofiler_att_dispatch_callback_t dispatch_callback, + rocprofiler_att_shader_data_callback_t shader_callback, + void* callback_userdata) { if(rocprofiler::registration::get_init_status() > -1) return ROCPROFILER_STATUS_ERROR_CONFIGURATION_LOCKED; @@ -43,14 +47,14 @@ rocprofiler_configure_thread_trace_service(rocprofiler_context_id_t if(!ctx) return ROCPROFILER_STATUS_ERROR_CONTEXT_NOT_STARTED; if(ctx->thread_trace) return ROCPROFILER_STATUS_ERROR_SERVICE_ALREADY_CONFIGURED; - auto param_pack = rocprofiler::thread_trace_parameter_pack{}; + auto pack = rocprofiler::thread_trace::thread_trace_parameter_pack{}; - param_pack.context_id = context_id; - param_pack.dispatch_cb_fn = dispatch_callback; - param_pack.shader_cb_fn = shader_callback; - param_pack.callback_userdata = callback_userdata; - bool bEnableCodeobj = false; + pack.context_id = context_id; + pack.dispatch_cb_fn = dispatch_callback; + pack.shader_cb_fn = shader_callback; + pack.callback_userdata = callback_userdata; + auto id_map = rocprofiler::counters::getPerfCountersIdMap(); for(size_t p = 0; p < num_parameters; p++) { const rocprofiler_att_parameter_t& param = parameters[p]; @@ -59,38 +63,88 @@ rocprofiler_configure_thread_trace_service(rocprofiler_context_id_t switch(param.type) { - case ROCPROFILER_ATT_PARAMETER_TARGET_CU: param_pack.target_cu = param.value; break; + case ROCPROFILER_ATT_PARAMETER_TARGET_CU: pack.target_cu = param.value; break; case ROCPROFILER_ATT_PARAMETER_SHADER_ENGINE_MASK: - param_pack.shader_engine_mask = param.value; + pack.shader_engine_mask = param.value; break; - case ROCPROFILER_ATT_PARAMETER_BUFFER_SIZE: param_pack.buffer_size = param.value; break; - case ROCPROFILER_ATT_PARAMETER_SIMD_SELECT: param_pack.simd_select = param.value; break; - case ROCPROFILER_ATT_PARAMETER_CODE_OBJECT_TRACE_ENABLE: - bEnableCodeobj = param.value != 0; + case ROCPROFILER_ATT_PARAMETER_BUFFER_SIZE: pack.buffer_size = param.value; break; + case ROCPROFILER_ATT_PARAMETER_SIMD_SELECT: pack.simd_select = param.value; break; + case ROCPROFILER_ATT_PARAMETER_PERFCOUNTER: + { + auto event_it = id_map.find(param.counter_id.handle); + if(event_it != id_map.end()) + pack.perfcounters.push_back({event_it->second, param.simd_mask}); + } + break; + case ROCPROFILER_ATT_PARAMETER_PERFCOUNTERS_CTRL: + pack.perfcounter_ctrl = param.value; break; case ROCPROFILER_ATT_PARAMETER_LAST: return ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT; } - // for(int i = 0; i < parameters.perfcounter_num; i++) - // thread_tracer->perfcounters.emplace_back(parameters.perfcounter[i]); } - ctx->thread_trace = std::make_shared(param_pack); + ctx->thread_trace = std::make_unique(pack); + return ROCPROFILER_STATUS_SUCCESS; +} + +rocprofiler_status_t ROCPROFILER_API +rocprofiler_configure_agent_thread_trace_service( + rocprofiler_context_id_t context_id, + rocprofiler_att_parameter_t* parameters, + size_t num_parameters, + rocprofiler_agent_id_t agent, + rocprofiler_att_shader_data_callback_t shader_callback, + void* callback_userdata) +{ + using AgentThreadTracer = rocprofiler::thread_trace::AgentThreadTracer; + if(rocprofiler::registration::get_init_status() > -1) + return ROCPROFILER_STATUS_ERROR_CONFIGURATION_LOCKED; - if(!bEnableCodeobj) return ROCPROFILER_STATUS_SUCCESS; // Skip TRACING_CODE_OBJECT setup + auto* ctx = rocprofiler::context::get_mutable_registered_context(context_id); + if(!ctx) return ROCPROFILER_STATUS_ERROR_CONTEXT_NOT_STARTED; + + if(!ctx->thread_trace) ctx->thread_trace = std::make_unique(); - auto& client_ctx = ctx->thread_trace->codeobj_client_ctx; + auto pack = rocprofiler::thread_trace::thread_trace_parameter_pack{}; + + pack.context_id = context_id; + pack.shader_cb_fn = shader_callback; + pack.callback_userdata = callback_userdata; + + auto id_map = rocprofiler::counters::getPerfCountersIdMap(); + for(size_t p = 0; p < num_parameters; p++) + { + const rocprofiler_att_parameter_t& param = parameters[p]; + if(param.type > ROCPROFILER_ATT_PARAMETER_LAST) + return ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT; - rocprofiler_status_t status = rocprofiler_create_context(&client_ctx); - if(status != ROCPROFILER_STATUS_SUCCESS) return status; + switch(param.type) + { + case ROCPROFILER_ATT_PARAMETER_TARGET_CU: pack.target_cu = param.value; break; + case ROCPROFILER_ATT_PARAMETER_SHADER_ENGINE_MASK: + pack.shader_engine_mask = param.value; + break; + case ROCPROFILER_ATT_PARAMETER_BUFFER_SIZE: pack.buffer_size = param.value; break; + case ROCPROFILER_ATT_PARAMETER_SIMD_SELECT: pack.simd_select = param.value; break; + case ROCPROFILER_ATT_PARAMETER_PERFCOUNTER: + { + auto event_it = id_map.find(param.counter_id.handle); + if(event_it != id_map.end()) + pack.perfcounters.push_back({event_it->second, param.simd_mask}); + } + break; + case ROCPROFILER_ATT_PARAMETER_PERFCOUNTERS_CTRL: + pack.perfcounter_ctrl = param.value; + break; + case ROCPROFILER_ATT_PARAMETER_LAST: return ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT; + } + } - status = rocprofiler_configure_callback_tracing_service( - client_ctx, - ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT, - nullptr, - 0, - rocprofiler::GlobalThreadTracer::codeobj_tracing_callback, - ctx->thread_trace.get()); + auto* agent_tracer = dynamic_cast(ctx->thread_trace.get()); + if(agent_tracer == nullptr || agent_tracer->has_agent(agent)) + return ROCPROFILER_STATUS_ERROR_SERVICE_ALREADY_CONFIGURED; - return status; + agent_tracer->add_agent(agent, pack); + return ROCPROFILER_STATUS_SUCCESS; } } diff --git a/source/lib/rocprofiler-sdk/thread_trace/code_object.cpp b/source/lib/rocprofiler-sdk/thread_trace/code_object.cpp new file mode 100644 index 00000000..10498fd3 --- /dev/null +++ b/source/lib/rocprofiler-sdk/thread_trace/code_object.cpp @@ -0,0 +1,147 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include "lib/rocprofiler-sdk/thread_trace/code_object.hpp" +#include "lib/rocprofiler-sdk/code_object/code_object.hpp" + +namespace rocprofiler +{ +namespace thread_trace +{ +namespace code_object +{ +std::mutex CodeobjCallbackRegistry::mut; +std::set CodeobjCallbackRegistry::all_registries{}; + +CodeobjCallbackRegistry::CodeobjCallbackRegistry(LoadCallback _ld, UnloadCallback _unld) +: ld_fn(std::move(_ld)) +, unld_fn(std::move(_unld)) +{ + std::unique_lock lg(mut); + all_registries.insert(this); +} + +CodeobjCallbackRegistry::~CodeobjCallbackRegistry() +{ + std::unique_lock lg(mut); + all_registries.erase(this); +} + +void +CodeobjCallbackRegistry::Load(rocprofiler_agent_id_t agent, + uint64_t id, + uint64_t addr, + uint64_t size) +{ + std::unique_lock lg(mut); + for(auto* reg : all_registries) + reg->ld_fn(agent, id, addr, size); +} + +void +CodeobjCallbackRegistry::Unload(uint64_t id) +{ + std::unique_lock lg(mut); + for(auto* reg : all_registries) + reg->unld_fn(id); +} + +void +CodeobjCallbackRegistry::IterateLoaded() const +{ + std::unique_lock lg(mut); + + rocprofiler::code_object::iterate_loaded_code_objects( + [&](const rocprofiler::code_object::hsa::code_object& code_object) { + const auto& data = code_object.rocp_data; + ld_fn(data.rocp_agent, data.code_object_id, data.load_delta, data.load_size); + }); +} + +namespace +{ +auto& +get_freeze_function() +{ + static decltype(::hsa_executable_freeze)* _v = nullptr; + return _v; +} + +auto& +get_destroy_function() +{ + static decltype(::hsa_executable_destroy)* _v = nullptr; + return _v; +} + +hsa_status_t +executable_freeze(hsa_executable_t executable, const char* options) +{ + // Call underlying function + hsa_status_t status = CHECK_NOTNULL(get_freeze_function())(executable, options); + if(status != HSA_STATUS_SUCCESS) return status; + + rocprofiler::code_object::iterate_loaded_code_objects( + [&](const rocprofiler::code_object::hsa::code_object& code_object) { + if(code_object.hsa_executable != executable) return; + + const auto& data = code_object.rocp_data; + CodeobjCallbackRegistry::Load( + data.rocp_agent, data.code_object_id, data.load_delta, data.load_size); + }); + + return HSA_STATUS_SUCCESS; +} + +hsa_status_t +executable_destroy(hsa_executable_t executable) +{ + rocprofiler::code_object::iterate_loaded_code_objects( + [&](const rocprofiler::code_object::hsa::code_object& code_object) { + if(code_object.hsa_executable == executable) + CodeobjCallbackRegistry::Unload(code_object.rocp_data.code_object_id); + }); + + // Call underlying function + return CHECK_NOTNULL(get_destroy_function())(executable); +} +} // namespace + +void +initialize(HsaApiTable* table) +{ + (void) table; + auto& core_table = *table->core_; + + get_freeze_function() = CHECK_NOTNULL(core_table.hsa_executable_freeze_fn); + get_destroy_function() = CHECK_NOTNULL(core_table.hsa_executable_destroy_fn); + core_table.hsa_executable_freeze_fn = executable_freeze; + core_table.hsa_executable_destroy_fn = executable_destroy; + LOG_IF(FATAL, get_freeze_function() == core_table.hsa_executable_freeze_fn) + << "infinite recursion"; + LOG_IF(FATAL, get_destroy_function() == core_table.hsa_executable_destroy_fn) + << "infinite recursion"; +} + +} // namespace code_object +} // namespace thread_trace +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/thread_trace/code_object.hpp b/source/lib/rocprofiler-sdk/thread_trace/code_object.hpp new file mode 100644 index 00000000..2ecebf9e --- /dev/null +++ b/source/lib/rocprofiler-sdk/thread_trace/code_object.hpp @@ -0,0 +1,63 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include + +#include + +#include +#include +#include + +namespace rocprofiler +{ +namespace thread_trace +{ +namespace code_object +{ +struct CodeobjCallbackRegistry +{ + using LoadCallback = std::function; + using UnloadCallback = std::function; + + CodeobjCallbackRegistry(LoadCallback ld, UnloadCallback unld); + virtual ~CodeobjCallbackRegistry(); + + void IterateLoaded() const; + static void Load(rocprofiler_agent_id_t agent, uint64_t id, uint64_t addr, uint64_t size); + static void Unload(uint64_t id); + +private: + LoadCallback ld_fn; + UnloadCallback unld_fn; + + static std::mutex mut; + static std::set all_registries; +}; + +void +initialize(HsaApiTable* table); +} // namespace code_object +} // namespace thread_trace +} // namespace rocprofiler diff --git a/source/lib/rocprofiler-sdk/thread_trace/tests/CMakeLists.txt b/source/lib/rocprofiler-sdk/thread_trace/tests/CMakeLists.txt index 1074fa92..c26a00d5 100644 --- a/source/lib/rocprofiler-sdk/thread_trace/tests/CMakeLists.txt +++ b/source/lib/rocprofiler-sdk/thread_trace/tests/CMakeLists.txt @@ -10,9 +10,14 @@ target_sources(thread-trace-packet-test PRIVATE ${ROCPROFILER_THREAD_TRACE_TEST_ target_link_libraries( thread-trace-packet-test - PRIVATE rocprofiler-sdk::rocprofiler-static-library rocprofiler-sdk::rocprofiler-glog - rocprofiler-sdk::rocprofiler-hsa-runtime rocprofiler-sdk::rocprofiler-hip - rocprofiler-sdk::rocprofiler-common-library GTest::gtest GTest::gtest_main) + PRIVATE rocprofiler-sdk::rocprofiler-static-library + rocprofiler-sdk::rocprofiler-glog + rocprofiler-sdk::rocprofiler-hsa-runtime + rocprofiler-sdk::rocprofiler-hip + rocprofiler-sdk::rocprofiler-common-library + GTest::gtest + GTest::gtest_main + rocprofiler-sdk::counter-test-constants) gtest_add_tests( TARGET thread-trace-packet-test diff --git a/source/lib/rocprofiler-sdk/thread_trace/tests/att_packet_test.cpp b/source/lib/rocprofiler-sdk/thread_trace/tests/att_packet_test.cpp index b8037902..d65430c9 100644 --- a/source/lib/rocprofiler-sdk/thread_trace/tests/att_packet_test.cpp +++ b/source/lib/rocprofiler-sdk/thread_trace/tests/att_packet_test.cpp @@ -20,26 +20,34 @@ // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE // SOFTWARE. +#include "lib/rocprofiler-sdk/agent.hpp" +#include "lib/rocprofiler-sdk/aql/helpers.hpp" +#include "lib/rocprofiler-sdk/aql/packet_construct.hpp" +#include "lib/rocprofiler-sdk/context/context.hpp" +#include "lib/rocprofiler-sdk/counters/metrics.hpp" +#include "lib/rocprofiler-sdk/counters/tests/hsa_tables.hpp" +#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" +#include "lib/rocprofiler-sdk/hsa/queue.hpp" +#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" +#include "lib/rocprofiler-sdk/registration.hpp" +#include "lib/rocprofiler-sdk/thread_trace/att_core.hpp" + #include #include +#include +#include #include #include +#include +#include #include -#include "lib/rocprofiler-sdk/context/context.hpp" -#include "lib/rocprofiler-sdk/registration.hpp" #include #include #include -#include "lib/rocprofiler-sdk/agent.hpp" -#include "lib/rocprofiler-sdk/aql/helpers.hpp" -#include "lib/rocprofiler-sdk/aql/packet_construct.hpp" -#include "lib/rocprofiler-sdk/counters/metrics.hpp" -#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp" -#include "lib/rocprofiler-sdk/hsa/queue.hpp" -#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp" +using namespace rocprofiler::counters::test_constants; #define ROCPROFILER_CALL(ARG, MSG) \ { \ @@ -49,40 +57,6 @@ namespace rocprofiler { -AmdExtTable& -get_ext_table() -{ - static auto _v = []() { - auto val = AmdExtTable{}; - val.hsa_amd_memory_pool_get_info_fn = hsa_amd_memory_pool_get_info; - val.hsa_amd_agent_iterate_memory_pools_fn = hsa_amd_agent_iterate_memory_pools; - val.hsa_amd_memory_pool_allocate_fn = hsa_amd_memory_pool_allocate; - val.hsa_amd_memory_pool_free_fn = hsa_amd_memory_pool_free; - val.hsa_amd_agent_memory_pool_get_info_fn = hsa_amd_agent_memory_pool_get_info; - val.hsa_amd_agents_allow_access_fn = hsa_amd_agents_allow_access; - return val; - }(); - return _v; -} - -CoreApiTable& -get_api_table() -{ - static auto _v = []() { - auto val = CoreApiTable{}; - val.hsa_iterate_agents_fn = hsa_iterate_agents; - val.hsa_agent_get_info_fn = hsa_agent_get_info; - val.hsa_queue_create_fn = hsa_queue_create; - val.hsa_queue_destroy_fn = hsa_queue_destroy; - val.hsa_signal_wait_relaxed_fn = hsa_signal_wait_relaxed; - val.hsa_queue_load_read_index_relaxed_fn = hsa_queue_load_read_index_relaxed; - val.hsa_queue_add_write_index_relaxed_fn = hsa_queue_add_write_index_relaxed; - val.hsa_signal_store_screlease_fn = hsa_signal_store_screlease; - return val; - }(); - return _v; -} - void test_init() { @@ -113,24 +87,24 @@ TEST(thread_trace, resource_creation) ASSERT_GT(agents.size(), 0); for(const auto& [_, agent] : agents) { - auto params = thread_trace_parameter_pack{}; + auto params = thread_trace::thread_trace_parameter_pack{}; aql::ThreadTraceAQLPacketFactory factory(agent, params, get_api_table(), get_ext_table()); - auto packet = factory.construct_packet(); + auto packet = factory.construct_control_packet(); packet->populate_before(); packet->populate_after(); size_t vendor_packet = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE; - ASSERT_TRUE(packet->start.header == vendor_packet); - ASSERT_TRUE(packet->stop.header == vendor_packet); ASSERT_TRUE(packet->before_krn_pkt.size() > 0); ASSERT_TRUE(packet->after_krn_pkt.size() > 0); + ASSERT_TRUE(packet->before_krn_pkt.at(0).header == vendor_packet); + ASSERT_TRUE(packet->after_krn_pkt.at(0).header == vendor_packet); } { - thread_trace_parameter_pack params{}; - GlobalThreadTracer tracer(std::move(params)); + thread_trace::thread_trace_parameter_pack params{}; + thread_trace::DispatchThreadTracer tracer(std::move(params)); for(const auto& [_, agent] : agents) { @@ -167,12 +141,12 @@ TEST(thread_trace, configure_test) ROCPROFILER_CALL(rocprofiler_create_context(&ctx), "context creation failed"); std::vector params; - params.push_back({ROCPROFILER_ATT_PARAMETER_TARGET_CU, 1}); - params.push_back({ROCPROFILER_ATT_PARAMETER_SHADER_ENGINE_MASK, 0xF}); - params.push_back({ROCPROFILER_ATT_PARAMETER_BUFFER_SIZE, 0x1000000}); - params.push_back({ROCPROFILER_ATT_PARAMETER_SIMD_SELECT, 0xF}); + params.push_back({ROCPROFILER_ATT_PARAMETER_TARGET_CU, {1}}); + params.push_back({ROCPROFILER_ATT_PARAMETER_SHADER_ENGINE_MASK, {0xF}}); + params.push_back({ROCPROFILER_ATT_PARAMETER_BUFFER_SIZE, {0x1000000}}); + params.push_back({ROCPROFILER_ATT_PARAMETER_SIMD_SELECT, {0xF}}); - rocprofiler_configure_thread_trace_service( + rocprofiler_configure_dispatch_thread_trace_service( ctx, params.data(), params.size(), @@ -180,12 +154,102 @@ TEST(thread_trace, configure_test) const rocprofiler_agent_t*, rocprofiler_correlation_id_t, rocprofiler_kernel_id_t, + rocprofiler_dispatch_id_t, + rocprofiler_user_data_t*, void*) { return ROCPROFILER_ATT_CONTROL_NONE; }, - [](int64_t, void*, size_t, void*) {}, + [](int64_t, void*, size_t, rocprofiler_user_data_t) {}, nullptr); ASSERT_EQ(hsa_init(), HSA_STATUS_SUCCESS); ROCPROFILER_CALL(rocprofiler_start_context(ctx), "context start failed"); ROCPROFILER_CALL(rocprofiler_stop_context(ctx), "context stop failed"); + context::pop_client(1); + hsa_shut_down(); +} + +TEST(thread_trace, perfcounters_configure_test) +{ + test_init(); + + registration::init_logging(); + registration::set_init_status(-1); + context::push_client(1); + rocprofiler_context_id_t ctx; + ROCPROFILER_CALL(rocprofiler_create_context(&ctx), "context creation failed"); + + // Only GFX9 SQ Block counters are supported + std::vector> perf_counters = { + {"SQ_WAVES", 0x1}, {"SQ_WAVES", 0x2}, {"SQ_WAVES", 0x2}, {"GRBM_COUNT", 0x3}}; + std::set> expected; + std::vector params; + params.push_back({ROCPROFILER_ATT_PARAMETER_PERFCOUNTERS_CTRL, {1}}); + auto metrics = rocprofiler::counters::getMetricsForAgent("gfx90a"); + + for(auto& [counter_name, simd_mask] : perf_counters) + for(auto& metric : metrics) + if(metric.name() == counter_name) + { + rocprofiler_att_parameter_t att_param; + att_param.type = ROCPROFILER_ATT_PARAMETER_PERFCOUNTER; + att_param.counter_id = rocprofiler_counter_id_t{.handle = metric.id()}; + att_param.simd_mask = simd_mask; + params.push_back(att_param); + expected.insert({std::atoi(metric.event().c_str()), simd_mask}); + } + + rocprofiler_configure_dispatch_thread_trace_service( + ctx, + params.data(), + params.size(), + [](rocprofiler_queue_id_t, + const rocprofiler_agent_t*, + rocprofiler_correlation_id_t, + rocprofiler_kernel_id_t, + rocprofiler_dispatch_id_t, + rocprofiler_user_data_t*, + void*) { return ROCPROFILER_ATT_CONTROL_NONE; }, + [](int64_t, void*, size_t, rocprofiler_user_data_t) {}, + nullptr); + + auto* context = rocprofiler::context::get_mutable_registered_context(ctx); + auto* tracer = dynamic_cast(context->thread_trace.get()); + + ASSERT_NE(tracer, nullptr); + ASSERT_EQ(tracer->params.perfcounter_ctrl, 1); + ASSERT_EQ(tracer->params.perfcounters.size(), 3); + for(const auto& param : tracer->params.perfcounters) + EXPECT_TRUE(expected.find(param) != expected.end()) + << "valid AQLprofile mask not generated for perfcounters"; + context::pop_client(1); + hsa_shut_down(); +} + +TEST(thread_trace, perfcounters_aql_options_test) +{ + hsa_init(); + test_init(); + + registration::init_logging(); + registration::set_init_status(-1); + context::push_client(1); + + const std::uint8_t sqtt_default_num_options = 5; + auto agents = hsa::get_queue_controller()->get_supported_agents(); + + thread_trace::thread_trace_parameter_pack _params = {}; + auto metrics = rocprofiler::counters::getMetricsForAgent("gfx90a"); + std::vector> perf_counters = { + {"SQ_WAVES", 0x1}, {"SQ_WAVES", 0x2}, {"GRBM_COUNT", 0x3}}; + for(auto& [counter_name, simd_mask] : perf_counters) + for(auto& metric : metrics) + if(metric.name() == counter_name) + _params.perfcounters.push_back({std::atoi(metric.event().c_str()), simd_mask}); + _params.perfcounter_ctrl = 2; + auto new_tracer = std::make_unique( + _params, begin(agents)->second, get_api_table(), get_ext_table()); + + ASSERT_EQ(new_tracer->factory->aql_params.size(), + sqtt_default_num_options + perf_counters.size()); + context::pop_client(1); hsa_shut_down(); } diff --git a/source/lib/rocprofiler-sdk/tracing/tracing.hpp b/source/lib/rocprofiler-sdk/tracing/tracing.hpp index 49f8863a..7cc214fe 100644 --- a/source/lib/rocprofiler-sdk/tracing/tracing.hpp +++ b/source/lib/rocprofiler-sdk/tracing/tracing.hpp @@ -37,94 +37,6 @@ namespace rocprofiler { namespace tracing { -// template -// bool -// context_filter(const context::context* ctx, DomainT domain, Args... args); - -// template -// void -// populate_contexts(rocprofiler_callback_tracing_kind_t callback_domain_idx, -// rocprofiler_buffer_tracing_kind_t buffered_domain_idx, -// rocprofiler_tracing_operation_t operation_idx, -// callback_context_data_vec_t& callback_contexts, -// buffered_context_data_vec_t& buffered_contexts, -// external_correlation_id_map_t& extern_corr_ids, -// ClearContainersT = ClearContainersT{}); - -// template -// void -// populate_contexts(rocprofiler_callback_tracing_kind_t callback_domain_idx, -// rocprofiler_buffer_tracing_kind_t buffered_domain_idx, -// callback_context_data_vec_t& callback_contexts, -// buffered_context_data_vec_t& buffered_contexts, -// external_correlation_id_map_t& extern_corr_ids, -// ClearContainersT = ClearContainersT{}); - -// template -// void -// populate_contexts(rocprofiler_callback_tracing_kind_t callback_domain_idx, -// rocprofiler_buffer_tracing_kind_t buffered_domain_idx, -// rocprofiler_tracing_operation_t operation_idx, -// tracing_data& data, -// ClearContainersT = ClearContainersT{}); - -// template -// void -// populate_contexts(rocprofiler_callback_tracing_kind_t callback_domain_idx, -// rocprofiler_buffer_tracing_kind_t buffered_domain_idx, -// tracing_data& data, -// ClearContainersT = ClearContainersT{}); - -// void -// populate_external_correlation_ids(external_correlation_id_map_t& external_corr_ids, -// rocprofiler_thread_id_t thr_id, -// rocprofiler_external_correlation_id_request_kind_t kind, -// rocprofiler_tracing_operation_t operation, -// uint64_t internal_corr_id); - -// void -// update_external_correlation_ids(external_correlation_id_map_t& external_corr_ids, -// rocprofiler_thread_id_t thr_id, -// rocprofiler_external_correlation_id_request_kind_t kind); - -// template -// void -// execute_phase_none_callbacks(callback_context_data_vec_t& callback_contexts, -// rocprofiler_thread_id_t thr_id, -// uint64_t internal_corr_id, -// external_correlation_id_map_t& external_corr_ids, -// rocprofiler_callback_tracing_kind_t domain, -// rocprofiler_tracing_operation_t operation, -// TracerDataT& tracer_data); - -// template -// void -// execute_phase_enter_callbacks(callback_context_data_vec_t& callback_contexts, -// rocprofiler_thread_id_t thr_id, -// uint64_t internal_corr_id, -// external_correlation_id_map_t& external_corr_ids, -// rocprofiler_callback_tracing_kind_t domain, -// rocprofiler_tracing_operation_t operation, -// TracerDataT& tracer_data); - -// template -// void -// execute_phase_exit_callbacks(callback_context_data_vec_t& callback_contexts, -// external_correlation_id_map_t& external_corr_ids, -// rocprofiler_callback_tracing_kind_t domain, -// rocprofiler_tracing_operation_t operation, -// TracerDataT& tracer_data); - -// template -// void -// execute_buffer_record_emplace(buffered_context_data_vec_t& buffered_contexts, -// rocprofiler_thread_id_t thr_id, -// uint64_t internal_corr_id, -// external_correlation_id_map_t& external_corr_ids, -// rocprofiler_buffer_tracing_kind_t domain, -// OperationT operation, -// BufferRecordT&& base_record); - template inline bool context_filter(const context::context* ctx, DomainT domain, Args... args) diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 397e3a33..6bbec14e 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -57,6 +57,7 @@ add_subdirectory(async-copy-tracing) add_subdirectory(scratch-memory-tracing) add_subdirectory(c-tool) add_subdirectory(page-migration) +add_subdirectory(pc_sampling) add_subdirectory(thread-trace) add_subdirectory(hip-graph-tracing) diff --git a/tests/pc_sampling/CMakeLists.txt b/tests/pc_sampling/CMakeLists.txt new file mode 100644 index 00000000..a9096b2c --- /dev/null +++ b/tests/pc_sampling/CMakeLists.txt @@ -0,0 +1,142 @@ +# +# +# +cmake_minimum_required(VERSION 3.21.0 FATAL_ERROR) + +if(NOT CMAKE_HIP_COMPILER) + find_program( + amdclangpp_EXECUTABLE + NAMES amdclang++ + HINTS ${ROCM_PATH} ENV ROCM_PATH /opt/rocm + PATHS ${ROCM_PATH} ENV ROCM_PATH /opt/rocm + PATH_SUFFIXES bin llvm/bin NO_CACHE) + mark_as_advanced(amdclangpp_EXECUTABLE) + + if(amdclangpp_EXECUTABLE) + set(CMAKE_HIP_COMPILER "${amdclangpp_EXECUTABLE}") + endif() +endif() + +project(rocprofiler-sdk-samples-pc-sampling-integration-test LANGUAGES CXX HIP) + +foreach(_TYPE DEBUG MINSIZEREL RELEASE RELWITHDEBINFO) + if("${CMAKE_HIP_FLAGS_${_TYPE}}" STREQUAL "") + set(CMAKE_HIP_FLAGS_${_TYPE} "${CMAKE_CXX_FLAGS_${_TYPE}}") + endif() +endforeach() + +find_package(rocprofiler-sdk REQUIRED) + +find_package(PkgConfig) + +if(PkgConfig_FOUND) + set(ENV{PKG_CONFIG_SYSTEM_INCLUDE_PATH} "") + pkg_check_modules(DW libdw) + + if(DW_FOUND + AND DW_INCLUDE_DIRS + AND DW_LIBRARIES) + set(libdw_INCLUDE_DIR + "${DW_INCLUDE_DIRS}" + CACHE FILEPATH "libdw include directory") + set(libdw_LIBRARY + "${DW_LIBRARIES}" + CACHE FILEPATH "libdw libraries") + endif() +endif() + +if(NOT libdw_INCLUDE_DIR OR NOT libdw_LIBRARY) + find_path( + libdw_ROOT_DIR + NAMES include/elfutils/libdw.h + HINTS ${libdw_ROOT} + PATHS ${libdw_ROOT}) + + mark_as_advanced(libdw_ROOT_DIR) + + find_path( + libdw_INCLUDE_DIR + NAMES elfutils/libdw.h + HINTS ${libdw_ROOT} + PATHS ${libdw_ROOT} + PATH_SUFFIXES include) + + find_library( + libdw_LIBRARY + NAMES dw + HINTS ${libdw_ROOT} + PATHS ${libdw_ROOT} + PATH_SUFFIXES lib lib64) +endif() + +include(FindPackageHandleStandardArgs) +find_package_handle_standard_args(libdw DEFAULT_MSG libdw_LIBRARY libdw_INCLUDE_DIR) + +if(libdw_FOUND AND NOT TARGET libdw::libdw) + add_library(libdw::libdw INTERFACE IMPORTED) + if(TARGET PkgConfig::DW AND DW_FOUND) + target_link_libraries(libdw::libdw INTERFACE PkgConfig::DW) + else() + target_link_libraries(libdw::libdw INTERFACE ${libdw_LIBRARY}) + target_include_directories(libdw::libdw SYSTEM INTERFACE ${libdw_INCLUDE_DIR}) + endif() +endif() + +add_library(pc-sampling-integration-test-client SHARED) +target_sources( + pc-sampling-integration-test-client + PRIVATE address_translation.cpp + address_translation.hpp + client.cpp + client.hpp + cid_retirement.cpp + cid_retirement.hpp + codeobj.cpp + codeobj.hpp + external_cid.cpp + external_cid.hpp + kernel_tracing.cpp + kernel_tracing.hpp + pcs.hpp + pcs.cpp + utils.hpp + utils.cpp) +target_link_libraries( + pc-sampling-integration-test-client + PRIVATE rocprofiler-sdk::rocprofiler-sdk rocprofiler-sdk::tests-build-flags + rocprofiler-sdk::tests-common-library amd_comgr dw) + +set_source_files_properties(main.cpp PROPERTIES LANGUAGE HIP) +find_package(Threads REQUIRED) + +add_executable(pc-sampling-integration-test) +target_sources(pc-sampling-integration-test PRIVATE main.cpp) +target_link_libraries( + pc-sampling-integration-test + PRIVATE pc-sampling-integration-test-client Threads::Threads + rocprofiler-sdk::tests-build-flags) + +# rocprofiler_pc-sampling-integration_get_preload_env(PRELOAD_ENV +# pc-sampling-integration-test-client) +# rocprofiler_pc-sampling-integration_get_ld_library_path_env(LIBRARY_PATH_ENV) + +# set(pc-sampling-integration-test-env ${PRELOAD_ENV} ${LIBRARY_PATH_ENV}) + +add_test(NAME pc-sampling-integration-test + COMMAND $) + +set_tests_properties( + pc-sampling-integration-test + PROPERTIES + TIMEOUT + 45 + LABELS + "integration-tests;pc-sampling" + # ENVIRONMENT + # "${ROCPROFILER_MEMCHECK_PRELOAD_ENV};HSA_TOOLS_LIB=$" + SKIP_REGULAR_EXPRESSION + "PC sampling unavailable" + ENVIRONMENT + "${pc-sampling-integration-test-env}" + FAIL_REGULAR_EXPRESSION + "${ROCPROFILER_DEFAULT_FAIL_REGEX}") diff --git a/tests/pc_sampling/address_translation.cpp b/tests/pc_sampling/address_translation.cpp new file mode 100644 index 00000000..0632f7ac --- /dev/null +++ b/tests/pc_sampling/address_translation.cpp @@ -0,0 +1,197 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// undefine NDEBUG so asserts are implemented +#ifdef NDEBUG +# undef NDEBUG +#endif + +#include "address_translation.hpp" +#include "pcs.hpp" +#include "utils.hpp" + +#include +#include +#include +#include +#include +#include + +namespace client +{ +namespace address_translation +{ +namespace +{ +struct FlatProfiler +{ +public: + FlatProfiler() = default; + ~FlatProfiler() = default; + + CodeobjAddressTranslate translator; + KernelObjectMap kernel_object_map; + FlatProfile flat_profile; + std::mutex global_mut; +}; +} // namespace + +// Raw pointer to prevent early destruction of static objects +FlatProfiler* flat_profiler = nullptr; + +void +init() +{ + flat_profiler = new FlatProfiler(); +} + +void +fini() +{ + delete flat_profiler; +} + +CodeobjAddressTranslate& +get_address_translator() +{ + return flat_profiler->translator; +} + +KernelObjectMap& +get_kernel_object_map() +{ + return flat_profiler->kernel_object_map; +} + +FlatProfile& +get_flat_profile() +{ + return flat_profiler->flat_profile; +} + +std::mutex& +get_global_mutex() +{ + return flat_profiler->global_mut; +} + +KernelObject::KernelObject(uint64_t code_object_id, + std::string kernel_name, + uint64_t begin_address, + uint64_t end_address) +: code_object_id_(code_object_id) +, kernel_name_(kernel_name) +, begin_address_(begin_address) +, end_address_(end_address) +{ + auto& translator = get_address_translator(); + uint64_t vaddr = begin_address; + while(vaddr < end_address) + { + auto inst = translator.get(vaddr); + vaddr += inst->size; + this->add_instruction(std::move(inst)); + } +} + +void +dump_flat_profile() +{ + // It seems that an instruction can be part of multiple + // instances of the same kernel loaded on two different devices. + // We need to prevent counting the same instruction multiple times. + std::unordered_set visited_instructions; + + const auto& kernel_object_map = get_kernel_object_map(); + const auto& flat_profile = get_flat_profile(); + + std::stringstream ss; + uint64_t samples_num = 0; + kernel_object_map.iterate_kernel_objects([&](const KernelObject* kernel_obj) { + ss << "\n===================================="; + ss << "The kernel: " << kernel_obj->kernel_name() + << " with the begin address: " << kernel_obj->begin_address() + << " from code object with id: " << kernel_obj->code_object_id() << std::endl; + kernel_obj->iterate_instrunctions([&](const Instruction& inst) { + ss << "\t"; + ss << inst.inst << "\t"; + ss << inst.comment << "\t"; + ss << "samples: "; + const auto* _sample_instruction = flat_profile.get_sample_instruction(inst); + if(_sample_instruction == nullptr) + ss << "0"; + else + { + _sample_instruction->process([&](const SampleInstruction& sample_instruction) { + ss << sample_instruction.sample_count(); + // Assure that each instruction is counted once. + if(visited_instructions.count(sample_instruction.inst()) == 0) + { + samples_num += sample_instruction.sample_count(); + visited_instructions.insert(sample_instruction.inst()); + } + + if(sample_instruction.exec_mask_counts().size() <= 1) + { + ss << ", exec_mask: " << std::hex; + ss << sample_instruction.exec_mask_counts().begin()->first; + ss << std::dec; + assert(sample_instruction.sample_count() == + sample_instruction.exec_mask_counts().begin()->second); + } + else + { + uint64_t num_samples_sum = 0; + // More than one exec_mask + for(auto& [exec_mask, samples_per_exec] : + sample_instruction.exec_mask_counts()) + { + ss << std::endl; + ss << "\t\t" + << "exec_mask: " << std::hex << exec_mask; + ss << "\t" + << "samples: " << std::dec << samples_per_exec; + num_samples_sum += samples_per_exec; + ss << std::endl; + } + assert(sample_instruction.sample_count() == num_samples_sum); + } + }); + } + ss << std::endl; + }); + ss << "====================================\n" << std::endl; + }); + + ss << "The total number of decoded samples: " << samples_num << std::endl; + ss << "The total number of collected samples: " << client::pcs::total_samples_num() + << std::endl; + + *utils::get_output_stream() << ss.str() << std::endl; + + assert(samples_num == client::pcs::total_samples_num()); + // We expect at least one PC sample to be decoded/delivered; + assert(samples_num > 0); +} + +} // namespace address_translation +} // namespace client diff --git a/tests/pc_sampling/address_translation.hpp b/tests/pc_sampling/address_translation.hpp new file mode 100644 index 00000000..1426fcfe --- /dev/null +++ b/tests/pc_sampling/address_translation.hpp @@ -0,0 +1,273 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace client +{ +namespace address_translation +{ +using Instruction = rocprofiler::codeobj::disassembly::Instruction; +using CodeobjAddressTranslate = rocprofiler::codeobj::disassembly::CodeobjAddressTranslate; + +class KernelObject +{ +private: + using process_inst_fn = std::function; + +public: + KernelObject() = default; + KernelObject(uint64_t code_object_id, + std::string kernel_name, + uint64_t begin_address, + uint64_t end_address); + + // write lock required + void add_instruction(std::unique_ptr instruction) + { + auto lock = std::unique_lock{mut}; + + instructions_.push_back(std::move(instruction)); + } + + // read lock required + void iterate_instrunctions(process_inst_fn fn) const + { + auto lock = std::shared_lock{mut}; + + for(const auto& inst : this->instructions_) + fn(*inst); + } + + uint64_t code_object_id() const { return code_object_id_; }; + std::string kernel_name() const { return kernel_name_; }; + uint64_t begin_address() const { return begin_address_; }; + uint64_t end_address() const { return end_address_; }; + +private: + mutable std::shared_mutex mut; + uint64_t code_object_id_; + std::string kernel_name_; + uint64_t begin_address_; + uint64_t end_address_; + std::vector> instructions_; +}; + +class KernelObjectMap +{ +private: + using process_kernel_fn = std::function; + +public: + KernelObjectMap() = default; + + // write lock required + void add_kernel(uint64_t code_object_id, + std::string name, + uint64_t begin_address, + uint64_t end_address) + { + auto lock = std::unique_lock{mut}; + + auto key = form_key(code_object_id, name, begin_address); + auto it = kernel_object_map.find(key); + assert(it == kernel_object_map.end()); + kernel_object_map.insert( + {key, + std::make_unique(code_object_id, name, begin_address, end_address)}); + } + +#if 0 + // read lock required + KernelObject* get_kernel(uint64_t code_object_id, std::string name) + { + auto lock = std::shared_lock{mut}; + + auto key = form_key(code_object_id, name); + auto it = kernel_object_map.find(key); + if(it == kernel_object_map.end()) + { + return nullptr; + } + + return it->second.get(); + } +#endif + + // read lock required + void iterate_kernel_objects(process_kernel_fn fn) const + { + auto lock = std::shared_lock{mut}; + + for(auto& [_, kernel_obj] : kernel_object_map) + fn(kernel_obj.get()); + } + +private: + std::unordered_map> kernel_object_map; + mutable std::shared_mutex mut; + + std::string form_key(uint64_t code_object_id, std::string kernel_name, uint64_t begin_address) + { + return std::to_string(code_object_id) + "_" + kernel_name + "_" + + std::to_string(begin_address); + } +}; + +class SampleInstruction +{ +private: + using proces_sample_inst_fn = std::function; + +public: + SampleInstruction() = default; + SampleInstruction(std::unique_ptr inst) + : inst_(std::move(inst)) + {} + + // write lock required + void add_sample(uint64_t exec_mask) + { + auto lock = std::unique_lock{mut}; + + if(exec_mask_counts_.find(exec_mask) == exec_mask_counts_.end()) + { + exec_mask_counts_[exec_mask] = 0; + } + exec_mask_counts_[exec_mask]++; + sample_count_++; + } + + // read lock required + void process(proces_sample_inst_fn fn) const + { + auto lock = std::shared_lock{mut}; + + fn(*this); + } + + Instruction* inst() const { return inst_.get(); }; + // In case an instruction is samples with different exec masks, + // keep track of how many time each exec_mask was observed. + const std::map& exec_mask_counts() const { return exec_mask_counts_; } + // How many time this instruction is samples + uint64_t sample_count() const { return sample_count_; }; + +private: + mutable std::shared_mutex mut; + + // FIXME: prevent direct access of the following fields. + // The following fields should be accessible only from within `process` function. + std::unique_ptr inst_; + // In case an instruction is samples with different exec masks, + // keep track of how many time each exec_mask was observed. + std::map exec_mask_counts_; + // How many time this instruction is samples + uint64_t sample_count_ = 0; +}; + +class FlatProfile +{ +public: + FlatProfile() = default; + + // write lock required + void add_sample(std::unique_ptr instruction, uint64_t exec_mask) + { + auto lock = std::unique_lock{mut}; + + auto inst_id = get_instruction_id(*instruction); + auto itr = samples.find(inst_id); + if(itr == samples.end()) + { + // Add new instruction + samples.insert({inst_id, std::make_unique(std::move(instruction))}); + itr = samples.find(inst_id); + } + + auto* sample_instruction = itr->second.get(); + sample_instruction->add_sample(exec_mask); + } + + // read lock required + const SampleInstruction* get_sample_instruction(const Instruction& inst) const + { + auto lock = std::shared_lock{mut}; + + auto inst_id = get_instruction_id(inst); + auto itr = samples.find(inst_id); + if(itr == samples.end()) return nullptr; + return itr->second.get(); + } + +private: + // For the sake of this test, we use `ld_addr` as the instruction identifier. + // TODO: To cover code object loading/unloading and relocations, + // use `(code_object_id + ld_addr)` as the unique identifier. + // This assumes the decoder chage to return code_object_id as part + // of the `LoadedCodeobjDecoder::get(uint64_t ld_addr)` method. + using instrution_id_t = uint64_t; + instrution_id_t get_instruction_id(const Instruction& instruction) const + { + // Ensure the decoder determined the `ld_addr`. + assert(instruction.ld_addr > 0); + return instruction.ld_addr; + } + + std::unordered_map> samples; + mutable std::shared_mutex mut; +}; + +std::mutex& +get_global_mutex(); + +CodeobjAddressTranslate& +get_address_translator(); + +KernelObjectMap& +get_kernel_object_map(); + +FlatProfile& +get_flat_profile(); + +void +dump_flat_profile(); + +void +init(); + +void +fini(); +} // namespace address_translation +} // namespace client diff --git a/tests/pc_sampling/cid_retirement.cpp b/tests/pc_sampling/cid_retirement.cpp new file mode 100644 index 00000000..fe2bb147 --- /dev/null +++ b/tests/pc_sampling/cid_retirement.cpp @@ -0,0 +1,129 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// undefine NDEBUG so asserts are implemented +#ifdef NDEBUG +# undef NDEBUG +#endif + +/** + * @file samples/pc_sampling_library/client.cpp + * + * @brief Example rocprofiler client (tool) + */ + +#include "utils.hpp" + +#include +#include +#include +#include +#include + +#include +#include +#include + +namespace client +{ +namespace cid_retirement +{ +constexpr size_t BUFFER_SIZE_BYTES = 8192; +constexpr size_t WATERMARK = (BUFFER_SIZE_BYTES / 4); + +rocprofiler_buffer_id_t cid_retirement_buffer; + +void +cid_retirement_tracing_buffered(rocprofiler_context_id_t /*context*/, + rocprofiler_buffer_id_t /*buffer_id*/, + rocprofiler_record_header_t** headers, + size_t num_headers, + void* /*user_data*/, + uint64_t /*drop_count*/) +{ + std::stringstream ss; + + for(size_t i = 0; i < num_headers; ++i) + { + auto* header = headers[i]; + + if(header == nullptr) + { + throw std::runtime_error{ + "rocprofiler provided a null pointer to header. this should never happen"}; + } + else if(header->hash != + rocprofiler_record_header_compute_hash(header->category, header->kind)) + { + throw std::runtime_error{"rocprofiler_record_header_t (category | kind) != hash"}; + } + else if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING) + { + if(header->kind == ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT) + { + auto* cid_record = + static_cast( + header->payload); + ss << "... The retired internal correlation id is: " + << cid_record->internal_correlation_id; + ss << ", the timestamp is: " << cid_record->timestamp; + ss << std::endl; + // TODO: assert that the retiring timestamp is greater than + // the greatest timestamp of PC samples matching the retired CID. + } + } + } + + *utils::get_output_stream() << ss.str(); +} + +void +configure_cid_retirement_tracing(rocprofiler_context_id_t context) +{ + ROCPROFILER_CALL(rocprofiler_create_buffer(context, + BUFFER_SIZE_BYTES, + WATERMARK, + ROCPROFILER_BUFFER_POLICY_LOSSLESS, + cid_retirement_tracing_buffered, + nullptr, + &cid_retirement_buffer), + "buffer creation"); + + ROCPROFILER_CALL(rocprofiler_configure_buffer_tracing_service( + context, + ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT, + nullptr, + 0, + cid_retirement_buffer), + "buffer tracing service for cid retirement configure"); +} + +void +flush_retired_cids() +{ + ROCPROFILER_CALL(rocprofiler_flush_buffer(cid_retirement_buffer), + "Cannot flush retired CIDs buffer"); + *utils::get_output_stream() << "Retired CIDs flushed..." << std::endl; +} + +} // namespace cid_retirement +} // namespace client diff --git a/tests/pc_sampling/cid_retirement.hpp b/tests/pc_sampling/cid_retirement.hpp new file mode 100644 index 00000000..1585f7e3 --- /dev/null +++ b/tests/pc_sampling/cid_retirement.hpp @@ -0,0 +1,38 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include +#include + +namespace client +{ +namespace cid_retirement +{ +void +configure_cid_retirement_tracing(rocprofiler_context_id_t context); + +void +flush_retired_cids(); +} // namespace cid_retirement +} // namespace client diff --git a/tests/pc_sampling/client.cpp b/tests/pc_sampling/client.cpp new file mode 100644 index 00000000..b18062e7 --- /dev/null +++ b/tests/pc_sampling/client.cpp @@ -0,0 +1,225 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// undefine NDEBUG so asserts are implemented +#ifdef NDEBUG +# undef NDEBUG +#endif + +/** + * @file samples/pc_sampling_library/client.cpp + * + * @brief Example rocprofiler client (tool) + */ + +#include "client.hpp" + +#include "address_translation.hpp" +#include "cid_retirement.hpp" +#include "codeobj.hpp" +#include "external_cid.hpp" +#include "kernel_tracing.hpp" +#include "pcs.hpp" +#include "utils.hpp" + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace client +{ +namespace +{ +rocprofiler_client_id_t* client_id = nullptr; +rocprofiler_client_finalize_t client_fini_func = nullptr; +rocprofiler_context_id_t client_ctx; + +int +tool_init(rocprofiler_client_finalize_t fini_func, void* /*tool_data*/) +{ + client_fini_func = fini_func; + + address_translation::init(); + external_cid::init(); + pcs::init(); + + ROCPROFILER_CALL(rocprofiler_create_context(&client_ctx), "Cannot create context\n"); + + pcs::configure_pc_sampling_on_all_agents(client_ctx); + + // Enable code object tracing service, to match PC samples to corresponding code object + ROCPROFILER_CALL( + rocprofiler_configure_callback_tracing_service(client_ctx, + ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT, + nullptr, + 0, + client::codeobj::codeobj_tracing_callback, + nullptr), + "code object tracing service configure"); + + cid_retirement::configure_cid_retirement_tracing(client_ctx); + // Kernel tracing service need for external correlation service. + kernel_tracing::configure_kernel_tracing_service(client_ctx); + external_cid::configure_external_correlation_service(client_ctx); + + int valid_ctx = 0; + ROCPROFILER_CALL(rocprofiler_context_is_valid(client_ctx, &valid_ctx), + "failure checking context validity"); + if(valid_ctx == 0) + { + // notify rocprofiler that initialization failed + // and all the contexts, buffers, etc. created + // should be ignored + return -1; + } + + ROCPROFILER_CALL(rocprofiler_start_context(client_ctx), "rocprofiler context start failed"); + + return 0; +} + +void +tool_fini(void* /*tool_data*/) +{ + // Drain all retired correlation IDs + client::sync(); + + if(client_id) + { + // Assert the context is inactive. + int state = -1; + ROCPROFILER_CALL(rocprofiler_context_is_active(client_ctx, &state), + "Cannot inspect the stat of the context.") + assert(state == 0); + + // No need to stop the context, since it has been stopped implicitly by the rocprofiler-SDK. + + // Flush remaining PC samples + pcs::flush_and_destroy_buffers(); + } + + address_translation::dump_flat_profile(); + // deallocation + address_translation::fini(); + external_cid::fini(); + pcs::fini(); +} + +} // namespace + +// forward declaration +void +setup(); + +void +setup() +{ + // Do not force configuration + if(int status = 0; + rocprofiler_is_initialized(&status) == ROCPROFILER_STATUS_SUCCESS && status == 0) + { + *utils::get_output_stream() << "Client forces rocprofiler configuration.\n" << std::endl; + ROCPROFILER_CALL(rocprofiler_force_configure(&rocprofiler_configure), + "failed to force configuration"); + } +} + +void +shutdown() +{} + +void +sync() +{ + // Flush rocprofiler-SDK's buffers containing PC samples. + pcs::flush_buffers(); + + // Flush retired correlation IDs. + cid_retirement::flush_retired_cids(); +} + +} // namespace client + +extern "C" rocprofiler_tool_configure_result_t* +rocprofiler_configure(uint32_t version, + const char* runtime_version, + uint32_t priority, + rocprofiler_client_id_t* id) +{ + // only activate if main tool + if(priority > 0) return nullptr; + + // set the client name + id->name = "PCSamplingExampleTool"; + + // store client info + client::client_id = id; + + // compute major/minor/patch version info + uint32_t major = version / 10000; + uint32_t minor = (version % 10000) / 100; + uint32_t patch = version % 100; + + // generate info string + auto info = std::stringstream{}; + info << id->name << " is using rocprofiler v" << major << "." << minor << "." << patch << " (" + << runtime_version << ")"; + + std::clog << info.str() << std::endl; + + std::ostream* output_stream = nullptr; + std::string filename = "pc_sampling_integration_test.log"; + if(auto* outfile = getenv("ROCPROFILER_SAMPLE_OUTPUT_FILE"); outfile) filename = outfile; + if(filename == "stdout") + output_stream = &std::cout; + else if(filename == "stderr") + output_stream = &std::cerr; + else + output_stream = new std::ofstream{filename}; + + client::utils::get_output_stream() = output_stream; + + // create configure data + static auto cfg = + rocprofiler_tool_configure_result_t{sizeof(rocprofiler_tool_configure_result_t), + &client::tool_init, + &client::tool_fini, + static_cast(output_stream)}; + + // return pointer to configure data + return &cfg; +} diff --git a/tests/pc_sampling/client.hpp b/tests/pc_sampling/client.hpp new file mode 100644 index 00000000..b82f27d7 --- /dev/null +++ b/tests/pc_sampling/client.hpp @@ -0,0 +1,44 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#ifdef pc_sampling_code_obj_tracing_client_EXPORTS +# define CLIENT_API __attribute__((visibility("default"))) +#else +# define CLIENT_API +#endif + +#define USE_CLIENT_SHUTDOWN_EXPLICITLY 1 + +namespace client +{ +void +setup() CLIENT_API; + +void +shutdown() CLIENT_API; + +void +sync() CLIENT_API; + +} // namespace client diff --git a/tests/pc_sampling/codeobj.cpp b/tests/pc_sampling/codeobj.cpp new file mode 100644 index 00000000..a9cd688e --- /dev/null +++ b/tests/pc_sampling/codeobj.cpp @@ -0,0 +1,261 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// undefine NDEBUG so asserts are implemented +#ifdef NDEBUG +# undef NDEBUG +#endif + +/** + * @file samples/pc_sampling_library/client.cpp + * + * @brief Example rocprofiler client (tool) + */ + +#include "address_translation.hpp" +#include "client.hpp" +#include "pcs.hpp" +#include "utils.hpp" + +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace client +{ +namespace codeobj +{ +#define CODEOBJ_DEBUG 0 + +constexpr bool COPY_MEMORY_CODEOBJ = true; + +std::string +cxa_demangle(std::string_view _mangled_name, int* _status) +{ + constexpr size_t buffer_len = 4096; + // return the mangled since there is no buffer + if(_mangled_name.empty()) + { + *_status = -2; + return std::string{}; + } + + auto _demangled_name = std::string{_mangled_name}; + + // PARAMETERS to __cxa_demangle + // mangled_name: + // A NULL-terminated character string containing the name to be demangled. + // buffer: + // A region of memory, allocated with malloc, of *length bytes, into which the + // demangled name is stored. If output_buffer is not long enough, it is expanded + // using realloc. output_buffer may instead be NULL; in that case, the demangled + // name is placed in a region of memory allocated with malloc. + // _buflen: + // If length is non-NULL, the length of the buffer containing the demangled name + // is placed in *length. + // status: + // *status is set to one of the following values + size_t _demang_len = 0; + char* _demang = abi::__cxa_demangle(_demangled_name.c_str(), nullptr, &_demang_len, _status); + switch(*_status) + { + // 0 : The demangling operation succeeded. + // -1 : A memory allocation failure occurred. + // -2 : mangled_name is not a valid name under the C++ ABI mangling rules. + // -3 : One of the arguments is invalid. + case 0: + { + if(_demang) _demangled_name = std::string{_demang}; + break; + } + case -1: + { + char _msg[buffer_len]; + ::memset(_msg, '\0', buffer_len * sizeof(char)); + ::snprintf(_msg, + buffer_len, + "memory allocation failure occurred demangling %s", + _demangled_name.c_str()); + ::perror(_msg); + break; + } + case -2: break; + case -3: + { + char _msg[buffer_len]; + ::memset(_msg, '\0', buffer_len * sizeof(char)); + ::snprintf(_msg, + buffer_len, + "Invalid argument in: (\"%s\", nullptr, nullptr, %p)", + _demangled_name.c_str(), + (void*) _status); + ::perror(_msg); + break; + } + default: break; + }; + + // if it "demangled" but the length is zero, set the status to -2 + if(_demang_len == 0 && *_status == 0) *_status = -2; + + // free allocated buffer + ::free(_demang); + return _demangled_name; +} + +template +std::string +as_hex(Tp _v, size_t _width = 16) +{ + auto _ss = std::stringstream{}; + _ss.fill('0'); + _ss << "0x" << std::hex << std::setw(_width) << _v; + return _ss.str(); +} + +void +codeobj_tracing_callback(rocprofiler_callback_tracing_record_t record, + rocprofiler_user_data_t* /*user_data*/, + void* /*callback_data*/) +{ + std::stringstream info; + + info << "-----------------------------\n"; + if(record.kind == ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT && + record.operation == ROCPROFILER_CODE_OBJECT_LOAD) + { + auto* data = + static_cast(record.payload); + + if(record.phase == ROCPROFILER_CALLBACK_PHASE_LOAD) + { + auto& global_mut = address_translation::get_global_mutex(); + { + auto lock = std::unique_lock{global_mut}; + + auto& translator = client::address_translation::get_address_translator(); + // register code object inside the decoder + if(std::string_view(data->uri).find("file:///") == 0) + { + translator.addDecoder( + data->uri, data->code_object_id, data->load_delta, data->load_size); + } + else if(COPY_MEMORY_CODEOBJ) + { + translator.addDecoder(reinterpret_cast(data->memory_base), + data->memory_size, + data->code_object_id, + data->load_delta, + data->load_size); + } + else + { + return; + } + + // extract symbols from code object + auto& kernel_object_map = client::address_translation::get_kernel_object_map(); + auto symbolmap = translator.getSymbolMap(); + for(auto& [vaddr, symbol] : symbolmap) + { + kernel_object_map.add_kernel( + data->code_object_id, symbol.name, vaddr, vaddr + symbol.mem_size); + } + } + + info << "code object load :: "; + } + else if(record.phase == ROCPROFILER_CALLBACK_PHASE_UNLOAD) + { + // Ensure all PC samples of the unloaded code object are decoded, + // prior to removing the decoder. + client::sync(); + auto& global_mut = address_translation::get_global_mutex(); + { + auto lock = std::unique_lock{global_mut}; + auto& translator = client::address_translation::get_address_translator(); + translator.removeDecoder(data->code_object_id, data->load_delta); + } + + info << "code object unload :: "; + } + + info << "code_object_id=" << data->code_object_id + << ", rocp_agent=" << data->rocp_agent.handle << ", uri=" << data->uri + << ", load_base=" << as_hex(data->load_base) << ", load_size=" << data->load_size + << ", load_delta=" << as_hex(data->load_delta); + if(data->storage_type == ROCPROFILER_CODE_OBJECT_STORAGE_TYPE_FILE) + info << ", storage_file_descr=" << data->storage_file; + else if(data->storage_type == ROCPROFILER_CODE_OBJECT_STORAGE_TYPE_MEMORY) + info << ", storage_memory_base=" << as_hex(data->memory_base) + << ", storage_memory_size=" << data->memory_size; + + info << std::endl; + } + if(record.kind == ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT && + record.operation == ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER) + { + auto* data = + static_cast( + record.payload); + + if(record.phase == ROCPROFILER_CALLBACK_PHASE_LOAD) + { + info << "kernel symbol load :: "; + } + else if(record.phase == ROCPROFILER_CALLBACK_PHASE_UNLOAD) + { + info << "kernel symbol unload :: "; + // client_kernels.erase(data->kernel_id); + } + + auto kernel_name = std::regex_replace(data->kernel_name, std::regex{"(\\.kd)$"}, ""); + int demangle_status = 0; + kernel_name = cxa_demangle(kernel_name, &demangle_status); + + info << "code_object_id=" << data->code_object_id << ", kernel_id=" << data->kernel_id + << ", kernel_object=" << as_hex(data->kernel_object) + << ", kernarg_segment_size=" << data->kernarg_segment_size + << ", kernarg_segment_alignment=" << data->kernarg_segment_alignment + << ", group_segment_size=" << data->group_segment_size + << ", private_segment_size=" << data->private_segment_size + << ", kernel_name=" << kernel_name; + + info << std::endl; + } + + *utils::get_output_stream() << info.str() << std::endl; +} + +} // namespace codeobj +} // namespace client diff --git a/tests/pc_sampling/codeobj.hpp b/tests/pc_sampling/codeobj.hpp new file mode 100644 index 00000000..4dc303e9 --- /dev/null +++ b/tests/pc_sampling/codeobj.hpp @@ -0,0 +1,38 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include +#include + +namespace client +{ +namespace codeobj +{ +void +codeobj_tracing_callback(rocprofiler_callback_tracing_record_t record, + rocprofiler_user_data_t* user_data, + void* callback_data); + +} // namespace codeobj +} // namespace client diff --git a/tests/pc_sampling/external_cid.cpp b/tests/pc_sampling/external_cid.cpp new file mode 100644 index 00000000..4592fa63 --- /dev/null +++ b/tests/pc_sampling/external_cid.cpp @@ -0,0 +1,110 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// undefine NDEBUG so asserts are implemented +#ifdef NDEBUG +# undef NDEBUG +#endif + +/** + * @file samples/pc_sampling_library/client.cpp + * + * @brief Example rocprofiler client (tool) + */ + +#include "utils.hpp" + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +namespace client +{ +namespace external_cid +{ +namespace +{ +template +auto +make_array(Arg arg, Args&&... args) +{ + constexpr auto N = 1 + sizeof...(Args); + return std::array{std::forward(arg), std::forward(args)...}; +} +} // namespace + +/** + * @brief Must be called at the beginning of the `tool_ini`. + */ +void +init() +{} + +/** + * @brief Should be called at the of the `tool_fini` + */ +void +fini() +{} + +int +set_external_correlation_id(rocprofiler_thread_id_t /*thr_id*/, + rocprofiler_context_id_t /*ctx_id*/, + rocprofiler_external_correlation_id_request_kind_t /*kind*/, + rocprofiler_tracing_operation_t /*op*/, + uint64_t internal_corr_id, + rocprofiler_user_data_t* external_corr_id, + void* /*user_data*/) +{ + // In multi-queues (devices) scenario, incrementing external correlation IDs + // might not always match with incrementing internal correlation IDs. + // Thus, use the value of internal correlation ID and verify that both + // externall correlation IDs and internal correlation IDs are the same + // in delivered PC samples. + external_corr_id->value = internal_corr_id; + return 0; +} + +void +configure_external_correlation_service(rocprofiler_context_id_t context) +{ + auto external_corr_id_request_kinds = + make_array(ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_KERNEL_DISPATCH); + + ROCPROFILER_CHECK(rocprofiler_configure_external_correlation_id_request_service( + context, + external_corr_id_request_kinds.data(), + external_corr_id_request_kinds.size(), + set_external_correlation_id, + nullptr)); +} + +} // namespace external_cid +} // namespace client diff --git a/tests/pc_sampling/external_cid.hpp b/tests/pc_sampling/external_cid.hpp new file mode 100644 index 00000000..7e2d6675 --- /dev/null +++ b/tests/pc_sampling/external_cid.hpp @@ -0,0 +1,42 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include +#include +#include + +namespace client +{ +namespace external_cid +{ +void +configure_external_correlation_service(rocprofiler_context_id_t context); + +void +init(); + +void +fini(); +} // namespace external_cid +} // namespace client diff --git a/tests/pc_sampling/kernel_tracing.cpp b/tests/pc_sampling/kernel_tracing.cpp new file mode 100644 index 00000000..986feb59 --- /dev/null +++ b/tests/pc_sampling/kernel_tracing.cpp @@ -0,0 +1,78 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// undefine NDEBUG so asserts are implemented +#ifdef NDEBUG +# undef NDEBUG +#endif + +/** + * @file samples/pc_sampling_library/client.cpp + * + * @brief Example rocprofiler client (tool) + */ + +#include "utils.hpp" + +#include +#include +#include + +#include +#include +#include + +namespace client +{ +namespace kernel_tracing +{ +constexpr size_t BUFFER_SIZE_BYTES = 8192; +constexpr size_t WATERMARK = (BUFFER_SIZE_BYTES / 4); + +rocprofiler_buffer_id_t kernel_tracing_buffer; + +void +kernel_tracing_buffered(rocprofiler_context_id_t /*context*/, + rocprofiler_buffer_id_t /*buffer_id*/, + rocprofiler_record_header_t** /*headers*/, + size_t /*num_headers*/, + void* /*user_data*/, + uint64_t /*drop_count*/) +{} + +void +configure_kernel_tracing_service(rocprofiler_context_id_t context) +{ + ROCPROFILER_CHECK(rocprofiler_create_buffer(context, + BUFFER_SIZE_BYTES, + WATERMARK, + ROCPROFILER_BUFFER_POLICY_LOSSLESS, + kernel_tracing_buffered, + nullptr, + &kernel_tracing_buffer)); + + ROCPROFILER_CHECK(rocprofiler_configure_buffer_tracing_service( + context, ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH, nullptr, 0, kernel_tracing_buffer)); +} + +} // namespace kernel_tracing +} // namespace client diff --git a/tests/pc_sampling/kernel_tracing.hpp b/tests/pc_sampling/kernel_tracing.hpp new file mode 100644 index 00000000..226337d4 --- /dev/null +++ b/tests/pc_sampling/kernel_tracing.hpp @@ -0,0 +1,41 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include +#include + +namespace client +{ +namespace kernel_tracing +{ +void +kernel_tracing_callback(rocprofiler_callback_tracing_record_t record, + rocprofiler_user_data_t* user_data, + void* callback_data); + +void +configure_kernel_tracing_service(rocprofiler_context_id_t context); + +} // namespace kernel_tracing +} // namespace client diff --git a/tests/pc_sampling/main.cpp b/tests/pc_sampling/main.cpp new file mode 100644 index 00000000..bc730377 --- /dev/null +++ b/tests/pc_sampling/main.cpp @@ -0,0 +1,224 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include + +#include +#include +#include +#include + +namespace +{ +#define M 8192 +#define N 8192 +#define K 8192 +#define TileSize 16 +#define BLOCK_SIZE_X 16 +#define BLOCK_SIZE_Y 16 +#define GRID_SIZE_X (M + BLOCK_SIZE_X - 1) / BLOCK_SIZE_X +#define GRID_SIZE_Y (N + BLOCK_SIZE_Y - 1) / BLOCK_SIZE_Y +#define WAVES_PER_BLOCK_MI200_PLUS (BLOCK_SIZE_X * BLOCK_SIZE_Y) / 64 + +#define HIP_API_CALL(CALL) \ + { \ + hipError_t error_ = (CALL); \ + if(error_ != hipSuccess) \ + { \ + fprintf(stderr, \ + "%s:%d :: HIP error : %s\n", \ + __FILE__, \ + __LINE__, \ + hipGetErrorString(error_)); \ + throw std::runtime_error("hip_api_call"); \ + } \ + } +} // namespace + +namespace +{ +void +check_hip_error(void); +} // namespace + +__global__ void +matrix_multiply(float* A, float* B, float* Out, int /*m*/, int n, int k) +{ + int gid_x = blockDim.x * blockIdx.x + threadIdx.x; + int gid_y = blockDim.y * blockIdx.y + threadIdx.y; + + if(gid_x < N && gid_y < M) + { + float sum = 0; + for(int i = 0; i < k; ++i) + { + sum += A[gid_y * k + i] * B[i * n + gid_x]; + } + + Out[gid_y * n + gid_x] = sum; + } +} + +#if 1 +__global__ void +matrix_multiply_tile(float* A, float* B, float* Out, int m, int n, int k) +{ + __shared__ float subTileM[TileSize][TileSize]; + __shared__ float subTileN[TileSize][TileSize]; + + int bx = blockIdx.x; + int by = blockIdx.y; + int tx = threadIdx.x; + int ty = threadIdx.y; + + int row = by * TileSize + ty; + int col = bx * TileSize + tx; + + float sum = 0; + for(int i = 0; i < ((k - 1) / TileSize + 1); i++) + { + int curr_l = row * k + i * TileSize + tx; + int curr_r = (i * TileSize + ty) * n + col; + + if(i * TileSize + tx < k && row < m) + { + subTileM[ty][tx] = A[curr_l]; + } + else + { + subTileM[ty][tx] = 0.0; + } + + if(i * TileSize + ty < k && col < n) + { + subTileN[ty][tx] = B[curr_r]; + } + else + { + subTileN[ty][tx] = 0.0; + } + + __syncthreads(); + + for(int j = 0; j < TileSize; j++) + { + if(j + TileSize * i < k) + { + sum += subTileM[ty][j] * subTileN[j][tx]; + } + } + + __syncthreads(); + } + + if(row < m && col < n) + { + Out[row * n + col] = sum; + } +} +#endif + +void +run_hip_app() +{ + std::vector A(M * K); + std::vector B(K * N); + std::vector Out(M * N); + + // Randomly initialize the matrices + for(int i = 0; i < M * K; ++i) + { + A[i] = (float) rand() / (float) RAND_MAX; + } + + for(int i = 0; i < K * N; ++i) + { + B[i] = (float) rand() / (float) RAND_MAX; + } + + // Allocate GPU Memory + float *d_A, *d_B, *d_Out; + HIP_API_CALL(hipMalloc(&d_A, sizeof(float) * M * K)); + HIP_API_CALL(hipMalloc(&d_B, sizeof(float) * K * N)); + HIP_API_CALL(hipMalloc(&d_Out, sizeof(float) * M * N)); + + // Copy data to GPU + HIP_API_CALL(hipMemcpy(d_A, A.data(), sizeof(float) * M * K, hipMemcpyHostToDevice)); + HIP_API_CALL(hipMemcpy(d_B, B.data(), sizeof(float) * K * N, hipMemcpyHostToDevice)); + + // Run the kernel + dim3 block_size(BLOCK_SIZE_X, BLOCK_SIZE_Y); + dim3 grid_size((M + block_size.x - 1) / block_size.x, (N + block_size.y - 1) / block_size.y); + matrix_multiply<<>>(d_A, d_B, d_Out, M, N, K); + check_hip_error(); + matrix_multiply_tile<<>>(d_A, d_B, d_Out, M, N, K); + check_hip_error(); + + // Copy data back to CPU + HIP_API_CALL(hipMemcpy(Out.data(), d_Out, sizeof(float) * M * N, hipMemcpyDeviceToHost)); + + // Free GPU Memory + HIP_API_CALL(hipFree(d_A)); + HIP_API_CALL(hipFree(d_B)); + HIP_API_CALL(hipFree(d_Out)); +} + +#define DEVICE_ID 0 + +int +main(int /*argc*/, char** /*argv*/) +{ + int deviceId = DEVICE_ID; + + auto status = hipSetDevice(deviceId); + assert(status == hipSuccess); + HIP_API_CALL(status); + + int currDeviceId = -1; + status = hipGetDevice(&currDeviceId); + HIP_API_CALL(status); + assert(status == hipSuccess); + assert(deviceId == currDeviceId); + + for(int i = 0; i < 1; i++) + { + std::cout << "<<< MatMul starts" << std::endl; + run_hip_app(); + std::cout << ">>> MatMul ends" << std::endl; + } + + return 0; +} + +namespace +{ +void +check_hip_error(void) +{ + hipError_t err = hipGetLastError(); + if(err != hipSuccess) + { + std::cerr << "Error: " << hipGetErrorString(err) << std::endl; + throw std::runtime_error("hip_api_call"); + } +} +} // namespace diff --git a/tests/pc_sampling/pcs.cpp b/tests/pc_sampling/pcs.cpp new file mode 100644 index 00000000..dbd6f0ba --- /dev/null +++ b/tests/pc_sampling/pcs.cpp @@ -0,0 +1,504 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +// undefine NDEBUG so asserts are implemented +#ifdef NDEBUG +# undef NDEBUG +#endif + +#include "pcs.hpp" +#include "address_translation.hpp" +#include "codeobj.hpp" +#include "external_cid.hpp" +#include "utils.hpp" + +#include +#include +#include +#include +#include +#include + +namespace client +{ +namespace pcs +{ +namespace +{ +constexpr int MAX_FAILURES = 10; +constexpr size_t BUFFER_SIZE_BYTES = 8192; +constexpr size_t WATERMARK = (BUFFER_SIZE_BYTES / 4); + +struct tool_agent_info; +using avail_configs_vec_t = std::vector; +using tool_agent_info_vec_t = std::vector>; +using pc_sampling_buffer_id_vec_t = std::vector; + +struct tool_agent_info +{ + rocprofiler_agent_id_t agent_id; + std::unique_ptr avail_configs; + const rocprofiler_agent_t* agent; +}; + +struct PCSampler +{ +private: + using code_object_id_t = uint64_t; + using code_object_id_set_t = std::unordered_set; + +public: + PCSampler() = default; + + ~PCSampler() + { + // Assert that `active_code_objects` is empty. + // For more information, refer to the comments above. + assert(active_code_objects.empty()); + // Clear the data + buffer_ids.clear(); + } + + // GPU agents supporting PC sampling + tool_agent_info_vec_t gpu_agents; + // The total number of collected samples + std::atomic total_samples_num{0}; + // ROCProfiler-SDK PC sampling buffers + pc_sampling_buffer_id_vec_t buffer_ids; + // The set that keeps track of reported code object loading/unloading events. + // At the end of the test, the sets needs to be empty. + // Namely, each loading event will insert a code object id into the set, + // while each unloading event will delete a code ojbect id from the set. + code_object_id_set_t active_code_objects; +}; + +// The reason for using raw pointers is the following. +// Sometimes, statically created objects of the client::pcs +// namespace might be freed prior to the `tool_fini`, +// meaning objects of `pcs` namespace become unusable inside `tool_fini`. +// Instead, use raw pointers to control objects deallocation time. +PCSampler* pc_sampler = nullptr; + +// forward declaration +bool +query_avail_configs_for_agent(tool_agent_info* agent_info); + +rocprofiler_status_t +find_all_gpu_agents_supporting_pc_sampling_impl(rocprofiler_agent_version_t version, + const void** agents, + size_t num_agents, + void* user_data) +{ + assert(version == ROCPROFILER_AGENT_INFO_VERSION_0); + // user_data represent the pointer to the array where gpu_agent will be stored + if(!user_data) return ROCPROFILER_STATUS_ERROR; + + std::stringstream ss; + + auto* _out_agents = static_cast(user_data); + auto* _agents = reinterpret_cast(agents); + for(size_t i = 0; i < num_agents; i++) + { + if(_agents[i]->type == ROCPROFILER_AGENT_TYPE_GPU) + { + // Instantiate the tool_agent_info. + // Store pointer to the rocprofiler_agent_t and instatiate a vector of + // available configurations. + // Move the ownership to the _out_agents + auto tool_gpu_agent = std::make_unique(); + tool_gpu_agent->agent_id = _agents[i]->id; + tool_gpu_agent->avail_configs = std::make_unique(); + tool_gpu_agent->agent = _agents[i]; + // Check if the GPU agent supports PC sampling. If so, add it to the + // output list `_out_agents`. + if(query_avail_configs_for_agent(tool_gpu_agent.get())) + _out_agents->push_back(std::move(tool_gpu_agent)); + } + + ss << "[" << __FUNCTION__ << "] " << _agents[i]->name << " :: " + << "id=" << _agents[i]->id.handle << ", " + << "type=" << _agents[i]->type << "\n"; + } + + *utils::get_output_stream() << ss.str() << std::endl; + + return ROCPROFILER_STATUS_SUCCESS; +} + +void +find_all_gpu_agents_supporting_pc_sampling() +{ + // This function returns the all gpu agents supporting some kind of PC sampling + ROCPROFILER_CALL( + rocprofiler_query_available_agents(ROCPROFILER_AGENT_INFO_VERSION_0, + &find_all_gpu_agents_supporting_pc_sampling_impl, + sizeof(rocprofiler_agent_t), + static_cast(&pc_sampler->gpu_agents)), + "Failed to find GPU agents"); +} + +/** + * @brief The function queries available PC sampling configurations. + * If there is at least one available configuration, it returns true. + * Otherwise, this function returns false to indicate the agent does + * not support PC sampling. + */ +bool +query_avail_configs_for_agent(tool_agent_info* agent_info) +{ + // Clear the available configurations vector + agent_info->avail_configs->clear(); + + auto cb = [](const rocprofiler_pc_sampling_configuration_t* configs, + size_t num_config, + void* user_data) { + auto* avail_configs = static_cast(user_data); + for(size_t i = 0; i < num_config; i++) + { + avail_configs->emplace_back(configs[i]); + } + return ROCPROFILER_STATUS_SUCCESS; + }; + + auto status = rocprofiler_query_pc_sampling_agent_configurations( + agent_info->agent_id, cb, agent_info->avail_configs.get()); + + std::stringstream ss; + + if(status != ROCPROFILER_STATUS_SUCCESS) + { + // The query operation failed, so consider the PC sampling is unsupported at the agent. + // This can happen if the PC sampling service is invoked within the ROCgdb. + ss << "Querying PC sampling capabilities failed with status: " << status << std::endl; + *utils::get_output_stream() << ss.str() << std::endl; + return false; + } + else if(agent_info->avail_configs->size() == 0) + { + // No available configuration at the moment, so mark the PC sampling as unsupported. + return false; + } + + ss << "The agent with the id: " << agent_info->agent_id.handle << " supports the " + << agent_info->avail_configs->size() << " configurations: " << std::endl; + size_t ind = 0; + for(auto& cfg : *agent_info->avail_configs) + { + ss << "(" << ++ind << ".) " + << "method: " << cfg.method << ", " + << "unit: " << cfg.unit << ", " + << "min_interval: " << cfg.min_interval << ", " + << "max_interval: " << cfg.max_interval << ", " + << "flags: " << std::hex << cfg.flags << std::dec << std::endl; + } + + *utils::get_output_stream() << ss.str() << std::flush; + + return true; +} + +void +configure_pc_sampling_prefer_stochastic(tool_agent_info* agent_info, + rocprofiler_context_id_t context_id, + rocprofiler_buffer_id_t buffer_id) +{ + int failures = MAX_FAILURES; + size_t interval = 0; + do + { + // Update the list of available configurations + auto success = query_avail_configs_for_agent(agent_info); + if(!success) + { + // An error occured while querying PC sampling capabilities, + // so avoid trying configuring PC sampling service. + // Instead return false to indicated a failure. + ROCPROFILER_CALL(ROCPROFILER_STATUS_ERROR, + "Could not configuring PC sampling service due to failure with query " + "capabilities."); + } + + const rocprofiler_pc_sampling_configuration_t* first_host_trap_config = nullptr; + const rocprofiler_pc_sampling_configuration_t* first_stochastic_config = nullptr; + // Search until encountering on the stochastic configuration, if any. + // Otherwise, use the host trap config + for(auto const& cfg : *agent_info->avail_configs) + { + if(cfg.method == ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC) + { + first_stochastic_config = &cfg; + break; + } + else if(!first_host_trap_config && + cfg.method == ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP) + { + first_host_trap_config = &cfg; + } + } + + // Check if the stochastic config is found. Use host trap config otherwise. + const rocprofiler_pc_sampling_configuration_t* picked_cfg = + (first_stochastic_config != nullptr) ? first_stochastic_config : first_host_trap_config; + + interval = picked_cfg->min_interval; + + auto status = rocprofiler_configure_pc_sampling_service(context_id, + agent_info->agent_id, + picked_cfg->method, + picked_cfg->unit, + interval, + buffer_id); + if(status == ROCPROFILER_STATUS_SUCCESS) + { + *utils::get_output_stream() + << ">>> We chose PC sampling interval: " << interval + << " on the agent: " << agent_info->agent->id.handle << std::endl; + return; + } + else if(status != ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE) + { + ROCPROFILER_CALL(status, "Failed to configure PC sampling"); + } + // status == ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE + // means another process P2 already configured PC sampling. + // Query available configurations again and receive the configurations picked by P2. + // However, if P2 destroys PC sampling service after query function finished, + // but before the `rocprofiler_configure_pc_sampling_service` is called, + // then the `rocprofiler_configure_pc_sampling_service` will fail again. + // The process P1 executing this loop can spin wait (starve) if it is unlucky enough + // to always be interuppted by some other process P2 that creates/destroys + // PC sampling service on the same device while P1 is executing the code + // after the `query_avail_configs_for_agent` and + // before the `rocprofiler_configure_pc_sampling_service`. + // This should happen very rarely, but just to be sure, we introduce a counter `failures` + // that will allow certain amount of failures to process P1. + } while(--failures); + + // The process failed too many times configuring PC sampling, + // report this to user; + ROCPROFILER_CALL(ROCPROFILER_STATUS_ERROR, + "Failed too many times configuring PC sampling service"); +} + +void +rocprofiler_pc_sampling_callback(rocprofiler_context_id_t /*context_id*/, + rocprofiler_buffer_id_t /*buffer_id*/, + rocprofiler_record_header_t** headers, + size_t num_headers, + void* /*data*/, + uint64_t drop_count) +{ + std::stringstream ss; + ss << "The number of delivered samples is: " << num_headers << ", " + << "while the number of dropped samples is: " << drop_count << std::endl; + + auto& flat_profile = client::address_translation::get_flat_profile(); + auto& translator = client::address_translation::get_address_translator(); + auto& global_mut = address_translation::get_global_mutex(); + + { + auto lock = std::unique_lock{global_mut}; + + for(size_t i = 0; i < num_headers; i++) + { + auto* cur_header = headers[i]; + + if(cur_header == nullptr) + { + throw std::runtime_error{ + "rocprofiler provided a null pointer to header. this should never happen"}; + } + else if(cur_header->hash != + rocprofiler_record_header_compute_hash(cur_header->category, cur_header->kind)) + { + throw std::runtime_error{"rocprofiler_record_header_t (category | kind) != hash"}; + } + else if(cur_header->category == ROCPROFILER_BUFFER_CATEGORY_PC_SAMPLING) + { + if(cur_header->kind == ROCPROFILER_PC_SAMPLING_RECORD_SAMPLE) + { + auto* pc_sample = + static_cast(cur_header->payload); + + ss << "pc: " << std::hex << pc_sample->pc << ", " + << "timestamp: " << std::dec << pc_sample->timestamp << ", " + << "exec: " << std::hex << std::setw(16) << pc_sample->exec_mask << ", " + << "workgroup_id_(x=" << std::dec << std::setw(5) + << pc_sample->workgroup_id.x << ", " + << "y=" << std::setw(5) << pc_sample->workgroup_id.y << ", " + << "z=" << std::setw(5) << pc_sample->workgroup_id.z << "), " + << "wave_id: " << std::setw(2) + << static_cast(pc_sample->wave_id) << ", " + << "cu_id: " << pc_sample->hw_id << ", " + << "correlation: {internal=" << std::setw(7) + << pc_sample->correlation_id.internal << ", " + << "external=" << std::setw(5) << pc_sample->correlation_id.external.value + << "}" << std::endl; + + // Ignore samples from blit kernels. + if(pc_sample->correlation_id.internal == + ROCPROFILER_CORRELATION_ID_INTERNAL_NONE) + continue; + + total_samples_num() += 1; + + auto corr_id = pc_sample->correlation_id; + // Internal correlation IDs are generated by the ROCProfiler-SDK for + // kernel dispatches only. Similarly, the test tool generate external + // correlation IDs for the kernel dispatches only. + // Thus, we should expect them to be equal. + assert(corr_id.internal == corr_id.external.value); + assert(corr_id.external.value > 0); + + // Decoding the PC + auto inst = translator.get(pc_sample->pc); + flat_profile.add_sample(std::move(inst), pc_sample->exec_mask); + } + else if(cur_header->kind == ROCPROFILER_PC_SAMPLING_RECORD_CODE_OBJECT_LOAD_MARKER) + { + auto* marker = static_cast( + cur_header->payload); + auto code_object_id = marker->code_object_id; + ss << "code object loading: " << code_object_id << std::endl; + // The code object load event can be reported once per code object id. + assert(pc_sampler->active_code_objects.count(code_object_id) == 0); + pc_sampler->active_code_objects.emplace(code_object_id); + } + else if(cur_header->kind == + ROCPROFILER_PC_SAMPLING_RECORD_CODE_OBJECT_UNLOAD_MARKER) + { + auto* marker = + static_cast( + cur_header->payload); + auto code_object_id = marker->code_object_id; + ss << "code object unloading: " << code_object_id << std::endl; + // The code object unload event can be reported once per code object id. + assert(pc_sampler->active_code_objects.count(code_object_id) == 1); + pc_sampler->active_code_objects.erase(code_object_id); + } + } + else + { + throw std::runtime_error{"unexpected rocprofiler_record_header_t category + kind"}; + } + } + + // TODO: do we need some sync here? + *utils::get_output_stream() << ss.str() << std::endl; + } +} +} // namespace + +void +init() +{ + pc_sampler = new PCSampler(); +} + +void +fini() +{ + delete pc_sampler; +} + +std::atomic& +total_samples_num() +{ + return pc_sampler->total_samples_num; +} + +void +configure_pc_sampling_on_all_agents(rocprofiler_context_id_t context) +{ + find_all_gpu_agents_supporting_pc_sampling(); + + if(pc_sampler->gpu_agents.empty()) + { + *utils::get_output_stream() << "No availabe gpu agents supporting PC sampling" << std::endl; + *utils::get_output_stream() << "PC sampling unavailable" << std::endl; + // Exit with no error if none of the GPUs support PC sampling. + exit(0); + } + + auto& buff_ids_vec = pc_sampler->buffer_ids; + + for(auto& gpu_agent : pc_sampler->gpu_agents) + { + // creating a buffer that will hold pc sampling information + rocprofiler_buffer_policy_t drop_buffer_action = ROCPROFILER_BUFFER_POLICY_LOSSLESS; + auto buffer_id = rocprofiler_buffer_id_t{}; + ROCPROFILER_CALL(rocprofiler_create_buffer(context, + client::pcs::BUFFER_SIZE_BYTES, + client::pcs::WATERMARK, + drop_buffer_action, + client::pcs::rocprofiler_pc_sampling_callback, + nullptr, + &buffer_id), + "Cannot create pc sampling buffer"); + + client::pcs::configure_pc_sampling_prefer_stochastic(gpu_agent.get(), context, buffer_id); + + // One helper thread per GPU agent's buffer. + auto client_agent_thread = rocprofiler_callback_thread_t{}; + ROCPROFILER_CALL(rocprofiler_create_callback_thread(&client_agent_thread), + "failure creating callback thread"); + + ROCPROFILER_CALL(rocprofiler_assign_callback_thread(buffer_id, client_agent_thread), + "failed to assign thread for buffer"); + + buff_ids_vec.emplace_back(buffer_id); + } +} + +void +flush_buffers() +{ + // Flush rocproifler-SDK's buffers containing PC samples. + for(const auto& buff_id : pc_sampler->buffer_ids) + { + // Flush the buffer explicitly + ROCPROFILER_CALL(rocprofiler_flush_buffer(buff_id), "Failure flushing buffer"); + } +} + +void +flush_and_destroy_buffers() +{ + for(const auto& buff_id : pc_sampler->buffer_ids) + { + // Flush the buffer explicitly + ROCPROFILER_CALL(rocprofiler_flush_buffer(buff_id), "Failure flushing buffer"); + // Destroying the buffer + rocprofiler_status_t status = rocprofiler_destroy_buffer(buff_id); + if(status == ROCPROFILER_STATUS_ERROR_BUFFER_BUSY) + { + *utils::get_output_stream() + << "The buffer is busy, so we cannot destroy it at the moment." << std::endl; + } + else + { + ROCPROFILER_CALL(status, "Cannot destroy buffer"); + } + } +} +} // namespace pcs +} // namespace client diff --git a/tests/pc_sampling/pcs.hpp b/tests/pc_sampling/pcs.hpp new file mode 100644 index 00000000..4b854613 --- /dev/null +++ b/tests/pc_sampling/pcs.hpp @@ -0,0 +1,55 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include +#include + +#include +#include + +namespace client +{ +namespace pcs +{ +// Must be called first (prior to any other function from this namespace) +void +init(); + +// Must be called at the end of the `tool_fini` +void +fini(); + +std::atomic& +total_samples_num(); + +void +configure_pc_sampling_on_all_agents(rocprofiler_context_id_t context); + +void +flush_buffers(); + +void +flush_and_destroy_buffers(); +} // namespace pcs +} // namespace client diff --git a/tests/pc_sampling/utils.cpp b/tests/pc_sampling/utils.cpp new file mode 100644 index 00000000..4fed10bd --- /dev/null +++ b/tests/pc_sampling/utils.cpp @@ -0,0 +1,37 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#include "utils.hpp" + +namespace client +{ +namespace utils +{ +std::ostream*& +get_output_stream() +{ + // The output strea is initially unitialized + static std::ostream* _v = nullptr; + return _v; +} +} // namespace utils +} // namespace client diff --git a/tests/pc_sampling/utils.hpp b/tests/pc_sampling/utils.hpp new file mode 100644 index 00000000..e9275160 --- /dev/null +++ b/tests/pc_sampling/utils.hpp @@ -0,0 +1,65 @@ +// MIT License +// +// Copyright (c) 2024 ROCm Developer Tools +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + +#pragma once + +#include + +#include +#include + +#define ROCPROFILER_VAR_NAME_COMBINE(X, Y) X##Y +#define ROCPROFILER_VARIABLE(X, Y) ROCPROFILER_VAR_NAME_COMBINE(X, Y) + +#define ROCPROFILER_CALL(result, msg) \ + { \ + rocprofiler_status_t CHECKSTATUS = result; \ + if(CHECKSTATUS != ROCPROFILER_STATUS_SUCCESS) \ + { \ + std::cerr << #result << " failed with error code " << CHECKSTATUS << std::endl; \ + throw std::runtime_error(#result " failure"); \ + } \ + } + +#define ROCPROFILER_CHECK(result) \ + { \ + rocprofiler_status_t ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) = result; \ + if(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) != ROCPROFILER_STATUS_SUCCESS) \ + { \ + std::string status_msg = \ + rocprofiler_get_status_string(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__)); \ + std::stringstream errmsg{}; \ + errmsg << "[" << __FILE__ << ":" << __LINE__ << "] " << #result \ + << " failed with error code " << ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) \ + << " :: " << status_msg; \ + throw std::runtime_error(errmsg.str()); \ + } \ + } + +namespace client +{ +namespace utils +{ +std::ostream*& +get_output_stream(); +} +} // namespace client diff --git a/tests/rocprofv3/CMakeLists.txt b/tests/rocprofv3/CMakeLists.txt index dd3dc404..3ed0e68d 100644 --- a/tests/rocprofv3/CMakeLists.txt +++ b/tests/rocprofv3/CMakeLists.txt @@ -24,7 +24,7 @@ enable_testing() include(CTest) add_subdirectory(tracing) -add_subdirectory(tracing-plus-cc) +add_subdirectory(tracing-plus-counter-collection) add_subdirectory(tracing-hip-in-libraries) add_subdirectory(counter-collection) add_subdirectory(hsa-queue-dependency) diff --git a/tests/rocprofv3/counter-collection/input1/CMakeLists.txt b/tests/rocprofv3/counter-collection/input1/CMakeLists.txt index ebece7c1..2f576f14 100644 --- a/tests/rocprofv3/counter-collection/input1/CMakeLists.txt +++ b/tests/rocprofv3/counter-collection/input1/CMakeLists.txt @@ -15,7 +15,7 @@ rocprofiler_configure_pytest_files(CONFIG pytest.ini COPY validate.py conftest.p # pmc1 add_test( - NAME rocprofv3-test-cc-txt-pmc1-execute + NAME rocprofv3-test-counter-collection-txt-pmc1-execute COMMAND $ -i ${CMAKE_CURRENT_BINARY_DIR}/input.txt -T -d ${CMAKE_CURRENT_BINARY_DIR}/out_cc_1 @@ -27,19 +27,19 @@ string(REPLACE "LD_PRELOAD=" "ROCPROF_PRELOAD=" PRELOAD_ENV set(cc-env-pmc1 "${PRELOAD_ENV}") set_tests_properties( - rocprofv3-test-cc-txt-pmc1-execute + rocprofv3-test-counter-collection-txt-pmc1-execute PROPERTIES TIMEOUT 45 LABELS "integration-tests" ENVIRONMENT "${cc-env-pmc1}" FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}") add_test( - NAME rocprofv3-test-cc-pmc1-validate + NAME rocprofv3-test-counter-collection-pmc1-validate COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/validate.py --input ${CMAKE_CURRENT_BINARY_DIR}/out_cc_1/pmc_1/pmc1_counter_collection.csv --json-input ${CMAKE_CURRENT_BINARY_DIR}/out_cc_1/pmc_1/pmc1_results.json) set_tests_properties( - rocprofv3-test-cc-pmc1-validate + rocprofv3-test-counter-collection-pmc1-validate PROPERTIES TIMEOUT 45 LABELS "integration-tests" DEPENDS - "rocprofv3-test-cc-txt-pmc1-execute" FAIL_REGULAR_EXPRESSION - "${ROCPROFILER_DEFAULT_FAIL_REGEX}") + "rocprofv3-test-counter-collection-txt-pmc1-execute" + FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}") diff --git a/tests/rocprofv3/counter-collection/input2/CMakeLists.txt b/tests/rocprofv3/counter-collection/input2/CMakeLists.txt index c5b2e1aa..15783e61 100644 --- a/tests/rocprofv3/counter-collection/input2/CMakeLists.txt +++ b/tests/rocprofv3/counter-collection/input2/CMakeLists.txt @@ -15,7 +15,7 @@ rocprofiler_configure_pytest_files(CONFIG pytest.ini COPY validate.py conftest.p # pmc2 add_test( - NAME rocprofv3-test-cc-txt-pmc2-execute + NAME rocprofv3-test-counter-collection-txt-pmc2-execute COMMAND $ -i ${CMAKE_CURRENT_BINARY_DIR}/input.txt --output-format CSV JSON -d @@ -27,12 +27,12 @@ string(REPLACE "LD_PRELOAD=" "ROCPROF_PRELOAD=" PRELOAD_ENV set(cc-env-pmc2 "${PRELOAD_ENV}") set_tests_properties( - rocprofv3-test-cc-txt-pmc2-execute + rocprofv3-test-counter-collection-txt-pmc2-execute PROPERTIES TIMEOUT 45 LABELS "integration-tests" ENVIRONMENT "${cc-env-pmc2}" FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}") add_test( - NAME rocprofv3-test-cc-txt-pmc2-execute-validate + NAME rocprofv3-test-counter-collection-txt-pmc2-execute-validate COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/validate.py --agent-input ${CMAKE_CURRENT_BINARY_DIR}/simple-transpose-cc/pmc_1/out_agent_info.csv @@ -50,7 +50,7 @@ set(SYS_VALIDATION_FILES ${CMAKE_CURRENT_BINARY_DIR}/simple-transpose-cc/pmc_2/out_counter_collection.csv) set_tests_properties( - rocprofv3-test-cc-txt-pmc2-execute-validate + rocprofv3-test-counter-collection-txt-pmc2-execute-validate PROPERTIES TIMEOUT 45 LABELS diff --git a/tests/rocprofv3/counter-collection/input3/CMakeLists.txt b/tests/rocprofv3/counter-collection/input3/CMakeLists.txt index afdd9a61..58b2df52 100644 --- a/tests/rocprofv3/counter-collection/input3/CMakeLists.txt +++ b/tests/rocprofv3/counter-collection/input3/CMakeLists.txt @@ -15,14 +15,14 @@ rocprofiler_configure_pytest_files(CONFIG pytest.ini COPY validate.py conftest.p # pmc1 add_test( - NAME rocprofv3-test-cc-json-pmc1-execute + NAME rocprofv3-test-counter-collection-json-pmc1-execute COMMAND $ -i ${CMAKE_CURRENT_BINARY_DIR}/input.json -d ${CMAKE_CURRENT_BINARY_DIR}/%argt%-cc -o out_json -- $) add_test( - NAME rocprofv3-test-cc-yaml-pmc1-execute + NAME rocprofv3-test-counter-collection-yaml-pmc1-execute COMMAND $ -i ${CMAKE_CURRENT_BINARY_DIR}/input.yml -d ${CMAKE_CURRENT_BINARY_DIR}/%argt%-cc -o @@ -34,12 +34,13 @@ string(REPLACE "LD_PRELOAD=" "ROCPROF_PRELOAD=" PRELOAD_ENV set(cc-env-pmc1 "${PRELOAD_ENV}") set_tests_properties( - rocprofv3-test-cc-json-pmc1-execute rocprofv3-test-cc-yaml-pmc1-execute + rocprofv3-test-counter-collection-json-pmc1-execute + rocprofv3-test-counter-collection-yaml-pmc1-execute PROPERTIES TIMEOUT 45 LABELS "integration-tests" ENVIRONMENT "${cc-env-pmc1}" FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}") add_test( - NAME rocprofv3-test-cc-json-pmc1-validate + NAME rocprofv3-test-counter-collection-json-pmc1-validate COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/validate.py --agent-input ${CMAKE_CURRENT_BINARY_DIR}/simple-transpose-cc/pmc_1/out_json_agent_info.csv @@ -58,20 +59,20 @@ set(JSON_VALIDATION_FILES ${CMAKE_CURRENT_BINARY_DIR}/simple-transpose-cc/pmc_2/out_json_counter_collection.csv) set_tests_properties( - rocprofv3-test-cc-json-pmc1-validate + rocprofv3-test-counter-collection-json-pmc1-validate PROPERTIES TIMEOUT 45 LABELS "integration-tests" DEPENDS - rocprofv3-test-cc-json-pmc1-execute + rocprofv3-test-counter-collection-json-pmc1-execute FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}" ATTACHED_FILES_ON_FAIL "${JSON_VALIDATION_FILES}") add_test( - NAME rocprofv3-test-cc-yaml-pmc1-validate + NAME rocprofv3-test-counter-collection-yaml-pmc1-validate COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/validate.py --agent-input ${CMAKE_CURRENT_BINARY_DIR}/simple-transpose-cc/pmc_1/out_yaml_agent_info.csv @@ -90,13 +91,13 @@ set(YAML_VALIDATION_FILES ${CMAKE_CURRENT_BINARY_DIR}/simple-transpose-cc/pmc_2/out_yaml_counter_collection.csv) set_tests_properties( - rocprofv3-test-cc-yaml-pmc1-validate + rocprofv3-test-counter-collection-yaml-pmc1-validate PROPERTIES TIMEOUT 45 LABELS "integration-tests" DEPENDS - rocprofv3-test-cc-yaml-pmc1-execute + rocprofv3-test-counter-collection-yaml-pmc1-execute FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}" ATTACHED_FILES_ON_FAIL diff --git a/tests/rocprofv3/counter-collection/input3/validate.py b/tests/rocprofv3/counter-collection/input3/validate.py index b5d4f494..7831874d 100644 --- a/tests/rocprofv3/counter-collection/input3/validate.py +++ b/tests/rocprofv3/counter-collection/input3/validate.py @@ -22,7 +22,7 @@ def test_agent_info(agent_info_input_data): assert int(row["Max_Waves_Per_Simd"]) > 0 -def test_validate_cc_yml_pmc(counter_input_data): +def test_validate_counter_collection_yml_pmc(counter_input_data): counter_names = ["SQ_WAVES", "GRBM_COUNT", "GRBM_GUI_ACTIVE"] di_list = [] diff --git a/tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt b/tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt index 99ecd152..444018e3 100644 --- a/tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt +++ b/tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt @@ -27,7 +27,7 @@ add_test( set_tests_properties( rocprofv3-test-hsa-multiqueue-execute - PROPERTIES LABELS "integration-tests" ENVIRONMENT "${tracing-env}" + PROPERTIES TIMEOUT 45 LABELS "integration-tests" ENVIRONMENT "${tracing-env}" FAIL_REGULAR_EXPRESSION "HSA_API|HIP_API") add_test( diff --git a/tests/rocprofv3/tracing-plus-cc/CMakeLists.txt b/tests/rocprofv3/tracing-plus-counter-collection/CMakeLists.txt similarity index 87% rename from tests/rocprofv3/tracing-plus-cc/CMakeLists.txt rename to tests/rocprofv3/tracing-plus-counter-collection/CMakeLists.txt index ea3b406e..3feb88a9 100644 --- a/tests/rocprofv3/tracing-plus-cc/CMakeLists.txt +++ b/tests/rocprofv3/tracing-plus-counter-collection/CMakeLists.txt @@ -16,7 +16,7 @@ rocprofiler_configure_pytest_files(COPY validate.py conftest.py input.txt # pmc3 add_test( - NAME rocprofv3-test-tracing-plus-cc-execute + NAME rocprofv3-test-tracing-plus-counter-collection-execute COMMAND $ --hsa-trace -i ${CMAKE_CURRENT_BINARY_DIR}/input.txt -d ${CMAKE_CURRENT_BINARY_DIR}/out_cc_trace @@ -28,14 +28,14 @@ string(REPLACE "LD_PRELOAD=" "ROCPROF_PRELOAD=" PRELOAD_ENV set(cc-tracing-env "${PRELOAD_ENV}") set_tests_properties( - rocprofv3-test-tracing-plus-cc-execute + rocprofv3-test-tracing-plus-counter-collection-execute PROPERTIES TIMEOUT 45 LABELS "integration-tests;application-replay" ENVIRONMENT "${cc-tracing-env}" FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}") foreach(_DIR "pmc_1" "pmc_2" "pmc_3" "pmc_4") add_test( - NAME rocprofv3-test-tracing-plus-cc-validate-${_DIR} + NAME rocprofv3-test-tracing-plus-counter-collection-validate-${_DIR} COMMAND ${Python3_EXECUTABLE} ${CMAKE_CURRENT_BINARY_DIR}/validate.py --json-input "${CMAKE_CURRENT_BINARY_DIR}/out_cc_trace/${_DIR}/pmc3_results.json" @@ -50,7 +50,7 @@ foreach(_DIR "pmc_1" "pmc_2" "pmc_3" "pmc_4") ) set_tests_properties( - rocprofv3-test-tracing-plus-cc-validate-${_DIR} + rocprofv3-test-tracing-plus-counter-collection-validate-${_DIR} PROPERTIES TIMEOUT 45 LABELS "integration-tests;application-replay" DEPENDS "rocprofv3-test-tracing-plus-cc-execute" FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}") diff --git a/tests/rocprofv3/tracing-plus-cc/conftest.py b/tests/rocprofv3/tracing-plus-counter-collection/conftest.py similarity index 100% rename from tests/rocprofv3/tracing-plus-cc/conftest.py rename to tests/rocprofv3/tracing-plus-counter-collection/conftest.py diff --git a/tests/rocprofv3/tracing-plus-cc/input.txt b/tests/rocprofv3/tracing-plus-counter-collection/input.txt similarity index 100% rename from tests/rocprofv3/tracing-plus-cc/input.txt rename to tests/rocprofv3/tracing-plus-counter-collection/input.txt diff --git a/tests/rocprofv3/tracing-plus-cc/pytest.ini b/tests/rocprofv3/tracing-plus-counter-collection/pytest.ini similarity index 100% rename from tests/rocprofv3/tracing-plus-cc/pytest.ini rename to tests/rocprofv3/tracing-plus-counter-collection/pytest.ini diff --git a/tests/rocprofv3/tracing-plus-cc/validate.py b/tests/rocprofv3/tracing-plus-counter-collection/validate.py similarity index 100% rename from tests/rocprofv3/tracing-plus-cc/validate.py rename to tests/rocprofv3/tracing-plus-counter-collection/validate.py diff --git a/tests/thread-trace/CMakeLists.txt b/tests/thread-trace/CMakeLists.txt index 70d38730..7c86c142 100644 --- a/tests/thread-trace/CMakeLists.txt +++ b/tests/thread-trace/CMakeLists.txt @@ -108,6 +108,7 @@ set_source_files_properties(kernel_branch.cpp PROPERTIES COMPILE_FLAGS "-g -O2") set_source_files_properties(kernel_branch.cpp PROPERTIES LANGUAGE HIP) set_source_files_properties(kernel_lds.cpp PROPERTIES COMPILE_FLAGS "-g -O2") set_source_files_properties(kernel_lds.cpp PROPERTIES LANGUAGE HIP) +set_source_files_properties(agent_test.cpp PROPERTIES LANGUAGE HIP) set_source_files_properties(main.cpp PROPERTIES LANGUAGE HIP) # Single dispatch test @@ -155,3 +156,25 @@ set_tests_properties( thread-trace-api-multi-test PROPERTIES TIMEOUT 10 LABELS "integration-tests" ENVIRONMENT "${PRELOAD_ENV}" FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}") + +# Agent profiling test +add_executable(thread-trace-api-agent-test) +target_sources(thread-trace-api-agent-test PRIVATE agent_test.cpp) + +target_link_libraries(thread-trace-api-agent-test + PRIVATE rocprofiler-sdk::rocprofiler-sdk) + +if(ROCPROFILER_MEMCHECK_PRELOAD_ENV) + set(PRELOAD_ENV + "${ROCPROFILER_MEMCHECK_PRELOAD_ENV}:$") +else() + set(PRELOAD_ENV "LD_PRELOAD=$") +endif() + +add_test(NAME thread-trace-api-agent-test + COMMAND $) + +set_tests_properties( + thread-trace-api-agent-test + PROPERTIES TIMEOUT 10 LABELS "integration-tests" ENVIRONMENT "${PRELOAD_ENV}" + FAIL_REGULAR_EXPRESSION "${ROCPROFILER_DEFAULT_FAIL_REGEX}") diff --git a/tests/thread-trace/agent_test.cpp b/tests/thread-trace/agent_test.cpp new file mode 100644 index 00000000..aeb9a2d4 --- /dev/null +++ b/tests/thread-trace/agent_test.cpp @@ -0,0 +1,168 @@ +// MIT License +// +// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. +// +// undefine NDEBUG so asserts are implemented +#ifdef NDEBUG +# undef NDEBUG +#endif + +#include +#include +#include +#include "common.hpp" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define HIP_API_CALL(CALL) assert((CALL) == hipSuccess) + +namespace ATTTest +{ +namespace Agent +{ +rocprofiler_context_id_t client_ctx = {}; +rocprofiler_client_id_t* client_id = nullptr; +std::atomic valid_data{false}; + +void +shader_data_callback(int64_t /* se_id */, + void* se_data, + size_t data_size, + rocprofiler_user_data_t /* userdata */) +{ + if(se_data && data_size) valid_data.store(true); +} + +rocprofiler_status_t +query_available_agents(rocprofiler_agent_version_t /* version */, + const void** agents, + size_t num_agents, + void* /* user_data */) +{ + for(size_t idx = 0; idx < num_agents; idx++) + { + const auto* agent = static_cast(agents[idx]); + if(agent->type != ROCPROFILER_AGENT_TYPE_GPU) continue; + + ROCPROFILER_CALL(rocprofiler_configure_agent_thread_trace_service( + client_ctx, nullptr, 0, agent->id, shader_data_callback, nullptr), + "thread trace service configure"); + + return ROCPROFILER_STATUS_SUCCESS; + } + return ROCPROFILER_STATUS_ERROR; +} + +int +tool_init(rocprofiler_client_finalize_t /* fini_func */, void* /* tool_data */) +{ + ROCPROFILER_CALL(rocprofiler_create_context(&client_ctx), "context creation"); + + ROCPROFILER_CALL(rocprofiler_query_available_agents(ROCPROFILER_AGENT_INFO_VERSION_0, + query_available_agents, + sizeof(rocprofiler_agent_t), + nullptr), + ""); + + int valid = 0; + ROCPROFILER_CALL(rocprofiler_context_is_valid(client_ctx, &valid), "context validity check"); + return (valid == 0) ? -1 : 0; +} + +void +tool_fini(void* /* tool_data */) +{ + assert(valid_data.load()); +} + +} // namespace Agent +} // namespace ATTTest + +extern "C" rocprofiler_tool_configure_result_t* +rocprofiler_configure(uint32_t /* version */, + const char* /* runtime_version */, + uint32_t priority, + rocprofiler_client_id_t* id) +{ + // only activate if main tool + if(priority > 0) return nullptr; + + // set the client name + id->name = "ATT_test_agent_api"; + + // store client info + ATTTest::Agent::client_id = id; + + // create configure data + static auto cfg = + rocprofiler_tool_configure_result_t{sizeof(rocprofiler_tool_configure_result_t), + &ATTTest::Agent::tool_init, + &ATTTest::Agent::tool_fini, + nullptr}; + + // return pointer to configure data + return &cfg; +} + +void +run(int dev) +{ + constexpr size_t size = 0x1000; + float* ptr = nullptr; + + HIP_API_CALL(hipSetDevice(dev)); + + HIP_API_CALL(hipMalloc(&ptr, size * sizeof(float))); + HIP_API_CALL(hipMemset(ptr, 0x55, size * sizeof(float))); + HIP_API_CALL(hipFree(ptr)); +} + +int +main() +{ + int ndev = 0; + HIP_API_CALL(hipGetDeviceCount(&ndev)); + + for(int dev = 0; dev < ndev; dev++) + run(dev); + + ROCPROFILER_CALL(rocprofiler_start_context(ATTTest::Agent::client_ctx), "context start"); + + for(int dev = 0; dev < ndev; dev++) + run(dev); + usleep(100); + + ROCPROFILER_CALL(rocprofiler_stop_context(ATTTest::Agent::client_ctx), "context stop"); + return 0; +} diff --git a/tests/thread-trace/common.hpp b/tests/thread-trace/common.hpp index 2e4f73ee..efbd06a4 100644 --- a/tests/thread-trace/common.hpp +++ b/tests/thread-trace/common.hpp @@ -1,4 +1,27 @@ +// MIT License +// +// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved. +// +// Permission is hereby granted, free of charge, to any person obtaining a copy +// of this software and associated documentation files (the "Software"), to deal +// in the Software without restriction, including without limitation the rights +// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +// copies of the Software, and to permit persons to whom the Software is +// furnished to do so, subject to the following conditions: +// +// The above copyright notice and this permission notice shall be included in all +// copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +// SOFTWARE. + #pragma once + #include #include @@ -12,15 +35,19 @@ #include #include +#define ROCPROFILER_VAR_NAME_COMBINE(X, Y) X##Y +#define ROCPROFILER_VARIABLE(X, Y) ROCPROFILER_VAR_NAME_COMBINE(X, Y) + #define ROCPROFILER_CALL(result, msg) \ { \ - rocprofiler_status_t CHECKSTATUS = result; \ - if(CHECKSTATUS != ROCPROFILER_STATUS_SUCCESS) \ + rocprofiler_status_t ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) = result; \ + if(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) != ROCPROFILER_STATUS_SUCCESS) \ { \ - std::string status_msg = rocprofiler_get_status_string(CHECKSTATUS); \ + std::string status_msg = \ + rocprofiler_get_status_string(ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__)); \ std::cerr << "[" #result "][" << __FILE__ << ":" << __LINE__ << "] " << msg \ - << " failed with error code " << CHECKSTATUS << ": " << status_msg \ - << std::endl; \ + << " failed with error code " << ROCPROFILER_VARIABLE(CHECKSTATUS, __LINE__) \ + << ": " << status_msg << std::endl; \ std::stringstream errmsg{}; \ errmsg << "[" #result "][" << __FILE__ << ":" << __LINE__ << "] " << msg " failure (" \ << status_msg << ")"; \ @@ -83,7 +110,10 @@ tool_codeobj_tracing_callback(rocprofiler_callback_tracing_record_t record, void* callback_data); void -shader_data_callback(int64_t se_id, void* se_data, size_t data_size, void* userdata); +shader_data_callback(int64_t se_id, + void* se_data, + size_t data_size, + rocprofiler_user_data_t userdata); void callbacks_init(); @@ -93,4 +123,4 @@ callbacks_fini(); }; // namespace Callbacks -}; // namespace ATTTest \ No newline at end of file +}; // namespace ATTTest diff --git a/tests/thread-trace/main.cpp b/tests/thread-trace/main.cpp index 507fde9b..242bc433 100644 --- a/tests/thread-trace/main.cpp +++ b/tests/thread-trace/main.cpp @@ -92,4 +92,4 @@ main(int argc, char** argv) } return 0; -} \ No newline at end of file +} diff --git a/tests/thread-trace/multi_dispatch.cpp b/tests/thread-trace/multi_dispatch.cpp index 46c8c090..6a5dc457 100644 --- a/tests/thread-trace/multi_dispatch.cpp +++ b/tests/thread-trace/multi_dispatch.cpp @@ -55,11 +55,14 @@ dispatch_callback(rocprofiler_queue_id_t /* queue_id */, const rocprofiler_agent_t* /* agent */, rocprofiler_correlation_id_t /* correlation_id */, rocprofiler_kernel_id_t kernel_id, - void* userdata) + rocprofiler_dispatch_id_t /* dispatch_id */, + rocprofiler_user_data_t* dispatch_userdata, + void* userdata) { C_API_BEGIN assert(userdata && "Dispatch callback passed null!"); - ToolData& tool = *reinterpret_cast(userdata); + ToolData& tool = *reinterpret_cast(userdata); + dispatch_userdata->ptr = userdata; static std::atomic call_id{0}; static std::string_view desired_func_name = "branching_kernel"; @@ -102,15 +105,15 @@ tool_init(rocprofiler_client_finalize_t /* fini_func */, void* tool_data) "code object tracing service configure"); std::vector params{}; - params.push_back({ROCPROFILER_ATT_PARAMETER_CODE_OBJECT_TRACE_ENABLE, 1}); - - ROCPROFILER_CALL(rocprofiler_configure_thread_trace_service(client_ctx, - params.data(), - params.size(), - dispatch_callback, - Callbacks::shader_data_callback, - tool_data), - "thread trace service configure"); + + ROCPROFILER_CALL( + rocprofiler_configure_dispatch_thread_trace_service(client_ctx, + params.data(), + params.size(), + dispatch_callback, + Callbacks::shader_data_callback, + tool_data), + "thread trace service configure"); int valid_ctx = 0; ROCPROFILER_CALL(rocprofiler_context_is_valid(client_ctx, &valid_ctx), diff --git a/tests/thread-trace/single_dispatch.cpp b/tests/thread-trace/single_dispatch.cpp index 79a9e90c..b4dd1842 100644 --- a/tests/thread-trace/single_dispatch.cpp +++ b/tests/thread-trace/single_dispatch.cpp @@ -56,13 +56,15 @@ dispatch_callback(rocprofiler_queue_id_t /* queue_id */, const rocprofiler_agent_t* /* agent */, rocprofiler_correlation_id_t /* correlation_id */, rocprofiler_kernel_id_t kernel_id, - void* userdata) + rocprofiler_dispatch_id_t /* dispatch_id */, + rocprofiler_user_data_t* dispatch_userdata, + void* userdata) { C_API_BEGIN assert(userdata && "Dispatch callback passed null!"); - ToolData& tool = *reinterpret_cast(userdata); + ToolData& tool = *reinterpret_cast(userdata); + dispatch_userdata->ptr = userdata; - static std::atomic call_id{0}; static std::string_view desired_func_name = "branching_kernel"; try @@ -71,7 +73,7 @@ dispatch_callback(rocprofiler_queue_id_t /* queue_id */, if(kernel_name.find(desired_func_name) == std::string::npos) return ROCPROFILER_ATT_CONTROL_NONE; - if(call_id.fetch_add(1) == 0) return ROCPROFILER_ATT_CONTROL_START_AND_STOP; + return ROCPROFILER_ATT_CONTROL_START_AND_STOP; } catch(...) { std::cerr << "Could not find kernel id: " << kernel_id << std::endl; @@ -99,7 +101,7 @@ tool_init(rocprofiler_client_finalize_t /* fini_func */, void* tool_data) "code object tracing service configure"); ROCPROFILER_CALL( - rocprofiler_configure_thread_trace_service( + rocprofiler_configure_dispatch_thread_trace_service( client_ctx, nullptr, 0, dispatch_callback, Callbacks::shader_data_callback, tool_data), "thread trace service configure"); diff --git a/tests/thread-trace/trace_callbacks.cpp b/tests/thread-trace/trace_callbacks.cpp index 0c0b56b0..97a319e3 100644 --- a/tests/thread-trace/trace_callbacks.cpp +++ b/tests/thread-trace/trace_callbacks.cpp @@ -138,9 +138,9 @@ get_trace_data(rocprofiler_att_parser_data_type_t type, void* att_data, void* us auto ptr = std::make_unique(); try { - auto shared_inst = codeobjTranslate->get(pc.marker_id, pc.addr); - if(shared_inst == nullptr) return; - ptr->inst = shared_inst->inst; + auto unique_inst = codeobjTranslate->get(pc.marker_id, pc.addr); + if(unique_inst == nullptr) return; + ptr->inst = unique_inst->inst; } catch(...) { return; @@ -178,7 +178,7 @@ isa_callback(char* isa_instruction, assert(trace_data.tool && "ISA callback passed null!"); ToolData& tool = *reinterpret_cast(trace_data.tool); - std::shared_ptr instruction; + std::unique_ptr instruction; try { @@ -210,11 +210,14 @@ isa_callback(char* isa_instruction, } void -shader_data_callback(int64_t se_id, void* se_data, size_t data_size, void* userdata) +shader_data_callback(int64_t se_id, + void* se_data, + size_t data_size, + rocprofiler_user_data_t userdata) { C_API_BEGIN - assert(userdata && "Shader callback passed null!"); - ToolData& tool = *reinterpret_cast(userdata); + assert(userdata.ptr && "Shader callback passed null!"); + ToolData& tool = *reinterpret_cast(userdata.ptr); trace_data_t data{.id = se_id, .data = (uint8_t*) se_data, .size = data_size, .tool = &tool}; auto status = rocprofiler_att_parse_data(copy_trace_data, get_trace_data, isa_callback, &data); @@ -235,4 +238,4 @@ callbacks_fini() } } // namespace Callbacks -} // namespace ATTTest \ No newline at end of file +} // namespace ATTTest