Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update release/rocm-rel-6.2 for RC3 (#968) (#972)
* Small doc update to remove restrictions no longer present (#917) * Small doc update to remove restrictions no longer present * Add calls to stop and wait for a debugger (#916) * Small change to sample for clarity (#913) * Added error log for query counter info (#903) * Added error log for query counter info * Add dimension query to counter collection sample (#918) * Disable PC sampling service if counter collection service is configured (#899) * The NULL value of an internal correlation ID defined (#901) * Remove duplicate table code from tests (#922) * Remove duplicate table code from tests Remove duplicate HSA table code from tests. Cleanup includes (and remove unnecessary ones). * SWDEV-465322: Adding support for Perfcounter SIMD Mask in ATT (#910) * SWDEV-465322: Adding support for r Perfcounter SIMD Mask in ATT * Apply suggestions from code review * Adding unit tests * Adding counters check for gfx9 and SQ block only * Addressing review comments * changing the struct size * fixing header includes --------- * Fix for SLES/RHEL compilers (#925) * Fix for SLES/RHEL compilers --------- * Fix agent profiling for SQ counters (#919) * Fix agent profiling for SQ counters --------- * Disable counter collection if PC sampling is enabled (#924) * docs and tests format (#927) * ATT API changes - add user_data field and separation of dispatch vs agent profiling (#893) * DRM Issue Fix for SLES 15 (#897) * DRM Issue Fix * Formatting Fix * PC sampling: CID manager unit test (#898) * Adding per-dispatch userdata field to ATT * Clang tidy * Formatting * Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp * Adding dispatch_id, fixing user_data and update aql_profile_v2 * Formatting * Tidy fixes * Second fix for userdata * removing assert for union * Adding serialization. Created agent profiling-like thread trace * Implemented agent thread trace * Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp * Restructured thread trace packets * Added agent API tests * Fixing multigpu for agent test * Formatting * Formatting * Improving header locations * Fixing merge conflicts * Tidy * Tidy * Tidy --------- * Allow multiple agents in a single context for agent profiling (#908) Allow multiple profiles for agent profiling * Remove unnecessary AgentCache argument from profile construction (#931) This argument is not necessary. Removed. * Update controller.cpp (#932) * Update controller.cpp * Update controller.cpp * Formatting * Pumping down the ioctl version for CI only (#928) * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Replicate global counters across all derived counters (#936) Fix derived counters to have globals replicated across all architectures (that support them). --------- * Incremental Counter Profile Creation (#933) * Incremental Counter Profile Creation Adds support for incremental counter creation. How this functions is the behavior of rocprofiler_create_profile_config has been changed. rocprofiler_create_profile_config(rocprofiler_agent_id_t agent_id, rocprofiler_counter_id_t* counters_list, size_t counters_count, rocprofiler_profile_config_id_t* config_id) The behavior of this function now allows an existing config_id to be supplied via config_id. The counters contained in this config will be copied over and used as a base for a new config along with any counters supplied in counters_list. The new config id is returned via config_id and can be used in future dispatch/agent counting sessions. A new config is created over modifying an existing config since there is no gaurentee that the existing config isn't already in use. While we could add locks (or other mutual exclusion properties) to check if its in use and reject an update, the benefit from doing so is minor in comparison to just creating a new config. This also side steps a common pattern a tool may use to add additional counters at some point later on during execution. Now they can do that without destroying the existing config. --------- * PC Sampling IOCTL version check introduced (#944) * doc update for 6.2 release (#938) * doc update for 6.2 release * Adding warning for gerrit->github nightly sync * PC sampling IOCTL versioning refactored (#945) The following changes are introduced: - Use functions instead of macros. - Verify the error code when querying KFD IOCTL version. - Skip tests and samples if KFD IOCTL < 1.16 or PC Sampling IOCTL < 0.1. * Add HSA tracing support for `hsa_amd_vmem_address_reserve_align` (#946) * Add support for hsa_amd_vmem_address_reserve_align * Update lib/rocprofiler-sdk/hsa/types.hpp - support HSA_AMD_EXT_API_TABLE_STEP_VERSION == 0x2 for HSA v1.14.0 --------- * readthedocs updates (#877) * readthedocs updates * Adding License * correcting table of contents path * Move doc requirements to sphinx dir * Compile requirements.txt * Update path to reqs * Adding missing python module * changing sphinx version * changing docutils version * enabling sphinx extensions * trying sphinx-rtd-theme * Remove unused doc configs * Remove unused html theme options * Add files to toc * temp commit to test * updating environment.yml for CI build * Update doc requirements To include rocprofiler-sdk in projects.yaml * Set external_projects_current_project as rocprofiler-sdk * Exclude external projects * Fix warning for missing static path * updating conf.py * Removing reST syntax * Use rocm-docs-core doxygen integration * Remove RST syntax from Markdown files * Generate doxyfile post checkout on RTD * Use custom RTD env * Specify mambaforge * Put conda before post checkout cmd * Add doxyfile for RTD * Run cmake from conf.py * Update environment.yml * Use mambaforge * Fix path to environment.yml * Call build doxyfile * Add Developer API title to Doxyfile * Config version header * Fix typo in conf.py * Format fix for conf.py * Increasing timeout for build-docs-from-source * Remove README as mainpage for doxyfile * Fix formatting in conf.py --------- * Fixing OpenSuse build (#947) * Fix documentation (#949) * Sync queue and async copy on client finalizer (#950) * Add `logical_node_type_id` field to `rocprofiler_agent_t` (#948) * Add logical_node_type_id field to rocprofiler_agent_t * Patch queue_controller * Remove fatal error when callback and buffer tracing API in one context (#952) - one context for callback and buffer tracing of same API produces erroneous fatal error -- this is a valid use case * Adding wrappers on HSA for executable load/unload and allowing multiple agents per context on ATT (#951) * Codeobj wrappers around HSA calls for ATT * Formatting * Bookeeping * Tidy * Tidy * Update source/lib/rocprofiler-sdk/thread_trace/code_object.hpp * Update source/lib/rocprofiler-sdk/thread_trace/att_core.hpp * Variable naming --------- * Removing cache of decoded lines and returning shared_ptr (#953) * Update continuous_integration.yml (#926) * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml * Update continuous_integration.yml --------- * Accumulation metrics support and update counter collection API to aqlprofile_v2 (#915) * Updating to v3 API * General fixes * Extending dimension bits to 54 * Disabling agent profiling tests * Fixed unit test * Adding accumulate metric support for parsing counters (#609) * Adding accumulate metric support for parsing counters * Adding metric flag * Updating tests * source formatting (clang-format v11) (#610) * source formatting (clang-format v11) (#614) * Adding evaluate ast test * source formatting (clang-format v11) (#633) * Update scanner generated file * Adding flags to events for aqlprofile * Fix Mi200 failing test --------- * Revert "Extending dimension bits to 54" This reverts commit 3cd6628452484044a93e129f27974f996a0e4c08. * Removing CU dimension * Fixing merge conflicts * Revert "Disabling agent profiling tests" This reverts commit 7e01518ed8c51fbb0c3b2575e1e0b8f9ddfa8237. * Fixing merge conflicts * Fix parser tests * Adding accumulate metric documentation * Update counter_collection_services.md * Update index.md * fix nested expression use * Update source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp * Doc update --------- * Fix kernel trace gaps (#961) - source/lib/rocprofiler-sdk/hsa/queue.cpp - Optimize WriteInterceptor to eliminate extra barrier packets causing gaps between kernels in kernel tracing - increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue - misc logging improvements - source/lib/rocprofiler-sdk/counters/agent_profiling.cpp - increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue - tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt - add TIMEOUT for rocprofv3-test-hsa-multiqueue-execute * PC sampling: integration test with instruction decoding (#929) * PC sampling: integration test with instruction decoding * PC sampling: verifying internal and external CIDs The PC sampling integration test has been extended to verify internal and external correlation IDs. * tmp solution of using Instructions as keys * wrapper for HIP call * PCS integration test: ld_addr as instruction id For the sake of the integration test, use as the instruction identifier. To support code object unloading and relocations, use as the identifier (the change in the decoder is required). * PCS integration test: removing shared_ptr Completely removing usage of shared pointers. * PCS integration test: removing decoder When a code object has been unloaded, ensure all PC samples corresponding to that object are decoded, prior to removing the decoder. * PCS integration test: fixing build flags and imports * PCS integration test: fixing labels * PCS integration test: cmake flags fix * PC sampling cmake labels renamed * PCS integration test refactoring * PCS integration test: minimize usage of raw pointers * PCS integration test: at least one sample should be delivered. * PC sampling lables: pc-sampling * General fixes to ATT, packets and event ID retrieval (#960) * General fixes to ATT, packets and event ID retrieval * Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp --------- * Returning code object id information in code_printing.cpp:Instruction (#965) * Returning code object id information in code_printing.cpp:Instruction * Adding assertions * Simplifying decoder library * Miscellaneous updates (#959) - missing-new-line CI job: ensures all source files end with new line - logging updates - add new line to the end of many files - fix header include ordering is misc places - transition to use hsa::get_core_table() and hsa::get_amd_ext_table() in various places instead of making copies * Update HIP API tracing (#958) - support HipDispatchTable additions for HIP_RUNTIME_API_TABLE_STEP_VERSION 1 thru 4 * Fix agent shutdown destructor errors (#969) * Update lib/rocprofiler-sdk/agent.cpp - use static_object wrapper for vector of agent_pair (rocp agent <-> hsa agent) * Fix get_aql_handles() shutdown error - use `static_object` wrapper for vector of `aqlprofile_agent_handle_t` --------- Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> Co-authored-by: Benjamin Welton <ben@amd.com> Co-authored-by: Manjunath P Jakaraddi <manjunath180397@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com> Co-authored-by: Giovanni Lenzi Baraldi <gbaraldi@amd.com> Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com> Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com> Co-authored-by: Manjunath-Jakaraddi <21177428+Manjunath-Jakaraddi@users.noreply.github.com> Co-authored-by: jrmadsen <6001865+jrmadsen@users.noreply.github.com> Co-authored-by: Manjunath-Jakaraddi <manjunath.jakaraddi@amd.com>
- Loading branch information