Skip to content

Commit

Permalink
comparing tool options in rocprof/rocprofv2/rocprofv3 (#1050)
Browse files Browse the repository at this point in the history
* comparing tool options in rocprof/rocprofv2/rocprofv3

* Added more perfetto options

* Added new summary optipons

* Added Category for all options

* Apply suggestions from code review

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>

* changes requested in SWDEV-484472

* Correct broken table formatting

* detailed explanation on json custom format

* Feedback Resolution

---------

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
  • Loading branch information
3 people authored Sep 16, 2024
1 parent 7861dcc commit 863b608
Show file tree
Hide file tree
Showing 2 changed files with 352 additions and 6 deletions.
344 changes: 344 additions & 0 deletions source/docs/conceptual/comparing-with-legacy-tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,347 @@ The design change empowers ROCprofiler-SDK to:
- Allow multiple tools to use certain services simultaneously.
- Improve thread safety without introducing parallel bottlenecks.
- Manage internal data and allocations more efficiently.

===================================================================================================
Comparing command-line tool options: ROCprofiler(rocprof, rocprofv2) and ROCprofiler-SDK(rocprofv3)
===================================================================================================

ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more efficient and flexible version of the ROCprofiler tool.

.. list-table:: Comparison of ROCprofiler Command-Line Tool's options
:header-rows: 1

* - Category
- Feature
- rocprof
- rocprofv2
- rocprofv3
- Improvements
- Notes
* - Basic tracing options
- HIP Trace
- `--hip-trace`
- `--hip-api`, `--hip-trace`
- `--hip-trace`
- No change
- | rocprof and rocprofv2 `--hip-trace` options include kernel dispatches and memory copy activities,
| which is not the case in rocprofv3
* - Basic tracing options
- HSA Trace
- `--hsa-trace`
- `--hsa-trace`
- `--hsa-trace`
- No change
- | rocprof and rocprofv2 `--hsa-trace` options include kernel dispatches and memory copy activities,
| which is not the case in rocprofv3
* - Basic tracing options
- Scratch Memory Trace
- *Not Available*
- *Not Available*
- `--scratch-memory-trace`
- New option to trace scratch memory operations
-
* - Basic tracing options
- Marker Trace(ROCTx)
- `--roctx-trace`
- `--roctx-trace`
- `--marker-trace`
- Improved ROCtx library with more features
-
* - Basic tracing options
- Memory Copy Trace
- Part of HIP and HSA Traces
- Part of HIP and HSA Traces
- `--memory-copy-trace`
- Provides granularity for memory move operations
-
* - Basic tracing options
- Kernel Trace
- `--kernel-trace`
- `--kernel-trace`
- `--kernel-trace`
- Performance improvement.
-
* - Granular tracing options
- HIP runtime trace
- Part of `--hip-trace` option
- Part of `--hip-trace` option
- `--hip-runtime-trace`
- For collecting HIP Runtime API Traces, e.g. public HIP API functions starting with 'hip' (i.e. hipSetDevice).
-
* - Granular tracing options
- HIP compiler trace
- *Not Available*
- *Not Available*
- `--hip-compiler-trace`
- For collecting HIP Compiler generated code Traces, e.g. HIP API functions starting with '__hip' (i.e. __hipRegisterFatBinary).
-
* - Granular tracing options
- HSA core API trace
- Part of `--hsa-trace` option
- Part of `--hsa-trace` option
- `--hsa-core-trace`
- New option for collecting only HSA API Traces (core API), e.g. HSA functions prefixed with only `hsa_` (i.e. hsa_init)
-
* - Granular tracing options
- HSA AMD trace
- Part of `--hsa-trace` option
- Part of `--hsa-trace` option
- `--hsa-amd-trace`
- For collecting HSA API Traces (AMD-extension API), e.g. HSA function prefixed with `hsa_amd_` (i.e. hsa_amd_coherency_get_type)
-
* - Granular tracing options
- HSA Image Extension trace
- Part of `--hsa-trace` option
- Part of `--hsa-trace` option
- `--hsa-image-trace`
- New option for collecting HSA API Traces (Image-extenson API), e.g. HSA functions prefixed with only `hsa_ext_image_` (i.e. hsa_ext_image_get_capability).
-
* - Granular tracing options
- HSA Finalizer trace
- Part of `--hsa-trace` option
- Part of `--hsa-trace` option
- `--hsa-finalizer-trace`
- New option for collecting HSA API Traces (Finalizer-extension API), e.g. HSA functions prefixed with only `hsa_ext_program_` (i.e. hsa_ext_program_create)
-
* - Aggregate tracing options
- Sys Trace
- `--sys-trace` [hip-trace|hsa-trace|roctx-trace|kernel-trace]
- `--sys-trace` [hip-trace|hsa-trace|roctx-trace|kernel-trace]
- ` -s, --sys-trace` [hip-trace|hsa-trace|scratch-trace|memory-copy-trace|roctx-trace|kernel-trace]
- Extends the sys trace options with more features
-
* - Aggregate tracing options
- Runtime Trace
- *Not available*
- *Not available*
- ` -r, --runtime-trace` [hip-runtime-trace|scratch-trace|memory-copy-trace|roctx-trace|kernel-trace]
- New option to aggregate trace operations
-
* - Kernel naming options
- Kernel Name Mangling
- *Not Available*
- *Not Available*
- `-M`, `--mangled-kernels`
- New option for mangled kernel names
-
* - Kernel naming options
- Kernel Name Truncation
- `--basenames <on|off>`
- `--basenames`
- `-T`, `--truncate-kernels`
- New option for truncating the demangled kernel names
-
* - Kernel naming options
- Kernel Rename
- `--roctx-rename`
- *Not available*
- `--kernel-rename`
- New option to use region names defined by roctxRangePush/roctxRangePop regions to rename the kernels
-
* - Post-processing tracing options
- Statistics
- --stats
- *Not Available*
- --stats
- Statistics for the collected traces
-
* - Post-processing tracing options
- Summary
- *Not available*
- *Not available*
- `-S, --summary`
- New option to output a single summary of tracing data after the profiling session
- `rocprof` generated the post-processing step's summary, stats, JSON, and database files with much less information.
* - Post-processing tracing options
- Summary Per Domain
- *Not available*
- *Not available*
- `-D, --summary-per-domain`
- New option to output summary for each tracing domain after the profiling session
- `rocprof --stats` option had less number of domains in the summary reports than `rocprofv3`
* - Post-processing tracing options
- Summary Groups
- *Not available*
- *Not available*
- `--summary-groups REGULAR_EXPRESSION`
- New option to output a summary for each set of domains matching the regular expression, e.g. 'KERNEL_DISPATCH|MEMORY_COPY' will generate a summary from all the tracing data in the KERNEL_DISPATCH and MEMORY_COPY domains
-
* - Summary options
- Summary Output File
- *Not available*
- *Not available*
- `--summary-output-file SUMMARY_OUTPUT_FILE`
- New option to output summary to a file, stdout, or stderr (default: stderr)
-
* - Summary options
- Summary Units
- *Not available*
- *Not available*
- `-u , --summary-units`
- New option to output summary in desired time units {sec,msec,usec,nsec}
-
* - Display options
- List Metrics
- `--list-basic`, `--list-derived`
- `--list-counters`
- `-L`, `--list-metrics`
- A valid YAML is supported for this option now
-
* - Perfetto-specific options
- Perfetto data collection backend
- *Not available*
- *Not available*
- `--perfetto-backend` {inprocess,system}
- New option for perfetto data collection backend. 'system' mode requires starting traced and perfetto daemons
- `rocprofv2` used only in-process collection for perfetto plugin, However, `rocprofv3` give the option to the user
* - Perfetto-specific options
- Perfetto Buffer Size
- *Not available*
- Setting env variable `rocprofiler_PERFETTO_MAX_BUFFER_SIZE_KIB` to the desired buffer size
- `--perfetto-buffer-size` {KB}
- New option to define size of buffer for perfetto output in KB. default: 1 GB
-
* - Perfetto-specific options
- Perfetto Buffer fill Policy
- *Not available*
- *Not available*
- `--perfetto-buffer-fill-policy` {discard,ring_buffer}
- New option or handling new records when perfetto has reached the buffer limit
- `rocprofv2` always used `TraceConfig_BufferConfig_FillPolicy_RING_BUFFER` fill policy.
* - Perfetto-specific options
- Perfetto shared memory size
- *Not available*
- *Not available*
- `--perfetto-shmem-size-hint` KB
- New option to define perfetto shared memory size hint in KB. default: 64 KB
-
* - Filtering options
- Kernel Filtration options for Counter Collection
- Supported in input.xml file (supports range, gpu and kernel filtration)
- kernel: <kernel_name> (can only be provided in input.txt file)
- `--kernel-include-regex`, `--kernel-exclude-regex`, `--kernel-iteration-range`
- Extensive control over output options using regular expressions
-
* - I/O options
- Output Directory
- `-d` <data directory>
- `-d` | `--output-directory`
- `-d` OUTPUT_DIRECTORY, `--output-directory` OUTPUT_DIRECTORY
- rocprofv3 supports special keys for runtime values, e.g. %pid% gets replaced by the process ID
-
* - I/O options
- Output File
- `-o` <output file>
- `-o` | `--output-file-name`
- `-o` OUTPUT_FILE, `--output-file` OUTPUT_FILE
- rocprofv3 supports special keys for runtime values, e.g. %pid% gets replaced by the process ID
-
* - I/O options
- Logging
- Minimal logging via environment variable
- Minimal logging via environment variable
- --log-level {fatal,error,warning,info,trace,env}
- Extensive logging options
-
* - I/O options
- Plugins
- *Not Available*
- plugin support for different output formats
- Replaced by `--output-format` option
- Not needed as rocprofv3 supports multiple output formats
-
* - I/O options
- Output Formats
- CSV, JSON (Chrome-Tracing format)
- CSV, JSON (Chrome-Tracing format), Perfetto, CTF
- CSV, JSON (custom schema), Perfetto, OTF2
- | # Multiple output formats can be supported in single run.
| # OTF2 can visualize larger trace files compared to perfetto.
- The Perfetto UI does not accept the JSON output format produced by rocprofv3. Perfetto is dropping support for the JSON Chrome tracing format in favor of the binary Perfetto protobuf format (``.pftrace`` extension), which is supported by rocprofv3.
* - I/O options
- Counter Collection
- Supports input text and XML format
- Only supports input text format
- Input support for text, YAML and JSON formats
- | # Its not possible to check for valid text file. Hence rocprofv3 supports strongly typed input formats.
| # YAML and JSON formats are more readable and easy to maintain.
| # Allows flexibility to add more features for the tool input
-
* - I/O options
- Providing Custom metrics file
- `-m` <metric file>
- `-m` <metric file>
- Not available
- Not yet in rocprofv3
-
* - Advanced options
- Preload
- *Not Available*
- *Not Available*
- --preload
- Libraries to prepend to LD_PRELOAD (usually for sanitizers)
-
* - Trace Control options
- Trace Period
- `--trace-period`
- `-tp | --trace-period`
- *Not available*
- Not yet in rocprofv3
-
* - Trace Control options
- Trace start
- `--trace-start <on|off>`
- *Not available*
- *Not available*
- Not yet in rocprofv3
-
* - Trace Control options
- Flush Interval
- `--flush-rate`
- `--flush-interval`
- *Not available*
- Not applicable for rocprofv3
-
* - Trace Control options
- Merge Traces
- `--merge-traces`
- *Not available*
- *Not available*
- Not yet in rocprofv3
-
* - Legacy options
- Timestamp On/Off
- `--timestamp <on|off>`
- *Not available*
- *Not available*
- Not applicable for rocprofv3
-
* - Legacy options
- Context wait
- `--ctx-wait`
- *Not available*
- *Not available*
- Not applicable for rocprofv3
-
* - Legacy options
- Context Limit
- `--ctx-limit <max number>`
- *Not available*
- *Not available*
- Not applicable for rocprofv3
-
* - Legacy options
- Code Object Tracking
- `--obj-tracking <on|off>`
- Always ``ON`` in rocprofv2
- Always ``ON`` in rocprofv3
-
-
* - Legacy options
- Heartbeat
- `--heartbeat <rate sec>`
- *Not available*
- *Not available*
- Not applicable for rocprofv3
-
14 changes: 8 additions & 6 deletions source/docs/how-to/using-rocprofv3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -685,10 +685,10 @@ Output formats
``rocprofv3`` supports the following output formats:
- CSV (default)
- JSON (custom format for programmatic analysis)
- PFTrace (Perfetto trace)
- OTF2 (Open Trace Format )
- CSV (Default)
- JSON (Custom format for programmatic analysis only)
- PFTrace (Perfetto trace for visualization with Perfetto)
- OTF2 (Open Trace Format for visualization with compatible third party tools)
You can specify the output format using the ``--output-format`` command-line option. Format selection is case-insensitive
and multiple output formats are supported. For example: ``--output-format json`` enables JSON output exclusively whereas
Expand All @@ -704,8 +704,10 @@ For .otf2 trace visualization, open the trace in `vampir.eu <https://vampir.eu/>
JSON output schema
++++++++++++++++++++
``rocprofv3`` supports a custom JSON output format designed for programmatic analysis. The schema is optimized for size
while factoring in usability. You can generate the JSON output using ``--output-format json`` command-line option.
``rocprofv3`` supports a **custom** JSON output format designed for programmatic analysis and **NOT** for visualization.
The schema is optimized for size while factoring in usability. The Perfetto UI does not accept this JSON output format produced by rocprofv3.
Perfetto is dropping support for the JSON Chrome tracing format in favor of the binary Perfetto protobuf format (.pftrace extension), which is supported by rocprofv3.
You can generate the JSON output using ``--output-format json`` command-line option.
Properties
++++++++++++
Expand Down

0 comments on commit 863b608

Please sign in to comment.