// MIT License // Modifications Copyright (c) 2024 Advanced Micro Devices, Inc. // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal // in the Software without restriction, including without limitation the rights // to use, copy, modify, merge, publish, distribute, sublicense, and/or sell // copies of the Software, and to permit persons to whom the Software is // furnished to do so, subject to the following conditions: // The above copyright notice and this permission notice shall be included in // all copies or substantial portions of the Software. // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN // THE SOFTWARE.
-
--list
,-l
- List all devices and benchmarks without running them.
-
--help
,-h
- Print usage information and exit.
-
--help-axes
,--help-axis
- Print axis specification documentation and exit.
-
--version
- Print information about the version of NVBench used to build the executable.
-
--persistence-mode <state>
,--pm <state>
- Sets persistence mode for one or more GPU devices.
- Applies to the devices described by the most recent
--devices
option, or all devices if--devices
is not specified. - This option requires root / admin permissions.
- This option is only supported on Linux.
- This call must precede all other device modification options, if any.
- Note that persistence mode is deprecated and will be removed at some point in favor of the new persistence daemon. See the following link for more details: https://docs.nvidia.com/deploy/driver-persistence/index.html
- Valid values for
state
are:0
: Disable persistence mode.1
: Enable persistence mode.
-
--lock-gpu-clocks <rate>
,--lgc <rate>
- Lock GPU clocks for one or more devices to a particular rate.
- Applies to the devices described by the most recent
--devices
option, or all devices if--devices
is not specified. - This option requires root / admin permissions.
- This option is only supported in Volta+ (sm_70+) devices.
- Valid values for
rate
are:reset
,unlock
,none
: Unlock the GPU clocks.base
,tdp
: Lock clocks to base frequency (best for stable results).max
,maximum
: Lock clocks to max frequency (best for fastest results).
-
--csv <filename/stream>
- Write CSV output to a file, or "stdout" / "stderr".
-
--json <filename/stream>
- Write JSON output to a file, or "stdout" / "stderr".
-
--markdown <filename/stream>
,--md <filename/stream>
- Write markdown output to a file, or "stdout" / "stderr".
- Markdown is written to "stdout" by default.
-
--quiet
,-q
- Suppress output.
-
--color
- Use color in output (markdown + stdout only).
-
--benchmark <benchmark name/index>
,-b <benchmark name/index>
- Execute a specific benchmark.
- Argument is a benchmark name or index, taken from
--list
. - If not specified, all benchmarks will run.
--benchmark
may be specified multiple times to run several benchmarks.- The same benchmark may be specified multiple times with different configurations.
-
--axis <axis specification>
,-a <axis specification>
- Override an axis specification.
- See
--help-axis
for details on axis specifications. - Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
-
--devices <device ids>
,--device <device ids>
,-d <device ids>
- Limit execution to one or more devices.
<device ids>
is a single id, a comma separated list, or the string "all".- Device ids can be obtained from
--list
. - Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
-
--min-samples <count>
- Gather at least
<count>
samples per measurement. - Default is 10 samples.
- Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
- Gather at least
-
--min-time <seconds>
- Accumulate at least
<seconds>
of execution time per measurement. - Default is 0.5 seconds.
- If both GPU and CPU times are gathered, this applies to GPU time only.
- Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
- Accumulate at least
-
--max-noise <value>
- Gather samples until the error in the measurement drops below
<value>
. - Noise is specified as the percent relative standard deviation.
- Default is 0.5% (
--max-noise 0.5
) - Only applies to Cold measurements.
- If both GPU and CPU times are gathered, this applies to GPU noise only.
- Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
- Gather samples until the error in the measurement drops below
-
--skip-time <seconds>
- Skip a measurement when a warmup run executes in less than
<seconds>
. - Default is -1 seconds (disabled).
- Intended for testing / debugging only.
- Very fast kernels (<5us) often require an extremely large number of samples
to converge
max-noise
. This option allows them to be skipped to save time during testing. - Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
- Skip a measurement when a warmup run executes in less than
-
--timeout <seconds>
- Measurements will timeout after
<seconds>
have elapsed. - Default is 15 seconds.
<seconds>
is walltime, not accumulated sample time.- If a measurement times out, the default markdown log will print a warning to report any outstanding termination criteria (min samples, min time, max noise).
- Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
- Measurements will timeout after
-
--run-once
- Only run the benchmark once, skipping any warmup runs and batched measurements.
- Intended for use with external profiling tools.
- Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
-
--disable-blocking-kernel
- Don't use the
blocking_kernel
. - Intended for use with external profiling tools.
- Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
- Don't use the
-
--profile
- Implies
--run-once
and--disable-blocking-kernel
. - Intended for use with external profiling tools.
- Applies to the most recent
--benchmark
, or all benchmarks if specified before any--benchmark
arguments.
- Implies