-
--list,-l- List all devices and benchmarks without running them.
-
--help,-h- Print usage information and exit.
-
--help-axes,--help-axis- Print axis specification documentation and exit.
-
--version- Print information about the version of NVBench used to build the executable.
-
--persistence-mode <state>,--pm <state>- Sets persistence mode for one or more GPU devices.
- Applies to the devices described by the most recent
--devicesoption, or all devices if--devicesis not specified. - This option requires root / admin permissions.
- This option is only supported on Linux.
- This call must precede all other device modification options, if any.
- Note that persistence mode is deprecated and will be removed at some point in favor of the new persistence daemon. See the following link for more details: https://docs.nvidia.com/deploy/driver-persistence/index.html
- Valid values for
stateare:0: Disable persistence mode.1: Enable persistence mode.
-
--lock-gpu-clocks <rate>,--lgc <rate>- Lock GPU clocks for one or more devices to a particular rate.
- Applies to the devices described by the most recent
--devicesoption, or all devices if--devicesis not specified. - This option requires root / admin permissions.
- This option is only supported in Volta+ (sm_70+) devices.
- Valid values for
rateare:reset,unlock,none: Unlock the GPU clocks.base,tdp: Lock clocks to base frequency (best for stable results).max,maximum: Lock clocks to max frequency (best for fastest results).
-
--csv <filename/stream>- Write CSV output to a file, or "stdout" / "stderr".
-
--json <filename/stream>- Write JSON output to a file, or "stdout" / "stderr".
-
--markdown <filename/stream>,--md <filename/stream>- Write markdown output to a file, or "stdout" / "stderr".
- Markdown is written to "stdout" by default.
-
--quiet,-q- Suppress output.
-
--color- Use color in output (markdown + stdout only).
-
--benchmark <benchmark name/index>,-b <benchmark name/index>- Execute a specific benchmark.
- Argument is a benchmark name or index, taken from
--list. - If not specified, all benchmarks will run.
--benchmarkmay be specified multiple times to run several benchmarks.- The same benchmark may be specified multiple times with different configurations.
-
--axis <axis specification>,-a <axis specification>- Override an axis specification.
- See
--help-axisfor details on axis specifications. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--devices <device ids>,--device <device ids>,-d <device ids>- Limit execution to one or more devices.
<device ids>is a single id, a comma separated list, or the string "all".- Device ids can be obtained from
--list. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--skip-time <seconds>- Skip a measurement when a warmup run executes in less than
<seconds>. - Default is -1 seconds (disabled).
- Intended for testing / debugging only.
- Very fast kernels (<5us) often require an extremely large number of samples
to converge
max-noise. This option allows them to be skipped to save time during testing. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Skip a measurement when a warmup run executes in less than
-
--throttle-threshold <value>- Set the GPU throttle threshold as percentage of the device's default clock rate.
- Default is 75.
- Set to 0 to disable throttle detection entirely.
- Note that throttling is disabled when
nvbench::exec_tag::syncis used. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--throttle-recovery-delay <value>- Set the GPU throttle recovery delay in seconds.
- Default is 0.05 seconds.
- Note that throttling is disabled when
nvbench::exec_tag::syncis used. - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--profile- Only run each benchmark once.
- Disable any instrumentation that may interfere with profilers.
- Intended for use with external profiling tools.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--no-batch- Do not run batched measurements even if enabled.
- Intended to shorten run-time when batched measurements are not of interest.
- Applied to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--timeout <seconds>- Measurements will timeout after
<seconds>have elapsed. - Default is 15 seconds.
<seconds>is walltime, not accumulated sample time.- If a measurement times out, the default markdown log will print a warning to report any outstanding termination criteria (min samples, min time, max noise).
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Measurements will timeout after
-
--min-samples <count>- Gather at least
<count>samples per measurement before checking any other stopping criterion besides the timeout. - Default is 10 samples.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Gather at least
-
--stopping-criterion <criterion>- After
--min-samplesis satisfied, use<criterion>to detect if enough samples were collected. - Only applies to Cold and CPU-only measurements.
- If both GPU and CPU times are gathered, GPU time is used for stopping analysis.
- Stopping criteria provided by NVBench are:
- "stdrel": (default) Converges to a minimal relative standard deviation, stdev / mean
- "entropy": Converges based on the cumulative entropy of all samples.
- Each stopping criterion may provide additional parameters to customize behavior, as detailed below:
- After
-
--min-time <seconds>- Accumulate at least
<seconds>of execution time per measurement. - Only applies to
stdrelstopping criterion. - Default is 0.5 seconds.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Accumulate at least
-
--max-noise <value>- Gather samples until the error in the measurement drops below
<value>. - Noise is specified as the percent relative standard deviation (stdev/mean).
- Default is 0.5% (
--max-noise 0.5) - Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
- Gather samples until the error in the measurement drops below
-
--max-angle <value>- Maximum linear regression angle of cumulative entropy.
- Smaller values give more accurate results.
- Default is 0.048.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.
-
--min-r2 <value>- Minimum coefficient of determination for linear regression of cumulative entropy.
- Larger values give more accurate results.
- Default is 0.36.
- Applies to the most recent
--benchmark, or all benchmarks if specified before any--benchmarkarguments.