vllm-model-bash

Scenario-based benchmarking and profiling helpers for vLLM serving workloads.

What This Does

For each scenario in a YAML config, vllm_bench.py:

Launches vllm serve with scenario-specific parameters.
Runs vllm bench serve across one or more concurrency points.
Saves benchmark JSON output and a cross-scenario summary CSV.
Optionally collects:
- Nsight Systems (nsys) traces
- PyTorch Profiler traces

This is useful for repeatable performance studies, regression tracking, and profiling runs.

Requirements

Python and Package Dependencies

pip install -r requirements.txt

Optional System Tools

Install these if you use the legacy shell scripts or profiling workflows:

sudo apt-get install -y jq curl
pip install yq

Profiling tools (optional but recommended for GPU analysis):

Nsight Systems (nsys)
Nsight Compute (ncu)
PyTorch Profiler (enabled through scenario config)

Recommended Workflow (`vllm_bench.py`)

CLI Syntax

python vllm_bench.py <config.yaml> [--scenario name1,name2] [--delay SEC] [--duration SEC] [--mlflow-experiment NAME] [--mlflow-run-name NAME] [--mlflow-tracking-uri URI] [--mlflow-tag KEY=VALUE]

Common Commands

# Run all scenarios
python vllm_bench.py configs/models/gpt-oss-20b.yaml

# Run a single scenario
python vllm_bench.py configs/models/gpt-oss-20b.yaml --scenario baseline

# Run multiple scenarios
python vllm_bench.py configs/models/gpt-oss-20b.yaml --scenarios baseline,async_scheduling

# Start nsys after benchmark has already started
python vllm_bench.py configs/models/gpt-oss-20b.yaml --scenario baseline --delay 15

# Collect nsys for fixed duration (seconds)
python vllm_bench.py configs/models/gpt-oss-20b.yaml --scenario baseline --duration 30

# Upload artifacts to a specific MLflow experiment
python vllm_bench.py configs/models/gpt-oss-20b.yaml --mlflow-experiment vllm-bench --mlflow-run-name gptoss20b-baseline

# Add MLflow tags (repeat --mlflow-tag for multiple)
python vllm_bench.py configs/models/gpt-oss-20b.yaml --mlflow-tag team=perf --mlflow-tag model_family=gptoss

# Disable MLflow uploads
python vllm_bench.py configs/models/gpt-oss-20b.yaml --no-mlflow

Argument Details

config: YAML file with model/defaults/scenarios.
--scenario / --scenarios: comma-separated scenario names to run.
--delay: delay before nsys start (useful with warmup-heavy startup).
--duration: stop nsys after fixed time instead of end-of-benchmark.
--mlflow-experiment: MLflow experiment name.
--mlflow-run-name: MLflow run name override.
--mlflow-tracking-uri: custom MLflow tracking URI.
--mlflow-tag: MLflow tag as KEY=VALUE (repeatable).
--no-mlflow: skip MLflow upload.

Config Shape

At minimum, your config should contain:

model:
  name: meta-llama/Llama-3.1-8B-Instruct
  base_params: "--gpu-memory-utilization 0.9"

defaults:
  study_dir: Study_llama
  env:
    VLLM_USE_V1: "1"
  bench:
    concurrencies: [1, 8, 32]
    input_len: 1024
    output_len: 128
    cc_mult: 10

scenarios:
  - name: baseline
    port: 8000
    params: "--max-model-len 8192"
    bench:
      concurrencies: [1, 16, 64]
    profile: true
    profiling:
      nsys_launch_args: "--trace=cuda,nvtx,osrt --start-later=true"
      nsys_start_args: "--force-overwrite=true --gpu-metrics-devices=cuda-visible"
      torch_profiler:
        enabled: true

Output Layout

Each run creates a timestamped study directory:

<study_dir>_<timestamp>/
  config.yaml
  summary.csv
  scenario_<name>/
    logs/
      vllm_server.log
    results/
      <result_prefix>.json
    profiles/
      nsys_server.qdrep|.nsys-rep      # when using direct `nsys profile` mode
      nsys_conc<k>.qdrep|.nsys-rep     # when using start/stop session mode
      torch/
        trace_conc<k>*

Notes:

summary.csv aggregates every scenario/concurrency run.
Nsight output now writes under each scenario profiles/ directory.
Exact Nsight extension varies by nsys version (.qdrep and/or .nsys-rep).
Run output is captured in logs/benchmark_output.log.
MLflow upload includes the full study directory, nvidia-smi, lscpu, command metadata, config, and run log.

Legacy Script

vllm_bench.sh remains available for older workflows/configs:

bash vllm_bench.sh config.yaml
bash vllm_bench.sh configs/models/gpt-oss-20b.yaml --scenario baseline

Developer Notes

For code flow and debugging notes, see CODE_README.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-model-bash

What This Does

Requirements

Python and Package Dependencies

Optional System Tools

Recommended Workflow (`vllm_bench.py`)

CLI Syntax

Common Commands

Argument Details

Config Shape

Output Layout

Legacy Script

Developer Notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
configs		configs
CODE_README.md		CODE_README.md
README.md		README.md
download_mlperf_datasets.sh		download_mlperf_datasets.sh
requirements.txt		requirements.txt
sglang_bench.sh		sglang_bench.sh
vllm_bench.py		vllm_bench.py
vllm_bench.sh		vllm_bench.sh
vllm_bench_rocm.py		vllm_bench_rocm.py

Folders and files

Latest commit

History

Repository files navigation

vllm-model-bash

What This Does

Requirements

Python and Package Dependencies

Optional System Tools

Recommended Workflow (vllm_bench.py)

CLI Syntax

Common Commands

Argument Details

Config Shape

Output Layout

Legacy Script

Developer Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Recommended Workflow (`vllm_bench.py`)

Packages