Omni scripts

This folder contains helper scripts to run and stress-test Omni models via scripts/omni/run_omni.py.

run_qwen_omni_synthetic.sh: generate random images and run Omni chat for throughput testing.
sglang/start_sglang_server*.sh: start an SGLang server for Omni models.
vllm/start_vllm_server.sh: start a vLLM OpenAI-compatible server for Omni models.

Stress test knobs

run_qwen_omni_synthetic.sh supports:

SYNTHETIC_NUM_IMAGES: total number of images.
BATCH_SIZE: groups images per call to the Python runner (best-effort; actual backend batching support varies).
MAX_NEW_TOKENS: generation length.

The JSON output includes time_sec_per_batch and time_sec_per_sample for easier comparisons.

Quickstart

SGLang (HTTP)

Start the server:

cd scripts/omni/sglang
MODEL_DIR=/path/to/Qwen2.5-Omni-7B ./start_sglang_server.sh

Run the synthetic stress test:

cd scripts/omni
BACKEND=sglang BASE_URL=http://127.0.0.1:30000 ./run_qwen_omni_synthetic.sh

vLLM (OpenAI-compatible HTTP)

Start the server:

cd scripts/omni/vllm
MODEL_DIR=/path/to/Qwen2.5-Omni-3B SERVED_MODEL_NAME=qwen2.5-omni-3b ./start_vllm_server.sh
# or
MODEL_DIR=/path/to/Qwen2.5-Omni-7B SERVED_MODEL_NAME=qwen2.5-omni-7b ./start_vllm_server.sh

Run the synthetic stress test:

cd scripts/omni
BACKEND=vllm-http BASE_URL=http://127.0.0.1:8000 ./run_qwen_omni_synthetic.sh

Notes:

For BACKEND=vllm-http, the HTTP request uses the served model name (here we use the CLI --model value). This must match the --served-model-name used when starting vLLM (SERVED_MODEL_NAME).
For BACKEND=sglang, the default port is 30000 (matching the SGLang startup script).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Omni scripts

Stress test knobs

Quickstart

SGLang (HTTP)

vLLM (OpenAI-compatible HTTP)

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Omni scripts

Stress test knobs

Quickstart

SGLang (HTTP)

vLLM (OpenAI-compatible HTTP)