Skip to content

feat(vllm): add Gemma 4 models, image, and ROCm serving recipes#144

Open
coketaste wants to merge 1 commit intoROCm:developfrom
coketaste:coketaste/gemma4
Open

feat(vllm): add Gemma 4 models, image, and ROCm serving recipes#144
coketaste wants to merge 1 commit intoROCm:developfrom
coketaste:coketaste/gemma4

Conversation

@coketaste
Copy link
Copy Markdown
Contributor

  • Register pyt_vllm_gemma-4-26b-a4b-it and pyt_vllm_gemma-4-31b-it in models.json (gemma4 Docker stack).
  • Add docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile from vllm/vllm-openai-rocm:gemma4 with transformers 5.5.0.
  • Extend scripts/vllm/configs/default.yaml with Gemma 4 serving blocks (TRITON_ATTN, gfx942 float16; 26B MoE disables AITER fused MoE).
  • Quote JSON-like extra_args in run_vllm.py (shlex) for --limit-mm-per-prompt with existing --flag YAML keys.
  • Document Gemma 4 in benchmark/vllm/README.md.

- Register pyt_vllm_gemma-4-26b-a4b-it and pyt_vllm_gemma-4-31b-it in models.json (gemma4 Docker stack).
- Add docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile from vllm/vllm-openai-rocm:gemma4 with transformers 5.5.0.
- Extend scripts/vllm/configs/default.yaml with Gemma 4 serving blocks (TRITON_ATTN, gfx942 float16; 26B MoE disables AITER fused MoE).
- Quote JSON-like extra_args in run_vllm.py (shlex) for --limit-mm-per-prompt with existing --flag YAML keys.
- Document Gemma 4 in benchmark/vllm/README.md.
@coketaste coketaste requested a review from gargrahul as a code owner April 14, 2026 18:43
Copilot AI review requested due to automatic review settings April 14, 2026 18:43
@coketaste coketaste self-assigned this Apr 14, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Gemma 4 (26B-A4B-it and 31B-it) vLLM serving support to the MAD benchmarking stack, including new model registrations, ROCm/Gemma4 Docker build plumbing, and documented serving recipes.

Changes:

  • Registered two Gemma 4 vLLM models in models.json and documented them in benchmark/vllm/README.md.
  • Added a Gemma4-specific AMD Ubuntu Dockerfile based on vllm/vllm-openai-rocm:gemma4 and extended scripts/vllm/configs/default.yaml with Gemma 4 serving recipes/overrides.
  • Updated scripts/vllm/run_vllm.py to shell-quote JSON-like/whitespace-containing extra_args values (notably --limit-mm-per-prompt).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
scripts/vllm/run_vllm.py Adjusts extra_args formatting/quoting when composing the vLLM command line.
scripts/vllm/configs/default.yaml Adds Gemma 4 serving benchmark blocks and gfx942 dtype overrides.
models.json Registers Gemma 4 vLLM models and their MAD metadata/output CSV names.
docker/pyt_vllm_gemma4.ubuntu.amd.Dockerfile Introduces a Gemma4-tagged base image Dockerfile and pins transformers.
benchmark/vllm/README.md Documents Gemma 4 image tag usage, gating/token requirements, and recipe details.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

RUN pip3 list

# Specify entrypoint to override upstream
ENTRYPOINT [""]
Comment thread scripts/vllm/run_vllm.py
Comment on lines +494 to +503
s = str(v)
st = s.strip()
if (
k == "--limit-mm-per-prompt"
or (st[:1] in "{[")
or any(ch.isspace() for ch in s)
):
extra_args_str += f" {k} {shlex.quote(s)}"
else:
extra_args_str += f" {k} {v}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants