feat: Add llm-d infrastructure deployment guide and test tooling by hmoghani · Pull Request #95 · red-hat-data-services/agentic-starter-kits

hmoghani · 2026-05-12T15:27:42Z

Summary

Add deployment guide for llm-d on OpenShift AI (docs/llm-d-deployment.md)
Add infrastructure/llm-d/ with LLMInferenceService YAML template, NetworkPolicy manifests, and a traffic distribution test script
Add llm-d references to the main README (repo structure + documentation section)

What's included

File	Description
`docs/llm-d-deployment.md`	Step-by-step guide covering prerequisites, GPU node pool creation, LLMInferenceService deployment, network policies, Llama Stack integration, and verification
`infrastructure/llm-d/llminferenceservice.yaml`	Parameterized LLMInferenceService template with placeholder comments
`infrastructure/llm-d/network-policies.yaml`	Required NetworkPolicies (RHOAI defaults block port 8000)
`infrastructure/llm-d/test_distribution.py`	Test script that validates llm-d's intelligent routing across vLLM replicas (concurrent load, prefix cache routing, sustained throughput, per-pod metrics)
`infrastructure/llm-d/README.md`	Overview and quick start

All manifests and scripts are fully parameterized with no hardcoded cluster-specific values.

Test plan

Review YAML templates for correctness (oc apply --dry-run=client)
Verify test script runs with --help without errors
Deploy on an OpenShift AI 3.4+ cluster following the guide end-to-end
Run test_distribution.py against a live llm-d deployment

Add deployment guide, Kubernetes manifests, and a traffic distribution test script for deploying llm-d on OpenShift AI with multiple vLLM replicas across GPU nodes. Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

Add infrastructure/llm-d to repository structure and link to the llm-d deployment guide in the documentation section. Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

coderabbitai · 2026-05-12T15:27:57Z

📝 Walkthrough

Walkthrough

This PR adds infrastructure and documentation to deploy llm-d on OpenShift AI: a detailed deployment guide, LLMInferenceService and NetworkPolicy manifests, an infra README, README links, and a Python traffic-distribution test script.

Changes

LLM-D Infrastructure & Deployment

Layer / File(s)	Summary
Documentation Overview & Navigation `README.md`, `infrastructure/llm-d/README.md`	Repository README adds reference to the new `infrastructure/llm-d/` directory and links to the deployment guide. The llm-d component README provides overview text, contents index linking to manifests and test script, and quick-start command examples.
Comprehensive Deployment Guide `docs/llm-d-deployment.md`	Full step-by-step deployment walkthrough covering prerequisites (OpenShift, RHOAI, GPU drivers), GPU node pool setup, Red Hat registry authentication, `LLMInferenceService` deployment from template with scheduler configuration, network policy application for port access, optional Llama Stack integration with environment variables, verification commands (readiness checks, test inference, traffic distribution script), and key behavioral notes on routing, gateway selection, and constraints.
Kubernetes Resource Manifests `infrastructure/llm-d/llminferenceservice.yaml`, `infrastructure/llm-d/network-policies.yaml`	`LLMInferenceService` manifest defines model URI, replica count, external routing and scheduler configuration, gateway reference to `maas-default-gateway`, vLLM container with model/cache environment variables, CPU/memory/GPU resource bounds, and HTTPS `/health` probe on port 8000. Two `NetworkPolicy` manifests allow ingress to vLLM/llm-d ports (8000, 9002, 9003, 9090) and Llama Stack port (8321) in the `redhat-ods-applications` namespace.
Traffic Distribution Testing Script `infrastructure/llm-d/test_distribution.py`	Python CLI tool for validating llm-d deployment, featuring pod discovery and metrics collection via `oc`, concurrent load testing, repeated-prompt prefix-cache routing analysis, and sustained throughput testing with configurable concurrency and duration. Collects before/after per-pod metrics including request counts, token counters, and cache hit rates, then reports distribution skew assessment.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding deployment guide and test tooling for llm-d infrastructure on OpenShift AI.
Description check	✅ Passed	The description is well-structured and clearly relates to the changeset, detailing all files added, their purposes, and the test plan.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@infrastructure/llm-d/llminferenceservice.yaml`:
- Line 27: The annotation security.opendatahub.io/enable-auth: 'false' disables
authentication for the inference service; either flip this annotation to 'true'
on the Service/Ingress resource or, if authless access is intentional for
maas-default-gateway, add an explicit justification and compensate by
restricting access (e.g., apply a NetworkPolicy/ingress whitelist, limit the
Ingress to internal CIDRs, or document in deployment configs/helm values) and
verify the maas-default-gateway indeed has no OAuth proxy; update the annotation
and accompanying deployment documentation/config (reference the
security.opendatahub.io/enable-auth key and maas-default-gateway) accordingly.
- Around line 73-77: The livenessProbe is configured with scheme: HTTPS on port
8000 but vLLM is not configured with TLS; update either the vLLM startup args or
the probe: add TLS flags (--ssl-certfile and --ssl-keyfile) to the
VLLM_ADDITIONAL_ARGS environment variable so vLLM serves HTTPS (ensure valid
cert/key paths and permissions), or change the livenessProbe's scheme from HTTPS
to HTTP to match the existing vLLM HTTP server; locate the livenessProbe block
and the VLLM_ADDITIONAL_ARGS env var in the manifest to apply the appropriate
fix.

In `@infrastructure/llm-d/network-policies.yaml`:
- Around line 52-54: The ingress rule currently uses namespaceSelector: {} which
allows traffic from any namespace; replace that with a namespaceSelector that
matches only the OpenShift router namespace so only the router can reach Llama
Stack. In the ingress/from block, change namespaceSelector: {} to
namespaceSelector: matchLabels: kubernetes.io/metadata.name: openshift-ingress
(or the cluster's label key that identifies the openshift-ingress namespace) so
the rule only selects the openshift-ingress namespace.

In `@infrastructure/llm-d/test_distribution.py`:
- Line 195: Several print statements use f-strings with no placeholders (e.g.,
the line printing "  This run (delta):" that currently reads print(f"\n  This
run (delta):")), causing linter failures; remove the unnecessary f-prefix on
those prints so they are plain strings. Update all occurrences mentioned (the
prints at the shown content and the other instances printing static strings) by
changing print(f"...") to print("...") for the listed lines (the identical
prints at the other locations noted in the comment).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 35a9edc7-82be-4185-afbc-6d10c9d7534b

📥 Commits

Reviewing files that changed from the base of the PR and between 6be61b4 and 0b1e8d8.

📒 Files selected for processing (6)

README.md
docs/llm-d-deployment.md
infrastructure/llm-d/README.md
infrastructure/llm-d/llminferenceservice.yaml
infrastructure/llm-d/network-policies.yaml
infrastructure/llm-d/test_distribution.py

- Remove extraneous f-string prefixes (ruff F541) - Restrict Llama Stack NetworkPolicy to openshift-ingress namespace only - Add comments explaining disabled auth annotation and KServe TLS injection Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

Nehanth · 2026-05-14T17:47:58Z

Review by Claude with Nehanth's manual approval

Thanks for working on this Hamid!

Tried following the guide on an RHOAI 3.4.0 cluster. After running the oc patch dsc command to enable Models-as-a-Service, the MaaS controller fails to provision with 3 blocking prerequisites that aren't mentioned in the doc:

PostgreSQL database — needs a maas-db-config secret in redhat-ods-applications with a DB_CONNECTION_URL key
Authorino TLS — needs spec.listener.tls.enabled=true with a certSecretRef on the Authorino instance
User Workload Monitoring — needs to be enabled on the cluster

The operator logs show:

provisioning failed: blocking prerequisites missing: database Secret 'maas-db-config' not found...
Authorino TLS is not configured...
unable to verify User Workload Monitoring status...

The guide should either document these setup steps or note that MaaS requires prior cluster-level configuration. Also, suggesting a specific model to test with would help users get started faster.

Add Authorino TLS, PostgreSQL database, and User Workload Monitoring as prerequisites with links to official docs. Add recommended models table with tested and alternative options. Addresses review feedback from Nehanth. Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

hmoghani · 2026-05-15T14:58:01Z

Thanks for testing this @Nehanth ! Great catches. Those 3 prerequisites (PostgreSQL, Authorino TLS, User Workload Monitoring) must have been pre-configured on my cluster so I didn't hit them.

I've updated the guide in the latest commit:

Added all three as explicit prerequisites in the table, each linking to the official RHOAI/OpenShift docs for setup
Added a note warning that the maas-controller will fail to start without them and to check its logs
Added a "Recommended Models" section with openai/gpt-oss-20b (tested) and Llama-3.1-8B-Instruct as a smaller alternative

I didn't include detailed setup steps for these since I haven't validated them myself. Linking to the official docs felt safer than untested instructions.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@infrastructure/llm-d/network-policies.yaml`:
- Around line 20-35: The network policy currently allows ingress to the llm-d
service from any source on ports 8000/9002/9003/9090; update the
allow-llmd-ingress rule to restrict sources to the actual gateway namespace
rather than redhat-ods-applications by adding a from block modeled on
allow-llamastack-from-router: include a namespaceSelector that matches the
openshift-ingress namespace (the gateway runs in openshift-ingress per
llminferenceservice.yaml) and a podSelector that targets the gateway/router pods
(use the same label keys used in allow-llamastack-from-router), keeping the
existing ports and policyTypes intact so only the gateway in openshift-ingress
can reach the llm-d service.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: efc30f51-b987-4ca7-8c3d-40f5ba5f5ee3

📥 Commits

Reviewing files that changed from the base of the PR and between 0b1e8d8 and 5abb2b3.

📒 Files selected for processing (4)

docs/llm-d-deployment.md
infrastructure/llm-d/llminferenceservice.yaml
infrastructure/llm-d/network-policies.yaml
infrastructure/llm-d/test_distribution.py

✅ Files skipped from review due to trivial changes (1)

docs/llm-d-deployment.md

🚧 Files skipped from review as they are similar to previous changes (2)

infrastructure/llm-d/llminferenceservice.yaml
infrastructure/llm-d/test_distribution.py

hmoghani added 2 commits May 12, 2026 11:21

feat: add llm-d infrastructure deployment guide and test tooling

4a31982

Add deployment guide, Kubernetes manifests, and a traffic distribution test script for deploying llm-d on OpenShift AI with multiple vLLM replicas across GPU nodes. Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

docs: add llm-d references to main README

0b1e8d8

Add infrastructure/llm-d to repository structure and link to the llm-d deployment guide in the documentation section. Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

github-actions Bot added the area/docs label May 12, 2026

github-actions Bot added the size/l label May 12, 2026

hmoghani added the area/tests label May 12, 2026

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread infrastructure/llm-d/llminferenceservice.yaml

Comment thread infrastructure/llm-d/llminferenceservice.yaml

Comment thread infrastructure/llm-d/network-policies.yaml Outdated

Comment thread infrastructure/llm-d/test_distribution.py Outdated

fix: address code review findings

5c4da9c

- Remove extraneous f-string prefixes (ruff F541) - Restrict Llama Stack NetworkPolicy to openshift-ingress namespace only - Add comments explaining disabled auth annotation and KServe TLS injection Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

github-actions Bot removed the area/tests label May 12, 2026

hmoghani added 2 commits May 12, 2026 11:35

style: apply ruff formatting to test_distribution.py

2468155

Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

style: add language tag to fenced code block (MD040)

ee0be86

Signed-off-by: Hamid Moghani <hmoghani@redhat.com>

hmoghani changed the title ~~Add llm-d infrastructure deployment guide and test tooling~~ feat: Add llm-d infrastructure deployment guide and test tooling May 12, 2026

hmoghani requested review from Nehanth, aakankshaduggal and msager27 May 14, 2026 12:32

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread infrastructure/llm-d/network-policies.yaml

Nehanth approved these changes May 15, 2026

View reviewed changes

hmoghani merged commit 0be983d into red-hat-data-services:main May 15, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add llm-d infrastructure deployment guide and test tooling#95

feat: Add llm-d infrastructure deployment guide and test tooling#95
hmoghani merged 6 commits into
red-hat-data-services:mainfrom
hmoghani:add-llm-d-infrastructure

hmoghani commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nehanth commented May 14, 2026

Uh oh!

hmoghani commented May 15, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hmoghani commented May 12, 2026

Summary

What's included

Test plan

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nehanth commented May 14, 2026

Uh oh!

hmoghani commented May 15, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 12, 2026 •

edited

Loading