Skip to content

feat: Add llm-d infrastructure deployment guide and test tooling#95

Merged
hmoghani merged 6 commits into
red-hat-data-services:mainfrom
hmoghani:add-llm-d-infrastructure
May 15, 2026
Merged

feat: Add llm-d infrastructure deployment guide and test tooling#95
hmoghani merged 6 commits into
red-hat-data-services:mainfrom
hmoghani:add-llm-d-infrastructure

Conversation

@hmoghani
Copy link
Copy Markdown
Contributor

Summary

  • Add deployment guide for llm-d on OpenShift AI (docs/llm-d-deployment.md)
  • Add infrastructure/llm-d/ with LLMInferenceService YAML template, NetworkPolicy manifests, and a traffic distribution test script
  • Add llm-d references to the main README (repo structure + documentation section)

What's included

File Description
docs/llm-d-deployment.md Step-by-step guide covering prerequisites, GPU node pool creation, LLMInferenceService deployment, network policies, Llama Stack integration, and verification
infrastructure/llm-d/llminferenceservice.yaml Parameterized LLMInferenceService template with placeholder comments
infrastructure/llm-d/network-policies.yaml Required NetworkPolicies (RHOAI defaults block port 8000)
infrastructure/llm-d/test_distribution.py Test script that validates llm-d's intelligent routing across vLLM replicas (concurrent load, prefix cache routing, sustained throughput, per-pod metrics)
infrastructure/llm-d/README.md Overview and quick start

All manifests and scripts are fully parameterized with no hardcoded cluster-specific values.

Test plan

  • Review YAML templates for correctness (oc apply --dry-run=client)
  • Verify test script runs with --help without errors
  • Deploy on an OpenShift AI 3.4+ cluster following the guide end-to-end
  • Run test_distribution.py against a live llm-d deployment

hmoghani added 2 commits May 12, 2026 11:21
Add deployment guide, Kubernetes manifests, and a traffic distribution
test script for deploying llm-d on OpenShift AI with multiple vLLM
replicas across GPU nodes.

Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
Add infrastructure/llm-d to repository structure and link to the
llm-d deployment guide in the documentation section.

Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds infrastructure and documentation to deploy llm-d on OpenShift AI: a detailed deployment guide, LLMInferenceService and NetworkPolicy manifests, an infra README, README links, and a Python traffic-distribution test script.

Changes

LLM-D Infrastructure & Deployment

Layer / File(s) Summary
Documentation Overview & Navigation
README.md, infrastructure/llm-d/README.md
Repository README adds reference to the new infrastructure/llm-d/ directory and links to the deployment guide. The llm-d component README provides overview text, contents index linking to manifests and test script, and quick-start command examples.
Comprehensive Deployment Guide
docs/llm-d-deployment.md
Full step-by-step deployment walkthrough covering prerequisites (OpenShift, RHOAI, GPU drivers), GPU node pool setup, Red Hat registry authentication, LLMInferenceService deployment from template with scheduler configuration, network policy application for port access, optional Llama Stack integration with environment variables, verification commands (readiness checks, test inference, traffic distribution script), and key behavioral notes on routing, gateway selection, and constraints.
Kubernetes Resource Manifests
infrastructure/llm-d/llminferenceservice.yaml, infrastructure/llm-d/network-policies.yaml
LLMInferenceService manifest defines model URI, replica count, external routing and scheduler configuration, gateway reference to maas-default-gateway, vLLM container with model/cache environment variables, CPU/memory/GPU resource bounds, and HTTPS /health probe on port 8000. Two NetworkPolicy manifests allow ingress to vLLM/llm-d ports (8000, 9002, 9003, 9090) and Llama Stack port (8321) in the redhat-ods-applications namespace.
Traffic Distribution Testing Script
infrastructure/llm-d/test_distribution.py
Python CLI tool for validating llm-d deployment, featuring pod discovery and metrics collection via oc, concurrent load testing, repeated-prompt prefix-cache routing analysis, and sustained throughput testing with configurable concurrency and duration. Collects before/after per-pod metrics including request counts, token counters, and cache hit rates, then reports distribution skew assessment.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding deployment guide and test tooling for llm-d infrastructure on OpenShift AI.
Description check ✅ Passed The description is well-structured and clearly relates to the changeset, detailing all files added, their purposes, and the test plan.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@infrastructure/llm-d/llminferenceservice.yaml`:
- Line 27: The annotation security.opendatahub.io/enable-auth: 'false' disables
authentication for the inference service; either flip this annotation to 'true'
on the Service/Ingress resource or, if authless access is intentional for
maas-default-gateway, add an explicit justification and compensate by
restricting access (e.g., apply a NetworkPolicy/ingress whitelist, limit the
Ingress to internal CIDRs, or document in deployment configs/helm values) and
verify the maas-default-gateway indeed has no OAuth proxy; update the annotation
and accompanying deployment documentation/config (reference the
security.opendatahub.io/enable-auth key and maas-default-gateway) accordingly.
- Around line 73-77: The livenessProbe is configured with scheme: HTTPS on port
8000 but vLLM is not configured with TLS; update either the vLLM startup args or
the probe: add TLS flags (--ssl-certfile and --ssl-keyfile) to the
VLLM_ADDITIONAL_ARGS environment variable so vLLM serves HTTPS (ensure valid
cert/key paths and permissions), or change the livenessProbe's scheme from HTTPS
to HTTP to match the existing vLLM HTTP server; locate the livenessProbe block
and the VLLM_ADDITIONAL_ARGS env var in the manifest to apply the appropriate
fix.

In `@infrastructure/llm-d/network-policies.yaml`:
- Around line 52-54: The ingress rule currently uses namespaceSelector: {} which
allows traffic from any namespace; replace that with a namespaceSelector that
matches only the OpenShift router namespace so only the router can reach Llama
Stack. In the ingress/from block, change namespaceSelector: {} to
namespaceSelector: matchLabels: kubernetes.io/metadata.name: openshift-ingress
(or the cluster's label key that identifies the openshift-ingress namespace) so
the rule only selects the openshift-ingress namespace.

In `@infrastructure/llm-d/test_distribution.py`:
- Line 195: Several print statements use f-strings with no placeholders (e.g.,
the line printing "  This run (delta):" that currently reads print(f"\n  This
run (delta):")), causing linter failures; remove the unnecessary f-prefix on
those prints so they are plain strings. Update all occurrences mentioned (the
prints at the shown content and the other instances printing static strings) by
changing print(f"...") to print("...") for the listed lines (the identical
prints at the other locations noted in the comment).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 35a9edc7-82be-4185-afbc-6d10c9d7534b

📥 Commits

Reviewing files that changed from the base of the PR and between 6be61b4 and 0b1e8d8.

📒 Files selected for processing (6)
  • README.md
  • docs/llm-d-deployment.md
  • infrastructure/llm-d/README.md
  • infrastructure/llm-d/llminferenceservice.yaml
  • infrastructure/llm-d/network-policies.yaml
  • infrastructure/llm-d/test_distribution.py

Comment thread infrastructure/llm-d/llminferenceservice.yaml
Comment thread infrastructure/llm-d/llminferenceservice.yaml
Comment thread infrastructure/llm-d/network-policies.yaml Outdated
Comment thread infrastructure/llm-d/test_distribution.py Outdated
- Remove extraneous f-string prefixes (ruff F541)
- Restrict Llama Stack NetworkPolicy to openshift-ingress namespace only
- Add comments explaining disabled auth annotation and KServe TLS injection

Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
hmoghani added 2 commits May 12, 2026 11:35
Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
@hmoghani hmoghani changed the title Add llm-d infrastructure deployment guide and test tooling feat: Add llm-d infrastructure deployment guide and test tooling May 12, 2026
@Nehanth
Copy link
Copy Markdown

Nehanth commented May 14, 2026

Review by Claude with Nehanth's manual approval

Thanks for working on this Hamid!

Tried following the guide on an RHOAI 3.4.0 cluster. After running the oc patch dsc command to enable Models-as-a-Service, the MaaS controller fails to provision with 3 blocking prerequisites that aren't mentioned in the doc:

  1. PostgreSQL database — needs a maas-db-config secret in redhat-ods-applications with a DB_CONNECTION_URL key
  2. Authorino TLS — needs spec.listener.tls.enabled=true with a certSecretRef on the Authorino instance
  3. User Workload Monitoring — needs to be enabled on the cluster

The operator logs show:

provisioning failed: blocking prerequisites missing: database Secret 'maas-db-config' not found...
Authorino TLS is not configured...
unable to verify User Workload Monitoring status...

The guide should either document these setup steps or note that MaaS requires prior cluster-level configuration. Also, suggesting a specific model to test with would help users get started faster.

Add Authorino TLS, PostgreSQL database, and User Workload Monitoring
as prerequisites with links to official docs. Add recommended models
table with tested and alternative options.

Addresses review feedback from Nehanth.

Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
@hmoghani
Copy link
Copy Markdown
Contributor Author

Thanks for testing this @Nehanth ! Great catches. Those 3 prerequisites (PostgreSQL, Authorino TLS, User Workload Monitoring) must have been pre-configured on my cluster so I didn't hit them.

I've updated the guide in the latest commit:

  • Added all three as explicit prerequisites in the table, each linking to the official RHOAI/OpenShift docs for setup
  • Added a note warning that the maas-controller will fail to start without them and to check its logs
  • Added a "Recommended Models" section with openai/gpt-oss-20b (tested) and Llama-3.1-8B-Instruct as a smaller alternative

I didn't include detailed setup steps for these since I haven't validated them myself. Linking to the official docs felt safer than untested instructions.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@infrastructure/llm-d/network-policies.yaml`:
- Around line 20-35: The network policy currently allows ingress to the llm-d
service from any source on ports 8000/9002/9003/9090; update the
allow-llmd-ingress rule to restrict sources to the actual gateway namespace
rather than redhat-ods-applications by adding a from block modeled on
allow-llamastack-from-router: include a namespaceSelector that matches the
openshift-ingress namespace (the gateway runs in openshift-ingress per
llminferenceservice.yaml) and a podSelector that targets the gateway/router pods
(use the same label keys used in allow-llamastack-from-router), keeping the
existing ports and policyTypes intact so only the gateway in openshift-ingress
can reach the llm-d service.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: efc30f51-b987-4ca7-8c3d-40f5ba5f5ee3

📥 Commits

Reviewing files that changed from the base of the PR and between 0b1e8d8 and 5abb2b3.

📒 Files selected for processing (4)
  • docs/llm-d-deployment.md
  • infrastructure/llm-d/llminferenceservice.yaml
  • infrastructure/llm-d/network-policies.yaml
  • infrastructure/llm-d/test_distribution.py
✅ Files skipped from review due to trivial changes (1)
  • docs/llm-d-deployment.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • infrastructure/llm-d/llminferenceservice.yaml
  • infrastructure/llm-d/test_distribution.py

Comment thread infrastructure/llm-d/network-policies.yaml
@hmoghani hmoghani merged commit 0be983d into red-hat-data-services:main May 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants