feat: Add llm-d infrastructure deployment guide and test tooling#95
Conversation
Add deployment guide, Kubernetes manifests, and a traffic distribution test script for deploying llm-d on OpenShift AI with multiple vLLM replicas across GPU nodes. Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
Add infrastructure/llm-d to repository structure and link to the llm-d deployment guide in the documentation section. Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
📝 WalkthroughWalkthroughThis PR adds infrastructure and documentation to deploy llm-d on OpenShift AI: a detailed deployment guide, ChangesLLM-D Infrastructure & Deployment
🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@infrastructure/llm-d/llminferenceservice.yaml`:
- Line 27: The annotation security.opendatahub.io/enable-auth: 'false' disables
authentication for the inference service; either flip this annotation to 'true'
on the Service/Ingress resource or, if authless access is intentional for
maas-default-gateway, add an explicit justification and compensate by
restricting access (e.g., apply a NetworkPolicy/ingress whitelist, limit the
Ingress to internal CIDRs, or document in deployment configs/helm values) and
verify the maas-default-gateway indeed has no OAuth proxy; update the annotation
and accompanying deployment documentation/config (reference the
security.opendatahub.io/enable-auth key and maas-default-gateway) accordingly.
- Around line 73-77: The livenessProbe is configured with scheme: HTTPS on port
8000 but vLLM is not configured with TLS; update either the vLLM startup args or
the probe: add TLS flags (--ssl-certfile and --ssl-keyfile) to the
VLLM_ADDITIONAL_ARGS environment variable so vLLM serves HTTPS (ensure valid
cert/key paths and permissions), or change the livenessProbe's scheme from HTTPS
to HTTP to match the existing vLLM HTTP server; locate the livenessProbe block
and the VLLM_ADDITIONAL_ARGS env var in the manifest to apply the appropriate
fix.
In `@infrastructure/llm-d/network-policies.yaml`:
- Around line 52-54: The ingress rule currently uses namespaceSelector: {} which
allows traffic from any namespace; replace that with a namespaceSelector that
matches only the OpenShift router namespace so only the router can reach Llama
Stack. In the ingress/from block, change namespaceSelector: {} to
namespaceSelector: matchLabels: kubernetes.io/metadata.name: openshift-ingress
(or the cluster's label key that identifies the openshift-ingress namespace) so
the rule only selects the openshift-ingress namespace.
In `@infrastructure/llm-d/test_distribution.py`:
- Line 195: Several print statements use f-strings with no placeholders (e.g.,
the line printing " This run (delta):" that currently reads print(f"\n This
run (delta):")), causing linter failures; remove the unnecessary f-prefix on
those prints so they are plain strings. Update all occurrences mentioned (the
prints at the shown content and the other instances printing static strings) by
changing print(f"...") to print("...") for the listed lines (the identical
prints at the other locations noted in the comment).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 35a9edc7-82be-4185-afbc-6d10c9d7534b
📒 Files selected for processing (6)
README.mddocs/llm-d-deployment.mdinfrastructure/llm-d/README.mdinfrastructure/llm-d/llminferenceservice.yamlinfrastructure/llm-d/network-policies.yamlinfrastructure/llm-d/test_distribution.py
- Remove extraneous f-string prefixes (ruff F541) - Restrict Llama Stack NetworkPolicy to openshift-ingress namespace only - Add comments explaining disabled auth annotation and KServe TLS injection Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
|
Review by Claude with Nehanth's manual approval Thanks for working on this Hamid! Tried following the guide on an RHOAI 3.4.0 cluster. After running the
The operator logs show: The guide should either document these setup steps or note that MaaS requires prior cluster-level configuration. Also, suggesting a specific model to test with would help users get started faster. |
Add Authorino TLS, PostgreSQL database, and User Workload Monitoring as prerequisites with links to official docs. Add recommended models table with tested and alternative options. Addresses review feedback from Nehanth. Signed-off-by: Hamid Moghani <hmoghani@redhat.com>
|
Thanks for testing this @Nehanth ! Great catches. Those 3 prerequisites (PostgreSQL, Authorino TLS, User Workload Monitoring) must have been pre-configured on my cluster so I didn't hit them. I've updated the guide in the latest commit:
I didn't include detailed setup steps for these since I haven't validated them myself. Linking to the official docs felt safer than untested instructions. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@infrastructure/llm-d/network-policies.yaml`:
- Around line 20-35: The network policy currently allows ingress to the llm-d
service from any source on ports 8000/9002/9003/9090; update the
allow-llmd-ingress rule to restrict sources to the actual gateway namespace
rather than redhat-ods-applications by adding a from block modeled on
allow-llamastack-from-router: include a namespaceSelector that matches the
openshift-ingress namespace (the gateway runs in openshift-ingress per
llminferenceservice.yaml) and a podSelector that targets the gateway/router pods
(use the same label keys used in allow-llamastack-from-router), keeping the
existing ports and policyTypes intact so only the gateway in openshift-ingress
can reach the llm-d service.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: efc30f51-b987-4ca7-8c3d-40f5ba5f5ee3
📒 Files selected for processing (4)
docs/llm-d-deployment.mdinfrastructure/llm-d/llminferenceservice.yamlinfrastructure/llm-d/network-policies.yamlinfrastructure/llm-d/test_distribution.py
✅ Files skipped from review due to trivial changes (1)
- docs/llm-d-deployment.md
🚧 Files skipped from review as they are similar to previous changes (2)
- infrastructure/llm-d/llminferenceservice.yaml
- infrastructure/llm-d/test_distribution.py
Summary
docs/llm-d-deployment.md)infrastructure/llm-d/with LLMInferenceService YAML template, NetworkPolicy manifests, and a traffic distribution test scriptWhat's included
docs/llm-d-deployment.mdinfrastructure/llm-d/llminferenceservice.yamlinfrastructure/llm-d/network-policies.yamlinfrastructure/llm-d/test_distribution.pyinfrastructure/llm-d/README.mdAll manifests and scripts are fully parameterized with no hardcoded cluster-specific values.
Test plan
oc apply --dry-run=client)--helpwithout errorstest_distribution.pyagainst a live llm-d deployment