Skip to content

feat: Support epp's "pods" interface in Dynamo fixes [DEP-424]#6302

Merged
atchernych merged 24 commits intomainfrom
dep-424-pod-iterface
Feb 25, 2026
Merged

feat: Support epp's "pods" interface in Dynamo fixes [DEP-424]#6302
atchernych merged 24 commits intomainfrom
dep-424-pod-iterface

Conversation

@atchernych
Copy link
Copy Markdown
Contributor

@atchernych atchernych commented Feb 14, 2026

Overview:

Support epp pods interface in Dynamo fixes [DEP-424]

Details:

  • EPP will pass a set of pods on each query() request. KvRouter will use it as a filter for making the decision about the best pod.
    The pods from EPP will not overwrite the kvRouter internal pod storage or discovery.

  • Support FrontEnd deployment as a sidecar for GAIE integration (a must for the interface)

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

Release Notes

  • New Features

    • Added label-based filtering for intelligent pod selection in the inference gateway
    • Introduced disaggregated deployment configuration for Llama 3 70B with dedicated prefill and decode worker separation
    • Added HTTP Gateway API routing configuration for improved traffic management
  • Improvements

    • Enhanced routing logic to consider pod availability when computing worker assignment

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@atchernych atchernych requested a review from a team as a code owner February 14, 2026 02:37
@atchernych atchernych requested a review from a team February 14, 2026 02:37
@atchernych atchernych requested review from a team as code owners February 14, 2026 02:37
@github-actions github-actions Bot added the feat label Feb 14, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 14, 2026

Walkthrough

This pull request introduces pod-aware label-based filtering to the Dynamo inference gateway. A new LabelFilter plugin filters pods based on configured label key-value pairs, integrated with updated router calls that thread pod information through the system. The C FFI interface is extended to accept pod JSON data, and new deployment manifests define a disaggregated llama-3-70b vLLM setup with prefill and decode workers.

Changes

Cohort / File(s) Summary
Label Filter Plugin Infrastructure
deploy/inference-gateway/epp/cmd/epp/main.go, deploy/inference-gateway/epp/pkg/plugins/label_filter/plugin.go, deploy/inference-gateway/epp/pkg/plugins/dynamo_kv_scorer/plugin.go
Introduces LabelFilter plugin with factory and TypedName methods; registers plugin in main initialization. Updates dynamo_kv_scorer routing calls to accept and pass pods parameter to underlying router for worker selection.
C FFI Bindings
lib/bindings/c/src/lib.rs
Extends route_request function signature with new pods_json parameter; adds conditional pod JSON parsing with error handling and logging.
Deployment & Routing Manifests
recipes/llama-3-70b/vllm/agg/gaie/deploy.yaml, recipes/llama-3-70b/vllm/disagg-single-node/gaie/*
Renames VllmPrefillWorker to VllmDecodeWorker in aggregated deployment. Introduces new disaggregated single-node deployment manifest with Epp, Frontend, and separate prefill/decode workers; adds HTTPRoute for external access.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A label-filtered path so clean,
Where pods hop to their destined scene,
Prefill and decode, side by side,
With C bindings deep and routes that guide,
Dynamo's wisdom, now pod-aware,
Llama leaps with filtering flair!

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (120 files):

⚔️ .devcontainer/README.md (content)
⚔️ .github/filters.yaml (content)
⚔️ .github/scripts/route_buildkit.sh (content)
⚔️ .github/workflows/pre-merge.yml (content)
⚔️ Cargo.lock (content)
⚔️ Cargo.toml (content)
⚔️ README.md (content)
⚔️ components/src/dynamo/common/configuration/groups/runtime_args.py (content)
⚔️ components/src/dynamo/common/utils/runtime.py (content)
⚔️ components/src/dynamo/frontend/frontend_args.py (content)
⚔️ components/src/dynamo/frontend/main.py (content)
⚔️ components/src/dynamo/frontend/vllm_processor.py (content)
⚔️ components/src/dynamo/global_router/__main__.py (content)
⚔️ components/src/dynamo/mocker/args.py (content)
⚔️ components/src/dynamo/mocker/main.py (content)
⚔️ components/src/dynamo/profiler/deploy/profile_sla_aic_dgdr.yaml (content)
⚔️ components/src/dynamo/profiler/deploy/profile_sla_dgdr.yaml (content)
⚔️ components/src/dynamo/profiler/deploy/profile_sla_moe_dgdr.yaml (content)
⚔️ components/src/dynamo/profiler/profile_sla.py (content)
⚔️ components/src/dynamo/profiler/utils/config_modifiers/protocol.py (content)
⚔️ components/src/dynamo/profiler/utils/defaults.py (content)
⚔️ components/src/dynamo/profiler/utils/estimate_perf.py (content)
⚔️ components/src/dynamo/profiler/utils/profiler_argparse.py (content)
⚔️ components/src/dynamo/profiler/utils/search_space_autogen.py (content)
⚔️ components/src/dynamo/sglang/args.py (content)
⚔️ components/src/dynamo/sglang/main.py (content)
⚔️ components/src/dynamo/sglang/register.py (content)
⚔️ components/src/dynamo/trtllm/configs/diffusion_config.py (content)
⚔️ components/src/dynamo/trtllm/main.py (content)
⚔️ components/src/dynamo/trtllm/utils/trtllm_utils.py (content)
⚔️ components/src/dynamo/trtllm/workers/llm_worker.py (content)
⚔️ components/src/dynamo/trtllm/workers/video_diffusion_worker.py (content)
⚔️ components/src/dynamo/vllm/args.py (content)
⚔️ components/src/dynamo/vllm/backend_args.py (content)
⚔️ components/src/dynamo/vllm/handlers.py (content)
⚔️ components/src/dynamo/vllm/main.py (content)
⚔️ components/src/dynamo/vllm/tests/test_vllm_unit.py (content)
⚔️ container/context.yaml (content)
⚔️ container/templates/args.Dockerfile (content)
⚔️ container/templates/vllm_runtime.Dockerfile (content)
⚔️ deploy/helm/charts/crds/templates/nvidia.com_dynamographdeploymentrequests.yaml (content)
⚔️ deploy/inference-gateway/epp/cmd/epp/main.go (content)
⚔️ deploy/inference-gateway/epp/pkg/plugins/dynamo_kv_scorer/plugin.go (content)
⚔️ deploy/operator/api/v1alpha1/common.go (content)
⚔️ deploy/operator/api/v1alpha1/dynamographdeploymentrequest_types.go (content)
⚔️ deploy/operator/api/v1alpha1/zz_generated.deepcopy.go (content)
⚔️ deploy/operator/cmd/main.go (content)
⚔️ deploy/operator/config/crd/bases/nvidia.com_dynamographdeploymentrequests.yaml (content)
⚔️ deploy/operator/config/samples/nvidia.com_v1alpha1_dynamographdeploymentrequest.yaml (content)
⚔️ deploy/operator/go.mod (content)
⚔️ deploy/operator/go.sum (content)
⚔️ deploy/operator/internal/controller/dynamocomponentdeployment_controller.go (content)
⚔️ deploy/operator/internal/controller/dynamocomponentdeployment_controller_test.go (content)
⚔️ deploy/operator/internal/controller/dynamographdeployment_controller.go (content)
⚔️ deploy/operator/internal/controller/dynamographdeploymentrequest_controller.go (content)
⚔️ deploy/operator/internal/controller/dynamographdeploymentrequest_controller_test.go (content)
⚔️ deploy/operator/internal/dynamo/backend_trtllm.go (content)
⚔️ deploy/operator/internal/dynamo/backend_trtllm_test.go (content)
⚔️ deploy/operator/internal/webhook/validation/dynamographdeploymentrequest.go (content)
⚔️ deploy/operator/internal/webhook/validation/dynamographdeploymentrequest_test.go (content)
⚔️ docs/pages/backends/sglang/README.md (content)
⚔️ docs/pages/backends/trtllm/README.md (content)
⚔️ docs/pages/backends/vllm/README.md (content)
⚔️ docs/pages/components/frontend/README.md (content)
⚔️ docs/pages/components/frontend/frontend-guide.md (content)
⚔️ docs/pages/components/kvbm/kvbm-guide.md (content)
⚔️ docs/pages/components/profiler/profiler-guide.md (content)
⚔️ docs/pages/components/router/README.md (content)
⚔️ docs/pages/components/router/router-examples.md (content)
⚔️ docs/pages/components/router/router-guide.md (content)
⚔️ docs/pages/design-docs/discovery-plane.md (content)
⚔️ docs/pages/design-docs/distributed-runtime.md (content)
⚔️ docs/pages/development/backend-guide.md (content)
⚔️ docs/pages/features/multimodal/multimodal-trtllm.md (content)
⚔️ docs/pages/features/multimodal/multimodal-vllm.md (content)
⚔️ docs/pages/getting-started/quickstart.md (content)
⚔️ docs/pages/integrations/kv-events-custom-engines.md (content)
⚔️ docs/pages/kubernetes/api-reference.md (content)
⚔️ examples/backends/tritonserver/README.md (content)
⚔️ examples/backends/tritonserver/launch/identity.sh (content)
⚔️ examples/backends/tritonserver/src/tritonworker.py (content)
⚔️ examples/custom_backend/hello_world/README.md (content)
⚔️ examples/multimodal/components/processor.py (content)
⚔️ lib/async-openai/src/types/assistant.rs (content)
⚔️ lib/async-openai/src/types/chat.rs (content)
⚔️ lib/async-openai/src/types/completion.rs (content)
⚔️ lib/bindings/c/src/lib.rs (content)
⚔️ lib/bindings/kvbm/Cargo.lock (content)
⚔️ lib/bindings/kvbm/README.md (content)
⚔️ lib/bindings/kvbm/src/block_manager/cache_stats.rs (content)
⚔️ lib/bindings/python/Cargo.toml (content)
⚔️ lib/bindings/python/examples/hello_world/server_sglang.py (content)
⚔️ lib/bindings/python/examples/hello_world/server_sglang_tok.py (content)
⚔️ lib/bindings/python/examples/hello_world/server_vllm.py (content)
⚔️ lib/bindings/python/rust/lib.rs (content)
⚔️ lib/bindings/python/src/dynamo/_core.pyi (content)
⚔️ lib/bindings/python/src/dynamo/llm/__init__.py (content)
⚔️ lib/bindings/python/src/dynamo/runtime/__init__.py (content)
⚔️ lib/bindings/python/tests/conftest.py (content)
⚔️ lib/bindings/python/tests/test_tensor.py (content)
⚔️ lib/llm/src/entrypoint/input/http.rs (content)
⚔️ lib/llm/src/http/service/service_v2.rs (content)
⚔️ lib/llm/src/local_model.rs (content)
⚔️ lib/llm/src/preprocessor/media/README.md (content)
⚔️ lib/llm/src/protocols/openai/validate.rs (content)
⚔️ lib/mocker/src/perf_model.rs (content)
⚔️ lib/runtime/examples/Cargo.lock (content)
⚔️ lib/runtime/src/discovery/kv_store.rs (content)
⚔️ lib/runtime/src/discovery/mod.rs (content)
⚔️ lib/runtime/src/distributed.rs (content)
⚔️ lib/runtime/src/storage/kv/etcd.rs (content)
⚔️ recipes/llama-3-70b/vllm/agg/gaie/deploy.yaml (content)
⚔️ tests/conftest.py (content)
⚔️ tests/frontend/grpc/echo_tensor_worker.py (content)
⚔️ tests/frontend/test_prompt_embeds.py (content)
⚔️ tests/profiler/test_profile_sla_aiconfigurator.py (content)
⚔️ tests/profiler/test_profile_sla_dryrun.py (content)
⚔️ tests/router/common.py (content)
⚔️ tests/router/test_router_e2e_with_mockers.py (content)
⚔️ tests/serve/launch/template_verifier.py (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
Description check ❓ Inconclusive The PR description includes all template sections (Overview, Details, Where should the reviewer start, Related Issues) but lacks specific implementation details and file pointers. Provide specific file paths and implementation details in the 'Where should the reviewer start?' section. Replace placeholder 'closes GitHub issue: #xxx' with the actual issue number DEP-424 or its corresponding GitHub issue link.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title clearly summarizes the main change: adding support for EPP's pods interface in Dynamo, addressing issue DEP-424.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deploy/inference-gateway/epp/pkg/plugins/dynamo_kv_scorer/plugin.go (1)

60-62: ⚠️ Potential issue | 🔴 Critical

Critical: CGo declaration of route_request is missing the new pods_json parameter.

The Rust FFI signature was updated to accept 4 parameters (handle, request_json, pods_json, out_result), but the CGo header declaration still declares the old 3-parameter version. This will cause a linker error at build time because the symbol won't match.

Update the CGo declaration to include the new parameter:

Proposed fix
-query_router_result_t route_request(RouterHandles *handle,
-                                         const char *request_json,
-                                         CRoutingResult *out_result);
+query_router_result_t route_request(RouterHandles *handle,
+                                         const char *request_json,
+                                         const char *pods_json,
+                                         CRoutingResult *out_result);
🤖 Fix all issues with AI agents
In `@deploy/inference-gateway/epp/pkg/plugins/dynamo_kv_scorer/plugin.go`:
- Around line 474-479: The call to C.route_request is passing the Go slice pods
(type []schedtypes.Pod) directly where the C/Rust signature expects a C string
pointer (pods_json: *const c_char); fix by serializing pods to JSON and
converting that JSON to a C string (use C.CString) and pass that pointer to
C.route_request (ensure you free the C string after the call), or if you prefer
the minimal change for now (since Rust ignores _pods), pass nil instead of pods;
update the invocation around C.route_request (referencing symbols: pods,
C.route_request, cRequestJSON, result/CRoutingResult) accordingly.

In `@deploy/inference-gateway/epp/pkg/plugins/label_filter/plugin.go`:
- Around line 38-58: The plugin factory (LabelFilterFactory) expects JSON
matching LabelFilterConfig with fields "key" and "value", but the deployment
uses "label", "validValues", and "allowsNoLabel", causing cfg.Key to be empty
and the factory to error; fix by aligning schemas: either update
LabelFilterConfig (and LabelFilterFactory) to accept the deployment fields
(rename Key->Label, Value->ValidValues/validate first entry, and add
AllowsNoLabel handling) and adapt NewLabelFilter invocation accordingly, or
update the deployment YAML to provide "key" and "value" fields (or a single
value from validValues) so that LabelFilterFactory, LabelFilterConfig, and the
call to NewLabelFilter(cfg.Key, cfg.Value) receive the expected values.
- Around line 90-101: The Filter method on LabelFilter incorrectly treats an
empty f.value as matching pods missing the label because
pod.GetPod().Labels[f.key] returns "" for absent keys; update Filter
(LabelFilter.Filter) to use the two-value map lookup (e.g., val, ok :=
pod.GetPod().Labels[f.key]) and only compare val == f.value when ok is true, or
alternatively enforce non-empty f.value when constructing the LabelFilter in the
factory so empty values are rejected; modify either the Filter function (use ok
check) or the factory that creates LabelFilter (validate f.value != "") to
ensure absent labels don't falsely match.

In `@recipes/llama-3-70b/vllm/disagg-single-node/gaie/deploy.yaml`:
- Around line 24-53: The EPP YAML references plugins that aren't registered or
misnamed: replace or register the missing plugins and align label-filter params
with its implementation. Either change pluginRef: picker to the actual scorer
plugin used (e.g., pluginRef: kv-aware-scorer) or add a registered plugin named
picker/max-score-picker in main.go; register the dyn-kv-prefill and
pd-profile-handler plugin types in main.go (or remove them from
schedulingProfiles) so they are available at runtime; and update the
label-filter parameter keys (label/validValues/allowsNoLabel) to the exact
parameter names expected by the label-filter implementation (or change the
implementation to accept the YAML names) so the schema matches. Ensure all
referenced plugin type strings (dyn-kv-prefill, pd-profile-handler,
picker/max-score-picker, label-filter, kv-aware-scorer) are consistently defined
and registered in main.go.
🧹 Nitpick comments (1)
lib/bindings/c/src/lib.rs (1)

988-1005: Parsed pods_json is validated but never used in routing logic.

The parsed pods are stored in _pods (underscore-prefixed) and discarded. The routing path on lines 1007–1101 still relies entirely on discovery. If this is intentional scaffolding for a future PR, consider adding a brief // TODO comment so the intent is clear to other contributors.

Also, tracing::info! on line 993 will fire on every routed request when pods are provided — consider downgrading to debug level for production.

Comment thread deploy/inference-gateway/epp/pkg/plugins/dynamo_kv_scorer/plugin.go Outdated
Comment thread deploy/inference-gateway/epp/pkg/plugins/label_filter/plugin.go Outdated
Comment thread deploy/inference-gateway/epp/pkg/plugins/label_filter/plugin.go
Comment thread recipes/llama-3-70b/vllm/disagg-single-node/gaie/deploy.yaml Outdated
@atchernych atchernych changed the title feat: Support epp pods interface in Dynamo fixes [DEP-424] feat: Support epp's "pods" interface in Dynamo fixes [DEP-424] Feb 17, 2026
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@github-actions github-actions Bot added the router Relates to routing, KV-aware routing, etc. label Feb 18, 2026
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Feb 18, 2026
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@github-actions github-actions Bot added the backend::vllm Relates to the vllm backend label Feb 24, 2026
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Comment thread lib/llm/src/kv_router.rs
Comment thread lib/llm/src/kv_router/queue.rs Outdated
Comment thread lib/llm/src/kv_router/queue.rs Outdated
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 25, 2026

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@atchernych atchernych enabled auto-merge (squash) February 25, 2026 01:14
@atchernych atchernych requested a review from PeaBrane February 25, 2026 01:30
@atchernych atchernych merged commit c916cd4 into main Feb 25, 2026
86 of 91 checks passed
@atchernych atchernych deleted the dep-424-pod-iterface branch February 25, 2026 01:54
to parse the parameters.
format: byte
type: string
x-kubernetes-preserve-unknown-fields: true
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::vllm Relates to the vllm backend deployment::k8s Relates to dynamo deployment in kubernetes documentation Improvements or additions to documentation feat router Relates to routing, KV-aware routing, etc. size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants