feat: Support epp's "pods" interface in Dynamo fixes [DEP-424]#6302
feat: Support epp's "pods" interface in Dynamo fixes [DEP-424]#6302atchernych merged 24 commits intomainfrom
Conversation
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
WalkthroughThis pull request introduces pod-aware label-based filtering to the Dynamo inference gateway. A new LabelFilter plugin filters pods based on configured label key-value pairs, integrated with updated router calls that thread pod information through the system. The C FFI interface is extended to accept pod JSON data, and new deployment manifests define a disaggregated llama-3-70b vLLM setup with prefill and decode workers. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deploy/inference-gateway/epp/pkg/plugins/dynamo_kv_scorer/plugin.go (1)
60-62:⚠️ Potential issue | 🔴 CriticalCritical: CGo declaration of
route_requestis missing the newpods_jsonparameter.The Rust FFI signature was updated to accept 4 parameters (
handle,request_json,pods_json,out_result), but the CGo header declaration still declares the old 3-parameter version. This will cause a linker error at build time because the symbol won't match.Update the CGo declaration to include the new parameter:
Proposed fix
-query_router_result_t route_request(RouterHandles *handle, - const char *request_json, - CRoutingResult *out_result); +query_router_result_t route_request(RouterHandles *handle, + const char *request_json, + const char *pods_json, + CRoutingResult *out_result);
🤖 Fix all issues with AI agents
In `@deploy/inference-gateway/epp/pkg/plugins/dynamo_kv_scorer/plugin.go`:
- Around line 474-479: The call to C.route_request is passing the Go slice pods
(type []schedtypes.Pod) directly where the C/Rust signature expects a C string
pointer (pods_json: *const c_char); fix by serializing pods to JSON and
converting that JSON to a C string (use C.CString) and pass that pointer to
C.route_request (ensure you free the C string after the call), or if you prefer
the minimal change for now (since Rust ignores _pods), pass nil instead of pods;
update the invocation around C.route_request (referencing symbols: pods,
C.route_request, cRequestJSON, result/CRoutingResult) accordingly.
In `@deploy/inference-gateway/epp/pkg/plugins/label_filter/plugin.go`:
- Around line 38-58: The plugin factory (LabelFilterFactory) expects JSON
matching LabelFilterConfig with fields "key" and "value", but the deployment
uses "label", "validValues", and "allowsNoLabel", causing cfg.Key to be empty
and the factory to error; fix by aligning schemas: either update
LabelFilterConfig (and LabelFilterFactory) to accept the deployment fields
(rename Key->Label, Value->ValidValues/validate first entry, and add
AllowsNoLabel handling) and adapt NewLabelFilter invocation accordingly, or
update the deployment YAML to provide "key" and "value" fields (or a single
value from validValues) so that LabelFilterFactory, LabelFilterConfig, and the
call to NewLabelFilter(cfg.Key, cfg.Value) receive the expected values.
- Around line 90-101: The Filter method on LabelFilter incorrectly treats an
empty f.value as matching pods missing the label because
pod.GetPod().Labels[f.key] returns "" for absent keys; update Filter
(LabelFilter.Filter) to use the two-value map lookup (e.g., val, ok :=
pod.GetPod().Labels[f.key]) and only compare val == f.value when ok is true, or
alternatively enforce non-empty f.value when constructing the LabelFilter in the
factory so empty values are rejected; modify either the Filter function (use ok
check) or the factory that creates LabelFilter (validate f.value != "") to
ensure absent labels don't falsely match.
In `@recipes/llama-3-70b/vllm/disagg-single-node/gaie/deploy.yaml`:
- Around line 24-53: The EPP YAML references plugins that aren't registered or
misnamed: replace or register the missing plugins and align label-filter params
with its implementation. Either change pluginRef: picker to the actual scorer
plugin used (e.g., pluginRef: kv-aware-scorer) or add a registered plugin named
picker/max-score-picker in main.go; register the dyn-kv-prefill and
pd-profile-handler plugin types in main.go (or remove them from
schedulingProfiles) so they are available at runtime; and update the
label-filter parameter keys (label/validValues/allowsNoLabel) to the exact
parameter names expected by the label-filter implementation (or change the
implementation to accept the YAML names) so the schema matches. Ensure all
referenced plugin type strings (dyn-kv-prefill, pd-profile-handler,
picker/max-score-picker, label-filter, kv-aware-scorer) are consistently defined
and registered in main.go.
🧹 Nitpick comments (1)
lib/bindings/c/src/lib.rs (1)
988-1005: Parsedpods_jsonis validated but never used in routing logic.The parsed pods are stored in
_pods(underscore-prefixed) and discarded. The routing path on lines 1007–1101 still relies entirely on discovery. If this is intentional scaffolding for a future PR, consider adding a brief// TODOcomment so the intent is clear to other contributors.Also,
tracing::info!on line 993 will fire on every routed request when pods are provided — consider downgrading todebuglevel for production.
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
| to parse the parameters. | ||
| format: byte | ||
| type: string | ||
| x-kubernetes-preserve-unknown-fields: true |
Overview:
Support epp pods interface in Dynamo fixes [DEP-424]
Details:
EPP will pass a set of pods on each query() request. KvRouter will use it as a filter for making the decision about the best pod.
The pods from EPP will not overwrite the kvRouter internal pod storage or discovery.
Support FrontEnd deployment as a sidecar for GAIE integration (a must for the interface)
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Release Notes
New Features
Improvements