design: payload processing profiles and profile picker#92
Conversation
Add design document for IPP profiles and profile picker architecture. Describes the three-stage pipeline (pre-processing, profile selection, profile execution) and how model selector integrates as a regular plugin within profiles. Runtime architecture only — config format is covered in llm-d#77. Relates to llm-d#15 Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>
|
@nirrozenbaum @shmuelk design document for profiles and profile picker — would appreciate your feedback. This is runtime architecture only, intentionally doesn't touch config format (that's #77). |
Reference Proposal 043 (ModelSelector Architecture) and describe how the Filter/Score/Pick pipeline fits within IPP profiles. Document the recursive pattern, multiple model selector instances across profiles, and current implementation status (PRs llm-d#72, llm-d#74). Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>
Explain how weighted scores accumulate across multiple scorers and how the picker uses those scores to make selection decisions. Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>
This reverts commit ed398ae.
This reverts commit 1dfe41a.
The profile picker's decision logic should be changeable at runtime without restart — e.g., shifting traffic between profiles by updating config. Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>
Add security and audit logging as use cases for shared post-processing that should run for every response regardless of profile. Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>
|
|
||
| A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision. | ||
|
|
||
| Pre-processing is necessary because the profile picker often cannot decide at the start of the request. For example, the picker may need to know whether the user specified a model or requested auto-selection. But the model name lives in the request body — it needs to be extracted by `body-field-to-header` before the picker can check for it. Without pre-processing, the picker would have to duplicate body-parsing logic, breaking the plugin composability model. |
There was a problem hiding this comment.
this is very much related to how we implement profile picker.
IMO we should start as any other plugin and not CEL. there are use cases where it would be very hard to implement profile picker with limited language some as regex/CEL/other.
for example, if one wants to select an external model ONLY when all internal models are saturated.
the profile picker should be able to take into account the request properties as well as the system state.
given that profile picker is like any other plugin - it should have access to all fields like any other plugin, so pre-processing plugins are not required for that purpse.
pre-processing plugins might make sense anyway if we want to define that certain plugins should always run, no matter which profile was selected. but the reason for that is not to enrich data for profile picker, but rather just a set of common plugins that represent common behavior for all use cases.
|
|
||
| The profile picker runs exactly once per request. It does not iterate or re-pick (unlike the upstream scheduler's ProfileHandler which can iteratively select profiles based on previous results). If iterative profile selection is needed in the future, the interface can be extended. | ||
|
|
||
| The profile picker's decision logic should be adjustable at runtime without restarting the IPP. For example, an operator should be able to change which requests route to which profiles (e.g., shifting traffic from a cost-optimized profile to a quality-optimized profile) by updating configuration. This requires the picker to support reloadable decision logic — whether through hot-reloadable config, CEL expressions, or another mechanism that doesn't require recompilation or restart. |
There was a problem hiding this comment.
I doubt it would be possible to express all possible configurations in those mechanisms.
maybe a better path is to support HA if we use redis as distributed datastore, which would allow rolling restarts while not loosing state.
| 1. **Should response plugins also be profile-specific, or always shared?** The current design makes them profile-specific. An alternative is to have shared post-processing (like shared pre-processing) that always runs regardless of profile. This could be useful for plugins that should run for every response — metrics collection, security validation, audit logging. A shared post-processing stage would mirror pre-processing and ensure these concerns are never accidentally omitted from a profile. | ||
|
|
||
| 2. **Shared pre/post-processing symmetry** — If we support shared pre-processing on the request side, should we also support shared post-processing on the response side? The use cases are clear: metrics collection, security plugins, audit logging all need to run for every response regardless of which profile handled the request. | ||
|
|
There was a problem hiding this comment.
aren't these points the same?
nirrozenbaum
left a comment
There was a problem hiding this comment.
finished reviewing the doc.
it feels more like a motivation & very high level concepts.
the main point is that there are no clear action items derived from the doc.
|
|
||
| All requests go through the same plugins in the same order. The order is determined at startup via command-line flags. There is no way to conditionally run different plugins for different requests. | ||
|
|
||
| This works for simple use cases (extract model name, resolve provider, inject credentials). But as the IPP takes on intelligent routing responsibilities — model selection, cost-based routing, fallback — different requests need fundamentally different processing paths. |
There was a problem hiding this comment.
What does resolve provider means?
There was a problem hiding this comment.
I assume figure out who is providing the model inference service....
|
|
||
| Consider these scenarios: | ||
|
|
||
| **Scenario 1: Model specified in request** |
There was a problem hiding this comment.
the text does not align with the IPP as of today. i.e., resolve the provider, translate the API format and inject credentials...
| - CycleState (shared state from pre-processing) | ||
| - The set of available profile names | ||
|
|
||
| It returns the name of the profile to execute. |
There was a problem hiding this comment.
we can run all Pre-Processing Plugins, then if a profile exist in the cycle state, run the profile.
|
|
||
| The profile picker runs exactly once per request. It does not iterate or re-pick (unlike the upstream scheduler's ProfileHandler which can iteratively select profiles based on previous results). If iterative profile selection is needed in the future, the interface can be extended. | ||
|
|
||
| The profile picker's decision logic should be adjustable at runtime without restarting the IPP. For example, an operator should be able to change which requests route to which profiles (e.g., shifting traffic from a cost-optimized profile to a quality-optimized profile) by updating configuration. This requires the picker to support reloadable decision logic — whether through hot-reloadable config, CEL expressions, or another mechanism that doesn't require recompilation or restart. |
There was a problem hiding this comment.
reading the above it start to sound like another type of filter-score-pick mechanism....
|
|
||
| ### Stage 1: Pre-Processing | ||
|
|
||
| A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision. |
There was a problem hiding this comment.
| A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision. | |
| A set of shared plugins that always run for every request in the order specified, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision. |
|
|
||
| ### Stage 2: Profile Selection | ||
|
|
||
| A profile picker plugin examines the enriched request and selects which profile to run. The picker is a plugin that implements a defined interface — it receives the request (after pre-processing has run) and the set of available profiles, and returns the name of the selected profile. |
There was a problem hiding this comment.
can we make the picker just one more request processing plugin? the set of plugin can come from the configuration or the data-layer.
if no profile-selection plugin is in the pre-processing, then a default profile (i.e., the first configured will run)
if one if configured, it write the decision to cycle-state...
There was a problem hiding this comment.
generally speaking, I actually think that representing every behavior as “just another request plugin” goes the opposite direction (same for model selector).
the framework should suggest strong types, along with validation and defaults.
plugin should represent customized logic to hook into extension points.
framework types should represent the built in mechanism and how to hook the plugins.
in that context - profile picker should be a framework strong type, also model selector.
There was a problem hiding this comment.
adopting this viewpoint, the IPP flow should be, for example:
- a pre-processing processor (of request processing plugins)
- a PRE-MODEL-SELECTOR profile-specific request processor, starting with a profile selector followed by running profile specific plugins (of request processing plugins)
- a model selector processor, starting with a profile selector, selecting selector profile, running a filter-score-pick profile
- a POST-MODEL-SELECTOR profile-specific request processor, starting with a profile selector followed by running profile specific plugins (of request processing plugins)
In this approach, the pre-processor, model-selector and profile-processor are 1st class citizens that run plugins.
There was a problem hiding this comment.
We need to separate how the IPP framework consumes the various types of plugins vs how they are described/configured/mentioned in the configuration.
Actually I think the IPP flow is different. It is as follows:
- pre-processing plugins run
- The profile picker picks a profile
- The request plugins of the profile are run. This includes the model selector, which is just another request plugin.
We need to extend things so that any plugin can reference other plugins and run them. The assistance from the framework is simply to order the instantiation of the plugins, preventing loops, etc.
|
|
||
| - Ability to define a set of pre-processing plugins with their order | ||
| - Ability to define multiple named profiles, each with an ordered request chain and response chain | ||
| - Ability to specify which plugin serves as the profile picker |
There was a problem hiding this comment.
| - Ability to specify which plugin serves as the profile picker | |
| - A profile picker plugin must be specified in the pre-processing plugins list in order to use a profile beyond the default one. |
There was a problem hiding this comment.
I disagree. The profile picker reference is a 1st class citizen in the configuration. If left out and there is only one profile in the configuration, a default single-profile-picker plugin is chosen, which simply always chooses the single profile in the configuration.
|
|
||
| ## Open Questions | ||
|
|
||
| 1. **Should response plugins also be profile-specific, or always shared?** The current design makes them profile-specific. An alternative is to have shared post-processing (like shared pre-processing) that always runs regardless of profile. This could be useful for plugins that should run for every response — metrics collection, security validation, audit logging. A shared post-processing stage would mirror pre-processing and ensure these concerns are never accidentally omitted from a profile. |
There was a problem hiding this comment.
we should start simple...
| ### Stage 1: Pre-Processing | ||
|
|
||
| A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision. | ||
|
|
There was a problem hiding this comment.
The pre-processing plugins can change the request that will eventually be sent out?
There was a problem hiding this comment.
I think the intention was yes. e.g., add header with the model (header change).
There was a problem hiding this comment.
assuming this is a plugin that is always needed.
otherwise the alternative is to specify that plugin multiple times, in all profiles.
|
|
||
| **Scenario 3: Priority routing** | ||
| A high-priority request (indicated by a header) should use a different model selection strategy than a standard request — perhaps a quality-optimized selector instead of a cost-optimized one. | ||
|
|
There was a problem hiding this comment.
Just for completeness:
| ***Scenario 4: Batch routing** | |
| A latency-insensitive request (indicated by a header) should use cost-optimized selector instead of the latency-optimized one. |
| The user sends `{"model": "auto"}`. The IPP needs to run a model selector (Filter → Score → Pick) to choose the best model, then proceed with provider resolution and credential injection. | ||
|
|
||
| **Scenario 3: Priority routing** | ||
| A high-priority request (indicated by a header) should use a different model selection strategy than a standard request — perhaps a quality-optimized selector instead of a cost-optimized one. |
There was a problem hiding this comment.
| A high-priority request (indicated by a header) should use a different model selection strategy than a standard request — perhaps a quality-optimized selector instead of a cost-optimized one. | |
| A high-priority request (indicated by a header) should use a different model selection strategy than a standard request — perhaps a latency-optimized selector instead of a cost-optimized one. |
| - The execution order between model selection and other plugins is explicit — you can see exactly where in the chain model selection happens | ||
| - Profiles without model selection simply don't include a model selector plugin | ||
|
|
||
| The model selector can itself support multiple internal profiles with its own profile selection (same recursive pattern), but that is internal to the model selector and transparent to the IPP profile system. |
There was a problem hiding this comment.
@noyitz , just for my understanding: does this imply that the internal model selector profile can override the initial profile picker, and potentially these profiles will push in different directions? What are scenarios that benefit from the recursive pattern?
What does this PR do?
Adds a design document for introducing profiles and a profile picker to the IPP (issue #15). This is a design-only PR — no code changes. The goal is to iterate on the architecture before implementation.
The design describes:
Why is this change needed?
Today the IPP runs the same plugin chain for every request. Different request types (model specified, auto-selection, priority routing) need different processing paths. Profiles enable this.
How was this tested?
Design document only — no code to test.
Checklist
git commit -s) per DCOmake test)make lint)Related Issues
Relates to #15
Related config work: #77