design: payload processing profiles and profile picker by noyitz · Pull Request #92 · llm-d/llm-d-inference-payload-processor

noyitz · 2026-05-13T22:12:18Z

What does this PR do?

Adds a design document for introducing profiles and a profile picker to the IPP (issue #15). This is a design-only PR — no code changes. The goal is to iterate on the architecture before implementation.

The design describes:

Three-stage pipeline: pre-processing → profile selection → profile execution
Profiles: named, ordered chains of request + response plugins
Profile picker: a plugin that selects which profile to run based on the enriched request
Pre-processing: shared plugins that always run before profile selection
Model selector integration: sits inside profiles as a regular RequestProcessor plugin

Why is this change needed?

Today the IPP runs the same plugin chain for every request. Different request types (model specified, auto-selection, priority routing) need different processing paths. Profiles enable this.

How was this tested?

Unit tests added/updated
Integration/e2e tests added/updated
Manual testing performed

Design document only — no code to test.

Checklist

Commits are signed off (git commit -s) per DCO
Code follows project contributing guidelines
Tests pass locally (make test)
Linters pass (make lint)
Documentation updated (if applicable)

Related Issues

Relates to #15
Related config work: #77

Add design document for IPP profiles and profile picker architecture. Describes the three-stage pipeline (pre-processing, profile selection, profile execution) and how model selector integrates as a regular plugin within profiles. Runtime architecture only — config format is covered in llm-d#77. Relates to llm-d#15 Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

noyitz · 2026-05-13T22:12:26Z

@nirrozenbaum @shmuelk design document for profiles and profile picker — would appreciate your feedback. This is runtime architecture only, intentionally doesn't touch config format (that's #77).

Reference Proposal 043 (ModelSelector Architecture) and describe how the Filter/Score/Pick pipeline fits within IPP profiles. Document the recursive pattern, multiple model selector instances across profiles, and current implementation status (PRs llm-d#72, llm-d#74). Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

Explain how weighted scores accumulate across multiple scorers and how the picker uses those scores to make selection decisions. Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

This reverts commit ed398ae.

This reverts commit 1dfe41a.

The profile picker's decision logic should be changeable at runtime without restart — e.g., shifting traffic between profiles by updating config. Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

Add security and audit logging as use cases for shared post-processing that should run for every response regardless of profile. Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

nirrozenbaum · 2026-05-17T12:31:27Z

+
+A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision.
+
+Pre-processing is necessary because the profile picker often cannot decide at the start of the request. For example, the picker may need to know whether the user specified a model or requested auto-selection. But the model name lives in the request body — it needs to be extracted by `body-field-to-header` before the picker can check for it. Without pre-processing, the picker would have to duplicate body-parsing logic, breaking the plugin composability model.


this is very much related to how we implement profile picker.
IMO we should start as any other plugin and not CEL. there are use cases where it would be very hard to implement profile picker with limited language some as regex/CEL/other.

for example, if one wants to select an external model ONLY when all internal models are saturated.
the profile picker should be able to take into account the request properties as well as the system state.

given that profile picker is like any other plugin - it should have access to all fields like any other plugin, so pre-processing plugins are not required for that purpse.

pre-processing plugins might make sense anyway if we want to define that certain plugins should always run, no matter which profile was selected. but the reason for that is not to enrich data for profile picker, but rather just a set of common plugins that represent common behavior for all use cases.

nirrozenbaum · 2026-05-17T12:38:26Z

+
+The profile picker runs exactly once per request. It does not iterate or re-pick (unlike the upstream scheduler's ProfileHandler which can iteratively select profiles based on previous results). If iterative profile selection is needed in the future, the interface can be extended.
+
+The profile picker's decision logic should be adjustable at runtime without restarting the IPP. For example, an operator should be able to change which requests route to which profiles (e.g., shifting traffic from a cost-optimized profile to a quality-optimized profile) by updating configuration. This requires the picker to support reloadable decision logic — whether through hot-reloadable config, CEL expressions, or another mechanism that doesn't require recompilation or restart.


I doubt it would be possible to express all possible configurations in those mechanisms.
maybe a better path is to support HA if we use redis as distributed datastore, which would allow rolling restarts while not loosing state.

nirrozenbaum · 2026-05-17T12:42:28Z

+1. **Should response plugins also be profile-specific, or always shared?** The current design makes them profile-specific. An alternative is to have shared post-processing (like shared pre-processing) that always runs regardless of profile. This could be useful for plugins that should run for every response — metrics collection, security validation, audit logging. A shared post-processing stage would mirror pre-processing and ensure these concerns are never accidentally omitted from a profile.
+
+2. **Shared pre/post-processing symmetry** — If we support shared pre-processing on the request side, should we also support shared post-processing on the response side? The use cases are clear: metrics collection, security plugins, audit logging all need to run for every response regardless of which profile handled the request.
+


aren't these points the same?

nirrozenbaum

finished reviewing the doc.
it feels more like a motivation & very high level concepts.

the main point is that there are no clear action items derived from the doc.

ronenkat · 2026-05-17T13:55:05Z

+
+All requests go through the same plugins in the same order. The order is determined at startup via command-line flags. There is no way to conditionally run different plugins for different requests.
+
+This works for simple use cases (extract model name, resolve provider, inject credentials). But as the IPP takes on intelligent routing responsibilities — model selection, cost-based routing, fallback — different requests need fundamentally different processing paths.


What does resolve provider means?

I assume figure out who is providing the model inference service....

ronenkat · 2026-05-17T13:55:57Z

+
+Consider these scenarios:
+
+**Scenario 1: Model specified in request**


the text does not align with the IPP as of today. i.e., resolve the provider, translate the API format and inject credentials...

ronenkat · 2026-05-17T14:03:55Z

+- CycleState (shared state from pre-processing)
+- The set of available profile names
+
+It returns the name of the profile to execute.


we can run all Pre-Processing Plugins, then if a profile exist in the cycle state, run the profile.

ronenkat · 2026-05-17T14:07:07Z

+
+The profile picker runs exactly once per request. It does not iterate or re-pick (unlike the upstream scheduler's ProfileHandler which can iteratively select profiles based on previous results). If iterative profile selection is needed in the future, the interface can be extended.
+
+The profile picker's decision logic should be adjustable at runtime without restarting the IPP. For example, an operator should be able to change which requests route to which profiles (e.g., shifting traffic from a cost-optimized profile to a quality-optimized profile) by updating configuration. This requires the picker to support reloadable decision logic — whether through hot-reloadable config, CEL expressions, or another mechanism that doesn't require recompilation or restart.


reading the above it start to sound like another type of filter-score-pick mechanism....

ronenkat · 2026-05-17T14:10:07Z

+
+### Stage 1: Pre-Processing
+
+A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision.


Suggested change

A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision.

A set of shared plugins that always run for every request in the order specified, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision.

ronenkat · 2026-05-17T14:12:50Z

+
+### Stage 2: Profile Selection
+
+A profile picker plugin examines the enriched request and selects which profile to run. The picker is a plugin that implements a defined interface — it receives the request (after pre-processing has run) and the set of available profiles, and returns the name of the selected profile.


can we make the picker just one more request processing plugin? the set of plugin can come from the configuration or the data-layer.
if no profile-selection plugin is in the pre-processing, then a default profile (i.e., the first configured will run)
if one if configured, it write the decision to cycle-state...

generally speaking, I actually think that representing every behavior as “just another request plugin” goes the opposite direction (same for model selector).

the framework should suggest strong types, along with validation and defaults.

plugin should represent customized logic to hook into extension points.
framework types should represent the built in mechanism and how to hook the plugins.

in that context - profile picker should be a framework strong type, also model selector.

adopting this viewpoint, the IPP flow should be, for example:

a pre-processing processor (of request processing plugins)

a PRE-MODEL-SELECTOR profile-specific request processor, starting with a profile selector followed by running profile specific plugins (of request processing plugins)

a model selector processor, starting with a profile selector, selecting selector profile, running a filter-score-pick profile

a POST-MODEL-SELECTOR profile-specific request processor, starting with a profile selector followed by running profile specific plugins (of request processing plugins)
In this approach, the pre-processor, model-selector and profile-processor are 1st class citizens that run plugins.

We need to separate how the IPP framework consumes the various types of plugins vs how they are described/configured/mentioned in the configuration.

Actually I think the IPP flow is different. It is as follows:

pre-processing plugins run

The profile picker picks a profile

The request plugins of the profile are run. This includes the model selector, which is just another request plugin.

We need to extend things so that any plugin can reference other plugins and run them. The assistance from the framework is simply to order the instantiation of the plugins, preventing loops, etc.

ronenkat · 2026-05-17T14:16:46Z

+
+- Ability to define a set of pre-processing plugins with their order
+- Ability to define multiple named profiles, each with an ordered request chain and response chain
+- Ability to specify which plugin serves as the profile picker


Suggested change

- Ability to specify which plugin serves as the profile picker

- A profile picker plugin must be specified in the pre-processing plugins list in order to use a profile beyond the default one.

I disagree. The profile picker reference is a 1st class citizen in the configuration. If left out and there is only one profile in the configuration, a default single-profile-picker plugin is chosen, which simply always chooses the single profile in the configuration.

ronenkat · 2026-05-17T14:17:42Z

+
+## Open Questions
+
+1. **Should response plugins also be profile-specific, or always shared?** The current design makes them profile-specific. An alternative is to have shared post-processing (like shared pre-processing) that always runs regardless of profile. This could be useful for plugins that should run for every response — metrics collection, security validation, audit logging. A shared post-processing stage would mirror pre-processing and ensure these concerns are never accidentally omitted from a profile.


we should start simple...

shmuelk · 2026-05-18T18:08:43Z

+### Stage 1: Pre-Processing
+
+A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision.
+


The pre-processing plugins can change the request that will eventually be sent out?

I think the intention was yes. e.g., add header with the model (header change).

assuming this is a plugin that is always needed.
otherwise the alternative is to specify that plugin multiple times, in all profiles.

davidbreitgand · 2026-05-19T12:42:54Z

+
+**Scenario 3: Priority routing**
+A high-priority request (indicated by a header) should use a different model selection strategy than a standard request — perhaps a quality-optimized selector instead of a cost-optimized one.
+


Just for completeness:

Suggested change

***Scenario 4: Batch routing**

A latency-insensitive request (indicated by a header) should use cost-optimized selector instead of the latency-optimized one.

davidbreitgand · 2026-05-19T12:43:42Z

+The user sends `{"model": "auto"}`. The IPP needs to run a model selector (Filter → Score → Pick) to choose the best model, then proceed with provider resolution and credential injection.
+
+**Scenario 3: Priority routing**
+A high-priority request (indicated by a header) should use a different model selection strategy than a standard request — perhaps a quality-optimized selector instead of a cost-optimized one.


Suggested change

A high-priority request (indicated by a header) should use a different model selection strategy than a standard request — perhaps a quality-optimized selector instead of a cost-optimized one.

A high-priority request (indicated by a header) should use a different model selection strategy than a standard request — perhaps a latency-optimized selector instead of a cost-optimized one.

davidbreitgand · 2026-05-19T13:43:34Z

+- The execution order between model selection and other plugins is explicit — you can see exactly where in the chain model selection happens
+- Profiles without model selection simply don't include a model selector plugin
+
+The model selector can itself support multiple internal profiles with its own profile selection (same recursive pattern), but that is internal to the model selector and transparent to the IPP profile system.


@noyitz , just for my understanding: does this imply that the internal model selector profile can override the initial profile picker, and potentially these profiles will push in different directions? What are scenarios that benefit from the recursive pattern?

github-actions Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 13, 2026

noyitz added 6 commits May 13, 2026 20:07

design: clarify data flow from scorers to picker

ed398ae

Explain how weighted scores accumulate across multiple scorers and how the picker uses those scores to make selection decisions. Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

Revert "design: clarify data flow from scorers to picker"

66fb2b7

This reverts commit ed398ae.

Revert "design: expand model selector integration section"

2ebf55c

This reverts commit 1dfe41a.

design: add runtime adjustability requirement for profile picker

f379b7d

The profile picker's decision logic should be changeable at runtime without restart — e.g., shifting traffic between profiles by updating config. Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

design: expand open questions on shared response post-processing

2407db8

Add security and audit logging as use cases for shared post-processing that should run for every response regardless of profile. Signed-off-by: Noy Itzikowitz <nitzikow@redhat.com>

noyitz mentioned this pull request May 15, 2026

feat: add model-selector RequestProcessor plugin #97

Draft

8 tasks

nirrozenbaum reviewed May 17, 2026

View reviewed changes

ronenkat reviewed May 17, 2026

View reviewed changes

shmuelk reviewed May 18, 2026

View reviewed changes

davidbreitgand reviewed May 19, 2026

View reviewed changes


		A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision.

		Pre-processing is necessary because the profile picker often cannot decide at the start of the request. For example, the picker may need to know whether the user specified a model or requested auto-selection. But the model name lives in the request body — it needs to be extracted by `body-field-to-header` before the picker can check for it. Without pre-processing, the picker would have to duplicate body-parsing logic, breaking the plugin composability model.


		The profile picker runs exactly once per request. It does not iterate or re-pick (unlike the upstream scheduler's ProfileHandler which can iteratively select profiles based on previous results). If iterative profile selection is needed in the future, the interface can be extended.

		The profile picker's decision logic should be adjustable at runtime without restarting the IPP. For example, an operator should be able to change which requests route to which profiles (e.g., shifting traffic from a cost-optimized profile to a quality-optimized profile) by updating configuration. This requires the picker to support reloadable decision logic — whether through hot-reloadable config, CEL expressions, or another mechanism that doesn't require recompilation or restart.

		1. Should response plugins also be profile-specific, or always shared? The current design makes them profile-specific. An alternative is to have shared post-processing (like shared pre-processing) that always runs regardless of profile. This could be useful for plugins that should run for every response — metrics collection, security validation, audit logging. A shared post-processing stage would mirror pre-processing and ensure these concerns are never accidentally omitted from a profile.

		2. Shared pre/post-processing symmetry — If we support shared pre-processing on the request side, should we also support shared post-processing on the response side? The use cases are clear: metrics collection, security plugins, audit logging all need to run for every response regardless of which profile handled the request.


		All requests go through the same plugins in the same order. The order is determined at startup via command-line flags. There is no way to conditionally run different plugins for different requests.

		This works for simple use cases (extract model name, resolve provider, inject credentials). But as the IPP takes on intelligent routing responsibilities — model selection, cost-based routing, fallback — different requests need fundamentally different processing paths.


		Consider these scenarios:

		Scenario 1: Model specified in request


		### Stage 1: Pre-Processing

		A set of shared plugins that always run for every request, regardless of which profile is selected. These plugins enrich the request with information that the profile picker needs to make its decision.


		### Stage 2: Profile Selection

		A profile picker plugin examines the enriched request and selects which profile to run. The picker is a plugin that implements a defined interface — it receives the request (after pre-processing has run) and the set of available profiles, and returns the name of the selected profile.

	- Ability to specify which plugin serves as the profile picker
	- A profile picker plugin must be specified in the pre-processing plugins list in order to use a profile beyond the default one.


		## Open Questions

		1. Should response plugins also be profile-specific, or always shared? The current design makes them profile-specific. An alternative is to have shared post-processing (like shared pre-processing) that always runs regardless of profile. This could be useful for plugins that should run for every response — metrics collection, security validation, audit logging. A shared post-processing stage would mirror pre-processing and ensure these concerns are never accidentally omitted from a profile.


		Scenario 3: Priority routing
		A high-priority request (indicated by a header) should use a different model selection strategy than a standard request — perhaps a quality-optimized selector instead of a cost-optimized one.



	*Scenario 4: Batch routing
	A latency-insensitive request (indicated by a header) should use cost-optimized selector instead of the latency-optimized one.

Conversation

noyitz commented May 13, 2026

What does this PR do?

Why is this change needed?

How was this tested?

Checklist

Related Issues

Uh oh!

noyitz commented May 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidbreitgand May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

davidbreitgand May 19, 2026 •

edited

Loading