Skip to content

docs(plans): preview-model A/B experiment system (opt-in, partner trace export)#3247

Draft
markijbema wants to merge 2 commits into
mainfrom
mark/experimental-models-plans
Draft

docs(plans): preview-model A/B experiment system (opt-in, partner trace export)#3247
markijbema wants to merge 2 commits into
mainfrom
mark/experimental-models-plans

Conversation

@markijbema
Copy link
Copy Markdown
Contributor

Summary

Adds two planning documents under .plans/ for an upcoming preview/experimental model A/B testing system, in partnership with model providers.

  • experimental-models-1.md — Core A/B experiment system: schema, gateway routing, variant picker, admin tRPC + UI, telemetry, API key handling.
  • experimental-models-2.md — Partner trace export & replay roadmap (depends on Part 1).

Scope is intentionally narrow:

  • This is only for unreleased / preview model checkpoints we are evaluating with provider partners. It is not a general traffic-splitting mechanism for production models.
  • It is opt-in only: experimented public model ids are dedicated preview ids (e.g. kilo/preview-experiment-foo) that a user must explicitly select. They are excluded from kilo-auto candidate sets and never silently routed onto a user's behalf. Production model traffic is never bucketed.
  • Variant assignment is server-side and blinded from the client; partner exports in Part 2 cover only opt-in preview traffic.

Both plan files now carry a scope banner at the top stating these constraints explicitly.

Verification

N/A — docs only.

Visual Changes

N/A.

Reviewer Notes

  • Plans only; no code changes. Looking for feedback on the design before implementation.
  • Part 1 is self-contained and the intended first PR. Part 2 is roadmap-level and depends on Part 1's schema.
  • Drafted as a PR specifically to make the opt-in / preview-only scope visible early; please flag if the framing needs to be sharper anywhere in the plans.

…ce export)

Adds two planning docs under .plans/ for an upcoming preview/experimental
model A/B testing system. Scope is intentionally limited to opt-in preview
model ids; production traffic is never bucketed and partner exports cover
opt-in preview traffic only.
These two plans live in .plans/ and need to be tracked alongside the PR
that introduces them; keep the rest of .plans/ ignored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant