Skip to content

feat(deploy): record generating model(s) per cell in run manifest#1

Open
JoshuaBearup wants to merge 1 commit into
orlyjamie:mainfrom
JoshuaBearup:feat/record-generator-model
Open

feat(deploy): record generating model(s) per cell in run manifest#1
JoshuaBearup wants to merge 1 commit into
orlyjamie:mainfrom
JoshuaBearup:feat/record-generator-model

Conversation

@JoshuaBearup

Copy link
Copy Markdown

What

Records which model(s) generated each cell in a run, alongside the existing record of which model attacked it.

  • Adds a generator_models column to runs/<id>/manifest.csv (e.g. claude-haiku-4-5-20251001:3|claude-opus-4-8:1 — model:call-count, pipe-separated).
  • Attaches a generation block (the existing getUsageReport() output — tokens + cost by model) to the per-cell local manifest.

Why

A run already records the model under test (the attacker) but not which model produced each target. Since generation mixes models (Haiku by default, Opus for the higher-quality steps via quality: true), recording the generator mix per cell helps reproducibility and cost/quality analysis — and makes it explicit when a quality step silently ran on a different model than expected.

Design notes

  • Local manifest only — not baked into the container. The generating model is metadata the model under test must not see, and RCE/LFI classes can read the baked manifest. This mirrors how solvabilityProof is attached post-deploy (generator/deploy.mjs), so it never reaches the container.
  • Additive. The new column is appended at the end of manifest.csv, so existing consumers of that file are unaffected.

Testing

Verified node -c on both changed files, header/row column-count consistency (14 to 14), and the generator_models formatting against a sample byModel. I did not run a full end-to-end deploy on this branch (no ANTHROPIC_API_KEY in my environment), but the change only persists getUsageReport() output that deploy.mjs already computes and logs via logUsage() — no new generation behaviour.

Adds a `generator_models` column to runs/<id>/manifest.csv and a
`generation` block (token usage by model, from the existing
getUsageReport()) to the per-cell local manifest, so a run records which
model(s) actually produced each deployed cell.

Local manifest only — deliberately NOT baked into the container. RCE/LFI
classes can read the baked manifest, and the generating model is metadata
the model under test must not see. Mirrors how solvabilityProof is
attached post-deploy.

Additive column appended at the end of the CSV, so existing manifest.csv
consumers are unaffected.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant