Generation pipeline: speed + quality ideas that preserve unpredictability

A few generation-pipeline ideas that aim to improve build speed and quality **without** reducing the randomness/unpredictability that makes the targets contamination-resistant. PR #5 already does the first one; the rest are written up here for discussion rather than as code.

## Guiding idea: separate the entropy budget from the LLM budget

The unpredictability that makes a cell hard to memorise can come from cheap, seeded, logged random draws; the LLM is best reserved for the parts that need natural-language realism (theme copy, scenario prose). Keeping those two budgets separate makes generation faster and cheaper, lets a specific cell be reproduced for debugging, and even lets you *increase* entropy without spending tokens. Most of the ideas below are facets of this.

## 1. Parallelise the independent asset steps — done in #5

Chrome, decoys, and 404 depend only on (theme, scenario) and not on each other, so they can generate concurrently. See PR #5.

## 2. Route by step criticality, not a blanket quality flag

The scenario step is the only schema-strict, retry-heavy one; everything else (theme, chrome, 404) is cosmetic. Routing *scenario* to the strong model and the cosmetic steps to the fast model gets the quality where it matters without paying for it everywhere. (This would also be a natural place to retire the hardcoded `QUALITY_MODEL = 'claude-opus-4-7'`.)

## 3. On a schema-bound violation, reprompt — don't clamp

When a generated scenario draw violates the schema's array bounds, the safest-looking fix is to clamp the arrays to fit. But clamping biases the output toward the clamp values and quietly shrinks the distribution — i.e. it spends exactly the unpredictability you want to keep. Rejection-sampling (reprompt and redraw) preserves the distribution. A clamp/repair pass is fine as a *last-resort* fallback after N reprompts, but reprompt should be the first move.

## 4. Partial regeneration on validation failure

When solvability / discovery / negative-control fails post-deploy, regenerating the whole scenario throws away the parts that were fine. Feeding the specific failure back and regenerating only the broken element recovers faster with no quality cost.

## 5. Seeded-but-logged RNG

Recording the per-deploy random seed (in the local, non-baked manifest) keeps every deploy fully random while letting a researcher reproduce a specific cell exactly for debugging. Zero entropy cost.

Happy to turn any of 2–5 into PRs if the direction is welcome.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation pipeline: speed + quality ideas that preserve unpredictability #6

Guiding idea: separate the entropy budget from the LLM budget

1. Parallelise the independent asset steps — done in #5

2. Route by step criticality, not a blanket quality flag

3. On a schema-bound violation, reprompt — don't clamp

4. Partial regeneration on validation failure

5. Seeded-but-logged RNG

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Generation pipeline: speed + quality ideas that preserve unpredictability #6

Description

Guiding idea: separate the entropy budget from the LLM budget

1. Parallelise the independent asset steps — done in #5

2. Route by step criticality, not a blanket quality flag

3. On a schema-bound violation, reprompt — don't clamp

4. Partial regeneration on validation failure

5. Seeded-but-logged RNG

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions