Problem
The Evolve agent must not make tangential or “helpful” changes that the user did not ask for. This is broader than preserving Nix formatting: if the user asks for one change, the resulting diff should stay scoped to that prompt.
This preserves the distinct acceptance criterion from ENG-210 while moving it under the broader Evolve quality eval suite.
Acceptance Criteria
- Representative eval cases include prompts where the correct behavior is a tightly scoped diff.
- The eval output classifies unrelated edits / prompt-scope violations separately from formatting, protected-file, and normal correctness failures.
- Launch-blocking threshold: zero severe cases where a simple user prompt produces unrelated package, shell, config, or structural changes.
- A post-evolution validation approach is specified: either prompt-level constraint, deterministic diff-vs-prompt checks, reviewer rubric, or a combination.
- Any implementation path keeps the distinction clear between:
Source
Supersedes ENG-210's unique diff-scope requirement without keeping the old “Trust & Safety” framing.
Problem
The Evolve agent must not make tangential or “helpful” changes that the user did not ask for. This is broader than preserving Nix formatting: if the user asks for one change, the resulting diff should stay scoped to that prompt.
This preserves the distinct acceptance criterion from ENG-210 while moving it under the broader Evolve quality eval suite.
Acceptance Criteria
Source
Supersedes ENG-210's unique diff-scope requirement without keeping the old “Trust & Safety” framing.