Goal
Introduce role separation (Spec / Code / Test) into the autonomous agent system to prevent self-consistent but incorrect changes (e.g. agent modifying tests to fit incorrect implementations).
This establishes a minimal "separation of concerns / checks and balances" model while keeping the system lean and iterative (no heavy spec framework).
Background / Problem
Currently, the agent:
- reads issues (spec)
- implements code
- modifies tests if needed
This leads to a critical failure mode:
The agent can silently "make the system green" by modifying tests instead of fixing the implementation.
This breaks the implicit assumption that:
- tests are a stable source of truth
- acceptance criteria are invariant
Target Model
Introduce three logical roles:
1. Spec Layer (human + optional agent)
- defines intent (issue, acceptance criteria)
- is the source of truth for behavior
- may update tests intentionally
2. Coding Agent (existing system, modified)
- implements code to satisfy spec
- must NOT modify tests or acceptance criteria
3. Test / Evaluation Layer (initially implicit)
- validates that implementation satisfies spec
- ensures invariants are not broken
Core Change (Phase 1 – Minimal Implementation)
1. Protect Tests from Modification
Constraint:
-
Agent MUST NOT modify:
- existing test files
- acceptance criteria sections in issues
Implementation:
- Detect test files (e.g. by path:
/Tests/, *.Tests.cs, etc.)
- Exclude them from writable file set
- OR fail execution if diff includes test changes
2. Enforce Read-Only Acceptance Criteria
In issue parsing:
-
Sections like:
## Success Criteria
## Acceptance Criteria
must be treated as immutable constraints
Agent may:
- read them
- reason about them
Agent must NOT:
- rewrite or reinterpret them in output
- "optimize" them away
3. Fail Fast on Spec Violations
If agent cannot satisfy tests without modifying them:
→ It should:
- stop execution
- create a PR with explanation OR comment on issue
Example message:
"Cannot satisfy existing tests without modifying them. Spec or implementation assumptions may be inconsistent."
Optional (Phase 2 – Lightweight Evaluator)
Introduce a simple evaluation step:
After code generation:
-
compare:
- acceptance criteria
- test coverage
- implementation
Basic checks:
- Are all acceptance criteria referenced in changes?
- Are new behaviors untested?
This can initially be:
- a post-step prompt
- not a separate agent yet
Non-Goals
- No full spec-driven framework (no spec-kit, no multi-file specs)
- No heavy workflow changes
- No branching per spec
- No "spec as source"
This remains:
lean, issue-driven, incremental development
Success Criteria
Future Extensions (not part of this issue)
- Separate evaluator agent
- Feature-level persistent specs
- Diff-based semantic validation (spec vs implementation)
- Agent-to-agent review loop
Rationale
This introduces controlled resistance into the system:
Instead of:
one agent defining, implementing, and validating truth
We move toward:
truth emerging from constraints and role separation
This is a minimal step toward a more robust autonomous development system without adding heavy process overhead.
Goal
Introduce role separation (Spec / Code / Test) into the autonomous agent system to prevent self-consistent but incorrect changes (e.g. agent modifying tests to fit incorrect implementations).
This establishes a minimal "separation of concerns / checks and balances" model while keeping the system lean and iterative (no heavy spec framework).
Background / Problem
Currently, the agent:
This leads to a critical failure mode:
This breaks the implicit assumption that:
Target Model
Introduce three logical roles:
1. Spec Layer (human + optional agent)
2. Coding Agent (existing system, modified)
3. Test / Evaluation Layer (initially implicit)
Core Change (Phase 1 – Minimal Implementation)
1. Protect Tests from Modification
Constraint:
Agent MUST NOT modify:
Implementation:
/Tests/,*.Tests.cs, etc.)2. Enforce Read-Only Acceptance Criteria
In issue parsing:
Sections like:
## Success Criteria## Acceptance Criteriamust be treated as immutable constraints
Agent may:
Agent must NOT:
3. Fail Fast on Spec Violations
If agent cannot satisfy tests without modifying them:
→ It should:
Example message:
Optional (Phase 2 – Lightweight Evaluator)
Introduce a simple evaluation step:
After code generation:
compare:
Basic checks:
This can initially be:
Non-Goals
This remains:
Success Criteria
Future Extensions (not part of this issue)
Rationale
This introduces controlled resistance into the system:
Instead of:
We move toward:
This is a minimal step toward a more robust autonomous development system without adding heavy process overhead.