Skip to content

Role separation spec/code/test #2

@aignermax

Description

@aignermax

Goal

Introduce role separation (Spec / Code / Test) into the autonomous agent system to prevent self-consistent but incorrect changes (e.g. agent modifying tests to fit incorrect implementations).

This establishes a minimal "separation of concerns / checks and balances" model while keeping the system lean and iterative (no heavy spec framework).


Background / Problem

Currently, the agent:

  • reads issues (spec)
  • implements code
  • modifies tests if needed

This leads to a critical failure mode:

The agent can silently "make the system green" by modifying tests instead of fixing the implementation.

This breaks the implicit assumption that:

  • tests are a stable source of truth
  • acceptance criteria are invariant

Target Model

Introduce three logical roles:

1. Spec Layer (human + optional agent)

  • defines intent (issue, acceptance criteria)
  • is the source of truth for behavior
  • may update tests intentionally

2. Coding Agent (existing system, modified)

  • implements code to satisfy spec
  • must NOT modify tests or acceptance criteria

3. Test / Evaluation Layer (initially implicit)

  • validates that implementation satisfies spec
  • ensures invariants are not broken

Core Change (Phase 1 – Minimal Implementation)

1. Protect Tests from Modification

Constraint:

  • Agent MUST NOT modify:

    • existing test files
    • acceptance criteria sections in issues

Implementation:

  • Detect test files (e.g. by path: /Tests/, *.Tests.cs, etc.)
  • Exclude them from writable file set
  • OR fail execution if diff includes test changes

2. Enforce Read-Only Acceptance Criteria

In issue parsing:

  • Sections like:

    • ## Success Criteria
    • ## Acceptance Criteria

must be treated as immutable constraints

Agent may:

  • read them
  • reason about them

Agent must NOT:

  • rewrite or reinterpret them in output
  • "optimize" them away

3. Fail Fast on Spec Violations

If agent cannot satisfy tests without modifying them:

→ It should:

  • stop execution
  • create a PR with explanation OR comment on issue

Example message:

"Cannot satisfy existing tests without modifying them. Spec or implementation assumptions may be inconsistent."


Optional (Phase 2 – Lightweight Evaluator)

Introduce a simple evaluation step:

After code generation:

  • compare:

    • acceptance criteria
    • test coverage
    • implementation

Basic checks:

  • Are all acceptance criteria referenced in changes?
  • Are new behaviors untested?

This can initially be:

  • a post-step prompt
  • not a separate agent yet

Non-Goals

  • No full spec-driven framework (no spec-kit, no multi-file specs)
  • No heavy workflow changes
  • No branching per spec
  • No "spec as source"

This remains:

lean, issue-driven, incremental development


Success Criteria

  • Agent does NOT modify any test files
  • Agent fails or reports instead of rewriting tests
  • Acceptance criteria are preserved exactly as written
  • Existing issues still execute successfully
  • At least one case observed where agent previously modified tests → now correctly fails

Future Extensions (not part of this issue)

  • Separate evaluator agent
  • Feature-level persistent specs
  • Diff-based semantic validation (spec vs implementation)
  • Agent-to-agent review loop

Rationale

This introduces controlled resistance into the system:

Instead of:

one agent defining, implementing, and validating truth

We move toward:

truth emerging from constraints and role separation

This is a minimal step toward a more robust autonomous development system without adding heavy process overhead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions