docs(phase3): README + reference docs + validation report by jramos · Pull Request #79 · jramos/agent-self-evolution

jramos · 2026-06-02T16:03:09Z

Documentation and validation report for Phase 3 (system-prompt section evolution), following the Phase 2 sequencing where docs + report ship as a separate PR after the feature lands.

README

New "Evolve a system prompt section" Quick Start subsection (behavioral closed-loop validation, compound verdict, splice-and-restore, --apply, --baseline-override-file).
Phase table: Phase 3 → Validated, linked to the report PDF.

Reference docs (knowledge base)

Phase 3 entries parallel to the existing evolve_tool ones:

interfaces.md — full evolve_prompt_section CLI flag reference.
components.md — orchestrator + supporting modules + the shared validation-stack changes.
workflows.md — Workflow 12 (prompt-section deploy path).
architecture.md — prompts tier + HermesPromptSectionInstaller in the module graph.
codebase_info.md — prompts package + LOC; Tier 3 marked implemented.
data_models.md — prompt-section gate_decision.json shape and the fields it deliberately omits vs the paired-bootstrap path.
index.md — routing rows.

Validation report

generate_report.py gains an additive artifact_type == "prompt_section" branch: behavioral-only runs self-source from gate_decision.json (no metrics.json/run.log), and the _experiment/_results renderers lay out pass-rate / win-loss tables instead of bootstrap / knee-point / synthetic ones. The skill/tool report path is unchanged.
reports/phase3_prose.yaml + reports/phase3_validation_report.pdf, headlined by the adversarial-baseline run (67%→100% holdout pass-rate, 2 wins / 0 losses, section shrank 15.2%).
Honest framing, mirroring Phase 2's weakened-target headline: production MEMORY_GUIDANCE is saturated and correctly default-denies; the improvement is demonstrated on a deliberately-weakened baseline.

No feature code changed — the only code touch is the additive report-generator branch.

Add an 'Evolve a system prompt section' Quick Start subsection (behavioral closed-loop validation, compound verdict, splice-and-restore, --apply, --baseline-override-file) and mark Phase 3 complete in the capabilities table.

…e base components.md (orchestrator + supporting modules + shared validation changes), workflows.md (Workflow 12: prompt-section deploy path), architecture.md (prompts tier + HermesPromptSectionInstaller in the module graph), codebase_info.md (prompts package + LOC + Tier 3 implemented), data_models.md (prompt-section gate_decision shape + the fields it deliberately omits vs the paired-bootstrap path), index.md (routing rows).

Add a prompt-section branch to generate_report.py (behavioral-only runs self-source from gate_decision.json — no metrics.json/run.log; the _experiment and _results renderers lay out pass-rate/win-loss tables instead of bootstrap/knee/synthetic), author reports/phase3_prose.yaml, render reports/phase3_validation_report.pdf from the adversarial-baseline headline run (67%→100% holdout, 2W/0L, section shrank 15.2%), and link it from the README phase table. The skill/tool report path is unchanged (additive artifact_type branch).

jramos added 4 commits June 2, 2026 09:04

docs(readme): document Phase 3 prompt-section evolution

d32e997

Add an 'Evolve a system prompt section' Quick Start subsection (behavioral closed-loop validation, compound verdict, splice-and-restore, --apply, --baseline-override-file) and mark Phase 3 complete in the capabilities table.

docs(interfaces): add evolve_prompt_section CLI reference

ef318c0

jramos merged commit 418396a into main Jun 2, 2026
4 checks passed

jramos deleted the docs/phase3-validation branch June 2, 2026 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(phase3): README + reference docs + validation report#79

docs(phase3): README + reference docs + validation report#79
jramos merged 4 commits into
mainfrom
docs/phase3-validation

jramos commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jramos commented Jun 2, 2026

README

Reference docs (knowledge base)

Validation report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant