Skip to content

docs(phase3): README + reference docs + validation report#79

Merged
jramos merged 4 commits into
mainfrom
docs/phase3-validation
Jun 2, 2026
Merged

docs(phase3): README + reference docs + validation report#79
jramos merged 4 commits into
mainfrom
docs/phase3-validation

Conversation

@jramos
Copy link
Copy Markdown
Owner

@jramos jramos commented Jun 2, 2026

Documentation and validation report for Phase 3 (system-prompt section evolution), following the Phase 2 sequencing where docs + report ship as a separate PR after the feature lands.

README

  • New "Evolve a system prompt section" Quick Start subsection (behavioral closed-loop validation, compound verdict, splice-and-restore, --apply, --baseline-override-file).
  • Phase table: Phase 3 → Validated, linked to the report PDF.

Reference docs (knowledge base)

Phase 3 entries parallel to the existing evolve_tool ones:

  • interfaces.md — full evolve_prompt_section CLI flag reference.
  • components.md — orchestrator + supporting modules + the shared validation-stack changes.
  • workflows.md — Workflow 12 (prompt-section deploy path).
  • architecture.md — prompts tier + HermesPromptSectionInstaller in the module graph.
  • codebase_info.md — prompts package + LOC; Tier 3 marked implemented.
  • data_models.md — prompt-section gate_decision.json shape and the fields it deliberately omits vs the paired-bootstrap path.
  • index.md — routing rows.

Validation report

  • generate_report.py gains an additive artifact_type == "prompt_section" branch: behavioral-only runs self-source from gate_decision.json (no metrics.json/run.log), and the _experiment/_results renderers lay out pass-rate / win-loss tables instead of bootstrap / knee-point / synthetic ones. The skill/tool report path is unchanged.
  • reports/phase3_prose.yaml + reports/phase3_validation_report.pdf, headlined by the adversarial-baseline run (67%→100% holdout pass-rate, 2 wins / 0 losses, section shrank 15.2%).
  • Honest framing, mirroring Phase 2's weakened-target headline: production MEMORY_GUIDANCE is saturated and correctly default-denies; the improvement is demonstrated on a deliberately-weakened baseline.

No feature code changed — the only code touch is the additive report-generator branch.

jramos added 4 commits June 2, 2026 09:04
Add an 'Evolve a system prompt section' Quick Start subsection (behavioral
closed-loop validation, compound verdict, splice-and-restore, --apply,
--baseline-override-file) and mark Phase 3 complete in the capabilities table.
…e base

components.md (orchestrator + supporting modules + shared validation changes),
workflows.md (Workflow 12: prompt-section deploy path), architecture.md (prompts
tier + HermesPromptSectionInstaller in the module graph), codebase_info.md
(prompts package + LOC + Tier 3 implemented), data_models.md (prompt-section
gate_decision shape + the fields it deliberately omits vs the paired-bootstrap
path), index.md (routing rows).
Add a prompt-section branch to generate_report.py (behavioral-only runs
self-source from gate_decision.json — no metrics.json/run.log; the _experiment
and _results renderers lay out pass-rate/win-loss tables instead of
bootstrap/knee/synthetic), author reports/phase3_prose.yaml, render
reports/phase3_validation_report.pdf from the adversarial-baseline headline run
(67%→100% holdout, 2W/0L, section shrank 15.2%), and link it from the README
phase table. The skill/tool report path is unchanged (additive artifact_type
branch).
@jramos jramos merged commit 418396a into main Jun 2, 2026
4 checks passed
@jramos jramos deleted the docs/phase3-validation branch June 2, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant