docs(phase3): README + reference docs + validation report#79
Merged
Conversation
Add an 'Evolve a system prompt section' Quick Start subsection (behavioral closed-loop validation, compound verdict, splice-and-restore, --apply, --baseline-override-file) and mark Phase 3 complete in the capabilities table.
…e base components.md (orchestrator + supporting modules + shared validation changes), workflows.md (Workflow 12: prompt-section deploy path), architecture.md (prompts tier + HermesPromptSectionInstaller in the module graph), codebase_info.md (prompts package + LOC + Tier 3 implemented), data_models.md (prompt-section gate_decision shape + the fields it deliberately omits vs the paired-bootstrap path), index.md (routing rows).
Add a prompt-section branch to generate_report.py (behavioral-only runs self-source from gate_decision.json — no metrics.json/run.log; the _experiment and _results renderers lay out pass-rate/win-loss tables instead of bootstrap/knee/synthetic), author reports/phase3_prose.yaml, render reports/phase3_validation_report.pdf from the adversarial-baseline headline run (67%→100% holdout, 2W/0L, section shrank 15.2%), and link it from the README phase table. The skill/tool report path is unchanged (additive artifact_type branch).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Documentation and validation report for Phase 3 (system-prompt section evolution), following the Phase 2 sequencing where docs + report ship as a separate PR after the feature lands.
README
--apply,--baseline-override-file).Reference docs (knowledge base)
Phase 3 entries parallel to the existing
evolve_toolones:interfaces.md— fullevolve_prompt_sectionCLI flag reference.components.md— orchestrator + supporting modules + the shared validation-stack changes.workflows.md— Workflow 12 (prompt-section deploy path).architecture.md— prompts tier +HermesPromptSectionInstallerin the module graph.codebase_info.md— prompts package + LOC; Tier 3 marked implemented.data_models.md— prompt-sectiongate_decision.jsonshape and the fields it deliberately omits vs the paired-bootstrap path.index.md— routing rows.Validation report
generate_report.pygains an additiveartifact_type == "prompt_section"branch: behavioral-only runs self-source fromgate_decision.json(nometrics.json/run.log), and the_experiment/_resultsrenderers lay out pass-rate / win-loss tables instead of bootstrap / knee-point / synthetic ones. The skill/tool report path is unchanged.reports/phase3_prose.yaml+reports/phase3_validation_report.pdf, headlined by the adversarial-baseline run (67%→100% holdout pass-rate, 2 wins / 0 losses, section shrank 15.2%).MEMORY_GUIDANCEis saturated and correctly default-denies; the improvement is demonstrated on a deliberately-weakened baseline.No feature code changed — the only code touch is the additive report-generator branch.