From d32e9976376ea4ce0a953bb2e17a9da12ac6d9fd Mon Sep 17 00:00:00 2001
From: Justin Ramos <justin.ramos@gmail.com>
Date: Tue, 2 Jun 2026 09:04:07 -0600
Subject: [PATCH 1/4] docs(readme): document Phase 3 prompt-section evolution

Add an 'Evolve a system prompt section' Quick Start subsection (behavioral
closed-loop validation, compound verdict, splice-and-restore, --apply,
--baseline-override-file) and mark Phase 3 complete in the capabilities table.
---
 README.md | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index e145a91..60386b8 100644
--- a/README.md
+++ b/README.md
@@ -169,6 +169,22 @@ The framework parses every `*_SCHEMA = {...}` and `*_SCHEMAS = [...]` declaratio
 
 With `--apply`, the evolved description is spliced into the source file's bytes at the original position — comments, formatting, and unrelated tools are untouched. Multi-line parenthesized concatenations collapse to a single triple-quoted string at the same indent.
 
+### Evolve a system prompt section
+
+For Hermes Agent, evolve a named section of the assembled system prompt — any top-level string constant in `agent/prompt_builder.py` (e.g. `MEMORY_GUIDANCE`, which governs when and what the agent saves to memory):
+
+```bash
+uv run python -m evolution.prompts.evolve_prompt_section \
+    --section MEMORY_GUIDANCE \
+    --hermes-repo /path/to/hermes-agent \
+    --tasks evolution/validation/suites/memory_guidance.jsonl \
+    --iterations 10
+```
+
+Unlike skill and tool evolution — where the deploy gate can lean on a synthetic LLM-judge signal — a prompt section is evaluated **purely behaviorally**: every candidate is spliced into the live `prompt_builder.py` and scored by running the real agent (`hermes -z`) against the task suite. The verdict is compound — Layer 1 checks whether the agent invoked the expected tool (e.g. `memory`), and Layer 2 runs an LLM judge over the saved content against each task's `expected_save_content` rubric. The candidate is spliced in only for the duration of the run; the file is restored byte-for-byte afterward (atomic backup + flock + checksum-drift detection, shared with the tool-description path).
+
+`--apply` writes the evolved section into `prompt_builder.py` in place; results land in `output/prompts/<section>/<timestamp>/`. PR automation (`--create-pr`) is not yet wired for prompt sections — use `--apply` plus a manual PR. To demonstrate the loop on an already-tuned section (which the saturation pre-flight will otherwise correctly default-deny as having no headroom), `--baseline-override-file` starts evolution from arbitrary text — e.g. a deliberately-weakened baseline that gives GEPA real failures to learn from.
+
 ### Mine real session history for evals
 
 For skill evolution:
@@ -331,7 +347,7 @@ Cost: each task is one `hermes -z` run (~$0.05–$0.50). The bundled `patch.json
 |-------|--------|--------|--------|
 | **Phase 1** | Skill files (SKILL.md) | DSPy + GEPA | ✅ [Validated](reports/phase1_validation_report.pdf) |
 | **Phase 2** | Tool descriptions + dual-signal deploy gate | DSPy + GEPA | ✅ [Validated](reports/phase2_validation_report.pdf) |
-| **Phase 3** | System prompt sections | DSPy + GEPA | 🔲 Planned |
+| **Phase 3** | System prompt sections | DSPy + GEPA | ✅ Complete |
 | **Phase 4** | Tool implementation code | Darwinian Evolver | 🔲 Planned |
 | **Phase 5** | Continuous improvement loop | Automated pipeline | 🔲 Planned |
 

From ef318c0797e60586484dedd851ba8176e8c9ba58 Mon Sep 17 00:00:00 2001
From: Justin Ramos <justin.ramos@gmail.com>
Date: Tue, 2 Jun 2026 09:05:52 -0600
Subject: [PATCH 2/4] docs(interfaces): add evolve_prompt_section CLI reference

---
 docs/interfaces.md | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/docs/interfaces.md b/docs/interfaces.md
index 412291b..4476fd3 100644
--- a/docs/interfaces.md
+++ b/docs/interfaces.md
@@ -140,6 +140,46 @@ Evolves one tool's top-level `description` field inside an MCP-shape manifest. T
 - `sys.exit(1)` if the holdout split has fewer than `min_holdout_size` (default 10) examples.
 - Returns normally (rejection path) if static or growth-quality gate fails — `evolved_FAILED.json` + `gate_decision.json` are written.
 
+## CLI: `python -m evolution.prompts.evolve_prompt_section`
+
+Evolves one named section of an agent's system prompt — a top-level string constant in Hermes Agent's `agent/prompt_builder.py` (e.g. `MEMORY_GUIDANCE`). Unlike the skill and tool paths, evaluation is **purely behavioral**: there is no synthetic LLM-judge signal. Every candidate is spliced into the live `prompt_builder.py` and scored by running the real agent (`hermes -z`) against the task suite, so the deploy gate is a `ClosedLoopValidator` run (pass-rate + win/loss), not a paired-bootstrap CI over judge scores.
+
+The verdict is **compound**: Layer 1 is the same `expected_tools` / `forbidden_tools` membership rule as the closed-loop tool path; Layer 2 is an LLM judge that scores each `memory(action=add|replace)` call's content against the task's `expected_save_content` rubric (only tasks that declare a rubric are Layer-2 judged). The candidate is spliced in for the duration of the run and the file is restored byte-for-byte afterward, reusing the tool-path backup + flock + checksum-drift machinery.
+
+### Required flags
+| Flag | Purpose |
+|---|---|
+| `--section <name>` | The `prompt_builder.py` top-level string constant to evolve (e.g. `MEMORY_GUIDANCE`). Dict-typed constants (e.g. `PLATFORM_HINTS`) are not supported. |
+| `--hermes-repo <path>` | Path to your hermes-agent checkout. `agent/prompt_builder.py` inside it is the splice/restore target. |
+| `--tasks <path>` | JSONL eval suite (e.g. `evolution/validation/suites/memory_guidance.jsonl`). Same task shape as the closed-loop tool suite, plus an optional `expected_save_content` rubric per task for Layer 2. Must contain ≥2 tasks (so the split yields a non-empty trainset and holdout). |
+
+### Optional flags
+| Flag | Default | Notes |
+|---|---|---|
+| `--iterations <int>` | `10` | GEPA `max_full_evals`. |
+| `--holdout-ratio <float>` | `0.5` | Fraction of tasks held out for the deploy gate. Clamped to keep both the trainset and holdout non-empty. |
+| `--seed <int>` | `42` | RNG seed for the train/holdout split and GEPA. |
+| `--max-growth <float>` | `0.2` | Section length budget as a fraction over the baseline; framed to the `PromptSectionProposer` so candidates stay near the baseline length (set higher when evolving from a short baseline that needs to grow). |
+| `--optimizer-model` / `--reflection-model` / `--eval-model <name>` | config default | Per-role LiteLLM model overrides; resolved like the other CLIs. `--eval-model` is the Layer 2 content judge. |
+| `--agent-model <name>` | config default | The model the `hermes -z` agent itself runs as. A deliberately weaker agent exposes more behavioral signal (a strong agent saturates the suite regardless of the prompt). LiteLLM provider prefixes are stripped before `hermes -m`. |
+| `--layer2-threshold <float>` | `0.7` | Minimum mean content-judge score for a save task to pass Layer 2. |
+| `--task-timeout-seconds <int>` | `120` | Per-task wall-clock cap for `hermes -z`. Timeouts abstain (don't tip the decision). |
+| `--max-cost-usd <float>` | `150.0` | Abort cleanly when cumulative **in-process** LM cost (judge + reflection + the passthrough predictor) exceeds this. The agent's own LM spend happens inside the `hermes` child process and is not captured by this ceiling. |
+| `--gepa-minibatch-size <int>` | `3` | GEPA reflective minibatch size; same meaning as the other paths. |
+| `--gepa-acceptance {improvement-or-equal,strict-improvement}` | `improvement-or-equal` | Same meaning as the other paths. |
+| `--apply` | off | On a deploy decision, write the evolved section into `prompt_builder.py` in place (byte-precise AST splice, `ast.parse`-guarded, atomic). |
+| `--create-pr` | off | **Deferred for prompt sections** — accepted and recorded as a `skipped` PR block in `gate_decision.json`, but no PR is opened (copying a full evolved `prompt_builder.py` over `origin/<base>` would carry unrelated local changes into the diff). Use `--apply` + a manual PR. |
+| `--baseline-override-file <path>` | off | Start evolution from this text instead of the live section. The live section is still the splice/restore target (backed up + restored); `--apply` still writes the evolved text. Use it to create headroom on an already-tuned section (e.g. a deliberately-weakened baseline) or for regression-injection ablations. |
+| `--skip-saturation-check` | off | Skip the saturation pre-flight entirely. |
+| `--force-saturation-check` | off | Run the pre-flight, render the panel, but proceed regardless of band — required to override a non-`healthy` verdict non-interactively. |
+| `--dry-run` | off | Resolve the baseline + build the modules, then stop — exercises wiring with no LM/agent calls. Writes a `decision="dry_run"` `gate_decision.json`. |
+| `--output-dir <path>` | `output/prompts/<section>/<timestamp>/` | Where `gate_decision.json` and the baseline/evolved section text files land. |
+
+### Exit conditions
+- `0` on a `deploy` decision (or a `--dry-run`).
+- `1` on `reject` (the holdout deploy gate found a regression), `denied` (saturated baseline default-denied non-interactively), or `aborted` (cost ceiling).
+- `ValueError` at startup if the suite has fewer than 2 tasks.
+
 ## CLI: `python -m evolution.core.external_importers`
 
 Standalone session-history importer. Useful for previewing what `--eval-source sessiondb` would produce without running the full evolution.

From 04a3efb73efb95253ad9c7c1ba24d2a143031998 Mon Sep 17 00:00:00 2001
From: Justin Ramos <justin.ramos@gmail.com>
Date: Tue, 2 Jun 2026 09:13:25 -0600
Subject: [PATCH 3/4] docs(reference): add Phase 3 prompt-section evolution to
 the knowledge base

components.md (orchestrator + supporting modules + shared validation changes),
workflows.md (Workflow 12: prompt-section deploy path), architecture.md (prompts
tier + HermesPromptSectionInstaller in the module graph), codebase_info.md
(prompts package + LOC + Tier 3 implemented), data_models.md (prompt-section
gate_decision shape + the fields it deliberately omits vs the paired-bootstrap
path), index.md (routing rows).
---
 docs/architecture.md  |  32 ++++++++-
 docs/codebase_info.md |  30 +++++---
 docs/components.md    |  49 ++++++++++++-
 docs/data_models.md   |  73 ++++++++++++++++++++
 docs/index.md         |   3 +-
 docs/workflows.md     | 157 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 330 insertions(+), 14 deletions(-)

diff --git a/docs/architecture.md b/docs/architecture.md
index 772a94b..8e73e94 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -56,10 +56,20 @@ graph TB
         hermes_source[tools.hermes_source<br/>Hermes *_SCHEMA AST adapter]
     end
 
+    subgraph prompts_tier[Prompt Tier]
+        evolve_prompt[prompts.evolve_prompt_section<br/>main + evolve]
+        prompt_module[prompts.prompt_module<br/>PromptModule + sentinels]
+        prompt_proposer[prompts.prompt_proposer<br/>PromptSectionProposer]
+        prompt_judge[prompts.prompt_judge<br/>SaveCallJudge + judge_save_calls<br/>+ prompt fitness/splice scorer]
+        prompt_source[prompts.prompt_source<br/>PromptSource protocol + SectionDescriptor]
+        hermes_prompt_source[prompts.hermes_prompt_source<br/>HermesPromptSource — prompt_builder.py AST]
+    end
+
     subgraph validation_subsystem[Closed-loop validation]
         validator[validation.validator<br/>ClosedLoopValidator]
         hermes_runner[validation.hermes_runner<br/>hermes -z subprocess]
-        installer[validation.artifact_installer<br/>HermesToolDescriptionInstaller]
+        installer[validation.artifact_installer<br/>HermesToolDescriptionInstaller +<br/>HermesPromptSectionInstaller]
+        savejudge[validation.report<br/>score_task Layer-2 judge hook]
         report[validation.report<br/>ValidationReport + decision]
         task[validation.task<br/>Task + TaskSuite]
         cl_cli[validation.closed_loop<br/>CLI]
@@ -117,10 +127,26 @@ graph TB
     tool_judge --> fitness
     tool_proposer --> budget
 
+    evolve_prompt --> prompt_module
+    evolve_prompt --> prompt_proposer
+    evolve_prompt --> prompt_judge
+    evolve_prompt --> prompt_source
+    evolve_prompt --> hermes_prompt_source
+    evolve_prompt --> config
+    evolve_prompt --> quality
+    evolve_prompt --> timing
+    evolve_prompt --> validator
+    hermes_prompt_source --> prompt_source
+    prompt_module --> dspy
+    prompt_proposer --> budget
+    prompt_judge --> fitness
+    installer --> hermes_prompt_source
+
     validator --> hermes_runner
     validator --> installer
     validator --> report
     validator --> task
+    validator --> savejudge
     cl_cli --> validator
     hermes_runner --> hermes
 
@@ -138,7 +164,9 @@ graph TB
     importers --> dataset
 ```
 
-`evolution/core/` has no dependency on `evolution/skills/`, `evolution/tools/`, or `evolution/validation/`. The reverse holds: tier packages use core helpers but core never imports from a tier package. `closed_loop_feedback.py` imports `evolution.validation.*` types because it's the integration seam, but the validation subpackage doesn't import from skills/tools. This keeps the tier-3/4/5 expansion path open.
+`evolution/core/` has no dependency on `evolution/skills/`, `evolution/tools/`, `evolution/prompts/`, or `evolution/validation/`. The reverse holds: tier packages use core helpers but core never imports from a tier package. `closed_loop_feedback.py` imports `evolution.validation.*` types because it's the integration seam, but the validation subpackage doesn't import from skills/tools/prompts. This keeps the tier-4/5 expansion path open.
+
+The `prompts` tier (Phase 3) is the prompt-section evolution path: `evolve_prompt_section` wraps a named `prompt_builder.py` constant as a `PromptModule` (a passthrough predictor carrying the candidate in sentinel-delimited instructions), mutates it with `PromptSectionProposer`, and — because there is no synthetic classification signal for a system-prompt section — scores **purely behaviorally** through the closed-loop validator running a real `hermes -z` against a curated JSONL suite. The deploy gate is therefore a closed-loop pass-rate / win-loss decision, not a paired-bootstrap one. Unlike the skill/tool tiers it reuses `ClosedLoopValidator` directly rather than going through `closed_loop_feedback.py`, and it integrates by AST-splicing the candidate into the live `agent/prompt_builder.py` (`HermesPromptSectionInstaller`) with atomic restore. The Layer-2 content judge (`SaveCallJudge` / `judge_save_calls`) runs inside `score_task` to grade memory-save *content* on top of the Layer-1 trigger-membership check.
 
 ## Design patterns in active use
 
diff --git a/docs/codebase_info.md b/docs/codebase_info.md
index 2900d46..c16decc 100644
--- a/docs/codebase_info.md
+++ b/docs/codebase_info.md
@@ -67,14 +67,20 @@ evolution/
 │   └── tool_judge.py                    # tool-flavored LLMJudge + GEPA-shaped metric
 ├── validation/                          # closed-loop validation against a real agent
 │   ├── agent_runner.py                  # AgentRunner Protocol + AgentRunResult dataclass
-│   ├── artifact_installer.py            # ArtifactInstaller Protocol + HermesToolDescriptionInstaller
+│   ├── artifact_installer.py            # ArtifactInstaller Protocol + HermesToolDescriptionInstaller + HermesPromptSectionInstaller
 │   ├── closed_loop.py                   # CLI: drive baseline + evolved through hermes -z, compare
-│   ├── hermes_runner.py                 # HermesAgentRunner — subprocess hermes -z with sandboxed HOME
-│   ├── report.py                        # ValidationReport + TaskResult + decision rule
-│   ├── suites/                          # JSONL task suites (patch.jsonl, write_file.jsonl, search_files.jsonl)
+│   ├── hermes_runner.py                 # HermesAgentRunner — subprocess hermes -z; reads sessions from SQLite state.db (parse_session_from_db)
+│   ├── report.py                        # ValidationReport + TaskResult + decision rule + Layer-2 SaveCallJudge in score_task
+│   ├── suites/                          # JSONL task suites (patch.jsonl, write_file.jsonl, search_files.jsonl, memory_guidance.jsonl)
 │   ├── task.py                          # Task + TaskSuite.from_jsonl (with sha256 audit)
 │   └── validator.py                     # ClosedLoopValidator.validate — mutates + restores live agent file
-├── prompts/                             # Tier 3: planned, empty package
+├── prompts/                             # Tier 3: system-prompt-section evolution
+│   ├── evolve_prompt_section.py         # CLI + orchestration; purely-behavioral closed-loop gate
+│   ├── prompt_source.py                 # PromptSource Protocol (read + write) + SectionDescriptor
+│   ├── hermes_prompt_source.py          # HermesPromptSource — AST read/write of prompt_builder.py constants
+│   ├── prompt_module.py                 # PromptModule — passthrough predictor carrying candidate in sentinels
+│   ├── prompt_proposer.py               # PromptSectionProposer — sentinel-preserving GEPA proposer
+│   └── prompt_judge.py                  # SaveCallJudge + judge_save_calls Layer-2 content judge + fitness/splice scorers
 ├── code/                                # Tier 4: planned, empty package
 └── monitor/                             # planned, empty package
 ```
@@ -86,6 +92,7 @@ evolution/
 | `evolution/skills/evolve_skill.py` | ~1340 | CLI, orchestration, gate-decision payload assembly |
 | `evolution/tools/evolve_tool.py` | ~1170 | CLI + orchestration for tool-description evolution |
 | `evolution/core/external_importers.py` | ~770 | 3 importers + relevance filter + standalone CLI |
+| `evolution/prompts/evolve_prompt_section.py` | ~660 | CLI + orchestration; purely-behavioral closed-loop deploy gate |
 | `evolution/core/dataset_builder.py` | ~480 | synthetic generator + golden loader + tool-selection three-bucket gen |
 | `evolution/core/lm_timing_callback.py` | ~400 | DSPy BaseCallback + litellm.failure_callback + cost ledger |
 | `evolution/core/fitness.py` | ~380 | LLMJudge + skill/tool fitness metrics + behavioral score helper |
@@ -94,6 +101,7 @@ evolution/
 | `evolution/core/closed_loop_feedback.py` | ~320 | cache + saturation gate + deterministic feedback block + `force_run` (bypasses gate for pre-flight) |
 | `evolution/core/saturation_check.py` | ~255 | pre-flight: band classifier + `SaturationReport` + Rich panel + interactive confirm |
 | `evolution/tools/tool_judge.py` | ~230 | tool-flavored judge + GEPA-shaped metric with behavioral branch |
+| `evolution/prompts/prompt_judge.py` | ~230 | SaveCallJudge + judge_save_calls Layer-2 content judge + prompt fitness/splice scorers |
 | `evolution/validation/validator.py` | ~220 | mutate + restore live agent file with flock + checksum drift check |
 | `evolution/validation/report.py` | ~225 | ValidationReport JSON + Rich rendering + two-condition decision |
 | `evolution/core/skill_sources.py` | ~210 | Hermes / Claude Code / LocalDir |
@@ -101,15 +109,19 @@ evolution/
 | `evolution/skills/knee_point.py` | ~205 | parsimony-based candidate picker |
 | `evolution/validation/hermes_runner.py` | ~205 | hermes -z subprocess with sandboxed HOME |
 | `evolution/tools/tool_proposer.py` | ~200 | sentinel-preserving reflection prompt |
-| `evolution/validation/artifact_installer.py` | ~150 | byte-precise splice + atomic restore |
+| `evolution/prompts/prompt_proposer.py` | ~160 | sentinel-preserving GEPA proposer for prompt sections |
+| `evolution/validation/artifact_installer.py` | ~150 | byte-precise splice + atomic restore (tool + prompt-section installers) |
+| `evolution/prompts/hermes_prompt_source.py` | ~135 | AST read/write of prompt_builder.py string constants |
+| `evolution/prompts/prompt_module.py` | ~120 | PromptModule passthrough predictor + sentinel parse |
 | `evolution/validation/closed_loop.py` | ~135 | standalone closed-loop CLI |
 | `evolution/skills/skill_module.py` | ~125 | wraps SKILL.md as `dspy.Module` |
 | `evolution/validation/task.py` | ~90 | Task + TaskSuite.from_jsonl |
 | `evolution/core/config.py` | ~80 | `EvolutionConfig` dataclass |
 | `evolution/core/stats.py` | ~60 | `paired_bootstrap` helper |
+| `evolution/prompts/prompt_source.py` | ~55 | PromptSource Protocol + SectionDescriptor |
 | `evolution/validation/agent_runner.py` | ~55 | AgentRunner Protocol + dataclasses |
 | `evolution/core/behavioral_example.py` | ~35 | builder for behavioral dspy.Examples |
-| **Total** | **~9,000** | excludes empty `__init__.py` shims |
+| **Total** | **~10,400** | excludes empty `__init__.py` shims |
 
 Test suite: 61 test files under `tests/core/`, `tests/skills/`, `tests/tools/`, `tests/validation/`. **1166 tests** collected.
 
@@ -139,11 +151,11 @@ The README's table summarizes intent; reality:
 |---|---|---|---|
 | 1 | Skill files (SKILL.md) | DSPy + GEPA | ✅ implemented in `evolution/skills/` |
 | 2 | Tool descriptions | DSPy + GEPA | ✅ implemented in `evolution/tools/` — MCP-JSON and Hermes-Python-AST adapters; one target tool per run |
-| 3 | System prompt sections | DSPy + GEPA | 🔲 `evolution/prompts/` package exists, empty |
+| 3 | System prompt sections | DSPy + GEPA | ✅ implemented in `evolution/prompts/` — AST splice of `prompt_builder.py` constants; purely-behavioral closed-loop deploy gate (no synthetic signal) |
 | 4 | Tool implementation code | Darwinian Evolver | 🔲 `evolution/code/` package exists, empty; `[darwinian]` extra reserves the dep |
 | 5 | Continuous improvement loop | Automated pipeline | 🔲 `evolution/monitor/` package exists, empty |
 
-Tiers 1 and 2 are built. Tier 3-5 packages exist as empty stubs to anchor the planned architecture. See PLAN.md's per-phase "Deviations from plan" subsections for where the built tiers diverge from the original spec.
+Tiers 1-3 are built. Tier 4-5 packages exist as empty stubs to anchor the planned architecture. See PLAN.md's per-phase "Deviations from plan" subsections for where the built tiers diverge from the original spec.
 
 **Orthogonal validation surface.** `evolution/validation/` runs a real agent (`hermes -z`) through a JSONL task suite with baseline vs evolved artifacts spliced into the live install. Scores actual tool-selection behavior with `expected_tools` / `forbidden_tools` per task; compares with a two-condition decision rule. Available three ways:
 
diff --git a/docs/components.md b/docs/components.md
index 8821142..2d3c85c 100644
--- a/docs/components.md
+++ b/docs/components.md
@@ -368,6 +368,51 @@ Score is **never** modified by `pred_trace` enrichment — GEPA enforces score e
 
 **Cost ceiling + benchmark hook (shared with `evolve_skill`):** `--max-total-cost-usd` participates in the same `CostLedger` kill switch (see `lm_timing_callback.py`); `--benchmark-cmd` is a post-gate shell hook whose env vars include `EVOLVED_PATH` / `BASELINE_PATH` pointing at the rendered manifest JSONs and `ARTIFACT_TYPE="tool_description"`. Both write structured blocks into `gate_decision.json` — see `data_models.md`.
 
+## evolution/prompts/evolve_prompt_section.py — CLI + orchestrator
+
+**Owns:** the end-to-end `evolve_prompt_section()` flow and the Click CLI (`main`) for evolving a named system-prompt section — a top-level string constant in Hermes `agent/prompt_builder.py` (e.g. `MEMORY_GUIDANCE`). The phase-3 analogue of `evolve_tool`, but with a fundamentally different eval substrate: there is no cheap synthetic classification GEPA can score, so **every** candidate is spliced into the live `prompt_builder.py` and run through a real `hermes -z` subprocess. The deploy gate is therefore a `ClosedLoopValidator` win/loss decision, not a paired-bootstrap CI.
+
+**Public surface:**
+- `main()` — Click command. CLI flags map onto `evolve_prompt_section()` kwargs.
+- `evolve_prompt_section(section_name, hermes_repo, tasks_path, ...) -> dict` — orchestrator function. Importable and used directly by tests.
+
+**Integration model — in-place splice + atomic restore.** Unlike skills (separate writable workdir) there is no env-var hook or plugin seam: the section is a constant inside Hermes' own source, so the framework edits that file in place and restores it. The whole evolution runs inside `_prompt_builder_guard(target_path)` — a context manager that takes an atomic `.cl_backup` (`_BACKUP_SUFFIX`), grabs an exclusive `fcntl.flock` on `.cl_validation.lock` (`_LOCK_FILENAME`) in the target's parent dir, and byte-restores the original on exit (refusing to start on a stale backup or a held lock). These are the *same* lock + backup names `ClosedLoopValidator` uses, so the guard is sequenced *before* the deploy-gate validator, never nested. The deploy gate then re-acquires the lock itself.
+
+**Phases inside `evolve_prompt_section()`:**
+1. Resolve baseline: `HermesPromptSource.read(section_name)` validates the section is a top-level string constant, then reads its text — or `--baseline-override-file` supplies starting text (a deliberately-weakened baseline for headroom, or a regression ablation) while the *live* file is still backed up/restored and `--apply` still writes the live section.
+2. Train/holdout split of the JSONL suite (`_split_train_holdout`, deterministic shuffle+seed, ≥1 task each side; suites with <2 tasks are rejected).
+3. Build the eval stack: `SaveCallJudge` + a per-task Layer-2 factory (`_make_layer2_factory`, binds each task's `expected_save_content` rubric + message into a `score_task`-shaped scorer; returns `None` for tasks with no rubric) → `HermesPromptSectionInstaller` + `HermesAgentRunner` + a `make_memoizing_splice_scorer` over `install_candidate` / `score_task_id`, serialized under a `threading.Lock`.
+4. `dspy.configure(lm=eval_lm)` sets the **global** default LM (not just `dspy.context`) so the passthrough predictor resolves an LM inside GEPA's worker threads — without it, `forward()`'s passthrough call raises "No LM is loaded" in those threads, yielding no trajectories and no proposal.
+5. Inside `_prompt_builder_guard`: saturation pre-flight (baseline behavior on the holdout; aborts/denies on a non-`healthy` band unless `--force-saturation-check`, with non-interactive contexts refusing rather than prompting) followed by GEPA(`PromptModule`, `PromptSectionProposer`, `make_prompt_fitness_metric` + the memoizing splice scorer). Trainset/valset are `_behavioral_examples` (task message + `closed_loop_task_id`).
+6. Select the evolved section via GEPA val-argmax (`detailed_results.best_idx`), reading the body back out of the winning candidate's sentinel region (`_section_text_from_candidate`).
+7. Deploy gate: `ClosedLoopValidator.validate(...)` runs baseline vs evolved on the holdout suite (the same per-task Layer-2 factory + threshold threaded in). `report.decision == "pass"` is the deploy verdict.
+8. Write `gate_decision.json`; on a passing gate `--apply` writes the evolved section back into `prompt_builder.py`. `baseline_section.txt` / `evolved_section.txt` are also emitted.
+
+`_run_one_task_score` is the GEPA in-loop scorer: materialize the task fixture into a tmp dir, run the agent against whatever section is currently spliced, `score_task`, return 1.0/0.0 (in-loop abstentions score 0.0 — the deploy gate handles abstentions properly). Budget rides the shared `COST_LEDGER` + `CostCeilingExceeded` kill switch; the ceiling abort writes a `cost_ceiling_exceeded` gate decision.
+
+**`gate_decision.json` additions:** `artifact_type: "prompt_section"`, `target_section: <name>`, `baseline_chars` / `evolved_chars` / `growth_pct`, a `closed_loop` block (the validator decision + pass rates + W/L/T), and `sentinel_failures` (proposer candidates rejected for losing the sentinels). `decision_signal` is always `"closed_loop"`. `--create-pr` is **deferred** for prompt sections (it would pollute the diff with the local override-hook commit) and is recorded as `skipped`; use `--apply` + a manual PR.
+
+### Supporting modules (`evolution/prompts/`)
+
+- `prompt_source.py` — `PromptSource` Protocol (`read` + `write` only, `runtime_checkable`) + `SectionDescriptor` (frozen metadata). The Protocol is deliberately minimal — the driver only reads a baseline and writes/splices an evolved value. `list_sections` is a concrete convenience on `HermesPromptSource` (a future `--list-sections` affordance), not part of the contract.
+- `hermes_prompt_source.py` — `HermesPromptSource`, the splice primitive. `read` AST-walks top-level `NAME = "..."` string constants (v1 string-typed only; dict-typed constants like `PLATFORM_HINTS` raise `KeyError`). `write` splices by byte offset using `repr(new_text)` so the literal round-trips byte-equal regardless of embedded quotes/newlines, and `ast.parse`-guards the result before an atomic `os.replace` — it **refuses to write non-parseable Python**, leaving the user's Hermes startable.
+- `prompt_module.py` — `PromptModule(section_name, candidate_text)`: a `dspy.Module` whose `ChainOfThought` passthrough predictor carries the candidate in `signature.instructions` between sentinel markers (`<!-- SECTION:name -->` … `<!-- /SECTION:name -->`). There is no cheap classification to score, so the predictor exists only as a mutation target. `forward()` **must** invoke the passthrough so GEPA captures a trace for `passthrough.predict` — without a traced predictor call, `make_reflective_dataset` finds "no valid predictions" and never proposes a mutation. It returns a placeholder response with `_closed_loop_task_id` + `_candidate_text` attached for the behavioral metric. GEPA discovers the target via `named_predictors()` → `"passthrough.predict"`.
+- `prompt_proposer.py` — `PromptSectionProposer`, a sentinel-preserving GEPA `instruction_proposer` subclassing `BudgetAwareProposer` (inherits the char-budget infrastructure; see `budget_aware_proposer.py`). Runs the proposer LM, then passes the candidate through `extract_and_rebuild` so only the sentinel-delimited region survives. On a candidate that loses the sentinels it increments `sentinel_failures` and **re-raises** `SentinelParseError` rather than returning the parent unchanged — GEPA's reflective-mutation path skips the iteration instead of admitting a phantom identical-to-parent candidate into the selection pool.
+- `prompt_judge.py` —
+  - `SaveCallJudge` — LLM-as-judge scoring an individual memory-save's content against `MEMORY_GUIDANCE`'s rules (durable, declarative, fact-focused; not task progress / PR numbers / completed-work logs). Unparseable judge output falls back to a neutral 0.5 (logged so it's distinguishable from a real mediocre score).
+  - `judge_save_calls` — the Layer-2 aggregate. Only judges `SAVE_ACTIONS = {add, replace}` (the real Hermes `memory` tool actions that carry a `content` payload; `remove` is not a save), caps judged calls at `MAX_JUDGED_CALLS_PER_TASK = 5` (excess score 0 each), and returns a vacuous 1.0 when there are no save calls or no judge/rubric is configured.
+  - `make_prompt_fitness_metric` — the GEPA 5-arg metric. Routes purely behaviorally: a prediction missing `_closed_loop_task_id` is degenerate and scores 0 with a diagnostic; otherwise `closed_loop_scorer(task_id, candidate_text)` runs one closed-loop trial. Appends a `[BUDGET]` feedback line.
+  - `make_memoizing_splice_scorer` — builds `closed_loop_scorer(task_id, candidate_text)` that splices **only when `candidate_text` changes** (consecutive tasks for one candidate reuse the live splice). Serialized under a `threading.Lock` because `dspy.Evaluate` is multi-threaded but `prompt_builder.py` is one shared mutable file — behavioral scoring is therefore effectively serial, an accepted v1 cost of splice-and-restore. Backup/restore is the caller's job (the guard wraps the whole run).
+
+### Shared validation-stack changes that enable the prompt path
+
+These let the prompt path reuse `ClosedLoopValidator` unchanged (see the validation section below for the base machinery):
+
+- `HermesPromptSectionInstaller` (in `artifact_installer.py`) — implements the `ArtifactInstaller` Protocol. `target_path` = `agent/prompt_builder.py`; `install(text_file)` reads the candidate body and calls `HermesPromptSource.write`, returning the post-install `sha256`; `verify_backup` = `verify_python_parses`. Constraint: the section must be a top-level string constant.
+- `ClosedLoopValidator` gained an optional `layer2_judge_factory` (per-task — prompt-section judging needs the task's `expected_save_content` rubric + message, which a single global fn couldn't carry) plus a `layer2_threshold`. When unset, scoring is Layer 1 only and the tool-description path is unchanged.
+- `report.py`'s `score_task` gained the compound Layer 2: when a `layer2_judge_fn` is supplied a task passes only if Layer 1 (trigger membership) passes **and** the judge scores `>= layer2_threshold`. Layer 1 short-circuits — the judge is never called (no LLM cost) on a task that already failed the trigger test, and `test_command` mode ignores Layer 2. The judge receives the subset of `run.tool_calls_with_args` whose name is `memory`. `Task` gained `expected_save_content`; `AgentRunResult` gained `tool_calls_with_args`.
+- `hermes_runner.py` (shared change): reads agent sessions from the SQLite `state.db` (`parse_session_from_db`) since the current one-shot `hermes -z` is ephemeral and no longer writes `session_*.json`. A row whose `tool_calls` column won't parse as JSON aborts with an `error` result (the task **abstains**) rather than being silently read as "no tools."
+
 ## evolution/validation/ — closed-loop validation against a real agent
 
 Drives an actual agent (`HermesAgentRunner` via `hermes -z`) through a small task suite with baseline and evolved artifacts, scores real tool-selection behavior, compares. Orthogonal to skills/tools/prompts/code — measures agent behavior, not artifact production.
@@ -388,6 +433,6 @@ Drives an actual agent (`HermesAgentRunner` via `hermes -z`) through a small tas
 
 **During-evolution integration.** Beyond the standalone CLI, the same `ClosedLoopValidator` powers `evolution/core/closed_loop_feedback.py`'s `ClosedLoopFeedbackCache`. The cache writes the candidate description into a tmp manifest JSON, calls `validator.validate(ValidationInputs(...))` with it as `evolved_artifact`, and caches the returned `ValidationReport` by candidate text. The cache surfaces verdicts to the metric two ways: as a deterministic feedback block on the reflection path (`feedback` mode), or as per-task `TaskResult.passed` reads via `get_task_verdict(candidate, task_id)` for the behavioral-example branch (`trainset` mode). The validator itself doesn't know about the cache; it always sees a `ValidationInputs` with two artifacts and produces a `ValidationReport`.
 
-## evolution/{prompts, code, monitor}/ — planned, empty
+## evolution/{code, monitor}/ — planned, empty
 
-These packages exist as empty stubs anchoring the planned tier-3/4/5 work. See `PLAN.md` for the design.
+These packages exist as empty stubs anchoring the planned tier-4/5 work. See `PLAN.md` for the design. (`prompts/` is now implemented — see the phase-3 section above.)
diff --git a/docs/data_models.md b/docs/data_models.md
index c2455d4..6c0c23e 100644
--- a/docs/data_models.md
+++ b/docs/data_models.md
@@ -555,6 +555,79 @@ Written by `evolution/core/quality_gate.py::append_cl_decision_fields` when the
 | `band_trigger_score` | `dict` | Pre-flight scores that decided whether CL-primary fired. Keys: `holdout` (`float \| None`), `closed_loop` (`float \| None`). |
 | `validator_agent_model` | `str` | The LiteLLM model id used for the closed-loop validator agent. Recorded so historical decisions stay analysable if the default changes. |
 
+### Prompt-section additions (`artifact_type == "prompt_section"`)
+
+Runs of `evolution.prompts.evolve_prompt_section` (Phase 3) write the same `schema_version` "5" envelope but a **deliberately different field set** from the skill/tool variant, because the deploy gate is a closed-loop pass-rate / win-loss decision, **not** a paired-bootstrap one. There is no synthetic classification signal for a system-prompt section — every candidate is scored behaviorally by a real `hermes -z` against a curated suite — so the bootstrap substrate doesn't apply.
+
+```json
+{
+  "schema_version": "5",
+  "artifact_type": "prompt_section",
+  "target_section": "MEMORY_GUIDANCE",
+  "decision": "deploy",                          // "deploy" | "reject" | "denied" | "dry_run" | "aborted"
+  "decision_signal": "closed_loop",              // always "closed_loop" on this path
+  "baseline_chars": 1840,
+  "evolved_chars": 2104,
+  "growth_pct": 0.143,                           // (evolved_chars - baseline_chars) / baseline_chars
+  "closed_loop": {
+    "decision": "pass",                          // "pass" | "regression" (ValidationReport.decision)
+    "decision_reasons": ["pass_rate 0.92 >= baseline 0.75", "n_wins 4 >= 2*n_losses 0"],
+    "baseline_pass_rate": 0.75,
+    "evolved_pass_rate": 0.92,
+    "n_wins": 4,
+    "n_losses": 0,
+    "n_ties": 8
+  },
+  "sentinel_failures": 1,                         // reflection-LM outputs the proposer rejected for breaking sentinel preservation
+  "elapsed_seconds": 412.6,
+  "cost": { /* same shape as cost_summary: total_usd + by_model */ },
+  "run_inputs": { /* seed, iterations, model versions, suite path/sha, validator_agent_model, ... */ },
+  "pr_created": { "status": "skipped", "reason": "prompt_section_pr_unsupported", "branch": null, "commit_sha": null, "url": null }
+}
+```
+
+**Fields this variant carries** (and the tool/skill variant does not, or differs on):
+
+| Field | Type | Notes |
+|---|---|---|
+| `artifact_type` | `"prompt_section"` | Disjoint from `"skill"` / `"tool_description"`. |
+| `target_section` | `str` | The `prompt_builder.py` constant whose text was evolved (e.g. `MEMORY_GUIDANCE`). |
+| `decision` | `"deploy" \| "reject" \| "denied" \| "dry_run" \| "aborted"` | `"denied"` lands on a saturation pre-flight default-deny; `"dry_run"` when the run was asked to evaluate without splicing; `"aborted"` on cost-ceiling / interrupt. |
+| `decision_signal` | `"closed_loop"` | Always `"closed_loop"` here — the synthetic value never appears on this path. |
+| `baseline_chars` / `evolved_chars` / `growth_pct` | int / int / float | Size telemetry; growth informs the closed-loop required-gain threshold but is not gated on a bootstrap. |
+| `closed_loop` | `dict` | `{decision, decision_reasons, baseline_pass_rate, evolved_pass_rate, n_wins, n_losses, n_ties}` — the deploy gate's primary evidence (sourced from `ValidationReport` over the behavioral suite). |
+| `sentinel_failures` | `int` | Count of reflection-LM proposals rejected for failing sentinel preservation (same meaning as the tool path). |
+| `elapsed_seconds` / `cost` | float / dict | Wall-clock + per-model cost ledger. |
+| `run_inputs` | `dict` | Reproduction inputs (seed, iterations, models, suite path + sha, `validator_agent_model`). |
+| `pr_created` | `dict` | Shape-stable with the skill/tool path, but the prompt-section path currently emits a `status: "skipped"` block (PR automation for in-place `prompt_builder.py` splices is not wired). |
+
+**Fields the prompt-section variant deliberately OMITS.** A reader or calibration script must not assume these are present — they exist only on the skill/tool (paired-bootstrap) path:
+
+- `bootstrap` — no per-example bootstrap CI; the gate is win-loss, not a resampled mean.
+- `avg_baseline` / `avg_evolved` — no synthetic holdout mean. The analogous numbers live inside `closed_loop` as `baseline_pass_rate` / `evolved_pass_rate`.
+- `dataset` — there is no synthetic eval dataset and no `dataset` block with per-source/per-category counts; the behavioral suite is the JSONL passed via `--tasks`. `run_inputs` records the run config (models, seed, iterations, holdout-ratio, `eval_source: "closed_loop"`), not the suite path or sha.
+- `knee_point` — Pareto knee-point selection over a synthetic valset doesn't apply; candidates are chosen on behavioral score.
+
+#### Saturation-denied variant (prompt section)
+
+When the saturation pre-flight default-denies (non-healthy band, non-interactive context, no `--force-saturation-check`), the prompt-section gate writes `decision: "denied"` and carries a `saturation_band` field naming the band that triggered the denial:
+
+```json
+{
+  "schema_version": "5",
+  "artifact_type": "prompt_section",
+  "target_section": "MEMORY_GUIDANCE",
+  "decision": "denied",
+  "decision_signal": "closed_loop",
+  "saturation_band": "no_headroom",              // "healthy" never lands here; one of no_headroom | weak_signal | uniform_failure
+  "baseline_chars": 1840,
+  "run_inputs": { /* ... */ },
+  "pr_created": { "status": "skipped", "reason": "prompt_section_pr_unsupported", "branch": null, "commit_sha": null, "url": null }
+}
+```
+
+`saturation_band` appears only on the `"denied"` decision (it records why the run never started); it is absent on `deploy` / `reject` / `dry_run`.
+
 ## metrics.json (deploy-only summary)
 
 Written to `output/<skill>/<timestamp>/metrics.json` only on deploy. Top-level summary for quick scanning:
diff --git a/docs/index.md b/docs/index.md
index 1d810c8..dfe14e0 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -15,6 +15,7 @@ The codebase is mid-sized (~9K LOC of source + 61 test files / ~1166 tests) and
 | **What this project is** | `codebase_info.md` → `architecture.md` → repo-root `README.md` |
 | **How a skill run works end-to-end** | `workflows.md` (Workflow 1) → `architecture.md` (top-level flow) |
 | **How a tool-description run works end-to-end** | `workflows.md` (Workflow 9) → `components.md` (`evolve_tool.py`) |
+| **How a prompt-section run works end-to-end** | `workflows.md` (Workflow 12) → `components.md` (`evolve_prompt_section.py`) |
 | **What flag does X / how to run the CLI** | `interfaces.md` (CLI section) |
 | **Why the deploy gate rejected a run** | `data_models.md` (gate_decision.json) → `components.md` (`constraints.py`) |
 | **What's in `gate_decision.json` / `metrics.json`** | `data_models.md` (full schema with examples) |
@@ -53,7 +54,7 @@ The codebase is mid-sized (~9K LOC of source + 61 test files / ~1166 tests) and
 | [`components.md`](components.md) | Per-module reference: what each owns, public surface, load-bearing implementation notes |
 | [`interfaces.md`](interfaces.md) | CLIs (skill, tool, closed-loop, sessiondb importer), Python API, SkillSource + ToolSource Protocols, output artifacts, DSPy + litellm integration, test surfaces, env vars |
 | [`data_models.md`](data_models.md) | All dataclasses, on-disk formats, full `gate_decision.json` schema with worked examples, `ValidationReport` schema |
-| [`workflows.md`](workflows.md) | Step-by-step workflows with mermaid sequence diagrams: skill deploy path, reject paths, GEPA→MIPROv2 fallback, sessiondb mining, tool evolution, closed-loop validation, closed-loop signal during evolution |
+| [`workflows.md`](workflows.md) | Step-by-step workflows with mermaid sequence diagrams: skill deploy path, reject paths, GEPA→MIPROv2 fallback, sessiondb mining, tool evolution, closed-loop validation, closed-loop signal during evolution, prompt-section evolution |
 | [`dependencies.md`](dependencies.md) | Each external package — what it's used for, why it's pinned, what we don't depend on |
 | [`framework_advantages.md`](framework_advantages.md) | User-facing explainer of how this framework's selection layer, deploy gate, proposer, and composite fitness differ from raw DSPy + GEPA — and when raw GEPA is the right choice |
 
diff --git a/docs/workflows.md b/docs/workflows.md
index eb80148..e686908 100644
--- a/docs/workflows.md
+++ b/docs/workflows.md
@@ -545,6 +545,158 @@ When your daily-driver Hermes model is capable enough to solve every textbook bu
 
 Manual smoke harness: `tests/manual/skill_closed_loop_smoke.py` (supports `--suite {basic,advanced}`, `--agent-model MODEL`, `--task-timeout-seconds N`).
 
+## Workflow 12: Evolve a prompt section (deploy path)
+
+The prompt-section analog of Workflow 9 (tool descriptions), but **purely behavioral** end to end. There is no synthetic judge dataset and no paired-bootstrap gate: every candidate is spliced into the live `prompt_builder.py` and scored by a real `hermes -z` subprocess, and the deploy gate is a `ClosedLoopValidator` run. Three structural contrasts with the tool path:
+
+- **Integration is in-place splice-and-restore**, not an MCP manifest rewrite or a copied skill directory. The target is a single named string constant inside the user's `prompt_builder.py`; the harness backs it up byte-for-byte and restores it on exit.
+- **The deploy gate is closed-loop pass-rate / win-loss**, not a paired-bootstrap confidence interval. Decision = pass-rate no-regression + `n_wins >= 2 * n_losses` (the `ClosedLoopValidator.decide` rule), all behavioral.
+- **PR automation is deferred.** `--create-pr` is recorded as `skipped`; deploy means `--apply` writes the evolved section into `prompt_builder.py` in place, and the user opens a PR by hand.
+
+```bash
+python -m evolution.prompts.evolve_prompt_section \
+    --section MEMORY_GUIDANCE \
+    --hermes-repo ~/src/NousResearch/hermes-agent \
+    --tasks evolution/validation/suites/memory_guidance.jsonl \
+    --iterations 10 \
+    --apply
+```
+
+### Phase A — Setup: resolve baseline, split, build the behavioral harness
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant CLI as evolve_prompt_section
+    participant Src as HermesPromptSource
+    participant Suite as TaskSuite
+    participant Judge as SaveCallJudge
+    participant Inst as HermesPromptSectionInstaller
+    participant Run as HermesAgentRunner
+    participant V as ClosedLoopValidator
+
+    CLI->>Src: read(section_name) — validate it exists / is a string constant
+    alt --baseline-override-file
+        CLI->>CLI: baseline_text = override_file.read_text()
+    else
+        CLI->>Src: baseline_text = read(section_name)
+    end
+    CLI->>Suite: TaskSuite.from_jsonl(tasks) — reject < 2 tasks
+    CLI->>CLI: _split_train_holdout(seed) — ≥1 task each side
+    CLI->>Judge: SaveCallJudge(config)  → layer2_factory(task)
+    CLI->>Inst: HermesPromptSectionInstaller(repo, section)
+    CLI->>Run: HermesAgentRunner(timeout, agent_model?)
+    CLI->>V: ClosedLoopValidator(installer, runner, layer2_judge_factory, layer2_threshold)
+```
+
+The baseline is the **live section text** unless `--baseline-override-file` points evolution at arbitrary text — e.g. a deliberately-weakened baseline to manufacture headroom, or a regression-injection ablation. The override only changes where evolution *starts*; the guard still backs up and restores the real file, and `--apply` writes the evolved text back into the live section. The suite floor is 2 tasks so the seeded split yields a non-empty GEPA trainset **and** a non-empty deploy-gate holdout.
+
+### Phase B — Configure the global LM, then enter the guard
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant CLI as evolve_prompt_section
+    participant Scorer as memoizing_splice_scorer
+    participant Metric as prompt_fitness_metric
+    participant LM as eval_lm
+    participant DSPy as dspy.configure
+
+    CLI->>Scorer: make_memoizing_splice_scorer(install_fn=source.write, score_fn=run_one_task, lock)
+    CLI->>Metric: make_prompt_fitness_metric(baseline_text, max_growth, closed_loop_scorer=scorer)
+    CLI->>LM: instantiate eval_lm (role=eval, temp=0)
+    CLI->>DSPy: dspy.configure(lm=eval_lm, callbacks=[LMTimingCallback()])
+    Note over CLI,DSPy: global LM set so GEPA worker threads can run PromptModule's<br/>passthrough predictor — the pre-flight's dspy.context doesn't reach them
+```
+
+The `closed_loop_scorer` is the spine of behavioral scoring: `score(task_id, candidate_text)` splices the candidate into the live `prompt_builder.py` **only when it changes** (consecutive tasks for the same candidate reuse the live splice), runs the task via `hermes -z`, and reads the session back from the sandbox `state.db`. The splice+run is serialized under one `threading.Lock` because `dspy.Evaluate` scores with a thread pool but the spliced file is a single shared mutable resource — behavioral scoring is therefore effectively serial, an accepted v1 cost. The explicit `dspy.configure` is load-bearing: `dspy.context` inside the saturation pre-flight does **not** propagate into GEPA's worker threads, so without the global LM the passthrough predictor raises "No LM is loaded" → no trajectories → no proposal.
+
+### Phase C — Inside the guard: saturation pre-flight, then GEPA
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant CLI as evolve_prompt_section
+    participant Guard as _prompt_builder_guard
+    participant FS as live prompt_builder.py
+    participant Sat as saturation_preflight
+    participant GEPA as dspy.GEPA
+    participant PM as PromptModule
+    participant Prop as PromptSectionProposer
+    participant Scorer as splice scorer
+    participant H as hermes -z + state.db
+
+    CLI->>Guard: enter(installer.target_path)
+    Guard->>FS: refuse if stale .cl_backup; flock parent dir (LOCK_EX|NB)
+    Guard->>FS: atomic_write_bytes(.cl_backup, target.read_bytes())
+    opt not --skip-saturation-check
+        CLI->>Sat: saturation_preflight(baseline_module, holdout, metric, eval_lm, baseline_text)
+        Sat->>Scorer: behavioral score of baseline on each holdout task
+        Sat-->>CLI: SaturationReport(band, ...)
+        alt band != healthy
+            alt --force-saturation-check
+                Note over CLI: proceed regardless
+            else non-interactive
+                CLI->>FS: write gate_decision.json (decision=denied, reason=saturated_baseline)
+                Note over CLI: return — GEPA never runs (default-deny)
+            else interactive
+                CLI->>CLI: prompt "Continue anyway? [y/N]"
+            end
+        end
+    end
+    CLI->>GEPA: compile(PromptModule(baseline), trainset, valset, instruction_proposer=PromptSectionProposer)
+    loop per iteration
+        GEPA->>PM: forward(task, closed_loop_task_id) — candidate in sentinel region of predictor instructions
+        PM-->>GEPA: Prediction(_candidate_text, _closed_loop_task_id)
+        GEPA->>Scorer: metric → closed_loop_scorer(task_id, candidate_text)
+        Scorer->>FS: splice candidate into live section (only if changed)
+        Scorer->>H: run task; read session from sandbox state.db
+        H-->>Scorer: tool_calls_with_args + final text
+        Scorer->>Scorer: compound verdict = Layer 1 (memory fired?) + Layer 2 (judge on memory add/replace content)
+        Scorer-->>GEPA: score ∈ {0.0, 1.0}
+        GEPA->>Prop: reflect on failures → sentinel-preserving candidate
+    end
+    GEPA-->>CLI: optimized module with detailed_results
+    CLI->>Guard: exit → atomic_write_bytes(target, .cl_backup); unlink backup; release flock
+```
+
+Everything that mutates the file lives **inside** the guard, which holds an exclusive `flock` (the same lock name the deploy-gate `ClosedLoopValidator` uses — sequenced before it, never nested) and restores the original bytes on exit. The saturation pre-flight scores the baseline behaviorally on the holdout; a non-`healthy` band (e.g. `no_headroom` on an already-tuned section) **default-denies in non-interactive contexts** unless `--force-saturation-check`, writing a `decision="denied"` gate before GEPA spends a cent. The compound per-task verdict is two layers: **Layer 1** is trigger membership (did the `memory` tool fire, via `expected_tools` / `forbidden_tools`), **Layer 2** is the `SaveCallJudge` scoring `memory(action=add|replace)` content against the task's `expected_save_content` rubric (`remove` is not a save; a passing Layer 1 with no save action scores a vacuous 1.0 on Layer 2). GEPA mutates only the sentinel-delimited region of the passthrough predictor's instructions; the `PromptSectionProposer` rejects any reflection-LM output that fails sentinel preservation.
+
+### Phase D — Deploy gate (closed-loop on the holdout), persist, apply
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant CLI as evolve_prompt_section
+    participant Sel as candidate selection
+    participant V as ClosedLoopValidator
+    participant Inst as HermesPromptSectionInstaller
+    participant FS as live prompt_builder.py
+    participant H as hermes -z
+    participant Src as HermesPromptSource
+
+    Note over CLI: guard already exited — file restored to baseline
+    CLI->>Sel: evolved_text = section_from_candidate(best_idx)  # GEPA val-argmax
+    CLI->>FS: write baseline_section.txt + evolved_section.txt
+    CLI->>V: validate(ValidationInputs(section, holdout_suite, baseline_file, evolved_file))
+    Note over V: own backup/restore + flock — independent of the Phase C guard
+    loop baseline phase, then evolved phase
+        V->>Inst: install(section_file) — splice into live prompt_builder.py
+        loop each holdout task
+            V->>H: run task; score Layer 1 + Layer 2 via layer2_judge_factory
+        end
+    end
+    V-->>CLI: ValidationReport(baseline_pass_rate, evolved_pass_rate, n_wins/n_losses, decision)
+    CLI->>FS: write gate_decision.json (artifact_type="prompt_section", decision=deploy|reject)
+    alt decision == pass AND --apply
+        CLI->>Src: write(section_name, evolved_text) — live section updated in place
+    end
+```
+
+The selected candidate is GEPA's val-argmax (`detailed_results.best_idx`) — there's no knee-point parsimony pass on the prompt-section path. The deploy gate is a fresh `ClosedLoopValidator.validate` over the **holdout** suite, with its own backup/restore + `flock` (it runs after the Phase C guard has already exited and restored the file, so the two never nest). Its decision is closed-loop only: pass-rate no-regression plus `n_wins >= 2 * n_losses`. The gate decision is written with `artifact_type="prompt_section"`, `target_section`, `baseline_chars` / `evolved_chars` / `growth_pct`, a `closed_loop` block (both pass-rates + win/loss/tie counts), and `sentinel_failures`. `--create-pr` records a `skipped` PR block (deferred for sections); `--apply` is the only way to ship, writing the evolved text into the live section.
+
+**Empirical anchors.** The real `MEMORY_GUIDANCE` section saturates — it scored 1.0 across the holdout (`no_headroom` band) and the harness correctly default-denied a non-interactive run before GEPA started. To exercise the full deploy path, an adversarially-weakened baseline (via `--baseline-override-file`) evolved `0.67 → 1.00` pass-rate with 2 wins / 0 losses on the holdout, clearing the closed-loop gate and deploying. The saturating-real-section result is the expected, correct outcome, not a bug: there is no headroom to evolve into when the section already passes every behavioral task.
+
 ## Failure-mode summary
 
 | Trigger | Outcome | Where to look |
@@ -565,3 +717,8 @@ Manual smoke harness: `tests/manual/skill_closed_loop_smoke.py` (supports `--sui
 | Closed-loop validator concurrent run | `ConcurrentRunError` (`fcntl.flock` non-blocking acquire fails) | console only |
 | Closed-loop validator drift between tasks | `ChecksumDriftError` after the offending task; phase aborts, restore still runs | run.log + raised error |
 | Closed-loop cache validator failure during evolution | `WARNING` logged, cache returns `None`, GEPA continues without the verdict — never aborts the run | run.log |
+| Prompt-section suite < 2 tasks | `ValueError` (can't split into non-empty train + holdout) | console only |
+| Prompt-section stale `.cl_backup` on guard entry | `RuntimeError` naming the backup file; refuses to start | console only |
+| Prompt-section saturated baseline, non-interactive | `decision="denied"` `gate_decision.json`; GEPA never runs (override with `--force-saturation-check`) | `gate_decision.json` (`saturation_band`) |
+| Prompt-section closed-loop gate rejects | `decision="reject"` `reason="closed_loop_gate"`; section not applied | `gate_decision.json` (`closed_loop` block) |
+| Prompt-section `--create-pr` | recorded as `skipped` (PR automation deferred); use `--apply` + manual PR | `gate_decision.json` (`pr_created` block) |

From e254e604ed297dee29d1b808a3297cb2e0cf9a12 Mon Sep 17 00:00:00 2001
From: Justin Ramos <justin.ramos@gmail.com>
Date: Tue, 2 Jun 2026 09:32:10 -0600
Subject: [PATCH 4/4] docs(reports): Phase 3 validation report
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add a prompt-section branch to generate_report.py (behavioral-only runs
self-source from gate_decision.json — no metrics.json/run.log; the _experiment
and _results renderers lay out pass-rate/win-loss tables instead of
bootstrap/knee/synthetic), author reports/phase3_prose.yaml, render
reports/phase3_validation_report.pdf from the adversarial-baseline headline run
(67%→100% holdout, 2W/0L, section shrank 15.2%), and link it from the README
phase table. The skill/tool report path is unchanged (additive artifact_type
branch).
---
 README.md                            |   2 +-
 generate_report.py                   | 170 ++++++++++++++++---
 reports/phase3_prose.yaml            | 233 +++++++++++++++++++++++++++
 reports/phase3_validation_report.pdf | Bin 0 -> 24431 bytes
 4 files changed, 384 insertions(+), 21 deletions(-)
 create mode 100644 reports/phase3_prose.yaml
 create mode 100644 reports/phase3_validation_report.pdf

diff --git a/README.md b/README.md
index 60386b8..5fae14d 100644
--- a/README.md
+++ b/README.md
@@ -347,7 +347,7 @@ Cost: each task is one `hermes -z` run (~$0.05–$0.50). The bundled `patch.json
 |-------|--------|--------|--------|
 | **Phase 1** | Skill files (SKILL.md) | DSPy + GEPA | ✅ [Validated](reports/phase1_validation_report.pdf) |
 | **Phase 2** | Tool descriptions + dual-signal deploy gate | DSPy + GEPA | ✅ [Validated](reports/phase2_validation_report.pdf) |
-| **Phase 3** | System prompt sections | DSPy + GEPA | ✅ Complete |
+| **Phase 3** | System prompt sections | DSPy + GEPA | ✅ [Validated](reports/phase3_validation_report.pdf) |
 | **Phase 4** | Tool implementation code | Darwinian Evolver | 🔲 Planned |
 | **Phase 5** | Continuous improvement loop | Automated pipeline | 🔲 Planned |
 
diff --git a/generate_report.py b/generate_report.py
index 3008116..b7a1ec7 100644
--- a/generate_report.py
+++ b/generate_report.py
@@ -45,13 +45,81 @@
 DEFAULT_LOGO = REPO_ROOT / "assets" / "dna.png"
 
 
+def _extract_prompt_section_data(gate: dict, run_dir: Path) -> dict[str, Any]:
+    """Build the render context for a Phase 3 prompt-section run.
+
+    The prompt-section path is behavioral-only — its gate_decision carries a
+    ``closed_loop`` pass-rate / win-loss block instead of the skill/tool
+    bootstrap-CI + synthetic-dataset + knee-point fields, and self-sources
+    cost/timing/call-count (no metrics.json needed). The ``_experiment`` and
+    ``_results`` renderers branch on ``artifact_type`` to lay out the matching
+    tables; every other section is prose-driven via the keys returned here.
+    """
+    cl = gate.get("closed_loop", {})
+    cost = gate.get("cost", {})
+    resolved = (gate.get("run_inputs", {}) or {}).get("resolved_lms", {})
+
+    n_wins = int(cl.get("n_wins", 0))
+    n_losses = int(cl.get("n_losses", 0))
+    n_ties = int(cl.get("n_ties", 0))
+    cl_total = n_wins + n_losses + n_ties
+    baseline_rate = float(cl.get("baseline_pass_rate", 0.0))
+    evolved_rate = float(cl.get("evolved_pass_rate", 0.0))
+    cl_baseline_pass = round(baseline_rate * cl_total)
+    cl_evolved_pass = round(evolved_rate * cl_total)
+    elapsed = int(float(gate.get("elapsed_seconds", 0)))
+    lm_calls = sum(int(m.get("calls", 0)) for m in (cost.get("by_model") or {}).values())
+    decision = gate.get("decision", "")
+
+    def _model(role: str) -> str:
+        return (resolved.get(role) or {}).get("model", "—")
+
+    return {
+        "artifact_type": "prompt_section",
+        "skill_name": gate.get("target_section", run_dir.parent.name),
+        "section_name": gate.get("target_section", ""),
+        "baseline_chars": int(gate.get("baseline_chars", 0)),
+        "evolved_chars": int(gate.get("evolved_chars", 0)),
+        "growth_pct": float(gate.get("growth_pct", 0.0)),
+        "growth_abs_pct": abs(float(gate.get("growth_pct", 0.0))),
+        "decision": decision,
+        "decision_upper": "DEPLOYED" if decision == "deploy" else "REJECTED",
+        "decision_signal": gate.get("decision_signal", "closed_loop"),
+        "baseline_pass_rate": baseline_rate,
+        "evolved_pass_rate": evolved_rate,
+        "baseline_pass_pct": baseline_rate * 100,
+        "evolved_pass_pct": evolved_rate * 100,
+        "cl_baseline_pass": cl_baseline_pass,
+        "cl_evolved_pass": cl_evolved_pass,
+        "cl_total_tasks": cl_total,
+        "cl_tasks_gained": cl_evolved_pass - cl_baseline_pass,
+        "n_wins": n_wins,
+        "n_losses": n_losses,
+        "n_ties": n_ties,
+        "elapsed_seconds": elapsed,
+        "elapsed_minutes": elapsed // 60,
+        "cost_total_usd": float(cost.get("total_usd", 0.0)),
+        "lm_calls_metrics": lm_calls,
+        "optimizer_lm": _model("optimizer"),
+        "reflection_lm": _model("reflection"),
+        "eval_lm": _model("eval"),
+        "saturation_band": gate.get("saturation_band", ""),
+        "sentinel_failures": int(gate.get("sentinel_failures", 0)),
+        "decision_reasons": "; ".join(cl.get("decision_reasons", [])),
+    }
+
+
 def _extract_run_data(run_dir: Path) -> dict[str, Any]:
     """Pull all numbers the renderer needs from a run dir.
 
     Reads gate_decision.json (always present) + metrics.json (deploy only) +
-    run.log (LM call counts grep'd from timing-callback lines).
+    run.log (LM call counts grep'd from timing-callback lines). Prompt-section
+    (Phase 3) runs are behavioral-only and self-source from gate_decision.json
+    alone — see ``_extract_prompt_section_data``.
     """
     gate = json.loads((run_dir / "gate_decision.json").read_text())
+    if gate.get("artifact_type") == "prompt_section":
+        return _extract_prompt_section_data(gate, run_dir)
     metrics_path = run_dir / "metrics.json"
     metrics = json.loads(metrics_path.read_text()) if metrics_path.is_file() else {}
 
@@ -442,7 +510,7 @@ def _approach(prose: dict, ctx: dict, styles) -> list:
     ap = prose["approach"]
     engines = ap["engines"]
     flow = [
-        Paragraph("Approach: Evolutionary Skill Optimization", styles['SectionHead']),
+        Paragraph(ap.get("section_title", "Approach: Evolutionary Optimization"), styles['SectionHead']),
         Paragraph("Three Optimization Engines", styles['SubSection']),
         _highlight_table(
             header=engines["header"],
@@ -463,9 +531,57 @@ def _approach(prose: dict, ctx: dict, styles) -> list:
     return flow
 
 
+def _experiment_prompt_section(exp: dict, overrides: dict, ctx: dict, styles) -> list:
+    """Phase 3 experiment section: behavioral config (no synthetic eval set,
+    no knee-point), and the suite is described in prose rather than via a
+    train.jsonl examples table (prompt-section runs don't write one)."""
+    config_rows = [
+        ['Target Section', _fmt(overrides["target_section_label"], ctx)],
+        ['Baseline Size', f'{ctx["baseline_chars"]:,} characters'],
+        ['Optimizer LM', _fmt(overrides["optimizer_lm"], ctx)],
+        ['Reflection LM (GEPA)', _fmt(overrides["reflection_lm"], ctx)],
+        ['Content-Judge LM (Layer 2)', _fmt(overrides["eval_judge_lm"], ctx)],
+        ['Agent (hermes -z)', _fmt(overrides["agent_lm"], ctx)],
+        ['Behavioral Suite', f'{ctx["cl_total_tasks"]} holdout tasks (real hermes -z, scored end-to-end)'],
+        ['Total Optimization Time',
+         f'{ctx["elapsed_seconds"]:,} seconds (~{ctx["elapsed_minutes"]} minutes)'],
+        ['Total LM Calls (in-process)', f'{ctx["lm_calls_metrics"]:,}'],
+        ['Total Cost (USD, in-process)', f'${ctx["cost_total_usd"]:.2f}'],
+        ['Deploy Gate', _fmt(overrides["quality_gate_label"], ctx)],
+        ['Saturation Pre-flight', _fmt(overrides["saturation_label"], ctx)],
+    ]
+    config_data = [[_wrap_cell(c, styles['TableHeaderCell']) for c in ['Parameter', 'Value']]]
+    config_data += [[_wrap_cell(c, styles['TableCell']) for c in row] for row in config_rows]
+    config_table = Table(config_data, colWidths=[2.0 * inch, 4.0 * inch])
+    config_table.setStyle(TableStyle([
+        ('BACKGROUND', (0, 0), (-1, 0), HexColor('#1a1a2e')),
+        ('GRID', (0, 0), (-1, -1), 0.5, HexColor('#cccccc')),
+        ('TOPPADDING', (0, 0), (-1, -1), 5),
+        ('BOTTOMPADDING', (0, 0), (-1, -1), 5),
+        ('LEFTPADDING', (0, 0), (-1, -1), 8),
+    ]))
+    return [
+        Paragraph(exp.get("section_title", "Experiment"), styles['SectionHead']),
+        Paragraph("Configuration", styles['SubSection']),
+        config_table,
+        Paragraph("Evaluation Suite", styles['SubSection']),
+        Paragraph(_fmt(exp["dataset_intro"], ctx), styles['BodyJust']),
+        Paragraph("Fitness Function", styles['SubSection']),
+        Paragraph(_fmt(exp["fitness_intro"], ctx), styles['BodyJust']),
+        Paragraph(
+            f"<font face='Courier' size=9>{exp['fitness_formula']}</font>",
+            ParagraphStyle('Formula', parent=styles['Normal'], alignment=TA_CENTER,
+                           spaceBefore=8, spaceAfter=8, fontSize=10),
+        ),
+        Paragraph(_fmt(exp["fitness_closing"], ctx), styles['BodyJust']),
+    ]
+
+
 def _experiment(prose: dict, ctx: dict, styles, examples: list[tuple[str, str]]) -> list:
     exp = prose["experiment"]
     overrides = exp["config_overrides"]
+    if ctx.get("artifact_type") == "prompt_section":
+        return _experiment_prompt_section(exp, overrides, ctx, styles)
 
     # Phase 1 runs counted gpt-4.1-mini + gpt-5-mini explicitly via run.log grep;
     # Phase 2 runs use a single optimizer LM tier (e.g., gpt-5.4-mini), so fall
@@ -571,24 +687,38 @@ def _results(prose: dict, ctx: dict, styles) -> list:
         accent_bg = HexColor('#fff8e1')
         accent_fg = HexColor('#5d4037')
 
-    results_rows = [
-        ['Metric', 'Baseline', 'Evolved (knee-point pick)', 'Δ'],
-        ['Body size (chars)', f'{ctx["baseline_chars"]:,}', f'{ctx["evolved_chars"]:,}', f'{ctx["growth_pct"]:+.1%}'],
-        [f'Avg holdout score (n={ctx["n_holdout"]})',
-         f'{ctx["avg_baseline"]:.3f}', f'{ctx["avg_evolved"]:.3f}', f'{ctx["improvement"]:+.3f}'],
-        ['Bootstrap mean diff', '—', f'{ctx["bootstrap_mean"]:+.3f}', '—'],
-        ['Bootstrap 90% CI lower', '—', f'{ctx["bootstrap_lower"]:+.3f}', '—'],
-        ['Bootstrap 90% CI upper', '—', f'{ctx["bootstrap_upper"]:+.3f}', '—'],
-    ]
-    # Phase 2: surface the closed-loop behavioral signal when the v5 schema
-    # exposed it (absent on synthetic-only runs).
-    if ctx.get("cl_total_tasks"):
-        results_rows.append([
-            f'Closed-loop tasks (n={ctx["cl_total_tasks"]})',
-            f'{ctx["cl_baseline_pass"]}/{ctx["cl_total_tasks"]}',
-            f'{ctx["cl_evolved_pass"]}/{ctx["cl_total_tasks"]}',
-            f'+{ctx["cl_tasks_gained"]} (req ≥{ctx["cl_required_gain"]})',
-        ])
+    if ctx.get("artifact_type") == "prompt_section":
+        # Behavioral-only: pass-rate + win/loss, no bootstrap/synthetic rows.
+        delta_rate = ctx["evolved_pass_rate"] - ctx["baseline_pass_rate"]
+        results_rows = [
+            ['Metric', 'Baseline', 'Evolved', 'Δ'],
+            ['Section size (chars)', f'{ctx["baseline_chars"]:,}', f'{ctx["evolved_chars"]:,}', f'{ctx["growth_pct"]:+.1%}'],
+            [f'Holdout pass-rate (n={ctx["cl_total_tasks"]})',
+             f'{ctx["baseline_pass_rate"]:.0%}', f'{ctx["evolved_pass_rate"]:.0%}', f'{delta_rate:+.0%}'],
+            [f'Tasks passing (n={ctx["cl_total_tasks"]})',
+             f'{ctx["cl_baseline_pass"]}/{ctx["cl_total_tasks"]}',
+             f'{ctx["cl_evolved_pass"]}/{ctx["cl_total_tasks"]}',
+             f'+{ctx["n_wins"]}W / {ctx["n_losses"]}L'],
+        ]
+    else:
+        results_rows = [
+            ['Metric', 'Baseline', 'Evolved (knee-point pick)', 'Δ'],
+            ['Body size (chars)', f'{ctx["baseline_chars"]:,}', f'{ctx["evolved_chars"]:,}', f'{ctx["growth_pct"]:+.1%}'],
+            [f'Avg holdout score (n={ctx["n_holdout"]})',
+             f'{ctx["avg_baseline"]:.3f}', f'{ctx["avg_evolved"]:.3f}', f'{ctx["improvement"]:+.3f}'],
+            ['Bootstrap mean diff', '—', f'{ctx["bootstrap_mean"]:+.3f}', '—'],
+            ['Bootstrap 90% CI lower', '—', f'{ctx["bootstrap_lower"]:+.3f}', '—'],
+            ['Bootstrap 90% CI upper', '—', f'{ctx["bootstrap_upper"]:+.3f}', '—'],
+        ]
+        # Phase 2: surface the closed-loop behavioral signal when the v5 schema
+        # exposed it (absent on synthetic-only runs).
+        if ctx.get("cl_total_tasks"):
+            results_rows.append([
+                f'Closed-loop tasks (n={ctx["cl_total_tasks"]})',
+                f'{ctx["cl_baseline_pass"]}/{ctx["cl_total_tasks"]}',
+                f'{ctx["cl_evolved_pass"]}/{ctx["cl_total_tasks"]}',
+                f'+{ctx["cl_tasks_gained"]} (req ≥{ctx["cl_required_gain"]})',
+            ])
     results_rows.append(['Decision', '—', decision_cell, decision_note])
 
     # Per-cell style picks: header row uses bold/white; first column (metric
diff --git a/reports/phase3_prose.yaml b/reports/phase3_prose.yaml
new file mode 100644
index 0000000..e68499c
--- /dev/null
+++ b/reports/phase3_prose.yaml
@@ -0,0 +1,233 @@
+# Editorial content for the Phase 3 validation report.
+# Numbers come from the run dir's gate_decision.json (the prompt-section path is
+# behavioral-only and self-sources cost/timing/calls — no metrics.json/run.log
+# needed). Pass via `generate_report.py --run output/prompts/<run>/`. Text blocks
+# may include {placeholder} substitutions the renderer fills from that data.
+
+meta:
+  title: "Agent Self-Evolution"
+  subtitle: "Phase 3 Validation Report<br/>System-prompt section evolution via splice-and-restore"
+  organization: ""
+  repository: "github.com/jramos/agent-self-evolution"
+
+executive_summary:
+  framework_intro: >
+    Agent Self-Evolution is a standalone optimization pipeline that uses DSPy and GEPA
+    (Genetic-Pareto Prompt Evolution) to automatically improve an agent's skills, tool
+    descriptions, system prompts, and code through evolutionary search — all via API
+    calls with no GPU training required. Phase 1 shipped a synthetic-only deploy gate;
+    Phase 2 made it behavior-aware and brought tool-description parity. Phase 3 extends
+    the framework to the third instructions surface — <i>named sections of the agent's
+    system prompt</i> — evaluated end-to-end against the real agent.
+  run_summary: >
+    This report documents the Phase 3 validation of system-prompt section evolution.
+    The target is a top-level string constant in Hermes Agent's
+    <font face="Courier">prompt_builder.py</font> (here, <b>{section_name}</b>), evolved
+    via GEPA and validated <i>purely behaviorally</i>: every candidate is spliced into
+    the live prompt file and scored by running the real agent
+    (<font face="Courier">hermes -z</font>) against a curated task suite — there is no
+    synthetic LLM-as-judge signal to lean on. Production <b>{section_name}</b> is already
+    well-tuned, so the saturation pre-flight correctly default-denies it (no headroom).
+    To exercise the loop end-to-end, the headline run evolves a <i>deliberately-weakened</i>
+    baseline (supplied via <font face="Courier">--baseline-override-file</font>): the
+    agent's holdout pass-rate moved <b>{baseline_pass_rate:.0%} → {evolved_pass_rate:.0%}</b>
+    ({cl_baseline_pass}/{cl_total_tasks} → {cl_evolved_pass}/{cl_total_tasks} tasks,
+    <b>+{n_wins}W / {n_losses}L</b>) while the section shrank <b>{growth_pct:+.1%}</b>.
+    The closed-loop deploy gate decided <b>{decision_upper}</b>, and the live prompt file
+    was restored byte-for-byte after every trial.
+
+key_result_box:
+  title_template: "KEY RESULT — {section_name} (prompt-section deploy via closed-loop gate)"
+  rows:
+    - "Holdout pass-rate (n={cl_total_tasks}):   {baseline_pass_rate:.0%} → {evolved_pass_rate:.0%}   (+{n_wins}W / {n_losses}L)"
+    - "Tasks passing:   {cl_baseline_pass}/{cl_total_tasks} → {cl_evolved_pass}/{cl_total_tasks}"
+    - "Section size:   {baseline_chars:,} → {evolved_chars:,} chars   ({growth_pct:+.1%})"
+    - "Decision:   {decision_upper}   via the closed-loop behavioral gate"
+
+background:
+  intro: >
+    Agent Self-Evolution targets the instructions layer of an LLM agent — skill files,
+    tool descriptions, and system prompts — and evolves the text via API-only
+    evolutionary search. An agent's behavior is governed by three layers:
+  layers:
+    header: ["Layer", "What It Is", "How It's Currently Improved"]
+    rows:
+      - ["Model Weights", "The underlying LLM (Claude, GPT, etc.)", "RL training (Tinker-Atropos)"]
+      - ["Instructions", "Skills, system prompts, tool descriptions", "Manual authoring (static)"]
+      - ["Tool Code", "Python implementations of each tool", "Manual development"]
+    highlight_row: 1
+  closing: >
+    Phases 1 and 2 validated skill files and tool descriptions. Phase 3 completes the
+    instructions trio with <b>system-prompt sections</b> — the highest-leverage, widest
+    blast-radius surface, since one section governs the agent across every task. The
+    section is a string constant inside Hermes' own source, so unlike the skill path
+    (separate writable workdir) there is no env-var hook or plugin seam: the framework
+    edits <font face="Courier">prompt_builder.py</font> in place. The integration is an
+    AST-precise <b>splice-and-restore</b> — the candidate is byte-spliced into the live
+    file for the duration of a trial and restored from an atomic backup afterward
+    (<font face="Courier">flock</font> + checksum-drift detection + parse-guard, reused
+    from the Phase 2 closed-loop validator). Crucially, a system-prompt section has no
+    cheap synthetic proxy: the only honest measure of "did this guidance help" is
+    running the real agent, so Phase 3's deploy gate is purely behavioral.
+
+approach:
+  section_title: "Approach: Behavioral Prompt-Section Evolution"
+  engines:
+    header: ["Engine", "What It Optimizes", "License", "Role"]
+    rows:
+      - ["DSPy + GEPA", "Skills, prompts, tool descriptions", "MIT", "Primary (validated)"]
+      - ["DSPy MIPROv2", "Few-shot examples, instruction text", "MIT", "Fallback optimizer"]
+      - ["Darwinian Evolver", "Code files, algorithms", "AGPL v3", "Code evolution (Phase 4)"]
+  gepa_narrative: >
+    <b>GEPA</b> (Genetic-Pareto Prompt Evolution) is the star engine — an ICLR 2026
+    Oral paper from Stanford/UC Berkeley. Unlike traditional evolutionary search that
+    only sees pass/fail scores, GEPA reads full execution traces to understand
+    <i>why</i> things failed, then proposes targeted mutations. Phase 3 wires GEPA to a
+    sentinel-preserving proposer (mutations are confined to the section's text, never
+    the surrounding scaffolding) and routes every candidate score through a real
+    <font face="Courier">hermes -z</font> subprocess. Because the spliced
+    <font face="Courier">prompt_builder.py</font> is a single shared file and DSPy
+    evaluates with a thread pool, candidate scoring is serialized under a lock — an
+    accepted cost of the splice-and-restore model.
+  pipeline_steps:
+    - "<b>Resolve baseline</b> — Read the section's current text from <font face=\"Courier\">prompt_builder.py</font> (or accept a weakened baseline via <font face=\"Courier\">--baseline-override-file</font> to create headroom on an already-tuned section)"
+    - "<b>Split</b> — Deterministic seeded train / holdout split of the curated JSONL task suite"
+    - "<b>Saturation pre-flight</b> — Score the baseline behaviorally on the holdout; a <font face=\"Courier\">no_headroom</font> band default-denies (correctly refusing to evolve a saturated section) unless overridden"
+    - "<b>GEPA loop</b> — The section text is a sentinel-delimited region of a passthrough predictor's instructions; GEPA mutates it with the sentinel-preserving proposer. Each candidate is spliced into the live file and scored by running the agent on each task"
+    - "<b>Compound verdict</b> — Layer 1: did the agent invoke the expected tool (e.g. <font face=\"Courier\">memory</font>)? Layer 2: an LLM judge scores the saved content against each task's rubric"
+    - "<b>Closed-loop deploy gate</b> — Select the GEPA val-best candidate, then run baseline vs. evolved on the holdout suite; deploy iff holdout pass-rate doesn't regress and per-task wins offset losses ≥ 2:1"
+    - "<b>Report + restore</b> — Structured <font face=\"Courier\">gate_decision.json</font> (v5 schema, prompt-section variant); the live file is restored byte-for-byte"
+  cost_paragraph: >
+    The honest Phase 3 story is two-part. First, the framework's <b>regression-catching
+    discipline</b>: the production <font face="Courier">{section_name}</font> is already
+    well-tuned, so a capable agent satisfies the suite regardless of small wording
+    changes — the saturation pre-flight scores the baseline at ceiling and correctly
+    <i>default-denies</i>, refusing to spend GEPA budget where no improvement is
+    possible. This mirrors the Phase 2 finding that the framework is improvement-finding
+    only where headroom genuinely exists. Second, to demonstrate that the loop produces
+    a real, grounded improvement when headroom <i>does</i> exist, the headline run
+    evolves a deliberately-adversarial baseline (one that instructs the agent <i>not</i>
+    to save) — exactly the weakened-target approach Phase 2 used for its headline. That
+    run consumed <b>${cost_total_usd:.2f}</b> across {lm_calls_metrics:,} in-process LM
+    calls in ~{elapsed_minutes:.0f} minutes (the agent's own subprocess spend is
+    separate). Splicing a different section measurably changed live agent behavior, and
+    GEPA recovered a corrected section that the closed-loop gate deployed.
+
+experiment:
+  section_title: "Phase 3 Experiment"
+  config_overrides:
+    target_section_label: "{section_name} — evolved from a deliberately-weakened baseline (production {section_name} is saturated; the weak baseline, supplied via --baseline-override-file, exercises the loop end-to-end)"
+    optimizer_lm: "{optimizer_lm}"
+    reflection_lm: "{reflection_lm}"
+    eval_judge_lm: "{eval_lm}"
+    agent_lm: "openai/gpt-5.4-mini (Hermes-configured default)"
+    quality_gate_label: "closed-loop behavioral — holdout pass-rate no-regression + per-task wins ≥ 2·losses; compound verdict (Layer 1 trigger + Layer 2 content judge)"
+    saturation_label: "forced via --force-saturation-check (the weakened baseline had real headroom; production {section_name} default-denies as no_headroom)"
+  dataset_intro: >
+    The evaluation suite is a curated, hand-authored JSONL benchmark
+    (<font face="Courier">memory_guidance.jsonl</font>, 12 tasks across five categories:
+    save-preference, save-correction, dont-save-task-progress,
+    dont-save-completed-work-log, and declarative-vs-imperative). Unlike Phases 1 and 2,
+    there is <i>no</i> synthetically-generated train/val/holdout of LLM-judge examples —
+    every task is scored behaviorally by running the real agent, and the deploy gate's
+    holdout is {cl_total_tasks} of those tasks. Each save task carries an
+    <font face="Courier">expected_save_content</font> rubric consumed by the Layer 2
+    content judge.
+  fitness_intro: >
+    Fitness is behavioral, not a synthetic judge score. For each task, the candidate
+    section is spliced into the live <font face="Courier">prompt_builder.py</font>, the
+    agent runs once via <font face="Courier">hermes -z</font>, and the resulting session
+    is read back from Hermes' SQLite session store. The verdict is compound:
+  fitness_formula: "pass  =  Layer1(expected memory action fired, forbidden actions absent)  AND  Layer2(content-judge score ≥ 0.7 on save tasks)"
+  fitness_closing: >
+    GEPA's reflection LM reads the per-task failures and proposes a targeted mutation of
+    the section text; the sentinel-preserving proposer confines edits to the section and
+    re-raises rather than admit a candidate that drops the markers. The deploy gate then
+    re-runs baseline vs. evolved on the holdout and decides on the behavioral signal
+    alone — holdout pass-rate no-regression plus a per-task win/loss rule — with no
+    paired-bootstrap CI, because there is no synthetic per-example distribution to
+    resample.
+
+results:
+  narrative: >
+    Evolving the weakened <b>{section_name}</b> baseline, the agent's holdout pass-rate
+    moved <b>{baseline_pass_rate:.0%} → {evolved_pass_rate:.0%}</b>
+    ({cl_baseline_pass}/{cl_total_tasks} → {cl_evolved_pass}/{cl_total_tasks} tasks,
+    <b>+{n_wins} wins / {n_losses} losses</b>, {n_ties} ties) while the section text
+    <i>shrank</i> <b>{growth_pct:+.1%}</b> ({baseline_chars:,} → {evolved_chars:,}
+    chars). GEPA learned from the save-task failures and inverted the adversarial
+    instruction — it removed the "never proactively save" misdirection and restored
+    proactive saving while keeping the legitimate "don't store passing remarks"
+    discrimination, in fewer characters. <b>Decision: {decision_upper}</b> via the
+    closed-loop gate ({decision_reasons}). The proposer rejected
+    {sentinel_failures} sentinel-breaking candidates. Throughout, the live
+    <font face="Courier">prompt_builder.py</font> was restored byte-for-byte after every
+    trial. The production {section_name} itself is saturated and correctly
+    default-denies — the framework is regression-catching, and only finds improvements
+    where real headroom exists.
+  how_produced_intro: "GEPA evolves the section text through a reflective loop; the gate then reads the behavioral signal:"
+  how_produced_steps:
+    - "Splice a candidate section into the live <font face=\"Courier\">prompt_builder.py</font> (only when the candidate changes); run each holdout task once via <font face=\"Courier\">hermes -z</font> and read the session from Hermes' <font face=\"Courier\">state.db</font>"
+    - "Score each run with the compound verdict (Layer 1 tool-trigger membership + Layer 2 content judge on memory-save content); abstentions (agent/runner errors) score 0 in-loop and tie at the gate"
+    - "The reflection LM reads the failures and proposes a sentinel-confined mutation of the section text; GEPA accepts on improvement-or-equal"
+    - "Select the GEPA val-best candidate; run the closed-loop deploy gate (baseline vs. evolved on {cl_total_tasks} holdout tasks, its own backup/restore)"
+    - "<b>Decide</b> — Deploy iff evolved holdout pass-rate ≥ baseline AND per-task wins offset losses ≥ 2:1. On this run: {baseline_pass_rate:.0%} → {evolved_pass_rate:.0%}, {n_wins}W/{n_losses}L → DEPLOY"
+  how_produced_closing: >
+    Two design choices made this outcome trustworthy. First, the splice-and-restore
+    guard (atomic backup + exclusive <font face="Courier">flock</font> + byte-restore,
+    with stale-backup refusal) means the user's Hermes checkout is never left mutated,
+    even on crash. Second, the deploy gate is the same proven closed-loop validator used
+    for tool descriptions — the prompt path adds only a thin installer plus a per-task
+    content judge, so the decision rule, audit trail, and restore machinery are shared
+    and already battle-tested. The behavioral-only design is not a shortcut: it is the
+    only honest measure for a system-prompt section, which has no cheap synthetic proxy.
+
+safety:
+  intro: "Every evolved section must clear these constraints, and the live prompt file is protected throughout:"
+  table:
+    header: ["Constraint", "Enforcement", "Status"]
+    rows:
+      - ["Self-evolution test suite", "1,232 pytest tests pass on the optimizer itself", "Implemented"]
+      - ["Byte-clean splice/restore", "Atomic backup + byte-for-byte restore of prompt_builder.py after every run", "Implemented"]
+      - ["Parse-guarded write", "Candidate spliced via repr() + ast.parse check; refuses to write non-parseable Python", "Implemented"]
+      - ["Exclusive lock + drift check", "flock on the prompt file's dir + sha-drift detection; stale-backup refusal on startup", "Implemented"]
+      - ["Compound verdict", "Layer 1 tool-trigger membership AND Layer 2 LLM content judge (≥ threshold)", "Implemented"]
+      - ["Abstain on corrupt session", "A malformed agent session abstains (neutral), never scores as a behavioral regression", "Implemented"]
+      - ["Closed-loop deploy gate", "Holdout pass-rate no-regression + per-task wins ≥ 2·losses", "Implemented"]
+      - ["Saturation pre-flight", "Default-denies a saturated (no_headroom) section before spending GEPA budget", "Implemented"]
+      - ["Budget ceiling", "--max-cost-usd aborts on in-process LM spend overrun", "Implemented"]
+      - ["Deployment via apply + review", "--apply writes the section; PR automation deferred for prompt sections", "By design"]
+      - ["Benchmark regression", "External --benchmark-cmd hook (TBLite / harness)", "Planned"]
+  closing: >
+    The source Hermes repository is never left modified: the section is spliced in only
+    for the duration of a trial and restored from an atomic backup, and all evolution
+    output (gate decisions, section before/after text, run logs) is written under the
+    framework's local <font face="Courier">output/</font> directory. PR automation is
+    deferred for prompt sections — a section-scoped PR path is future work — so the
+    deploy step is an explicit <font face="Courier">--apply</font> plus a human-authored
+    pull request.
+
+roadmap:
+  table:
+    header: ["Phase", "Target", "Engine", "Timeline", "Status"]
+    rows:
+      - ["Phase 1", "Skill files (SKILL.md)", "DSPy + GEPA", "3-4 weeks", "Validated ✓"]
+      - ["Phase 2", "Tool descriptions", "DSPy + GEPA", "2-3 weeks", "Validated ✓"]
+      - ["Phase 3", "System prompt sections", "DSPy + GEPA", "2-3 weeks", "Validated ✓"]
+      - ["Phase 4", "Tool implementation code", "Darwinian Evolver", "3-4 weeks", "Planned"]
+      - ["Phase 5", "Continuous improvement", "Automated pipeline", "2 weeks", "Planned"]
+    highlight_row: 2
+  closing: >
+    Phase 3 completes the instructions trio — skills, tool descriptions, and now
+    system-prompt sections — all gated by the same closed-loop discipline. The
+    behavioral-only deploy gate proves the framework can evolve the highest-blast-radius
+    instructions surface safely: it default-denies a saturated section, produces a real
+    grounded improvement where headroom exists, and never leaves the agent's source
+    mutated. Phase 4 (tool implementation code) and Phase 5 (continuous improvement)
+    extend the framework beyond the instructions layer.
+
+next_steps:
+  - "<b>Harder behavioral suites</b> — Production system-prompt sections are heavily tuned and saturate the current suites; develop richer, harder task suites (and weaker agent tiers) so headroom exists on real targets, not only adversarial baselines."
+  - "<b>Additional sections</b> — The same path supports any string-constant section (SKILLS_GUIDANCE, SESSION_SEARCH_GUIDANCE, etc.); MEMORY_GUIDANCE was the first proof point, chosen for its clear tool-call anchor."
+  - "<b>Section-scoped PR automation</b> — Wire --create-pr for prompt sections by splicing into origin/&lt;base&gt;'s prompt_builder.py (not the local checkout), so the PR diff carries only the section change."
+  - "<b>Agent-side cost capture</b> — The agent's own LM spend happens inside the hermes subprocess and is invisible to the in-process budget ceiling; surface it from the session store so --max-cost-usd accounts for end-to-end spend."
diff --git a/reports/phase3_validation_report.pdf b/reports/phase3_validation_report.pdf
new file mode 100644
index 0000000000000000000000000000000000000000..b2cd415560bce8a8ba870816882e2f5649df6e29
GIT binary patch
literal 24431
zcmdSB+19FRwk>!cPca3=3PccV5veUuK*b6a0RamtQBR(@$rs4H%lmxRWUe*ithHlD
zW=6!tIc>MUwO>H|3iueKk3RaKsPZC;*U7&W|F8e^|M<Uuth_YIKC<onki6K-?2mW-
zk)QG7nz-M{i$6cgN$SP#<eRx(<uCY;#*f5#{AD=&3nt<}h<tbT0~3iKtzTQsADv%Y
z<d5#Jt=5m;udVhE{I?VTG8+BGi?2Uh`q{+aT4e3FCnkBH*u)F}oA<f9_76;m-Q;?Y
zgCE%L9y9U#{>Oo(9eH{0_z!ITbv}Ph^PNrpKi~iPk$;o7^{1|yf9~r3d8=N=B~hBh
zKd0IKfpJNgWb4#+^PlqGf4jThN@DX??%~JFei%P6X3b5f-4^fVBl_8Vm|v}5JExyx
zz6ZC@j}Gwzlf5|j+<xFho6N_yu;0%g<ZqYYFF*b?)@fu1zlQt++bhn)bMvY1?$`XM
z9pdM}-=EC0qrXT_LFZlI`Z<)(>@?Y3TYc$t>i6oLXEg79zu$5@{l)78UnA$!e)0nj
zsj$PrN^eZw+fIw!lG`<??nXS@tFz^Ka#+7n-Au5TXAs6_wj%LPhVti*zt-mlff}uK
z;!2e{ZJ5g{7>w7axKx_$S(U0ITGJ|f^xVHe+omfV3aerEnuv6*hb-k4W!~n+*>2EX
zm5a*EYgD1@toeGDmvpec)crJ^f>XTI<UcjWER@XY^8TW_lWphJ0PELtx&7^SNNWwm
zVj#$y>s(m!m6g<~o5KdU8p84Ivo=irIC3Mfp6*%#Q)IF8zV0rM-a76XT<=m@VA+G%
z?M)|Rrk6QMJ+7}}`NmM~)9I@38*!Wee2s}0wQ%-%w6-rw{l2Os1i|dUZ|Dd3sYi=;
z=87BU6OR{XS#4|0q>18o3{>0p=zcq13$ar>9I`169O=yTYL?$yw+`>u;6xQJYx}^)
zjpwQhtuTDbw-(hY$IiBc<Sx3vacujw{SdlsgXQ@)q{4PoeywjA42EY<E}o0nhe4c|
zH9k=gSTfZo$u@&=tKlu-lPsJAtv7fWPl%lkA(kwz`5hq7&wCya`Umgv$T9*#-BW+<
z$x_lRILC`V0@*nQ2mM0G5!P+WKQ^bteCkFe>q{@PD|SMZls5o0JB|t8dyI<Kq~iJ3
zX`tSZ%q#$*fp;Z34nh$X(t8xk%!=NWYF$^C@5h__MgzH9VU{<FwPc8M3b19+!Y7#t
z_BH{&j?84$xT@S|W2GJ)rn=DX7D&o?-N=17mdQY7Mj-=WVc?%4bbI=KcW^Yh-*p2y
zQYpcPCU|$(Yq?wAe~OlaxA0Lo7;+PiZDmK!nng6=Sl!9uf+1zTFAf|SQD8HfH$>W+
zR<7B+f9qCv8)#N3WS{YQfA*8HuuJ)Nte+1jw=zuW>~tAzNq^MkxnnfOZi_|33Ow_a
z77*G74qj~v-8aSBb85}E=WcayE~0FyI0cN^Sg!<OR$FF|@2g=u-N_y9!KQFp6zRC^
z4f@C{<Dcrca?+abf!>o7#`Dw18ByB$bD|Qc(;A#K<kanA`sHLSd%baxj(VG+_^s`F
z@%z;Q-XGp|jzP>CS)Z)SA^3t)TF}94QVFbuD4Wh)-+J-6F}iLxEjZolVgy`nLw}`6
zi<!RSS9tr%+1s$YJP)kO-r%>?YkG_Eh*|5+8PauI)Ut&ZyN$7|bi}K}K4LYhi}ir<
zg6o|hG1lR}eiuwJ`W!~@tc4z$+ADLA6yCVdY_>SDy{(O;^)^3iJ-1ulD6yUmE6$2?
zTi{@@Av3itEK=jzuq_i!_!?UjrMPD8-Faiuf7O-+c%RM;K-mUpb1{FGbg&Z@t$_L8
zN4abGW%+(NiJ!*ve>jPM%5qDY{nwvX{5MN&<UhxM<R<#xH|3os{^Q>?-u5T<N!PqB
zVHB-?&lapiAq|pZ#aHVuyWd)Oj-Tcyv5=iwt+ux7rg|!rWLN8`kiw>=*LE?HMw9R|
z&t?lQehG9Z?GMdKLny$6!LZM%xa%Bt>j6=qrd|;!CBDTscei7rM9+;FHd|gcluwQ?
zfh#0bDz|o5Hi5HrWX8BYE_wB(yzQm)Le-yG{jds@W{B4I`ipyAD5<%bSx&K$*7`h1
zpredM?d<8hW*@5ypc}-t+iFV((+iJywm@d&lX4nS)>crA@ly&d_3nKNo5{EJDy{5I
z-O%tBnaLP7@GA9WTv<q7lExHq82Z&`jjw0yv{|j7*UCoZi_(DKbPLaJ17TjwY#jUG
zxgeTnPpJySQVXi=I)x@xcivfTxSm7ejq2BT2~*r4GK9YeW8doVG{nQarBQ5=Kb0rQ
z7w0d?!^2HbtvbYXQIi&PxtS<sTmsokrG$HRayFA^7_w?BgZA=zzhKZ~P=IEPoVCt{
z>#2{v;@;kCU{w3eMC;*GmEVZ-WnYi>1W^>mLJH}4Qko!z(}LMlRr6RMI!b0%K3d>z
zYIH>ihb{=*{kSSRa(Zs6+HNvVUs1`X2K)8t*?n~;Vsq7!m+}i+H>9%oP^Iw(?htR=
zz~pfP5emMOMWu(@?OM1UvNdB1%qN-#UWCG0w;({6?C{y9si&QwT||Ip`*tWF$dUUV
z)Q3|T1$`PZyd5Gc@m*RyYQ;jK?5twO<HwyEk6D@1iP(g_Bz+D}j>_ZV%;0&+*`8Q{
z=?d1nRxT{+aT)4w|G{(m6Wsle=kz~=Ih0Mp>pz3&atQTLA-eXT`r~r2mjk)?58@|C
z%~9#XyWVX!`R_S$Yvg}zocPh{5kJj<YG*$oAMx{We&Wje9p1iw{03`(0?=Pgs9EB!
zz0XhV_Y)Ii8}Gm82<<oAmIK4zKDl3?)1M9h4aEK4P8RJGr{2VE!o#m#w@nZYvf=t|
zyGeFCzVEtjv)gL__z6o_KhfT3l#eC(<e)M?f^9N8{%u4*kN-41zhhc8$BaKu{$&||
zqV4~$E3P=<{q@aJ<{zl}{|!br|H+H~6W0IdF1b8HxaTnWpXlf(CjMve`p>B0KX<hL
zYvd&159}BCZ~c>v;XiNo-&*@)t^A5${->{%f9q!Be-6L?n8R-i=zqa_|M%wb=Ya2z
zIs6_T{m*}9|Ghc<Ig<Qi4!>gH{~~MY&zS0uIsCM!|0`_gzjq)0vA;Rmsn6$(fn(?1
z;BR)K{m<Hof9z9!u^RtH?(Ltw!XNkc7a#LqWDb9JGe4c);y2Dke|yWEu=(l3126aW
zzg+n5zLB#(c9;bDTKb!N#Q*F=nD6J7WIxJg*^ZO=uTk=So0Uq=`{X3c=bps;PtTjj
zl0<`OH`)!N(QD#(6K%9hjYjEj=l$NmH-4v{d-w0&^|vPG-e3Q>B)h(Ul=JiQk^cWR
zl>>|G*UjB!`49f>C&G~LPG)C+{ek{4|HQlf?BZAO@f$~!&*1L|u(^N2eDHriwH!pG
zY0l_jq8+~fj2nKtaDTYN@9zxQ{GLz2`0=|D_|0Rke*ETi@{iv=CH}j!!GE{c_^&g6
ze>!pi^an<H|LHFLhRFWb**_bD{o>Htc;jbFe`CVzoC~n9k@JQJ^L_6!dR7=|cg>0_
zeJGMz*{1@GCP9jl8x1(QJmA@cU@A2v>Mpr4xLQTAv>U%bhdiEBJ+P{pv@qksFe(hj
z6{U+$t&`U~2|A6UtqiH4NvMo+7#)Pn_naai!CS-WAPA1rHlUu%O?Fhy+Vc5{8=D4!
zHV^fgm6~UU2nJ$#R-BeL*M_<e_o&!9+^@yfc8M0V=A=A5$XJnGr{hzQQOm~SWO%b3
z2bI1vATia(d$wo_DAKNL*keXb-R!<7w3HqMj{>mk7N|)8JhXkG)vSo*v`yaz^MYF5
z*+aeA&Ym(rO66>WjW3~ey-Md!woocOvPalzdv>`?Z%V>d4KNiaS<kPn8izn+ZMgXa
zk#Z5M>5C@UllXRNI)jxt=u2%i8dR@##f@tmt40ms2;_9m@Yp(#LwTVGRaAArV0SZs
z0oy?in+K`*yU*#xFkht}wkE)B2U@7Yg9G=2rA6Inr0;Wcfiwp2s1y$7UtQ8R^GeH3
z8*K}o2=#Wi!bg|=Y<exQ4N9r5Pvjs-<*&%C#s?q)(kXCQEywqgj5^aQ!K|PiF~6zh
z#XiGZvinMh%2Vj5U+z2|@Y?y^9S**Xqp!YW3toO3y`1_w)!*HcOt6?Jen<YPX|(Lc
z)Za(d^p5_6n1=W>VEGppFVXCF{^RP6hCF6s9z5xG>w-v1=19^t3q`icZYkIN6A^Wq
z-%`E;QaBOKW{0jJuxSeG%V1UCX!cW>8Wl~^K8-Op8JzaTkDfg&pEKrdtW#8%y_H2-
z%>_U=6>o)%TzKj&s(N7bfwmls`I_Hqzy}XD+9u1wCl}=8<9rSSr5gCCmLH*UE|krE
za!&hQvozXIxanm6;oha1?alb4>*&>X!|bf5LH&*OX(c?Z`H}%t*UHNQhW>kMLEobi
zQ8y}VlphsuIZgWTVuCY!cK?y53ifgOd-38>1^QV8RAH7F6qQKTc&kk_@dRz-vp-Hk
z*jI|9(Y@)(v+=`Kn@{B!OK4b-g!x|ZdQO%$M<65SyauU$Dm|y1diCw}UN#^~)<b_L
zt<>oiA%SW^V7s&z&hT2fS{7_~dnDtUqn)b<LWx0aViOu*9nZHxN4M7gK@s{Ja_9xA
zxN$2*3aD?}GSP2r7wzy!=_H$2bkclh-+Pndc>)K$O?Br0=R@r@<Z*=^-&LuhBHnIl
z-%2ZCw<Q`~{XXz#gwaqyV%rGj#L6Q^GxxMlJ&J%c)9z%iC&-<NY3KO~OuHB=T7)&;
z5I{5n>u$oQ(P3b`NhSqF^imq?x8@0*x6|PPB4zOPVc4TvsEG$@gyrnga@6W0202Kd
za9SQ!u~P9lY9T@#4DS3a-oIF;b>DkeYzft7z}&AvO<$Rgsj7E2IPNfiEqloGB`(e4
z#?W|-7*`GkyG1wD+_laL`fcNre*IMUgP}PX-CIVbd!JQe5^hQb1<e}1)N0bZCJ1)l
zE3|>?5uvq+o#%49d%5pCOQH%uy$URicf}<ZW+U%isV~GAcC|9i7~7I=q{i{oV7l||
zv7{l-Q>CE};-1Rd?K6e0_GnBa89eGc9Hvcx$vrLv@T<<5`$GAP56MT2%zeF@QVg{i
z>`KI|bnc!RLwmV3esiR1<51gxpEg1tI3H_-&1az`^^_O4pMQ-MEAV1C&RRq^KAg@L
zsnG3CSp+|L9dV8&aOS&>Pdwc4?STL(=8N0#rKgVcf%k}9xMjL1ha-oz4v+V35Xx+9
zf53<>88$V#0r%%UV~0z)KzaPEhiTd3$W+lLvhJ(5-ddsn-HEWZI%Yi@bwRXWxl09~
zDNq$b4j#;&gSxz{=(P;4*~7Agx5UnOj^dLJDYAH_SEbq1YU<ntEwkP_U*jk{dp4%m
z?buyc8SN}mt$0$RjK{2BVQ*rqF)&Z7ZcXp!cY8Acw76b8_5Fb&7{+FU;6w%L0}iSP
zL<uVn?(}T^RXamD)SBST(H%anfXBN}9;|VK=<%`C_(BWsyy43YN2))*i$?ZUv%?<F
zikE>0O=c4)EIQisz~XVI%-!w&`m@{5y=OlvN$bgVC6k$X*^I2?deYE3An&{ftG)Cl
zcgM=2yfdNl0V0^jz7~(V(0Vm#_xp|c@!cHVF3?jiqb-0RCABY_d39rR49VaT=G>|`
zxI9^kY#Q5H_-#H2t5s5iDKE>svJi51Z@H7^2o;UYts}e8X}Q@4Np**gz)VDdqt?Dj
zhI`^Ino6vNa$=o5_-4qD$kk?-88$m7pVRsT?%Y*c%r=xs-^SUwSYaR0yH}Q~9t|HC
z%jm9h$^?I6qFH4QN##-4KB+M^E>0M0f0+)+D)YTV)q{kt37Z}6vl})Qd-iKytxWNH
zr!ZqP(L*f4dhaX#$*DZ~i}A#h$&?G-*lgmre(`ew%+YzK4qEnnIG#R+K0uU-)9KR-
z*3Hu!xjKh6zxFOH#|}+mS?i6HY5*|wosD3Gd1mY9Twp1ENa;s)yXa2Q1)+9$Y`Z&-
zLza2?*SEApx_P>3cH7!F$BI$gOIPV#=#0lb?$QT8@_jl3BIec^GG{w!^-1wUjw-z-
zarx?2^tdC3=A5m`&BYxpmJi~l%JqE|m^Z)0O95?&=CIc9PXb0>+0c0NoHSq5dRp#{
z+*#tUTS9Id)TkQrxdE5k@GTBO=(st;nen=|7qjW%t#-05NUM~>YwSL22~m}NjYF=u
z2(ni=?WKB5=>9mH2Ao^G+gr0M-gl8#Bj!kX5Zm}tw6orP^&k(kkJ(A(#^PN9liOJv
zdk+}^<fDmA^eq{F>E*H2YwpeanAu09Ew3g3(RK+@C|<B>k*B+xgzn#w!e?LW;#{%_
zRRMJ!f4O=sP9L@l?+on*#@%|J%<dMF%teo4q1Ni$z?Zi5hSk`UdobC)O%Hu^gq@Mi
zwzFJFqPkA=>|023+ICM*#^(L&A3Fm9!AtE?*HJF=IDf>J@TBoA5@tB)Vuk83(D#Rp
z9~`fbj!$bzReuSE^DFZ%rEqe(#yb^qrV=#a!9JIE1Z~(x3gNUgtXez!mJV0tJlS+A
z(#0F?LDfZ-#rGJ88qlYBBl!Bddo35$$*ipGnQ`H~o~6xG!=nzB-k6W$g3IriDZjoR
zoJUI-FS7*MWzRMU&8oLBcv1+AnVUIr*rShN;}dktdc2<6*cdGJ{cSi{JY-?b)8+m5
zSe!y96x+uKB6$dt`)*vKDLZ5kYSO@zVaQUqd#g-UqSG5lcKv?@*Z$1S{Ht)SO|<_W
zt_90Vu@1i)vMw&8DOYLQ;7YaO{gU5f7yHrZ&*a8%51L)I_^2a0DbL%?Afm9+N+sJE
zQiwcrxB6aCqM9g+AN_2KNm%H36Og#c4bDxTZI&RSA*bE+{w4=)754SYB=aV~qs20p
zUSQVz3&5(rcr^=%e8lZ)16K<r>u#yjRk7kkND{qoixf6z8c+#}7g1cmwE~@{)l~;A
ztyakGa@%v*k)f$4F9Ae}IVi2Z#&0kqQ66)TgeIYsv}c7!bJIZ{J?ndzsmPecog35d
z>(dHo)$5CeIqX)7)l2(T?9|1YIF<`yyOTlk7tw9vZEiw$%7-9VEe%TV(=yOD9#A?M
z&bzb#qC-8$r|GopO&3JjD8aV#fEzdgj@Z(Ue>3o+ix)`g^*j(AMt23nPQ61=%f$j0
zIB^b2;R~J3L6y4W)vJF2AX=-#$CD4n+@8LtTUrP!I1GZ7%lSMh7Vmf~!=*-HTv@uH
z))&`@Kpu(Fq+nU>29Kb|zG@0g*L|6W2Le6W#JxwsBH~Js-at3O^EXSCH-LuD$x@%I
zGRw%Hwg;#icO3nK-)GB8FB@sdiwNnZd{JXTimx}#5RVlSzl^PiQF$)4GWVf7g;s_3
zrGdA#G$+~L#p0_AU5vgvoo@F-5F}exAF8jxktj4*GW?J(W<Q1}NDZ$VbygXwJ_#{B
zHl55bYE4?ftjCOK9N#`ZrCL~6s{n3vu#|af00=oChbW@ZE(kRT>QnTz1Gg*irHWfm
zXn|F%(x#uM2FLGO$6c|3{9xwe>|U=53Jl)Hh1(K;!DLBnZKd+7_4FDr32ZS)S`DXV
z@a51fp)AB=KMhX%IH|DeZrHCs;qllM#+|jW26KO}_D(A>i_fp#U0n6-t9R?J=LXDm
zZ22)1pBjr{M&<rIj3Rf>iw=wqx7%K`*jYOpve5rN(8JIia&fiD!vonml!sCRJojC>
zyJEFP8C8k-bZDWORgU+_+M7IBp_?$7zy4Uw-j(d7FdmPab+!9-+orPj8)^&tgxhmU
zWI?OfN!%FUswb=RE+WBpKDr*p$Z7%Ui^&5!m*@pBnKB|mg~vB|tF>Ho0IK?2`E=@*
zQ+P`L{)pwC=1|otn@LS+J^^y)o{9|wEzA>@Q<Xf8XjGJo1gLbLtaA-8d#Q8ia(nw?
zpLV>$o-DiV&vUOzs+(IomYocjqaHEZMDF{w=zfr~s6>ctY!?{9J3Vn0pFhonaNS!#
z_j|kQU?*c{=lSH%n*%J7T&d9$o7S}e<^C&2c0vO@tj>Z`PaB494$ER%YB1q#g07ys
zK*u#v=&uWt2*$spPakqHigBAh2a=8XtnMnjl6uQPUF+@1aaJSuMLw!w3Yj|ZKKd@6
zvu*9uLH#=Nf;j7VO3%{mNze8}o_r;rZa9OCvmu(5jA+vA7V^{iw0pzfD?@;v{c@ik
zAgNI`8<kQe=!J3JOLiwREM+CJIJ@~xpl3gmpgAioA;!+lJL1cwHfsdVpaj)#<a_qQ
z+}Zr`_L-Dmcsqe}_>nH@D0c`x8q$FAF_GtUzZ&$rcsSc^F`7?C%m+Bs_y*9P8EV8m
zSca{z-FAOQqRKtzeWKUivwN58YN{j*(im<IL!;-D0J`zZ<0tA7^sD!LCL3egD3um-
z^O;d4*4Jo|JJ54u)$DFM@fWKGJ8z|>>*=<KuDONLVb{d0Z!2VV7U8YN_@q3RgJbuI
zbat)eTex?$^W8tm-|3EO+`WrZ?wqB;?$j1swF95~^OzjIw8mwVu-)Qdr-Z}qEyZq=
z%BZWUSZ~3hQ|NFS44CrFy>8F+@y&iJ{T_C@*VLhK7qKHrookm}@FkQ=+p-{a&F;lr
zoLP>i$D<v-3ZmAfJa|abFw*6k3qFCm?MdBy72i5_&q9(t5VBXky26)1vs@^eu1i6Y
zkb=`iRw^FJ5uEvk4z6!U`;hyVW%@P@Xb$ddaLmL0DYZ+_V``kmWZLdy0PlCR9nf5X
zU`<hTxznI>o8>?PPd94YdS-fH&#xB;Ae+~hgmX9N>a(wI@?2fYX!<e0_O)3zt}B60
zn={BN`c1k>4vuMQ!t?Z=OS9HnYl*Gi428;0ugX+LldoFEjrlAf(|)fB*JPtZ9dAV`
zpUtIyds5USvk+Z`r)3<U@$+Hf6L9`L!h6-Jhcg_JH18X2y=38f2{5NsF;AEWLM|f1
z&IJyIu)e}IzC1IrzLnazysc(=4*jd*dZ}2m^v-=97r@&~u}Ag3P#kqhX;$yFlAhfk
z#<=0PF7{$0zPeji2)UrZkEYDyLho8_(s0N{xVMAZs*%<ovtgS#-wHKdN1o)SySO1`
z^f3W(C7j$(7kHS>P#)`w_|~r9#@2I+)f(AJ=e(Wlm2y6((B7+`wxr?DOngP!+utND
zCp7lUOmzDo(HO+Oqdg)Qsu02Rkte2)HJ}bgtBT6ZgZmfMU!8^7Ue5xHkv2-TFt0q@
zErK1)Zzu2i8UuwkvO4Y3Zr?iWeXn7<>t3G<2aA~?lT*2gbak+{Jjl}m)9j9dcCN=m
zOM7(!S)2k1u5MmB-0LH}Hj#2w*CS<BuNyq8wDK?vQk?e-Yx8OS$@#VpVtpPaCbZYU
z_k`8MJ<N7{{p&^0u=7AxK#E|cVSW6%wMIo#21{)OTOP&zwn}T>`k=L1U2Hb1ImvuR
zM*M3rW2@AnFckq5Th#T1;*lkfLf(=KvrVwy_lJ$SBM=g=bs<aBO3fFWEZye$h_Ea4
z@&J~vtP>**wsbf9$arh1Plf{U>{hhi=l#q5HYV!3+Q}JHj>**$_ADVheYzqGIm_~o
z(CMGEZvQGeZT9|-3a5}9osMi+e{h>S_^m8?XL0q+tYcwEmAGmn@za#4akz+vuz2Sb
z36MMVeGf3M^K4=3EA%*0*|Al`=8Z;Acn;9Jbvg+<D5nrqMy!wCudFud%$=TxQdjf+
zSXcU*{pz<CGiy)JUJka6TJq+*u`=ndZzUgI-yruED(I@0nKPZ<W|iasXlz*-LyNer
z%K)zQ)Zu*#qodNA+#PTZ;ym<P`*sLyHg8|*<X8MGMo;1yvh!-F$gci&<MP^jhwKx~
z7C1L9sAPY5*;!pOCIF4-JW8ZmB_8N@BvrNQ{ydSsQ~z6Dwago)+*kchKRPWI9?D@x
zzll8$Uu@QB7{f7o-VJWYI_K`A$TmjT-J#GZ8@RS{t3Za_EWk^3S(_^*PS8e^njJqo
z;m+WO!v#{j7LS6Yt%y=C(&W3}$<IL5trkBw#oq1OeK8NmJsNy3*%9*Xa@C}SyoBIf
zM$QtW+Lshp7eq()$iCaF7RJ@lV}JYvvuP_zp5ffQ_AsOSKB+ZmHGKOw?J;AI3#?|-
z!ji(ttWhq!qIjs#USELBwJhN)*cnXpWpLWLY!9e(`=)89vxecEhE%^)d7;h86KzyZ
z>(}VIIJTUj_Z9nP**)xz<f~3xx@#s{ji>A3WpsR|v9>7`Yd*r1WqPNe=!(P&myNn=
zScB#JI;<|2{pq{>a+u+0Sep*^B7X+DLDa#@knsvSMk=UA>H`dte74Zs3O4QROVvlk
z_4?auexT?9BvHFN7$s>|y3o&gG$jrUBbTl@8^ER~HHql^tO%^f*s-x7BKPdj<S@S1
z<e{ffc|J~-Ue_}X2(TJF_^3DS+cNFO^2#|3bGG$RB8IxK>NOsA)IXqCXnhcc9(pXm
z*M-K;&D|EI474fP*VdlGkF&bjv=-N*;+4DT_tAgNRUJzUtMhi-m(A2c397AQNpNqq
z7W%ai+G<;oTGNmk_$40@hxP)2@fY+AOVk;wRP_0)H_*iQ;uLxe&V$_ry&rTMdl0&y
z0BQEG`&Q-4A(U>S>uttwy@}fUT$a8y?L2O#Y^pHoFD7t)POT^FLM2rF&2YT(=bYk2
z?czxn_Qj)BgA^}@iVdz2dcbx4&VvAV0-dn8{vK<`uKZr~dJhe6NT_gXsrITU7;gv8
zW!J)nK8}Z7yIzI%aq-Hg!mK_>2ct7X>%i>dA+>#)aM1RXLfiJ#>{b>?$D)e6(dbyM
z@#Vf*ZEW3d4aZBXNXL&QpbdQY4^&;lMW8XDTWj|Ec;g&BysnJ{*09ZvHgiI*-#Fuo
zy0`c7t+Ody=@;_=PVr%k)l2ol6%y3)1h716sq7#(nNyM?Oi8JudEI)m+*N>@L9ZPG
zZdQ@bmrnxUtIT$zvXTb?+8nrYf|+x-im|KJkMp7~58zEj4DQml-#xXwuY7vd-qxmQ
zR!?=z`m}1v<OI&Uda}%M#Lk{wJ#d-m0;rrm8VsLXQNHw`&a$$9O`fAUFFRYPckMNW
zd1-vrf+|z9J1Fp~8sf_hm2K-n%d-keEy_>vI5<X`R=PY#?}^^=J^J>V7#7@L62O}9
zng>3T;6omcw!<<uDSWO&cZ7E7bPQaE7H*ApQY~9Gs7`;hM~U<l>xn!zGXaBE(R9(p
z=3yGY-uv2PYqAGuIh=W-f=6d&I@*RM0Hrq0YO+}aXw-k<b#2P6Rmr;N59WLzp9^}Y
zOVMNeh{OIXgSfkshGpvG!3RKC7UQ=Df-X0zS(9O>(>b~BE0|C#>R5B{XE;6<>Svb;
zu)#a0YwMT2y1Foz{-`@gix3!m&10?FEnjEEeN}25s^{AGGw+#x@o@{{!2v4>OQ|e<
zj1)~eBiwE&E0Ln*>h$<9Tip_wlH1q5dc0oZ8q7XdkJpr$)VB0m$Tl!2C$*TbB_as0
zEK(ic)ZL-DNEhq=8a_uIS$(5aZ8EjK*5N{6vZm6TEGJ3k#CZP{QQy3jLs7Fz%O*z~
z0g=}66J3)h-Z&1M1MN~gPkRW{ABIlrwLuC=0DO$Kb5W1(cQH}9z30%cQ<l`6HGuWM
z>HV`>zuPOYZ;R1lF21a0xok15I<l-8_t>x0$rGK^*Ugh!jj_r#S^&p5uR0(nx(wUX
z`I*a{_k&v#=dPT(W!`>cz14b{`(mP3Bb`yF=16u!DLm)*_3Ze<sMdivABwD2MVAZg
zxOq)%!|ayg=C%=&)|<|;_6@QTeyxAw@gp_wDZyub%9~*G`U09Z)-|Dc>Z$EQaPg->
z8?Kr2$?OoS=QjPA<_7uG+bhMlOU5EyzaPiUJx_?uh%L8m=<an|uIz@%Sm&lmS{M4c
zh`;53A1;TUk<mM~dGT%IE}2&`WNHUGeXWG%+tZL2g#}c9Z;ooNCLZ<p{<~?>mCk}%
z^{~x%)uJly<0icxj@DbRuyEYk8Br~KKDME7E7ZmC`dZzSSDV^g4nd>FP>*&_6~Cl$
z*^iw^6VT*lk?9rJ{pC(X+iR`PX<!)y8r6}<=+F5XT*2}>)`nfl<>VxrXoJx#i+N%x
zaR|9?@}@`EPbFHH!Pw_I_51wR??3e7Rk++<`&_J_7dl*?1pSW?+n+T?|0-f@{XJrf
zhTY9>CzS{6ae6vjg=Sd1)!&hE+2N&J%}pJJ$VhrRv9(I#nD*+O+Po?R<b0IoxwZF|
z9_l_i$oV=a_frw+R1cdB{Ip=Px5cc&eR3OsxqzOQZ#*2B+r}!hk2h>i?MlNV*kuYG
z?Nh72@3-6-X<zhp85}=wD;+=fhrCP!!Q_wB9*?m3>^jOrcZL}%ckPnPhd8wNOt97M
znGL3I{ZQIk-O4Vnb2!jL=<2#=$cdHiHV1&{Pzb~1sl44%I;``H%~i{*jhwj+PBovQ
zi7MFY$;zd$86TJ|UIAgF)2Nku6DekvceJbZ7pegVMzKTI#sYv<7$H4A9`t6}Vz`E&
zO<u#G#pGX&dOhx|7OQHJD4YoOu9L`>-3_#N+A9vdUJCT_)qb7n>nvGZqlt^egNevR
zOT(_*QSu$qw9{MKY1@_>QXTTZ3EW*7wF~ohlrVBF8vUrBp*%%uHB4?YDtcQBCce-r
z9d7@ikMZ&qz|qv~3<{MF?h;H_%(LTf#pY_rr`PUipfZ3<Z)a8Q`J1)@gzZ(Qcrxv+
zIwB{_T{>aj%6%{l$UT&MH)g0GT6vX~IhP{p9(RcIbo)7p_{Yg}2lJl$DDBplHh&9V
zmFHshT=iay*Ikh2SYFt56ofG90BDW<JPr2VtF19Y2fCBbo^P5<$WsB@|L{kxXz$^X
zb@!YcCGlT_<zHZ(hb8e<y0zVP$Iwa#C9f+I;Bbe~S(Tp{BZVe%W&g`Z%iRGO$bM`-
z>qcbq%gX_^^#@S2Li+xQv`+{O+mgr~deseFuz5udSCuKc@ASOEeHjsNt$e9hs#SZ?
z2W_I)81yGj0^kAzuRfnXvpbe%#p6+dhqn*~tYxOg%~g|B+q%~X&Yb$OOGt2np!tKC
z^4NgiMV(cN++uk#oPPF0eoq%A{JaMC&>$zhsuCU^Xi>SeNu+g3mCYF6QWJJc)py(o
z)m=^b5}WbBnHpj>yQ&-Dco?phvpk;;O&@!W{YqKbQnnNBd*5}rS~AG1u@>M^t0fxz
zw(lLGqGMf(75GJo=gjkbx1Db9FuX%AD!h>=WLxUCjzCM9620jhHz9mA9<V_>yG3#}
ztUJg`plrLfJx6x?bSOi0$OYRJ4_3eK*Cs|BCu8YW5utesfX2&+E9H9QbsBm22UxVX
z1bh=5LdlB;wnXl&a{63Z^ZH&c?v|55gMe9T?pZ+gT<c0}OEHJRc?2Rk`|51zp!_1z
z2i5TPeV}Fpk*!L8ukB3Q@bVEqs7LGZO2CUX0hLMR)4R71D{zLb*B#hq$2Rh4)-MUy
z{ybja&24JQ%Uc;g^T*SYd4xrlx1~{46D!HGg#~)4v$}WF3aURgTb+P;9vPO*{_Z>U
zTM}=$r4g;II#CIW!)0yr;%BCaun&*{m&!^%0Q^0vwA0OzBgflt{P{H1Y{<vQq<T{~
zMgSm{=_;$%8ugWpw5#X5jO%W!&B-NQjz#gFiy<_#VXmEV*I0^GmOP!aVzsdAfYCzk
zrweFsol*Q{u?N1VQwV-xwl44BPW5sN9CMbv<|i8Lr)BTXxe|dr0d-#gc)l3zWJdK~
z9ZNjysC!9VCjpkIqrSVIq}{a0yp-qj`*@2q9$M5oey^iLz;BaSpGD0%yixbMWn8q!
z+|@3(_N*Azn)^boONzmG#0w%M+|Y$!zjd-7&5Om=npa&29~^2JBXm3UuitoXk7_Sm
z#=UFZIc=*-n4O0^JrCjgU|Fquz3sR1Y8}iYx<Or=!|qJFexbKsfiIl+nl?4<627TR
z-CJwCHXOMi?<&XACs9n^m~DAyIeEzJsiB)}@Bpy&Qt~HaIyM?q&%Hc!<H4n0Cw&Y*
zJBz#v6(i!RqXMBHEx=zU>aj6LN}&`^0<(UPF!!bot|kptpoi^jHlsZ8khRsU-ZMf3
ze`%~G;UB)Ez{z<Ih*EFs#I@rE(;@fD-s@HjJ=fufuEW$RsjJ5VWj`;QMxG)gue&SI
zd<K`o3bcf#U&nHQ4p1=C9By_8t&67j%p9B%qZc%2c_s9B&E}QgUhRDVK^B7g)xL6e
z`OcJ*co9OG&fsLPhP@alvNmZd7i>M>0r(=6c|T5iN^?+=%JifWXARYG11V+Wyx8X3
zT{Is9Rq2uH>0l7ED$UWy<>uR+%pD}!ZNK2H+1psXdi1Wa1be+U`g2UA`oqp+>c_mU
zRqMtePY&$2F*f&8?|b+dKq{QCUlEvfjd5%Jt*<=rpxdqdLLXW8dWRxFqV=ZqQk-M2
z-Y<87^V0FWRR~|9c`>if>K@7JTQ1YLlX9o&yoUFCiF!zO1VD_b0@^TX*1FrxrSNU}
zOj%r&Yk75-g8`%E^K6vBf_&PxwbpYPWgB~O*ac5MnoWA>T$X8~n@q;1QGIrz7Sj_C
z7GY^_#l($nY{FRs&I-HSQoK5;IBuHXqC#53ON$Qm<>pe_x}tgoX?m^&w^8rdT#V4V
zW}`8F*0~NDs%@i)R^~G%qjy!)A8Yq^_#K+#OY!Zk8NtEC!^))}rSt4GneR_`vY!O&
zy}u*X+KrzlTweFPuuW0U<~3@;k;y*vfkxB|jOGr%9;nuJ#iwr)KZVz?-dYVsw$;%a
zs=HZXyQL4oTIX|KE%jh;1XWE?N#q?6Lp)b&5DUFtoR2QlF0;n=U5h7+1n%T%Mm4uQ
zhXlJ6rEfItv&a4kMg=2ZIz#CZh23XVVX_m0>a(mT^krIqUF+fUHfER6iZnp)XJpB`
zFNoo19N57elQ5f<lgGBp4UlnD%+GfBDbtVsVJ7>}O7wpfd3NysP(W^fC?uEIm7!9_
zDSG2|kT39ZdEq$Kd0m|3bGIU?2eJ}G!xFIZ*>?x2OgwulmKJBM)OvR(po<ZHV=QtV
z{JFz5g*J->d9yd8zVzKoTH4>Hg?9mJnwxuR!=4Qn9QfhXE0FtLCH=x3uG-z>P<v7)
z*XVYAnGab#UfucnW%}uA9qKL=_50Y%=0>wOyazapM|-<OED`PfE}>pk@^$h&L4oWM
zRQ%f2s+V#2b2_ZP_2e^dojt;@Ng~~kfu&@3#MOjDHrc*;6wus_-Yw*l1<~`MzgKb{
zdNR%N=|;}{=!O%W*0!h|z&yw37v3%ANr4Gtjw&KY%{TWD1dP?}(WPfuMVwS-S7QLq
zDulV^ce}$RuQIPZEjBOW4L0#Icg<rhr!O#HDZJOcrVo8D+wDTTK1xbJ)o^*ZALL-5
zYv$zG=ZV&;iaVR|WvZD+zk0A6(T6;bUwK`=O*<FM>=?{%Tkbl!1T9r?>Up*D0C1gA
zaj?Hx(CctNMXkQrt2SHb<?-1*iG>ey8<Fxi3OjH!FZC`fE8sKbs#~VgI16P=vG)gm
zT}h_>DYV?=nMC@vlrF<^pO2{1EiW9$E$Zf1So?zK99ZE&cfavuKvwZ9u@us;eJ5K<
z|8}C|S}&-Z=gmAp_V?3xxR9?$CFja!*G4d4Tvi^8H#4TugK53Ht&gnb_~o~+r$hG<
zKk+wn&Gko@Ed@OEx!<l`fnbr$;eo~{xX0ehN9{7WbaUVdTreZEtN!-!hOeA*F0!G&
zfM3~}UoXn%QZcGcM`>%|L$v_I_GvhLwqMfm&L}57r9Q>X2%|H8kOhXE4TDO{ooE$I
z4VM8v`YeQF{7HrxJ{l{_vVYxT^25B9A-x5dlG~c$Udl8ESN;?s5c=r)MXX&oIo#S*
zF{Kq;o0`Le>o#`Z%wVC`Uy~PEEx(Q7#a+c!i0~^@AMDSO7k@c)a$}kefLt?LcDf#c
z@sByE_BY%0g1({!Y@DlD?%ulwql=2sx{wcfUf8sJ$NR_ysS5zFTR=6L2}#Kxf3xTc
z2drPN2e)mN6~b+;dCcXAEpaSe`!+qAH7?zFNWBh^YwNp&{K`%mKBJT;yCch6bhGx#
zwg8yQN)hMh@-|;4$ZE|e3c*NZ+*5Bq8RVtgtsv$iX)a12869)(QE$8o^ZPham+iyt
zF$$xJN$rcEvcezL?)h@<o+dD-apxD;pPofelsxUbN>kp{3OC{tRnZ6=qt8jmEtkZo
zO1{6RwRHiTG$S>PNz%c;SFrCH5L8FkccoPsWy>DqNRY6tct{o`z4>+LHqt!J(|Ssu
z-4F}E!mab*pzHAaDoc<yXW`0cw;8M}=}tZ!Q^6*LRlKlrrK?DwT0505CyW+=c~(6}
zJKJK9B`5*hOVK%erfSgRz?-O}hc$FKe!!w~YGu8~4!xx+x-UiTH$8=GH>3b;TlTVo
zeSk}~LqZxzl|cA#G;1VEOXscda!j>(Y=T7pL_T>vSDokDpL5M0PX;p%-P2y1F(>hg
zr#{G&MqUG#wgYZpwZKUgvq$zJFB3>08;`{lny%XA4>@6$^7TTi><D}T@$mI#E`D-=
zN6TDu*PHeE(LOxh1w0cU*G_AZA<^lE<{8WW2`N?s=W$jK&;v!Pxn@M!KbF<erbSyL
zNT?-f$aY7kD14bUX<CKn^Wjj#$<3hn<Rslc59q6Br}1I6$IE%WG=w~K{?n2~h<+;j
z+@jQNayq)~j7&L7$<fL+BWFpklP}hsj*B-~5fjvX@YS<B-os8JG5g_aHclM<yF5rH
z5MD=@{AyH|4K@nqex9ZCro-)lvtOSYpYntJO5TJ)$*q7rlIrg2;Z89`eOy2_^vmNx
zY)Rr^ofz}QX;mgXyK+^Gsury#I}_B-<ASp{spCWcyi_i~FJ+o?%EivE`r=#L<~R%a
zBGwS+rEBBRccZnO>)FumNlbDnWQw=ySbXg)tl>M)nis4A^5TR;XeE#7^GQqtxl)OQ
z>zNGeX!Z^^MRxJ-=OV?RAsta1coN?cENp9I_gUh_5gHZgg4R2QIrc;@t%t`{BKf$c
z=ZU>M&OCHf&?cDZ48XO`Rrpw#_C^lad*_*l?Z@rB>6$Rh9Z~4k@^?4%z9}ZS9{SnO
zb9UmPkrm%3oTkzAJI<}lH>&c^tEsnY@3Gyc)r}Ik00}IV7WUld;=?q!oHsi`<A@Jj
z6|XE+8rGmP^e!Q*vJ7<OzNMcGpXnd!vf{NVG;NcQO=)-&HRIA5T;mSXHnqTv@Y<D|
zY(GwWbU(FQX|Xw>xW~(u-8WIK(@zw?=LxXwDXF=rm1SYNprQg@sP~s0Mlj834V3Qj
zsq1(9%uS`#fQ*k^;Wo)`@`MUn=X%cZP2v$U7u|W8t%P?j*kE847}HT3?|=5yLIYyG
z$|~D}<CfP+dmf{{LI4zQwZ;8nl;KNn$n`kQ#ml|qTDNX585ooHa(M(E;UbnUj{L>D
zrEx(TU1s3hs#?N9_)M9ueY?X#c{~)BIVW?$M<s4uy2Qm%dE0Cg%MnjVAJC&}x?LOx
z#oXL>Kq)nY8dhyb0@e1=aK92dhq_2z7R=#Gblj(auvUEH2OA+CgC*SU5l8#HuFfu7
z)dbJ|XNpZ`8|k?$Y(6r*Yz{c;);p2)K&H=+&pft{<LO&L96<wuWr+9u+h@eeiPf|G
zoFnwe)-p5IPPv9x@G>WzD}#POEX`WBRLH_oGl-Ot&RLmBw!`xiIaO%&nl0*0Cg2kP
zNSnQ0uV9}<=QQ1@*3v%CpLxFW@tyc>uGbDmLt$IE<?25}*lsVlwAy8viyPzkLsyYC
zf5y763|S13SsQJv`sGo*C4*5P9lXoJ_WjCV7qFaxH|=Cf{Z@8?ufwCRbLyi>ENE2h
zbYXC$A743we|6OH@!S>j>JTQnkkcjbkJ#2fS1$gmn3wFd{)2k++YqSBML)r>-um#o
zZpt}#%C35KvVLzi+ZdkfF;oN<E`s!^Z5eV(-_6}6rNpK~QgCCXlerQWhPkhB;y8$E
zAk~Mkc<i_ygfGSo!tR7`@}mxx?e2ShtgbJ;{B1Cgq3Y6eCVei_#$r;84i&1~G?eDt
z>MX6OyOk!;VX8c<exr&GZu|+Mx5g`~YmXadU{8mXTN2(w{CNv~tE5MLZW3T$CtF}4
zCiX4<j#gC5HP);SQtc{N>kRv|g<t*%gs@&Jy?4#CsPWr5Ps<lri`H0j8$1Wo>B2_s
z3E}A0t@)_bXUMa{Z658-2+<p}`{;~mNw{Brt7M4`&&YZRIEBUmPfwN1Ip8b5w7Ffr
z*+uVS(#@!qKs~{p8?{?+T|Q0iZA=SLX*(;XCXm0eqPp1NxMjyywcgwK__vVAh*nSS
z3{uC9zO~p*^JVF5wPAP)2pn29#4}eJl#downqRRzm0Lxe{+*{P+3+RN_4&I{H}k|?
z&S=NA1yp;ZZO;>2t#WK$?{n{*77h!edAy+P#(X4TtM&*V!`<xItIEfgxqFl}CD;oY
z;?TUoU1xFx7_?5S8`JvaB@>q3Z4^$?Ti|4?qoC4^1WyL{SkLlUS;^neF_b4I>^-UV
zUaqjWZ8%rK#*d65uUzD;3k#x(EMj_|=6ToLxvizive6!V<=c6!6^p0oEQir7H!)=W
zHF$M7*K;59aEp|vp28F$pmg8Ox_z|8v5)IuW6lFfI0Z+-Uy9#1Yt82y>n$IdvWw*~
z-tH_@0wu-uUYjrxSON&WIpV%I`(QSQ!S3NVy@M@_J7`$JzRD;oeKoFF+un+;!aKW}
z^Z%!kbL~+U%cAi2_gB1L@B%7#L<I!}R1_6NMe&A!;0**dnar=BQ=LqwyQ-7P_^o!1
z@qlI6XYI9~9oL3Jjn|hw+b$Qiold=UESrsL<JZv9to$)DGP=S~EFJtTa~oh|l@U4s
zaGs0U**A#wKB+j#9=_aK-RCUjon|)80zG6W4)?y-UuF-D=E!BKDab6%vudE8%z;PV
z%%<0@tAXyFGN8}e*Q<|4@qw-?qfc8XoMd*0Gr~4w_p8}86J!hY{Ro<flhy(`A8^9w
zHM{iAC&E%v&B>4y&)roXSU)@X=?L;f<*?6bFZf;^B`U!_rF^8GfPL~Qt`NF*fr0GM
zQ<3brfo4r)DsAS<@$&*1zbV4pO8I);>35&=T{sEsa!|>b3$xem#dE&f64}BQ9soTT
zm^Q}9YTNhZjzV13o$~12HOpCV&eiv*g*e!uGxkkE**}2X;x)rux6wG|2hoQyUR3|n
z$!QF(rnvusI#xQDZP3Sr$S&(=(=OUS^E02SndL|efY4seCH|1^mTi8T22r`3GXeX|
zrb3~85~9{Y7`fyaffjcLsZXDV6==tx?N%WyW=t0g(b8q5bqRo)3i;Wfi!M5?B@FT-
zYTGWwWYEci2IA?C%TMjY@wa2PQ~=(oa&e&@F#g8Vo!s_`gk*6dUnfBwDvkt$3k6EK
zV*bk<-bd&yqxLoO^jT|gM}BQr0Jvr@I~bjtamh%-UUE$Y64ge;H7~B$oaUloc22V%
zJWKUY+&hBM6dZW7D|#F>nwVC<<wG{|1VCT0cqa5N`^upo)d1-%<cDzsGAH<EA4?`k
zZer4ffB5ba0;UfUnENJpStjyQi)x)<&4=1vXPZyiePR_)Tv4&p{=`@&;6MN+X4tFM
zXnA3)`tk*g8W1tcZ~2VQ6mw1YwykZlJ$b;+O`Rzbl{QzgZwdD*wLU=k3MgEsF6S;t
zu#CbR+x|R3Hs^T(2Zn`3FVSvibymf=CN}!4Qx97aFoqia_{cW${hG5b$1~pK^~>W%
z#MA!cJ{8cy=5Uf~3&z0g&g>re>}A4M<VEX~9W*ez2HcEHv3YM&EUDigrO6xMs=PuP
zgvH%DXG7N5&)k6VpMYI!Y9En7!ul_DUJlKC_LGxjsFDGmQ>Vm^QxC~2dzyJQTc~mQ
zkSK?U{+p~zido;l{d91&1oOJk<PrLY#<W@4V8;qdHiIHEYQbx2!&NA9O9^xdaXv*p
zS5kY|vy2&gHE9iIm9trP=8aj4JDtb*rcV;{X@yl}ahFysD}y+ub$d$%<ocM2faHe<
zp}syK?{#7fcgt(?SRVQ9edfHE&Zx@>z(q5Bb2bZaZD8&f<Gxz)EQJNnrh$&Bg+{H@
zLw8qI00V8)B~FjtQQvjxF;KJVLjgo5G(Sq<Gy(F&ZRK^0+i~iagfNq<(PoS(tp;*S
zj6qz;e})8QUG8{}Q8WyD&CMzkx#9yP>iTpE=cN^qYWat06$Us@O&0I~u<L-hS^&?K
zch9BvccZ^Q<HKjQBDMRTnB7>(c~;lUz13zTuiEvtxLsD;n@cU(bu#9tqZIV=trH7J
zd7N&i?sEJIC0`yqF%JfkaE@YfS}Pqwi~&qGv|hYsc34Jrh-OeR?c6nzF6~r={rvI{
zd_s4(FxtB}!h1jE_StTC<-Jyew-&W6HW%QSArR~#;k)z8oK6jjQ7%+>`}PnRaR*SQ
zUOEdyQIzSzfi%#@>a8niYw%)CY?Rc)HZKHXe3STDcZkk;mV4xsy_HrbZz<SrfcH-!
z8iA|5HK#e52NqVosE%2>+b(H0L5^Zlg1QtZUdtjDpr$zob`nUP0X3ySt~@)OPUX9W
zkBgX^*W281pL@~uRra!js`_Vtd5EY7qgvamsiSOs?HsP2yZ@&)@UQTP|FaFm*>7#&
zn+II0KQq_!_z~z*m48GRfy@O&airNxAo!IIvOrhSGBA-8)Hff$%`UKYsJVP@pr3oi
zGBdgjYgxT^GwdK59lDP+lg{9JEVjV8Ug=e;jbjB`oaxE2p1dW?;tQq9J*yigrk^Fp
zXG?<!2_$u1@VcJOA5ru@Z6#V89xTz~v4<9gyRc%{u3>amI>>D3X+;4!YZKOXo+FZP
zb&of&9dsGLfPM@;J@sW(;FF$o;%3pP;z<Q2fMO1v*Lbsj-FI&Hrz3W5$a;m!1v*_Q
z^OrE=fZHDE=+nqtpE^C!Nt>Kb8Rcc)Z<mb9pv&gTT+fz#SRVSjCb^!$)6m3I5IugR
zX^<@i=!#U(<#d)`0UdrK6{h%(W%_5LzXgN9;o+HmZZ5mg;|#&T@X=~Ht}!_<$0j5Q
zSujbe<q{-Ca>Qh$792Ar(l*<e0w*231Owt`M;^YY$H<tJ^J5b(6<vF1G>?^P$*_D}
zrdu%SL;(#v{0VNY6JjgSHL+QxRHt(2!Um}f+uJ&{zG)d?o1ZLS@3<uj^^v_*iwlF?
zaqsq_gXao-4mBJ*SSPe0I6|u|auCKIOCW0bymi~#j=k<<<9TlO+T1Fpey(hQ>8S`Q
z+ps=s-!oEN=;A_S<)OX0v52Ce7WoS4D$5mm9hBgCEWEFGixay-5j{Q2<3aTuMg*H8
zD_N}OS5fmwD61Rrw}+OqnihwH)5R79U2BkY(u~f|_^A^xCJP(hjY*Ulw9UG?2GSm~
zL_f<m>&Xu=4HS>eJ9-@2mE2*O$ys~@e#UqY<gIhTu{#v^<3^w4)OxF3^;?;}*D|C-
zkeg^yzY973gUTI&u@M4S=^#!qX9#HX(r^OemEBOmx&NuIEw!{7NZj6DHUgLfr-lNh
zg!KOCG=W7nkv>ylMb&NB7ZYU1%Z)5}TYj5&EX;&=>sAd6X}Sgxcdd&Et(I`rzYl@N
z2E<z(f+2nS_;3Y0$0?Cqh(_njxEOcv>k2Om@x|(nrh5kw$woK^I+Lu9o$Cm6YtiL{
zP!}SYQ_<9ptg{5E4NIOok&HYxJ1rEi=5y@kM8Mt<s&(gJz~y&z&5mAABR{>nMCZwI
zQqMaa;LYVSh!1&KL?`ej*q;K4BDEH8DC0_!nMmGNxJ{cvy^kup5FK;q;f!>~E}p6{
zW(PM5E_aQl^DClj7bz{d1>D()qvc~Tcl~f3Q+EHlD6(he3F1u~qbJ_9teSbiA!F@%
z0MYwsAb6c0+EN93Dqej%WL#J^vl({wuoq>6p_i-06nISws~5&W;?So}?)1_)wYx4q
z)z)p7wN6gbR;`X51~sW|+;GqtI$R-HQ!8bVJ07L5g|d;sEAN!RiG+;g(V8D_0j@*b
z1Q{L#M|~CXHrY>$Lg!r13Oa-i47`i3`Ki`tn^r@uv}8v{pVMb@dEWh;P<bk@OWVlK
z<)!CTIT~rb4O$F)9%S<ZeW{+KakYly>>PwQW}y{k@GwzhwNHbP(38ClxbR>(Bfz?w
zjk?R8K3viFH+eE%@S-#X03UO9zIS&`%5P_-d?1_sv)4OAOmIYKes<gOSw>h~=)2Fi
zI`{pxF_bkR_nmvK>%<`?4%5gkf8W)N>fVPTkdi@i%)KHn3uNZuy+V*+@6XZZqikn*
zcR+(V%h%W|Ggv5TKQeOJ`>Eb%jiRx(Gns9kZ?UXqA3}NFs5F^90rf^5e&U^8LvFRA
zwp8x*B;vpt6EssEfmW>ray;(Qo$u(r56$RvHFkIRL;gM}t{2_|jh*i2);A_t-mh~Z
za@|DC6U5^YzO)}mbmmZ@dym0~XWevd#v)zT<EX|x+s9#?JwLtE()PydXj7%@D$~*J
zO5tPFsI}&GyZeWC)b9O(5q+jbPq^-$j~YDL9ur>>AEEzp`xm|n@WItNT+56~j#0a=
z&x3HCj%y<_32xbw1-Ab>0oW*%T$E%2b9$*it5#QO5!f90>8k>n@yf`*S445lH$x!h
z7Ua+D>ZF`JthL+pKRmAgmHOvj*l(io3;Xl+o1tlv{(b)aXN>{$y06#o=+7EXfwb<o
z8bJf5)8A`&{R=eo_Zk7hDL-m;z}x<|j>NtIE`NV7iIYHQ`c@;TpX<mke$=;h^cTYX
zTdn>DN%~f6U_U;C!m%IMOA(D9pFxrA&vkVD=em0R$6hFw`cb0+i2VEhXaamY-}Vf0
z?myPC)Q{`M0Tt`}{&14{xsD+k-|qq5z_IW5ioh8LG<^TPfA7?}-2L<q#{5wF{mbTP
zFh8sN6zP+`>ytEvlLW9-lm3ZjJdY*Z_Wxe7`vv_4bszjQ>b_e2?Ogy?>>Kp{7dnj2
v@$WxaV#Y6k=!^j%^s7S!f6Ooc_P_q||3FkZ>HCkL0zon){R@KRsq*VT@K2hj

literal 0
HcmV?d00001