feat(claude-plugin): add synthesize-skill skill + experiment runner#262
feat(claude-plugin): add synthesize-skill skill + experiment runner#262vinodmut wants to merge 9 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds synthesize-skill templates and platform finalize CLIs plus SKILL.md docs, and introduces experiments/skill_from_trajectory.py which seeds trajectories, invokes synthesize-skill, and benchmarks three recall conditions with progressive raw results and a generated report.md. ChangesSkill Synthesis Workflow and Implementation
Skill Synthesis Benchmarking Experiment
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
experiments/skill_from_trajectory.py (1)
117-131: ⚡ Quick winUnhandled
subprocess.TimeoutExpiredcrashes the whole experiment.
subprocess.run(..., timeout=SESSION_TIMEOUT_SECONDS)raisesTimeoutExpiredif a docker run hangs. That exception isn't caught here or by callers (_seed_and_synthesize,_do_measure_run), so one stuck run aborts all remaining trials and the in-flight run's result isn't persisted. Catching it and returning a sentinelproc/Nonelets the existingreturncode != 0error paths record the failure and continue.♻️ Proposed handling
- proc = subprocess.run(cmd, capture_output=True, text=True, timeout=SESSION_TIMEOUT_SECONDS) + try: + proc = subprocess.run(cmd, capture_output=True, text=True, timeout=SESSION_TIMEOUT_SECONDS) + except subprocess.TimeoutExpired as exc: + proc = subprocess.CompletedProcess( + cmd, returncode=124, stdout=exc.stdout or "", stderr=(exc.stderr or "") + "\n[timeout]" + ) + return proc, None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@experiments/skill_from_trajectory.py` around lines 117 - 131, Wrap the subprocess.run call in a try/except that catches subprocess.TimeoutExpired and, on timeout, construct and return a sentinel result (e.g. a CompletedProcess-like object or None for proc) with parsed left as None so callers (_seed_and_synthesize, _do_measure_run) follow the existing error paths; specifically update the block around subprocess.run(cmd, capture_output=True, text=True, timeout=SESSION_TIMEOUT_SECONDS) to catch TimeoutExpired and return (proc_timeout_sentinel, None) so the timeout is treated like a non-zero return and does not crash the experiment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py`:
- Around line 102-113: cmd_finalize currently calls _copy_into sequentially
which can leave evolve_dst written if claude_dst already exists and --force is
false; pre-validate both destinations before performing any copy by checking
existence of evolve_dst and claude_dst and their compatibility with args.force
(or reuse existing _copy_into validation logic) and raise/exit early if either
target would block the install so no partial install occurs; update cmd_finalize
to perform these existence checks (or a helper like _validate_targets) prior to
calling _copy_into for either destination.
- Around line 66-83: _validate_draft currently calls _parse_frontmatter which
can raise ValueError; those parse errors bubble up as tracebacks instead of the
intended friendly SystemExit messages, so wrap the call to _parse_frontmatter in
a try/except ValueError block inside _validate_draft, catch the ValueError from
_parse_frontmatter and raise SystemExit with a clear message (include the
original error text) so malformed SKILL.md files produce clean CLI errors rather
than tracebacks.
---
Nitpick comments:
In `@experiments/skill_from_trajectory.py`:
- Around line 117-131: Wrap the subprocess.run call in a try/except that catches
subprocess.TimeoutExpired and, on timeout, construct and return a sentinel
result (e.g. a CompletedProcess-like object or None for proc) with parsed left
as None so callers (_seed_and_synthesize, _do_measure_run) follow the existing
error paths; specifically update the block around subprocess.run(cmd,
capture_output=True, text=True, timeout=SESSION_TIMEOUT_SECONDS) to catch
TimeoutExpired and return (proc_timeout_sentinel, None) so the timeout is
treated like a non-zero return and does not crash the experiment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c784d3d7-e422-4423-8097-0589334ccbd5
📒 Files selected for processing (3)
experiments/skill_from_trajectory.pyplatform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.mdplatform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py
7a3d87e to
85a35a0
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py (2)
102-113:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPre-check both destinations before copying to avoid a partial install.
_copy_intoruns sequentially:evolve_dstis written first, thenclaude_dst. Ifclaude_dstalready exists without--force, theSystemExitfires afterevolve_dstwas already created. The install is now half-done, and a retry (still without--force) fails immediately becauseevolve_dstnow exists — leaving the user stuck. Validate both targets up front.🛡️ Proposed fix
evolve_dst = workspace / ".evolve" / "skills" / name claude_dst = workspace / ".claude" / "skills" / name + for dst in (evolve_dst, claude_dst): + if dst.exists() and not args.force: + raise SystemExit(f"{dst} already exists (use --force to overwrite)") + _copy_into(src, evolve_dst, args.force) _copy_into(src, claude_dst, args.force)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py` around lines 102 - 113, In cmd_finalize, pre-check both destination paths (evolve_dst and claude_dst) before calling _copy_into so you can't create one and then fail on the other: compute evolve_dst and claude_dst as you already do, then if not args.force validate each target (if target.exists() -> raise SystemExit or similar error) and only after both checks pass call _copy_into for each; keep using the existing symbols cmd_finalize, evolve_dst, claude_dst, _copy_into, and args.force to locate and implement the checks.
66-83:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFrontmatter parse errors escape as tracebacks instead of clean CLI errors.
_validate_draftreports problems viaSystemExit(friendly one-line messages), but_parse_frontmatterraisesValueError(Lines 53, 60) which isn't caught here — a malformed SKILL.md surfaces as a full traceback rather than a clean validation message.🛠️ Proposed fix
- fm, body = _parse_frontmatter(skill_md) + try: + fm, body = _parse_frontmatter(skill_md) + except ValueError as exc: + raise SystemExit(str(exc)) from exc🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py` around lines 66 - 83, _validate_draft currently lets parsing errors from _parse_frontmatter bubble up as ValueError tracebacks; wrap the call to _parse_frontmatter(skill_md) in a try/except catching ValueError (and optionally other parsing-related exceptions), and re-raise a SystemExit with a clean, user-facing message (e.g., "SKILL.md parse error: <error>") so malformed SKILL.md files produce the same friendly CLI errors; keep the rest of the existing checks (fm/body usage and messages) unchanged.
🧹 Nitpick comments (4)
experiments/skill_from_trajectory.py (1)
237-237: 💤 Low valueRemove unused variable
last.This variable is assigned but never used. The
noqa: F841suppresses the linter warning but the dead code should be removed.♻️ Proposed fix
if seed_runs: - last = seed_runs[-1]["usage"] # noqa: F841 for k in ("input_tokens", "output_tokens", "cache_creation_input_tokens", "cache_read_input_tokens", "total_tokens"):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@experiments/skill_from_trajectory.py` at line 237, The variable last assigned from seed_runs[-1]["usage"] in experiments/skill_from_trajectory.py is unused; remove the assignment line (and the trailing noqa comment) so the dead code is eliminated—i.e., delete the statement that assigns to last and rely on seed_runs directly where needed.platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md (2)
106-111: ⚡ Quick winAdd language specifier to code block.
The fenced code block showing the file tree structure is missing a language specifier. Consider adding
textor another appropriate identifier.📝 Proposed fix
-``` +```text .evolve/skills/<name>/ ├── SKILL.mdAs per coding guidelines, based on static analysis hint from markdownlint-cli2.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md` around lines 106 - 111, The fenced code block in SKILL.md that shows the file tree (the triple-backtick block containing ".evolve/skills/<name>/ ... scripts/<action>.py") lacks a language specifier; update that block to use a language tag such as "text" (i.e., change the opening ``` to ```text) so the file-tree is correctly treated as plain text by markdown linters/renderers.
75-92: ⚡ Quick winAdd language specifier to code block.
The fenced code block showing the SKILL.md frontmatter template is missing a language specifier. Adding one improves syntax highlighting and satisfies linters.
📝 Proposed fix
-``` +```markdown --- name: <kebab-case-name> description: <one-line task description>As per coding guidelines, based on static analysis hint from markdownlint-cli2.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md` around lines 75 - 92, Update the fenced code block that contains the SKILL.md frontmatter template in SKILL.md to include a language specifier (use "markdown") on the opening triple backticks so the frontmatter block is ```markdown ... ```; modify the SKILL.md frontmatter code block (the template under the "Workflow" sample) to add the language tag to satisfy markdownlint and enable syntax highlighting.plugin-source/_claude/skills/evolve-lite/synthesize-skill/scripts/synthesize.py (1)
94-99: ⚡ Quick winConsider defensive handling for the edge case where destination exists as a file.
If
dstexists as a file rather than a directory,shutil.rmtree(dst)will raise an unclear exception. While unlikely in normal operation, adding a check improves error clarity.🛡️ Proposed defensive fix
def _copy_into(src: Path, dst: Path, force: bool) -> None: if dst.exists(): if not force: raise SystemExit(f"{dst} already exists (use --force to overwrite)") - shutil.rmtree(dst) + if dst.is_file(): + dst.unlink() + else: + shutil.rmtree(dst) shutil.copytree(src, dst)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugin-source/_claude/skills/evolve-lite/synthesize-skill/scripts/synthesize.py` around lines 94 - 99, The _copy_into function currently assumes dst is a directory and calls shutil.rmtree(dst) when force is true, which will raise a confusing error if dst is a file; update _copy_into to check whether dst.exists() and then whether dst.is_dir() or dst.is_file() and handle each: if dst.is_file() remove it with dst.unlink() (or raise a clear SystemExit if you prefer not to delete files), if dst.is_dir() use shutil.rmtree(dst), and only then call shutil.copytree(src, dst); keep the existing SystemExit when not forcing and preserve function signature and behavior otherwise.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugin-source/_claude/skills/evolve-lite/synthesize-skill/SKILL.md`:
- Around line 106-114: The fenced code block in SKILL.md showing the directory
tree lacks a language identifier; update the opening fence from ``` to ```text
so the block is marked as plain text (locate the directory-tree fenced block in
SKILL.md inside the .evolve/skills/<name>/ example and change its opening fence
to ```text).
- Around line 75-92: The fenced code block containing the YAML frontmatter (the
lines starting with `---`, `name:`, and `description:`) lacks a language
identifier; update the opening fence to ```yaml so the frontmatter is marked as
YAML for proper syntax highlighting—locate the block in SKILL.md under the
example snippet and change the opening triple backticks to include `yaml`.
---
Duplicate comments:
In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py`:
- Around line 102-113: In cmd_finalize, pre-check both destination paths
(evolve_dst and claude_dst) before calling _copy_into so you can't create one
and then fail on the other: compute evolve_dst and claude_dst as you already do,
then if not args.force validate each target (if target.exists() -> raise
SystemExit or similar error) and only after both checks pass call _copy_into for
each; keep using the existing symbols cmd_finalize, evolve_dst, claude_dst,
_copy_into, and args.force to locate and implement the checks.
- Around line 66-83: _validate_draft currently lets parsing errors from
_parse_frontmatter bubble up as ValueError tracebacks; wrap the call to
_parse_frontmatter(skill_md) in a try/except catching ValueError (and optionally
other parsing-related exceptions), and re-raise a SystemExit with a clean,
user-facing message (e.g., "SKILL.md parse error: <error>") so malformed
SKILL.md files produce the same friendly CLI errors; keep the rest of the
existing checks (fm/body usage and messages) unchanged.
---
Nitpick comments:
In `@experiments/skill_from_trajectory.py`:
- Line 237: The variable last assigned from seed_runs[-1]["usage"] in
experiments/skill_from_trajectory.py is unused; remove the assignment line (and
the trailing noqa comment) so the dead code is eliminated—i.e., delete the
statement that assigns to last and rely on seed_runs directly where needed.
In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md`:
- Around line 106-111: The fenced code block in SKILL.md that shows the file
tree (the triple-backtick block containing ".evolve/skills/<name>/ ...
scripts/<action>.py") lacks a language specifier; update that block to use a
language tag such as "text" (i.e., change the opening ``` to ```text) so the
file-tree is correctly treated as plain text by markdown linters/renderers.
- Around line 75-92: Update the fenced code block that contains the SKILL.md
frontmatter template in SKILL.md to include a language specifier (use
"markdown") on the opening triple backticks so the frontmatter block is
```markdown ... ```; modify the SKILL.md frontmatter code block (the template
under the "Workflow" sample) to add the language tag to satisfy markdownlint and
enable syntax highlighting.
In
`@plugin-source/_claude/skills/evolve-lite/synthesize-skill/scripts/synthesize.py`:
- Around line 94-99: The _copy_into function currently assumes dst is a
directory and calls shutil.rmtree(dst) when force is true, which will raise a
confusing error if dst is a file; update _copy_into to check whether
dst.exists() and then whether dst.is_dir() or dst.is_file() and handle each: if
dst.is_file() remove it with dst.unlink() (or raise a clear SystemExit if you
prefer not to delete files), if dst.is_dir() use shutil.rmtree(dst), and only
then call shutil.copytree(src, dst); keep the existing SystemExit when not
forcing and preserve function signature and behavior otherwise.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b8aaed47-cc3a-4339-a672-fc205e4b4349
📒 Files selected for processing (5)
experiments/skill_from_trajectory.pyplatform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.mdplatform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.pyplugin-source/_claude/skills/evolve-lite/synthesize-skill/SKILL.mdplugin-source/_claude/skills/evolve-lite/synthesize-skill/scripts/synthesize.py
A new evolve-lite skill that converts a saved trajectory into a reusable agent skill (SKILL.md + supporting scripts). Models the SKILL.md shape on the existing learn skill: judgment lives in a forked subagent (read trajectory, identify the successful workflow, draft a SKILL.md and scripts), file-system plumbing lives in scripts/synthesize.py (frontmatter validation, dual writes, audit-log entry). Lives at plugin-source/skills/evolve-lite/synthesize-skill/ — universal, ships to all four platforms (claude, codex, claw-code, bob). The SKILL.md template uses the shared invoke()/skill_ref() macros for platform-aware shell paths and slash-prefixes; the script is templated to set _RUNTIME_MIRROR_DIR per platform. On claude, the script writes both to .evolve/skills/<name>/ (canonical) and .claude/skills/<name>/ (so the Claude Code skill loader picks it up automatically). Other platforms write only to .evolve/skills/<name>/ for now — adopting an automatic runtime mirror on those hosts is a follow-up. The skill is invoked manually for now; not wired into a Stop hook.
Three-way comparison driver (no_recall / guidelines / skill) that runs the seed → synthesize → measure flow per trial. Reuses helpers from experiments/token_savings.py. Captures token usage from --output-format json plus per-turn usage and tool-call summaries from saved transcripts. Supports --seed-utterances to test multi-utterance seeding (e.g. gps + focal_length, then measure on lens) for skill generalization. Also resolves /tmp -> /private/tmp on macOS (Docker bind-mount of /tmp subdirs doesn't follow the symlink, breaking the prior plumbing for hidden subdirs).
85a35a0 to
5191fd7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugin-source/skills/evolve-lite/synthesize-skill/scripts/synthesize.py.j2`:
- Around line 114-142: Compute both destination paths (evolve_dst and
runtime_dst) immediately after resolving workspace in cmd_finalize, then
validate that neither target would cause a partial install before performing any
copies: if evolve_dst.exists() or (runtime_dst is not None and
runtime_dst.exists()) and args.force is false, abort (same exit behavior as
_copy_into) so nothing is copied; only after those pre-checks call
_copy_into(evolve_dst, ...) and then _copy_into(runtime_dst, ...) as before.
Reference cmd_finalize, evolve_dst, runtime_dst, _RUNTIME_MIRROR_DIR, args.force
and reuse the same failure/exit pattern as _copy_into to keep behavior
consistent.
In `@plugin-source/skills/evolve-lite/synthesize-skill/SKILL.md.j2`:
- Line 78: The template SKILL.md.j2 uses plain fenced code blocks which trigger
MD040; update the two example fences in the template (the frontmatter example
and the directory tree example referenced around the fences at the current diff)
to include language identifiers—change the frontmatter fence from ``` to ```yaml
and change the directory-tree fence from ``` to ```text so generated SKILL.md
files include language tags and silence markdownlint warnings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3c37c576-3dd2-4830-b79f-948ef6fe36b0
📒 Files selected for processing (12)
experiments/skill_from_trajectory.pyplatform-integrations/bob/evolve-lite/commands/evolve-lite-synthesize-skill.mdplatform-integrations/bob/evolve-lite/skills/evolve-lite-synthesize-skill/SKILL.mdplatform-integrations/bob/evolve-lite/skills/evolve-lite-synthesize-skill/scripts/synthesize.pyplatform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.mdplatform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.pyplatform-integrations/claw-code/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.mdplatform-integrations/claw-code/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.pyplatform-integrations/codex/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.mdplatform-integrations/codex/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.pyplugin-source/skills/evolve-lite/synthesize-skill/SKILL.md.j2plugin-source/skills/evolve-lite/synthesize-skill/scripts/synthesize.py.j2
✅ Files skipped from review due to trivial changes (1)
- platform-integrations/bob/evolve-lite/commands/evolve-lite-synthesize-skill.md
🚧 Files skipped from review as they are similar to previous changes (2)
- platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py
- experiments/skill_from_trajectory.py
Ruff format wants `text[match.end() :]` (slice spacing) instead of `text[match.end():]`. Apply to the .j2 source plus all four rendered outputs so re-running the renderer stays consistent. Fixes failing CI check: check-formatting (3.12)
Fixes failing CI check: check-formatting (3.12)
The token_savings import goes through sys.path.insert so mypy can't resolve the module. Add `# type: ignore[import-not-found]` and explicitly annotate the wrapper return so the no-any-return error is also resolved. Fixes failing CI check: check-typing (3.12)
Markdownlint MD040 wants every fenced code block to declare a language. The frontmatter example fence becomes ```yaml; the directory-tree fence becomes ```text. Edit the .j2 source so re-rendering propagates to all four platforms. Addresses CodeRabbit review findings: - "Add language identifier to fenced code block" (frontmatter) - "Add language identifier to fenced code block" (directory tree) - "Add language identifiers to fenced code blocks (root cause for deployed copies)"
…talls Two robustness fixes in cmd_finalize: 1. Wrap the _parse_frontmatter call in a try/except ValueError so malformed SKILL.md files exit cleanly (SystemExit with the parser's message) instead of bubbling up a traceback. 2. Pre-check both destinations (.evolve/skills/<name>/ and the platform-specific runtime mirror, when set) before performing any copy. Previously, if the runtime-mirror destination already existed and --force was off, evolve_dst would already have been written — leaving a partial install on disk. Refactor: extract _check_dest() (existence guard) from _copy_into() (actual write), and call _check_dest on both targets before either _copy_into. Also collapse the platform-specific _RUNTIME_MIRROR_DIR declaration to a single line so the rendered output matches ruff format directly (no post-render reformatting cycle). Addresses CodeRabbit review findings: - "Frontmatter parse errors escape as tracebacks instead of clean CLI errors" - "Pre-check both destinations before copying to avoid a partial install" - "Partial install on runtime-mirror failure (claude variant)"
PR AgentToolkit#258 moved each platform's shared lib from `lib/` to `lib/evolve-lite/`. My _lib walker was looking for `lib/audit.py` and `evolve-lib/audit.py` and would fail to find the helpers post-merge. Update the walker to match the simplified pattern used by save_entities and other recall scripts: a single `lib/evolve-lite/` candidate.
Summary
Adds a new evolve-lite skill that converts a saved trajectory into a reusable Claude Code skill (SKILL.md + supporting scripts), plus a standalone experiment driver that compares this procedural memory path against the existing declarative recall (guidelines) path and a no-memory baseline.
This is the runner + skill only. Result artifacts from my own runs against the EXIF demo are kept on
procedural; headline numbers and the writeup are on issue #260.What's in here
platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/A new skill that mirrors the shape of
learn:SKILL.md— instructions the forked subagent follows: read the trajectory, identify the successful workflow, draft a SKILL.md and any scripts.scripts/synthesize.py finalize— file-system plumbing: validates frontmatter, dual-writes to both.evolve/skills/<name>/and.claude/skills/<name>/, appends asynthesize_skillevent to.evolve/audit.log.The split mirrors
learn's pattern: judgment in the subagent, plumbing in a Python script. The skill is invoked manually for now — not wired into a Stop hook.experiments/skill_from_trajectory.pyThree-way comparison driver (no_recall / guidelines / skill). Per trial:
gps). Each fires the existinglearnStop hook./evolve-lite:synthesize-skillon the seed trajectory.Supports
--seed-utterancesfor multi-utterance seeding to test skill generalization (e.g.--seed-utterances gps focal_lengththen measure onlens). Reuses helpers fromexperiments/token_savings.py.Also includes a small fix for macOS Docker bind-mount behavior: resolves
/tmp→/private/tmpso subdirectories of/tmpare visible inside the sandbox container (the prior token-savings experiment got away with this because pytest'stmp_pathis under/private/var/...).Test plan
python3 experiments/skill_from_trajectory.py --helplists--trials,--seed-utterances,--utterances,--keep-workspaces.python3 platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py finalize --helpworks.python3 experiments/skill_from_trajectory.py --trials 1 --utterances focal_lengthend-to-end (requires Docker, theclaude-sandboximage, and Anthropic API credentials). Should produceexperiments/results/skill_from_trajectory_<ts>/{report.md, raw.json, synthesized_skills/}.Related
Summary by CodeRabbit
New Features
Documentation