Skip to content

feat(claude-plugin): add synthesize-skill skill + experiment runner#262

Open
vinodmut wants to merge 9 commits into
AgentToolkit:mainfrom
vinodmut:synthesize-skill-experiment
Open

feat(claude-plugin): add synthesize-skill skill + experiment runner#262
vinodmut wants to merge 9 commits into
AgentToolkit:mainfrom
vinodmut:synthesize-skill-experiment

Conversation

@vinodmut
Copy link
Copy Markdown
Contributor

@vinodmut vinodmut commented Jun 1, 2026

Summary

Adds a new evolve-lite skill that converts a saved trajectory into a reusable Claude Code skill (SKILL.md + supporting scripts), plus a standalone experiment driver that compares this procedural memory path against the existing declarative recall (guidelines) path and a no-memory baseline.

This is the runner + skill only. Result artifacts from my own runs against the EXIF demo are kept on procedural; headline numbers and the writeup are on issue #260.

What's in here

platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/

A new skill that mirrors the shape of learn:

  • SKILL.md — instructions the forked subagent follows: read the trajectory, identify the successful workflow, draft a SKILL.md and any scripts.
  • scripts/synthesize.py finalize — file-system plumbing: validates frontmatter, dual-writes to both .evolve/skills/<name>/ and .claude/skills/<name>/, appends a synthesize_skill event to .evolve/audit.log.

The split mirrors learn's pattern: judgment in the subagent, plumbing in a Python script. The skill is invoked manually for now — not wired into a Stop hook.

experiments/skill_from_trajectory.py

Three-way comparison driver (no_recall / guidelines / skill). Per trial:

  1. Seed the workspace by running one or more utterances (default: gps). Each fires the existing learn Stop hook.
  2. Invoke /evolve-lite:synthesize-skill on the seed trajectory.
  3. Branch into three condition workspaces and run measure utterance(s) in each.
  4. Capture token usage, per-turn breakdown, and tool-call summaries.

Supports --seed-utterances for multi-utterance seeding to test skill generalization (e.g. --seed-utterances gps focal_length then measure on lens). Reuses helpers from experiments/token_savings.py.

Also includes a small fix for macOS Docker bind-mount behavior: resolves /tmp/private/tmp so subdirectories of /tmp are visible inside the sandbox container (the prior token-savings experiment got away with this because pytest's tmp_path is under /private/var/...).

Test plan

  • python3 experiments/skill_from_trajectory.py --help lists --trials, --seed-utterances, --utterances, --keep-workspaces.
  • python3 platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py finalize --help works.
  • Smoke run: python3 experiments/skill_from_trajectory.py --trials 1 --utterances focal_length end-to-end (requires Docker, the claude-sandbox image, and Anthropic API credentials). Should produce experiments/results/skill_from_trajectory_<ts>/{report.md, raw.json, synthesized_skills/}.

Related

Summary by CodeRabbit

  • New Features

    • Convert saved trajectories into reusable, validated agent skills via a finalize/install CLI (with overwrite option) and mirror installs for runtime platforms.
    • New benchmark experiment that runs seeded trials across three recall conditions, records per-run token/cost/duration/turn metrics, and collects synthesized-skill artifacts.
  • Documentation

    • Added user-facing guides, templates, and best-practice rules for the synthesize workflow and validation.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds synthesize-skill templates and platform finalize CLIs plus SKILL.md docs, and introduces experiments/skill_from_trajectory.py which seeds trajectories, invokes synthesize-skill, and benchmarks three recall conditions with progressive raw results and a generated report.md.

Changes

Skill Synthesis Workflow and Implementation

Layer / File(s) Summary
Plugin templates
plugin-source/skills/evolve-lite/synthesize-skill/SKILL.md.j2, plugin-source/skills/evolve-lite/synthesize-skill/scripts/synthesize.py.j2
Jinja2 templates that produce platform-specific SKILL.md and synthesize.py finalize helper including conditional runtime mirroring and optional Claude fork context.
Platform SKILL.md documentation & command guidance
platform-integrations/*/skills/evolve-lite/synthesize-skill/SKILL.md, platform-integrations/bob/evolve-lite/commands/evolve-lite-synthesize-skill.md
Deployed SKILL.md and command guidance describing trajectory sourcing, extracting the final successful tool sequence, drafting required SKILL.md frontmatter, optional scripts, finalize invocation, collision handling, and best-practice rules.
Platform synthesize.py implementations
platform-integrations/*/skills/evolve-lite/synthesize-skill/scripts/synthesize.py
Finalize CLI that locates plugin audit helper, parses minimal frontmatter, validates kebab-case --name and required keys, copies draft into .evolve/skills/<name> (and optional runtime mirror), appends an audit-log entry, prints installed paths, and supports --force.

Skill Synthesis Benchmarking Experiment

Layer / File(s) Summary
Experiment framework
experiments/skill_from_trajectory.py (imports, runner, utilities)
Adds CLI and constants, Docker/macOS-backed sandbox runner executing claude with JSON fallback parsing, workspace creation/copy helpers, transcript parsing for tool-call summaries, and usage extraction including total_cost_usd.
Experiment benchmarking and reporting
experiments/skill_from_trajectory.py (trial loop, branching, measures, reporting)
Implements seed→synthesize step, branches per-condition workspaces (no_recall, guidelines, skill), runs measures capturing headline usage, per-turn metrics and tool-call summaries, persists progressive raw.json, copies synthesized skills into results, writes report.md with synthesis-cost and cross-condition comparison, and provides CLI orchestration (exit code 1 if any run errors).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #219 — The changes implement synthesize-skill templates, per-platform finalize scripts, and experiment benchmarking described by this issue.

Suggested reviewers

  • visahak
  • illeatmyhat

Poem

🐰 From saved steps I stitched a clever art,

I found the end and gave it a start.
Three workspaces ran, the reports took flight,
Templates and skills tucked in just right.
Hooray — a rabbit hop for code tonight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main additions: a new synthesize-skill feature for the Claude plugin and an experiment runner script comparing skill-based vs. guideline-based vs. baseline recall.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
experiments/skill_from_trajectory.py (1)

117-131: ⚡ Quick win

Unhandled subprocess.TimeoutExpired crashes the whole experiment.

subprocess.run(..., timeout=SESSION_TIMEOUT_SECONDS) raises TimeoutExpired if a docker run hangs. That exception isn't caught here or by callers (_seed_and_synthesize, _do_measure_run), so one stuck run aborts all remaining trials and the in-flight run's result isn't persisted. Catching it and returning a sentinel proc/None lets the existing returncode != 0 error paths record the failure and continue.

♻️ Proposed handling
-    proc = subprocess.run(cmd, capture_output=True, text=True, timeout=SESSION_TIMEOUT_SECONDS)
+    try:
+        proc = subprocess.run(cmd, capture_output=True, text=True, timeout=SESSION_TIMEOUT_SECONDS)
+    except subprocess.TimeoutExpired as exc:
+        proc = subprocess.CompletedProcess(
+            cmd, returncode=124, stdout=exc.stdout or "", stderr=(exc.stderr or "") + "\n[timeout]"
+        )
+        return proc, None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@experiments/skill_from_trajectory.py` around lines 117 - 131, Wrap the
subprocess.run call in a try/except that catches subprocess.TimeoutExpired and,
on timeout, construct and return a sentinel result (e.g. a CompletedProcess-like
object or None for proc) with parsed left as None so callers
(_seed_and_synthesize, _do_measure_run) follow the existing error paths;
specifically update the block around subprocess.run(cmd, capture_output=True,
text=True, timeout=SESSION_TIMEOUT_SECONDS) to catch TimeoutExpired and return
(proc_timeout_sentinel, None) so the timeout is treated like a non-zero return
and does not crash the experiment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py`:
- Around line 102-113: cmd_finalize currently calls _copy_into sequentially
which can leave evolve_dst written if claude_dst already exists and --force is
false; pre-validate both destinations before performing any copy by checking
existence of evolve_dst and claude_dst and their compatibility with args.force
(or reuse existing _copy_into validation logic) and raise/exit early if either
target would block the install so no partial install occurs; update cmd_finalize
to perform these existence checks (or a helper like _validate_targets) prior to
calling _copy_into for either destination.
- Around line 66-83: _validate_draft currently calls _parse_frontmatter which
can raise ValueError; those parse errors bubble up as tracebacks instead of the
intended friendly SystemExit messages, so wrap the call to _parse_frontmatter in
a try/except ValueError block inside _validate_draft, catch the ValueError from
_parse_frontmatter and raise SystemExit with a clear message (include the
original error text) so malformed SKILL.md files produce clean CLI errors rather
than tracebacks.

---

Nitpick comments:
In `@experiments/skill_from_trajectory.py`:
- Around line 117-131: Wrap the subprocess.run call in a try/except that catches
subprocess.TimeoutExpired and, on timeout, construct and return a sentinel
result (e.g. a CompletedProcess-like object or None for proc) with parsed left
as None so callers (_seed_and_synthesize, _do_measure_run) follow the existing
error paths; specifically update the block around subprocess.run(cmd,
capture_output=True, text=True, timeout=SESSION_TIMEOUT_SECONDS) to catch
TimeoutExpired and return (proc_timeout_sentinel, None) so the timeout is
treated like a non-zero return and does not crash the experiment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c784d3d7-e422-4423-8097-0589334ccbd5

📥 Commits

Reviewing files that changed from the base of the PR and between c57148b and 7a3d87e.

📒 Files selected for processing (3)
  • experiments/skill_from_trajectory.py
  • platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md
  • platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py

@vinodmut vinodmut force-pushed the synthesize-skill-experiment branch from 7a3d87e to 85a35a0 Compare June 1, 2026 15:48
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py (2)

102-113: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Pre-check both destinations before copying to avoid a partial install.

_copy_into runs sequentially: evolve_dst is written first, then claude_dst. If claude_dst already exists without --force, the SystemExit fires after evolve_dst was already created. The install is now half-done, and a retry (still without --force) fails immediately because evolve_dst now exists — leaving the user stuck. Validate both targets up front.

🛡️ Proposed fix
     evolve_dst = workspace / ".evolve" / "skills" / name
     claude_dst = workspace / ".claude" / "skills" / name
 
+    for dst in (evolve_dst, claude_dst):
+        if dst.exists() and not args.force:
+            raise SystemExit(f"{dst} already exists (use --force to overwrite)")
+
     _copy_into(src, evolve_dst, args.force)
     _copy_into(src, claude_dst, args.force)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py`
around lines 102 - 113, In cmd_finalize, pre-check both destination paths
(evolve_dst and claude_dst) before calling _copy_into so you can't create one
and then fail on the other: compute evolve_dst and claude_dst as you already do,
then if not args.force validate each target (if target.exists() -> raise
SystemExit or similar error) and only after both checks pass call _copy_into for
each; keep using the existing symbols cmd_finalize, evolve_dst, claude_dst,
_copy_into, and args.force to locate and implement the checks.

66-83: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Frontmatter parse errors escape as tracebacks instead of clean CLI errors.

_validate_draft reports problems via SystemExit (friendly one-line messages), but _parse_frontmatter raises ValueError (Lines 53, 60) which isn't caught here — a malformed SKILL.md surfaces as a full traceback rather than a clean validation message.

🛠️ Proposed fix
-    fm, body = _parse_frontmatter(skill_md)
+    try:
+        fm, body = _parse_frontmatter(skill_md)
+    except ValueError as exc:
+        raise SystemExit(str(exc)) from exc
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py`
around lines 66 - 83, _validate_draft currently lets parsing errors from
_parse_frontmatter bubble up as ValueError tracebacks; wrap the call to
_parse_frontmatter(skill_md) in a try/except catching ValueError (and optionally
other parsing-related exceptions), and re-raise a SystemExit with a clean,
user-facing message (e.g., "SKILL.md parse error: <error>") so malformed
SKILL.md files produce the same friendly CLI errors; keep the rest of the
existing checks (fm/body usage and messages) unchanged.
🧹 Nitpick comments (4)
experiments/skill_from_trajectory.py (1)

237-237: 💤 Low value

Remove unused variable last.

This variable is assigned but never used. The noqa: F841 suppresses the linter warning but the dead code should be removed.

♻️ Proposed fix
     if seed_runs:
-        last = seed_runs[-1]["usage"]  # noqa: F841
         for k in ("input_tokens", "output_tokens", "cache_creation_input_tokens", "cache_read_input_tokens", "total_tokens"):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@experiments/skill_from_trajectory.py` at line 237, The variable last assigned
from seed_runs[-1]["usage"] in experiments/skill_from_trajectory.py is unused;
remove the assignment line (and the trailing noqa comment) so the dead code is
eliminated—i.e., delete the statement that assigns to last and rely on seed_runs
directly where needed.
platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md (2)

106-111: ⚡ Quick win

Add language specifier to code block.

The fenced code block showing the file tree structure is missing a language specifier. Consider adding text or another appropriate identifier.

📝 Proposed fix
-```
+```text
 .evolve/skills/<name>/
 ├── SKILL.md

As per coding guidelines, based on static analysis hint from markdownlint-cli2.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md`
around lines 106 - 111, The fenced code block in SKILL.md that shows the file
tree (the triple-backtick block containing ".evolve/skills/<name>/ ...
scripts/<action>.py") lacks a language specifier; update that block to use a
language tag such as "text" (i.e., change the opening ``` to ```text) so the
file-tree is correctly treated as plain text by markdown linters/renderers.

75-92: ⚡ Quick win

Add language specifier to code block.

The fenced code block showing the SKILL.md frontmatter template is missing a language specifier. Adding one improves syntax highlighting and satisfies linters.

📝 Proposed fix
-```
+```markdown
 ---
 name: <kebab-case-name>
 description: <one-line task description>

As per coding guidelines, based on static analysis hint from markdownlint-cli2.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md`
around lines 75 - 92, Update the fenced code block that contains the SKILL.md
frontmatter template in SKILL.md to include a language specifier (use
"markdown") on the opening triple backticks so the frontmatter block is
```markdown ... ```; modify the SKILL.md frontmatter code block (the template
under the "Workflow" sample) to add the language tag to satisfy markdownlint and
enable syntax highlighting.
plugin-source/_claude/skills/evolve-lite/synthesize-skill/scripts/synthesize.py (1)

94-99: ⚡ Quick win

Consider defensive handling for the edge case where destination exists as a file.

If dst exists as a file rather than a directory, shutil.rmtree(dst) will raise an unclear exception. While unlikely in normal operation, adding a check improves error clarity.

🛡️ Proposed defensive fix
 def _copy_into(src: Path, dst: Path, force: bool) -> None:
     if dst.exists():
         if not force:
             raise SystemExit(f"{dst} already exists (use --force to overwrite)")
-        shutil.rmtree(dst)
+        if dst.is_file():
+            dst.unlink()
+        else:
+            shutil.rmtree(dst)
     shutil.copytree(src, dst)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@plugin-source/_claude/skills/evolve-lite/synthesize-skill/scripts/synthesize.py`
around lines 94 - 99, The _copy_into function currently assumes dst is a
directory and calls shutil.rmtree(dst) when force is true, which will raise a
confusing error if dst is a file; update _copy_into to check whether
dst.exists() and then whether dst.is_dir() or dst.is_file() and handle each: if
dst.is_file() remove it with dst.unlink() (or raise a clear SystemExit if you
prefer not to delete files), if dst.is_dir() use shutil.rmtree(dst), and only
then call shutil.copytree(src, dst); keep the existing SystemExit when not
forcing and preserve function signature and behavior otherwise.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugin-source/_claude/skills/evolve-lite/synthesize-skill/SKILL.md`:
- Around line 106-114: The fenced code block in SKILL.md showing the directory
tree lacks a language identifier; update the opening fence from ``` to ```text
so the block is marked as plain text (locate the directory-tree fenced block in
SKILL.md inside the .evolve/skills/<name>/ example and change its opening fence
to ```text).
- Around line 75-92: The fenced code block containing the YAML frontmatter (the
lines starting with `---`, `name:`, and `description:`) lacks a language
identifier; update the opening fence to ```yaml so the frontmatter is marked as
YAML for proper syntax highlighting—locate the block in SKILL.md under the
example snippet and change the opening triple backticks to include `yaml`.

---

Duplicate comments:
In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py`:
- Around line 102-113: In cmd_finalize, pre-check both destination paths
(evolve_dst and claude_dst) before calling _copy_into so you can't create one
and then fail on the other: compute evolve_dst and claude_dst as you already do,
then if not args.force validate each target (if target.exists() -> raise
SystemExit or similar error) and only after both checks pass call _copy_into for
each; keep using the existing symbols cmd_finalize, evolve_dst, claude_dst,
_copy_into, and args.force to locate and implement the checks.
- Around line 66-83: _validate_draft currently lets parsing errors from
_parse_frontmatter bubble up as ValueError tracebacks; wrap the call to
_parse_frontmatter(skill_md) in a try/except catching ValueError (and optionally
other parsing-related exceptions), and re-raise a SystemExit with a clean,
user-facing message (e.g., "SKILL.md parse error: <error>") so malformed
SKILL.md files produce the same friendly CLI errors; keep the rest of the
existing checks (fm/body usage and messages) unchanged.

---

Nitpick comments:
In `@experiments/skill_from_trajectory.py`:
- Line 237: The variable last assigned from seed_runs[-1]["usage"] in
experiments/skill_from_trajectory.py is unused; remove the assignment line (and
the trailing noqa comment) so the dead code is eliminated—i.e., delete the
statement that assigns to last and rely on seed_runs directly where needed.

In
`@platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md`:
- Around line 106-111: The fenced code block in SKILL.md that shows the file
tree (the triple-backtick block containing ".evolve/skills/<name>/ ...
scripts/<action>.py") lacks a language specifier; update that block to use a
language tag such as "text" (i.e., change the opening ``` to ```text) so the
file-tree is correctly treated as plain text by markdown linters/renderers.
- Around line 75-92: Update the fenced code block that contains the SKILL.md
frontmatter template in SKILL.md to include a language specifier (use
"markdown") on the opening triple backticks so the frontmatter block is
```markdown ... ```; modify the SKILL.md frontmatter code block (the template
under the "Workflow" sample) to add the language tag to satisfy markdownlint and
enable syntax highlighting.

In
`@plugin-source/_claude/skills/evolve-lite/synthesize-skill/scripts/synthesize.py`:
- Around line 94-99: The _copy_into function currently assumes dst is a
directory and calls shutil.rmtree(dst) when force is true, which will raise a
confusing error if dst is a file; update _copy_into to check whether
dst.exists() and then whether dst.is_dir() or dst.is_file() and handle each: if
dst.is_file() remove it with dst.unlink() (or raise a clear SystemExit if you
prefer not to delete files), if dst.is_dir() use shutil.rmtree(dst), and only
then call shutil.copytree(src, dst); keep the existing SystemExit when not
forcing and preserve function signature and behavior otherwise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b8aaed47-cc3a-4339-a672-fc205e4b4349

📥 Commits

Reviewing files that changed from the base of the PR and between 7a3d87e and 85a35a0.

📒 Files selected for processing (5)
  • experiments/skill_from_trajectory.py
  • platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md
  • platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py
  • plugin-source/_claude/skills/evolve-lite/synthesize-skill/SKILL.md
  • plugin-source/_claude/skills/evolve-lite/synthesize-skill/scripts/synthesize.py

vinodmut added 2 commits June 1, 2026 11:04
A new evolve-lite skill that converts a saved trajectory into a reusable
agent skill (SKILL.md + supporting scripts). Models the SKILL.md shape
on the existing learn skill: judgment lives in a forked subagent (read
trajectory, identify the successful workflow, draft a SKILL.md and
scripts), file-system plumbing lives in scripts/synthesize.py
(frontmatter validation, dual writes, audit-log entry).

Lives at plugin-source/skills/evolve-lite/synthesize-skill/ — universal,
ships to all four platforms (claude, codex, claw-code, bob). The SKILL.md
template uses the shared invoke()/skill_ref() macros for platform-aware
shell paths and slash-prefixes; the script is templated to set
_RUNTIME_MIRROR_DIR per platform. On claude, the script writes both to
.evolve/skills/<name>/ (canonical) and .claude/skills/<name>/ (so the
Claude Code skill loader picks it up automatically). Other platforms
write only to .evolve/skills/<name>/ for now — adopting an automatic
runtime mirror on those hosts is a follow-up.

The skill is invoked manually for now; not wired into a Stop hook.
Three-way comparison driver (no_recall / guidelines / skill) that runs
the seed → synthesize → measure flow per trial. Reuses helpers from
experiments/token_savings.py. Captures token usage from
--output-format json plus per-turn usage and tool-call summaries from
saved transcripts. Supports --seed-utterances to test multi-utterance
seeding (e.g. gps + focal_length, then measure on lens) for skill
generalization.

Also resolves /tmp -> /private/tmp on macOS (Docker bind-mount of /tmp
subdirs doesn't follow the symlink, breaking the prior plumbing for
hidden subdirs).
@vinodmut vinodmut force-pushed the synthesize-skill-experiment branch from 85a35a0 to 5191fd7 Compare June 1, 2026 16:11
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugin-source/skills/evolve-lite/synthesize-skill/scripts/synthesize.py.j2`:
- Around line 114-142: Compute both destination paths (evolve_dst and
runtime_dst) immediately after resolving workspace in cmd_finalize, then
validate that neither target would cause a partial install before performing any
copies: if evolve_dst.exists() or (runtime_dst is not None and
runtime_dst.exists()) and args.force is false, abort (same exit behavior as
_copy_into) so nothing is copied; only after those pre-checks call
_copy_into(evolve_dst, ...) and then _copy_into(runtime_dst, ...) as before.
Reference cmd_finalize, evolve_dst, runtime_dst, _RUNTIME_MIRROR_DIR, args.force
and reuse the same failure/exit pattern as _copy_into to keep behavior
consistent.

In `@plugin-source/skills/evolve-lite/synthesize-skill/SKILL.md.j2`:
- Line 78: The template SKILL.md.j2 uses plain fenced code blocks which trigger
MD040; update the two example fences in the template (the frontmatter example
and the directory tree example referenced around the fences at the current diff)
to include language identifiers—change the frontmatter fence from ``` to ```yaml
and change the directory-tree fence from ``` to ```text so generated SKILL.md
files include language tags and silence markdownlint warnings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c37c576-3dd2-4830-b79f-948ef6fe36b0

📥 Commits

Reviewing files that changed from the base of the PR and between 85a35a0 and 5191fd7.

📒 Files selected for processing (12)
  • experiments/skill_from_trajectory.py
  • platform-integrations/bob/evolve-lite/commands/evolve-lite-synthesize-skill.md
  • platform-integrations/bob/evolve-lite/skills/evolve-lite-synthesize-skill/SKILL.md
  • platform-integrations/bob/evolve-lite/skills/evolve-lite-synthesize-skill/scripts/synthesize.py
  • platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md
  • platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py
  • platform-integrations/claw-code/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md
  • platform-integrations/claw-code/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py
  • platform-integrations/codex/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/SKILL.md
  • platform-integrations/codex/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py
  • plugin-source/skills/evolve-lite/synthesize-skill/SKILL.md.j2
  • plugin-source/skills/evolve-lite/synthesize-skill/scripts/synthesize.py.j2
✅ Files skipped from review due to trivial changes (1)
  • platform-integrations/bob/evolve-lite/commands/evolve-lite-synthesize-skill.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • platform-integrations/claude/plugins/evolve-lite/skills/evolve-lite/synthesize-skill/scripts/synthesize.py
  • experiments/skill_from_trajectory.py

Comment thread plugin-source/skills/evolve-lite/synthesize-skill/SKILL.md.j2 Outdated
vinodmut added 7 commits June 1, 2026 11:57
Ruff format wants `text[match.end() :]` (slice spacing) instead of
`text[match.end():]`. Apply to the .j2 source plus all four rendered
outputs so re-running the renderer stays consistent.

Fixes failing CI check: check-formatting (3.12)
Fixes failing CI check: check-formatting (3.12)
The token_savings import goes through sys.path.insert so mypy can't
resolve the module. Add `# type: ignore[import-not-found]` and
explicitly annotate the wrapper return so the no-any-return error is
also resolved.

Fixes failing CI check: check-typing (3.12)
Markdownlint MD040 wants every fenced code block to declare a language.
The frontmatter example fence becomes ```yaml; the directory-tree fence
becomes ```text. Edit the .j2 source so re-rendering propagates to all
four platforms.

Addresses CodeRabbit review findings:
- "Add language identifier to fenced code block" (frontmatter)
- "Add language identifier to fenced code block" (directory tree)
- "Add language identifiers to fenced code blocks (root cause for deployed copies)"
…talls

Two robustness fixes in cmd_finalize:

1. Wrap the _parse_frontmatter call in a try/except ValueError so
   malformed SKILL.md files exit cleanly (SystemExit with the parser's
   message) instead of bubbling up a traceback.

2. Pre-check both destinations (.evolve/skills/<name>/ and the
   platform-specific runtime mirror, when set) before performing any
   copy. Previously, if the runtime-mirror destination already existed
   and --force was off, evolve_dst would already have been written —
   leaving a partial install on disk.

Refactor: extract _check_dest() (existence guard) from _copy_into()
(actual write), and call _check_dest on both targets before either
_copy_into. Also collapse the platform-specific _RUNTIME_MIRROR_DIR
declaration to a single line so the rendered output matches ruff
format directly (no post-render reformatting cycle).

Addresses CodeRabbit review findings:
- "Frontmatter parse errors escape as tracebacks instead of clean CLI errors"
- "Pre-check both destinations before copying to avoid a partial install"
- "Partial install on runtime-mirror failure (claude variant)"
PR AgentToolkit#258 moved each platform's shared lib from `lib/` to
`lib/evolve-lite/`. My _lib walker was looking for `lib/audit.py` and
`evolve-lib/audit.py` and would fail to find the helpers post-merge.
Update the walker to match the simplified pattern used by save_entities
and other recall scripts: a single `lib/evolve-lite/` candidate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant