Skip to content
Open
515 changes: 515 additions & 0 deletions experiments/skill_from_trajectory.py

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
description: Convert a saved trajectory into a reusable agent skill (SKILL.md + supporting scripts) that future agents can invoke to skip rediscovered work. Use when a session captured a non-trivial workflow worth promoting from a free-text guideline to an executable skill.
---
Use the `evolve-lite-synthesize-skill` skill on the current conversation. Follow the skill's instructions exactly.
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
name: evolve-lite:synthesize-skill
description: Convert a saved trajectory into a reusable agent skill (SKILL.md + supporting scripts) that future agents can invoke to skip rediscovered work. Use when a session captured a non-trivial workflow worth promoting from a free-text guideline to an executable skill.
---

# Skill Synthesizer

## Overview

This skill reads a saved trajectory and produces a **reusable agent skill** — a `SKILL.md` plus any supporting scripts — that captures the *successful* workflow the session discovered. The output goes to `.evolve/skills/<skill-name>/` (canonical, evolve-managed). Future sessions on the same project can then invoke the skill directly instead of re-deriving the workflow.

This is the **executable** counterpart to the `learn` skill's free-text guidelines: `learn` writes Markdown the next agent has to *read and decide what to do*; `synthesize-skill` writes a skill the next agent can simply *call*.

## When To Use

Use this skill when a trajectory captured:

- A **non-trivial workflow** that succeeded after trial-and-error (the eventual happy path is worth promoting from free-text advice to an invocable artifact).
- A **reusable script or command sequence** the model wrote during the session — particularly one the agent had to reconstruct over multiple attempts.
- An environment-specific workaround (a missing system tool, a permissions wrinkle, a fallback pipeline) that future sessions in the same project will hit.

Skip this skill — and let `learn` cover the case with a guideline alone — when:

- The successful path was a single trivial command.
- The workflow embeds secrets, tokens, or one-off user inputs that can't be safely generalized.
- A skill with the same trigger already exists in `.evolve/skills/` (use `learn`'s guideline path to refine the existing skill instead of creating a duplicate).

## Workflow

### Step 0: Locate the Trajectory

This skill runs in a forked context. **You cannot see the parent conversation directly** — read the trajectory the parent passed in via `args` or via the `Run evolve-lite:synthesize-skill on <path>` instruction.

The trajectory path is either:

- supplied directly as `args` to the skill invocation, or
- stated in the parent's invocation message as `The saved trajectory path is: <path>` — take everything after the colon, strip surrounding whitespace and quotes.

If neither is present, scan `.evolve/trajectories/` for the most recently modified `claude-transcript_<session-id>.jsonl` and use that. If `.evolve/trajectories/` does not exist or is empty, output zero artifacts and exit — do not invent a trajectory.

**Read the trajectory with the `Read` tool — do NOT shell out.** The transcript is JSONL: one JSON object per line. Filter for `"type": "assistant"` and `"type": "human"` records and reconstruct the flow from `message.content`.

### Step 1: Identify the Successful Workflow

Walk the trajectory and locate the **final, working** tool sequence — the one that actually produced the answer. Distinguish it from the trial-and-error leading up to it.

Capture:

- **What the user asked** (the original prompt).
- **What ultimately worked** — the exact tool calls, scripts, or command sequences that produced the answer. Quote them verbatim from the trajectory.
- **What didn't work** — the dead-ends. You will use these to write a `Triggers` section so the future agent knows when to reach for this skill *instead of* the failing approaches.
- **Environment assumptions** — what was missing or had to be installed (e.g. "no exiftool, pip install Pillow needed").

If no clearly successful workflow is in the trajectory (the session ended without reaching an answer, or the answer came from a single trivial call), output zero artifacts and exit.

### Step 2: Decide a Skill Name and Trigger

The skill **name** must be:

- kebab-case, action-oriented (`extract-exif-metadata`, `parse-cloudwatch-logs`, `restart-stuck-deploy`)
- specific enough that a future agent reading just the name can guess what it does
- not a duplicate of any existing entry under `.evolve/skills/`

The skill **description** (one line, in the SKILL.md frontmatter) should describe the *task* the skill solves, not the trajectory it came from. Bad: "Solves the focal-length question from session abc123." Good: "Extract EXIF metadata (focal length, GPS, lens, timestamps) from JPEG/HEIC images using Pillow when system EXIF tools are unavailable."

The **trigger** (in the SKILL.md body, under `## When To Use`) should describe the broad task context, not the narrow original request — same rule as the `learn` skill's guidelines.

Before continuing, list `.evolve/skills/` (use the `Glob` tool, not `find` / `ls`) and confirm your chosen name does not collide with an existing skill.

### Step 3: Draft the SKILL.md

Author a SKILL.md with this exact frontmatter shape — the validator in Step 5 will reject it otherwise:

```yaml
---
name: <kebab-case-name>
description: <one-line task description>
---

# <Title Case Name>

## Overview
<1–2 sentences: what the skill does and when to use it>

## When To Use
- <trigger 1>
- <trigger 2>

## Workflow
<step-by-step instructions for the agent>
```

Notes:

- `context: fork` is **omitted** for synthesized skills. They run in the parent context so they can write files into the workspace and report back.
- Do NOT inline the full successful script into the SKILL.md if it's more than ~10 lines — put it in a sibling `scripts/` file (Step 4) and reference it from the SKILL.md.
- The Workflow section should describe what to do *to solve the task*, not retell the original session. A future agent reading this should be able to act without ever seeing the trajectory.

### Step 4: Emit Supporting Scripts

If the successful workflow used a non-trivial script (more than a one-liner), write it as a sibling file under `scripts/` of your draft skill directory. Use the **already-validated code from the trajectory** — do not invent variations. Strip incidental one-off inputs (literal file names, IDs, hard-coded outputs) and replace with arguments or stdin where appropriate.

Common shape:

```text
.evolve/skills/<name>/
├── SKILL.md
└── scripts/
└── <action>.py # callable as `python3 scripts/<action>.py <args>`
```

If the workflow was a sequence of shell commands rather than a script, encode it as an executable shell script (`scripts/<action>.sh`) so future agents can invoke it as a single unit instead of replaying each command.

If no non-trivial script is needed (the workflow is a sequence of standard tool calls), skip this step — the SKILL.md alone is the skill.

### Step 5: Finalize

Place your draft files (SKILL.md and any scripts) under a temporary directory inside the workspace, e.g. `/tmp/synthesized-<name>/`, then call:

```bash
python3 .bob/skills/evolve-lite-synthesize-skill/scripts/synthesize.py finalize --src /tmp/synthesized-<name>/ --name <kebab-case-name> --trajectory <saved_trajectory_path>
```

The script will:

- Validate the SKILL.md frontmatter (`name` and `description` required; `name` must match `--name`).
- Reject the skill if a same-named skill already exists in `.evolve/skills/` (overwriting requires `--force`).
- Copy the directory into `.evolve/skills/<name>/` (canonical).
- Append a `synthesize_skill` event to `.evolve/audit.log` recording the new skill, the source trajectory, and the timestamp.
- Print the destination path(s).

If the validator rejects the draft, fix the SKILL.md and retry — do not edit files under `.evolve/skills/` directly.

### Step 6: Confirm

After the script returns, list the destination directories with the `Glob` tool to confirm the files landed. Output a short summary:

- The skill name and description.
- The destination paths.
- A one-line note on what future sessions should now be able to do that they couldn't before.

## Best Practices

1. **One skill per workflow.** If the trajectory contains two unrelated successful workflows, run synthesis twice with different names — do not pack them into one skill.
2. **Cite the trajectory.** Include the `--trajectory` flag so the audit log records provenance; future maintainers can trace the skill back to the session that produced it.
3. **Don't promote one-shots.** A skill is worth synthesizing only if the trigger is plausibly recurring. If the trajectory looks like a one-off, prefer the `learn` skill's guideline path instead.
4. **Don't paraphrase failure.** The skill describes what *worked*. If you find yourself writing "this skill avoids the problem where exiftool isn't installed," restate it as "uses Pillow to extract EXIF; works in environments without system EXIF tools." Triggers describe *when*, not *what failed*.
5. **Keep scripts minimal.** Strip incidental log lines, debug prints, and validation that wasn't actually exercised in the trajectory. If a feature wasn't validated, leave it out.
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
#!/usr/bin/env python3
"""Synthesize-skill helper: validate and install a synthesized skill.

The synthesize-skill skill (a subagent) is responsible for the *judgment* —
reading the trajectory, identifying the successful workflow, and writing
draft SKILL.md + supporting scripts into a temporary directory.

This script is the *plumbing* — it validates the draft frontmatter, copies
the directory into both the canonical evolve-managed location and the
platform-specific skills loader location, and writes an audit-log entry.

Usage:
synthesize.py finalize --src <draft_dir> --name <kebab-case-name> \
[--trajectory <path>] [--workspace <path>] [--force]
"""

from __future__ import annotations

import argparse
import re
import shutil
import sys
from pathlib import Path

# Reuse the plugin's lib helpers (audit-log writer + entities-dir locator).
_script = Path(__file__).resolve()
_lib = None
for _ancestor in _script.parents:
_candidate = _ancestor / "lib" / "evolve-lite"
if (_candidate / "audit.py").is_file():
_lib = _candidate
break
if _lib is None:
raise ImportError(f"Cannot find plugin lib directory above {_script}")
sys.path.insert(0, str(_lib))
from audit import append as audit_append # noqa: E402


KEBAB_RE = re.compile(r"^[a-z][a-z0-9]*(?:-[a-z0-9]+)*$")
FRONTMATTER_RE = re.compile(r"\A---\s*\n(.*?)\n---\s*\n", re.DOTALL)


def _parse_frontmatter(skill_md: Path) -> tuple[dict[str, str], str]:
"""Minimal YAML-ish frontmatter parser. Supports `key: value` lines only.

Returns (frontmatter_dict, body_text).
"""
text = skill_md.read_text(encoding="utf-8")
match = FRONTMATTER_RE.match(text)
if not match:
raise ValueError(f"{skill_md} has no frontmatter block")
fm: dict[str, str] = {}
for line in match.group(1).splitlines():
line = line.strip()
if not line or line.startswith("#"):
continue
if ":" not in line:
raise ValueError(f"{skill_md}: malformed frontmatter line: {line!r}")
key, _, value = line.partition(":")
fm[key.strip()] = value.strip().strip('"').strip("'")
return fm, text[match.end() :]


def _validate_draft(src: Path, name: str) -> None:
if not KEBAB_RE.match(name):
raise SystemExit(f"--name {name!r} is not kebab-case")
if not src.is_dir():
raise SystemExit(f"--src {src} is not a directory")
skill_md = src / "SKILL.md"
if not skill_md.is_file():
raise SystemExit(f"missing SKILL.md in {src}")

try:
fm, body = _parse_frontmatter(skill_md)
except ValueError as exc:
raise SystemExit(f"SKILL.md frontmatter is malformed: {exc}") from exc
if "name" not in fm or "description" not in fm:
raise SystemExit(f"SKILL.md frontmatter must include `name` and `description` (got: {sorted(fm.keys())})")
if fm["name"] != name:
raise SystemExit(f"frontmatter name {fm['name']!r} does not match --name {name!r}")
if not fm["description"]:
raise SystemExit("SKILL.md description is empty")
if len(body.strip()) < 50:
raise SystemExit("SKILL.md body is suspiciously short — not enough instructions to be useful")


def _resolve_workspace(arg: str | None) -> Path:
if arg:
return Path(arg).resolve()
# Fall back to the current working directory; at runtime this is the
# workspace mounted into the sandbox (`/workspace`) or the host repo root.
return Path.cwd().resolve()


def _check_dest(dst: Path, force: bool) -> None:
"""Reject the install if dst would block it; let _copy_into do the actual write."""
if dst.exists() and not force:
raise SystemExit(f"{dst} already exists (use --force to overwrite)")


def _copy_into(src: Path, dst: Path) -> None:
if dst.exists():
shutil.rmtree(dst)
shutil.copytree(src, dst)


# Per-platform runtime mirror — the loader-discoverable directory the
# synthesized skill is copied into so the host agent picks it up
# automatically. Set to None where the platform doesn't have a runtime
# skills directory; only the canonical `.evolve/skills/<name>/` write
# happens in that case.
_RUNTIME_MIRROR_DIR: str | None = None


def cmd_finalize(args: argparse.Namespace) -> int:
src = Path(args.src).resolve()
name = args.name
workspace = _resolve_workspace(args.workspace)

_validate_draft(src, name)

evolve_dst = workspace / ".evolve" / "skills" / name
runtime_dst: Path | None = workspace / _RUNTIME_MIRROR_DIR / name if _RUNTIME_MIRROR_DIR is not None else None

# Pre-check both destinations before any copy so a blocked second
# write doesn't leave the first half of the install on disk.
_check_dest(evolve_dst, args.force)
if runtime_dst is not None:
_check_dest(runtime_dst, args.force)

_copy_into(src, evolve_dst)
if runtime_dst is not None:
_copy_into(src, runtime_dst)

audit_append(
project_root=str(workspace),
event="synthesize_skill",
skill=name,
evolve_path=str(evolve_dst.relative_to(workspace)),
runtime_path=str(runtime_dst.relative_to(workspace)) if runtime_dst else "",
trajectory=args.trajectory or "",
)

print(f"Installed skill {name!r}:")
print(f" evolve: {evolve_dst}")
if runtime_dst is not None:
print(f" runtime: {runtime_dst}")
return 0


def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(description=__doc__)
sub = parser.add_subparsers(dest="cmd", required=True)

p_finalize = sub.add_parser(
"finalize",
help="Validate a draft skill directory and install it under .evolve/skills/ (and the platform's runtime skills dir, if any).",
)
p_finalize.add_argument("--src", required=True, help="Draft directory containing SKILL.md and any scripts/")
p_finalize.add_argument("--name", required=True, help="Kebab-case skill name; must match SKILL.md frontmatter")
p_finalize.add_argument("--trajectory", default="", help="Source trajectory path (recorded in audit.log)")
p_finalize.add_argument("--workspace", default=None, help="Project root (defaults to CWD)")
p_finalize.add_argument("--force", action="store_true", help="Overwrite existing skill of the same name")
p_finalize.set_defaults(func=cmd_finalize)

args = parser.parse_args(argv)
return args.func(args)


if __name__ == "__main__":
raise SystemExit(main())
Loading
Loading