Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
355 changes: 355 additions & 0 deletions skills/workflow-audit/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,355 @@
---
name: workflow-audit
description: >
Mine the user's Claude Code chat corpus for self-optimization patterns —
repeated corrections, tool thrash, dropped threads, restated constraints —
and produce a ranked, PII-scrubbed, anti-recency-biased markdown report.
Different from `self-audit` (which reviews PR code). This audits how the
user works with agents. Designed to be dispatched as a sub-agent so the
caller's context stays clean.
allowed-tools: "Bash,Read,Grep,Glob,Write"
argument-hint: "[output-path] [weeks=12]"
---

# workflow-audit — Mine the agent-chat corpus for optimization targets

Find 10–15 high-leverage interventions in HOW the user works with agents,
ranked by `(bucket_coverage × cost)`, output as a single PII-scrubbed
markdown report.

This skill is intended to be **dispatched as a sub-agent** so the caller's
context stays clean and the analysis is reproducible across machines.

## When to use

- User asks to "audit my chat history", "find optimization targets",
"what am I correcting agents on repeatedly", or similar.
- User explicitly invokes `/workflow-audit`.
- Periodically (e.g. monthly) as a maintenance ritual on the user's CC setup.

## Arguments

`$ARGUMENTS` — optional, positional:
1. Output path. Default: `~/workflow-audit-<YYYYMMDD>.md`.
2. Weeks of corpus to scan. Default: `12`.

## Prerequisites

The agent must have:

1. **`mache` binary** for the chat-corpus ingest. If missing:
```bash
git clone https://github.com/agentic-research/mache ~/.cache/workflow-audit/mache
cd ~/.cache/workflow-audit/mache && task install
```

2. **`chat-embed` binary** from ley-line-open. If missing:
```bash
git clone https://github.com/agentic-research/ley-line-open ~/.cache/workflow-audit/llo
cd ~/.cache/workflow-audit/llo && cargo install --path rs/ll-open/chat-embed
```

3. **Python 3** with stdlib only (used for JSONL parsing + scoring).
No third-party deps.

If any prereq install fails, **bail with a clear error**, do not fabricate
findings.

## Load-bearing rules

Three rules that MUST hold for the report to be trustworthy. If you find
yourself relaxing any of them, stop and report why instead of soldiering on.

### Rule 1 — Anti-temporal-bias

The user dispatches a lot of agents; recent weeks are dense and biased.

1. Bucket sessions by ISO week (`YYYY-Www`).
2. **Exclude the 2 most recent calendar weeks** from aggregation.
3. For each finding, compute `bucket_coverage = weeks_appearing / weeks_scanned`.
4. **Rank by `bucket_coverage × estimated_cost`**, NOT by raw count.
5. Drop any finding appearing in fewer than 3 distinct buckets.
6. If a single week contributes >40% of a finding's occurrences,
either drop the finding or flag the concentration in the report.

### Rule 2 — PII scrub before anything leaves the process

This must run BEFORE any tokens reach the report, the user-facing transcript,
or any tool call's input. Including intermediate thinking and scratch buffers.

Build a deterministic substitution table, then apply it to ALL output. The
table itself **never** appears in the report.

| Real form | Placeholder |
|---|---|
| GitHub user names (mine + colleagues) | `USER_N` |
| Org names (`agentic-research`, employer, …) | `ORG_A`, `ORG_B`, … |
| Repo names | `REPO_N` |
| Branch names | `BRANCH_N` |
| `/Users/<me>/...` and `/home/<me>/...` | `$HOME/...` |
| Email addresses | `EMAIL_N` |
| Phone numbers | `PHONE_N` |
| Slack / Discord / Notion handles | `HANDLE_N` |
| Internal URLs, S3 buckets, DB connection strings | `INTERNAL_URL_N` |
| External-tracker ticket IDs (JIRA, Linear, etc.) | `TICKET_N` |
| Customer / project codenames | `PROJECT_X` |

Rules:
- **Deterministic.** Same input → same placeholder. Patterns must stay
visible across sessions.
- **Pessimistic on name-shaped tokens.** If it looks like a proper noun
and isn't a common English word, scrub. False positives are cheap;
leaked PII is expensive.
- **Pass-through.** Bead IDs, commit SHAs, opaque tokens (UUIDs, hashes)
pass through unchanged. They're traceable but not identifying.

### Rule 3 — Don't fabricate

- If the corpus is too small (<50 sessions in scope), return a stats-only
report. Do not invent patterns.
- If a finding hits Rule 1's coverage threshold but the agent isn't sure
what it means, surface it as a "needs human interpretation" finding.
- Every example snippet in the report must be a real (scrubbed) quote.
No paraphrases.
- Contradictions between findings must be reported, not collapsed.

## Step 1 — Locate the corpus

The agent-chat corpus lives at:

```
~/.claude/projects/<sanitized-cwd>/
```

Each sub-directory is one project; each `*.jsonl` file is one session.
Skip projects matching `$WORKFLOW_AUDIT_SKIP_PATHS` (env var, glob list) —
this is how the user excludes work projects on the work machine.

```bash
ls -la ~/.claude/projects/ | head -20
```

## Step 2 — Ingest into a single sqlite db

Use mache to consolidate sessions into a queryable shape:

```bash
mache ingest claude-chats \
--src ~/.claude/projects \
--out ~/.cache/workflow-audit/chats.db \
--since "$(date -v-${WEEKS}w +%Y-%m-%d)" # macOS; Linux: --date "${WEEKS} weeks ago"
```

If `--since` isn't supported, ingest everything and filter at scoring time.

## Step 3 — Embed for semantic clustering

```bash
chat-embed index ~/.cache/workflow-audit/chats.db
chat-embed stats ~/.cache/workflow-audit/chats.db
```

Embeddings drive **Rule 1's pattern persistence detection**. Without them
you can only do exact-phrase matching, which under-counts paraphrased
recurrences (the user says "no" in 30 ways; the embedding model sees them
as one cluster).

If `chat-embed` fails or the user's machine can't run fastembed, fall back
to substring matching with explicit phrase lists. Note the degradation in
the report.

## Step 4 — Sample with anti-temporal-bias

```bash
# Walk the corpus chronologically, bucketing by ISO week.
# Exclude the 2 most recent weeks. Cap at $WEEKS buckets total.
python3 - <<'PY'
import json, glob, os, datetime, collections
import pathlib

BUCKETS = collections.defaultdict(list)
WEEKS = int(os.environ.get("WEEKS", 12))
SKIP = os.environ.get("WORKFLOW_AUDIT_SKIP_PATHS", "").split(":")

for path in glob.glob(os.path.expanduser("~/.claude/projects/*/*.jsonl")):
if any(s and s in path for s in SKIP):
continue
with open(path) as f:
for line in f:
try:
evt = json.loads(line)
except Exception:
continue
ts = evt.get("timestamp") or evt.get("created_at")
if not ts: continue
d = datetime.datetime.fromisoformat(ts.rstrip("Z"))
iso_year, iso_week, _ = d.isocalendar()
BUCKETS[(iso_year, iso_week)].append((path, evt))

# Drop the 2 most recent buckets.
recent = sorted(BUCKETS.keys())[-2:]
for k in recent: BUCKETS.pop(k, None)

# Cap to the last $WEEKS buckets.
kept = sorted(BUCKETS.keys())[-WEEKS:]
for k in list(BUCKETS.keys()):
if k not in kept: BUCKETS.pop(k)

print(f"buckets={len(BUCKETS)} events={sum(len(v) for v in BUCKETS.values())}")
PY
```

## Step 5 — Pattern detection

Eight pattern shapes the report ranks. Each is a separate pass over the
bucketed corpus. For every match: record `(bucket, session_id, snippet,
estimated_cost)`.

1. **Repeated user corrections.** User reply ≤30 tokens following an agent
turn ≥500 tokens, matching one of: `no`, `wait`, `ugh`, `stop`, `ew`,
`wrong`, `lying`, `you didn't`, `that's not`, `actually`.
2. **Tool-use thrash.** Sessions where ≥4 tool calls precede the first
`Edit`/`Write`, OR where the same file is `Read` more than twice without
intervening edits.
3. **Pre-action over-asking.** Agent turn ends with a `?`; next user turn
≤20 tokens contains the answer derivable from prior context (heuristic:
the answer's keywords appear earlier in the same session).
4. **Post-action under-asking.** Tool calls matching `git push`,
`git push --force`, `gh pr create`, `rm -rf`, `git reset --hard`,
`gh pr merge` followed by user objection in the next 3 turns.
5. **Dropped threads.** Sessions ending without `git commit`, `gh pr ...`,
or a final `Stop`/`success` marker AND lasting >10 turns.
6. **Restated constraints.** User instructions (≥3 tokens) appearing in
`$N` sessions where `N ≥ 3` AND the agent demonstrably violated the
instruction at least once in a session AFTER the first one. Semantic
match via chat-embed query.
7. **Long-cycle recurrence.** Same semantic cluster (chat-embed query)
appearing in ≥4 buckets separated by ≥2 weeks. Surface as "design debt
the user keeps re-touching."
8. **Phrasing-shift markers.** Register shifts (`honest`, `lowkey`, `tbh`,
`wait`, `ya feel`, `srsly`) followed within 3 turns by a correction.
Map to what the agent had just done.

## Step 6 — Score and rank

```
cost_proxy = (tokens_wasted_in_correction_recovery
+ tokens_in_dropped_thread_dead_weight
+ 0.5 × user_turns_in_correction_burst)

bucket_coverage = unique_buckets_appearing / total_buckets_scanned

score = bucket_coverage × cost_proxy
```

Sort findings descending by score. Apply Rule 1 step 5 (drop coverage < 3
buckets). Apply Rule 1 step 6 (flag concentration > 40%). Take top 15.

## Step 7 — Scrub

Run the PII scrub pass over every snippet, finding title, intervention
text, and example. Build the mapping table on first encounter; reuse on
subsequent ones. Verify NO raw forms appear in the output before writing.

```python
# Pseudo:
def scrub(text, table):
for real, placeholder in table.items():
text = re.sub(re.escape(real), placeholder, text, flags=re.IGNORECASE)
return text
```

## Step 8 — Write the report

Output to `$OUTPUT_PATH` (arg 1) using this exact skeleton:

```markdown
# Workflow audit — YYYY-MM-DD
# Buckets scanned: N weeks (recent 2 excluded)
# Sessions: N across M projects
# PII scrub: applied (mapping table not included)
# Corpus health: <one line>

## Top findings (ranked by bucket_coverage × cost)

### F1. <one-line title>
- **Pattern**: <2 sentences, scrubbed>
- **Bucket coverage**: x/N weeks | **Concentration**: passed | flagged
- **Estimated cost per occurrence**: <tokens or qualitative>
- **Example (scrubbed)**: > <real quote, scrubbed>
- **Intervention**: <one concrete line>
- WHERE: `CLAUDE.md` | `~/.claude/settings.json` | new skill | new hook | dispatch pattern
- WHAT (copy-paste):
```
<fragment>
```

### F2. … (repeat through F10–F15)

## Findings dropped (low coverage; for transparency)
- F-D1. <one line>
- F-D2. …

## Corpus health stats
- Sessions / week (last 12 weeks, with the 2 recent included for context)
- Median tool calls / session
- Median user turns / session
- % sessions ending in a `git commit`
- % sessions ending in a user objection within 3 turns
- Top 5 most-edited files (across the corpus, scrubbed to REPO_N pattern)

## Ready-to-apply (3 hand-picked)
For each: WHERE, copy-paste fragment, expected effect, contradiction check
against existing `CLAUDE.md` / `~/.claude/CLAUDE.md` memory entries.

### A1. <title>
### A2. <title>
### A3. <title>

## Notes
- Contradictions surfaced (if any): <list>
- Findings flagged for human interpretation: <list>
- PII scrub: <count> unique forms mapped to placeholders. Mapping lives
at `~/.cache/workflow-audit/scrub-table.json` and is NOT in this file.
```

## Step 9 — Hand-off

Print the absolute path of the report. Do NOT cat the file (it may be
large). Do NOT post the report to any external system — the user reviews
it locally first.

If invoked by a dispatcher agent, return:

```
report=<absolute path>
buckets=<N>
sessions=<N>
findings_top=<count>
findings_dropped=<count>
prereq_status=<ok|degraded|failed>
```

## Running on work machines

Two safeties:

1. **Skip work projects via env var**: `WORKFLOW_AUDIT_SKIP_PATHS` is a
colon-separated glob list. Default empty; on work machines set to your
work-project path patterns.
2. **Scrub-table local-only**: the scrub mapping never leaves the machine.
If the user wants to compare audits across machines, they compare the
*reports* (which are placeholder-only); the mappings stay home.

When the report finishes, leave a brief one-liner explaining what's in
`~/.cache/workflow-audit/` so the user knows what to delete vs keep.

## Why this skill exists separately from `self-audit`

| Skill | Surface | Trigger |
|---|---|---|
| `self-audit` | PR code | Before tagging a reviewer |
| `workflow-audit` | Chat corpus | Maintenance ritual, or when the user feels friction |

Don't confuse them. `self-audit` reviews what you wrote. `workflow-audit`
reviews how you wrote it.