feat(skill): Mode A — self-improving personal skills distilled locally from your own sessions#95
feat(skill): Mode A — self-improving personal skills distilled locally from your own sessions#95kai-rayward wants to merge 29 commits into
Conversation
…e Code + Codex
New `clawjournal/skill/` package + `clawjournal skill` command. Selects recurring
failures ("avoid") + successes/recoveries ("do") from the user's own scored
sessions, distills <=5 rules via ONE local agent-CLI call (anonymize +
deterministic secrets-scrub before the LLM), runs a deterministic gate (external/
exec hard-deny + secrets/PII/TruffleHog), previews, then installs a
`clawjournal-lessons` skill globally for Claude (~/.claude/skills/) and Codex
(~/.codex/AGENTS.md managed region). Fully local, no upload. First run uses --all;
weekly default window is 7 days. Tests in tests/skill/ (16, fake-caller, no live LLM).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
… state + merge - Preflight (§7.0): require source scope + confirmed projects + an installed backend + TruffleHog (unless CLAWJOURNAL_SKIP_TRUFFLEHOG); block with next steps. - Scan + score the window (§7.1/§7.2): reuse `_run_scan` + `query_unscored_sessions` + `_score_single_session`, bounded by --score-limit with progress (first run --all). - Durable state (§9): `skill_rules` table (fingerprint + approval/rejection/install timestamps) in skill/store.py. - Merge loop (§9): merge new distill output into the kept set, dedupe by fingerprint, replace the weakest (cap 5), skip rejected fingerprints; preview shows a NEW/KEPT/dropped diff. `--reject <fp>` bans a rule from re-proposal. - New flags: --no-scan --no-score --score-limit --reject --skip-preflight. Tests: preflight, store, merge, + store-integration in test_generate (30 total). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
…y (§9/D9) Track the targeted failure modes' incidence *rate* over eligible scored sessions week-over-week as a directional "is it helping?" signal (not a powered metric): - select adds `eligible_scored` (denominator) + `mode_rates()`. - store adds `skill_mode_snapshots` + save/last helpers. - generate_skill computes prev-vs-current rates for the modes the "avoid" rules target; preview shows `mode: 30% -> 22% (improving)` / baseline / insufficient data (n<10); a snapshot is recorded on install for next week's comparison. Tests: tests/skill/test_telemetry.py (denominator, snapshot round-trip, trend). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
test_select/test_generate/test_store share basenames with tests/benchmark; making tests/skill a package namespaces its modules so the full suite collects cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
…ce scope, orchestration test Combines Codex's in-flight hardening (candidate impact/recency/rank_score fields, evidence-id limiting to selected sessions, TruffleHog fail-closed gate, drop-on- reinstall, richer preflight/tests) with the four correctness fixes surfaced by an independent adversarial review of the branch against the Mode A spec: 1. Pool partition (§6/D5): each session lands in exactly one pool (avoid wins) and the recurrence counters count each session once — a recovered/high-value trace no longer double-books as both avoid+do, and mode_rates() can no longer exceed 100%. 2. Merge good+bad mix (D2/§9): merge_rules now interleaves avoid/do so 'do' rules (weaker support signal) are not structurally crowded out over weekly runs. 3. Source scope (§4.4): the confirmed `config --source` scope is threaded through selection, scan, and scoring — it now constrains which sessions are distilled, not just a presence gate. 4. Defense-in-depth: the external/exec hard-deny is re-applied to the full merged install set (not only fresh rules); honest --reject timing message. Tests: +good/bad-mix, +no-double-booking, +rate<=100%, +source-scope, and a real _ensure_corpus scan/score orchestration test (closes the §14 coverage gap where it was monkeypatched to a no-op). 57 skill tests + 155 cli tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
generate_skill read the on-disk config via _config_sources(), so the pure pipeline's tests silently depended on the developer's real ~/.clawjournal/config.json — a dev who ran `config --source claude` would see spurious failures (seeds use source="codex"). Make `sources` an explicit parameter defaulting to the coding-agent scope; run_skill passes _config_sources() so production still enforces the configured source scope. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
The distiller flattened rich learning_summary signal into platitudes ("add
tests", "check the environment") because the prompt asked only for "portable
guidance" and ran on the fast default model. Rewrite _SYSTEM to require each
rule to name the concrete situation/surface/pattern, preserve technical detail
from the source summary, and reject vague platitudes unless bound to a concrete
trigger + checkable mechanism. De-identification now scopes to PII only (names,
emails, URLs, home paths, secrets, shell commands) — technical specifics (repo
and module names, failure surfaces, tool categories, patterns) are kept.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
Rules rendered the entire guidance sentence as the ### heading, so lessons read as unnamed blobs. Add a short imperative `title` (3-6 words) as the heading, with the full rule kept as a "Rule:" body line. - schema: SkillRule.title + display_title() fallback; title in SKILL_DISTILL_SCHEMA (required) and parse_rules (derives from guidance when the model omits it); find_external_tokens also scans the title. - distill: prompt asks for a title; _RULE_SHAPE documents it. - render: title is the heading; guidance moves to the body (deduped when a rule is unnamed and falls back to guidance). - store: title column (+ migration for pre-title tables), round-tripped through upsert/load so kept rules keep their name across weekly merges. Fingerprint stays on kind+guidance, so renaming never changes a rule's identity (dedup/reject/merge are unaffected). 63 skill tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
Mirror the rendered SKILL.md: the preview headline is now display_title() with the full guidance indented under `rule:`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
Rule titles ran up to 6 words ("Never leak secrets in script errors"). Tighten
the distill prompt to 2-4 words (aim 2-3) reading like a command/skill name, and
shorten the fallback derivation from 8 to 4 words. Full guidance is preserved in
the Rule: body line, so truncation never loses detail.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
The distill step used default_model_for_backend (the fast scoring default, claude-haiku-4-5), which flattened rich session signal into generic, over-long rules. Distill is a SINGLE call per run — unlike the 25-call scoring fleet — so it can afford a frontier model. Add a distill-only model map (DEFAULT_DISTILL_MODELS: claude->opus, codex->gpt-5.4) and use it in DefaultCaller. Scoring is untouched (share/benchmark still use the fast default). `--model` still overrides; CLI help updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f73243fc0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| unscored = query_unscored_sessions( | ||
| conn, limit=score_limit, source=_config_sources(), since=since) |
There was a problem hiding this comment.
Skip held sessions before auto-scoring
When clawjournal skill runs with its default scoring step and an unscored session is in pending_review or an active embargoed hold, query_unscored_sessions still returns it here; the later hold-state check in select_skill_candidates only runs after _score_single_session has already sent that transcript to the scoring backend. This violates the new skill path's hold-state egress guarantee, so filter these IDs with release_gate_blockers before the scoring loop or otherwise skip held rows.
Useful? React with 👍 / 👎.
| "AND EXISTS (SELECT 1 FROM json_each(COALESCE(ai_recovery_labels,'[]')) " | ||
| " WHERE value IN ('self_recovered','user_corrected_recovery')))" |
There was a problem hiding this comment.
Validate recovery-label JSON before json_each
If any scored row has ai_recovery_labels set to a non-JSON string, SQLite raises OperationalError: malformed JSON while evaluating this json_each, so one bad/manual row aborts the entire skill command instead of being treated as having no clean recovery labels. Add json_valid(ai_recovery_labels) to this branch or coalesce invalid values to [] before calling json_each.
Useful? React with 👍 / 👎.
Correctness (install/gate path): - #1 run_skill persisted every merged rule (state='proposed') BEFORE the render-time gate ran, so a rule the gate flagged was stored, reloaded by load_kept every run, and re-blocked the install forever. Persist nothing when gate_issues is set; record 'seen' only after the gate passes. - #2 install_claude/install_codex could raise (hand-edited SKILL.md -> RuntimeError; non-writable ~/.claude/~/.codex -> OSError) with no except, so the command aborted on a raw traceback. Catch, report cleanly, exit 1 without advancing the store. - #3 scoring auto-scored unscored sessions with no hold-state filter, sending held/embargoed content to the model provider even though distill gates that egress. Filter unscored sessions through release_gate_blockers before scoring. - #4 `--model` (documented as distill-only) was forwarded into scoring, silently re-pricing up to --score-limit sessions on a frontier model. Scoring now uses its own default; --model reaches distill only. - #5 mode_rates() could exceed 100%: badge-only (NULL-score) candidates counted into the numerator but were excluded from eligible_scored. Widen the eligible denominator to a superset of the candidate set (score OR badge OR modes). - #6 DefaultCaller() (which resolves the backend and can raise) was built outside distill_skills' degrade-gracefully try; move it inside. - #7 succ_sql called json_each on ai_recovery_labels without the json_valid guard its sibling ai_failure_modes clause uses -> crash on a malformed row. - #12 installed rules were upserted twice/run (loop + mark_installed); the install path now persists only via mark_installed. Docs: - #15 add `clawjournal skill` to the README command reference. Tests: +5 regression guards (held-not-scored, no distill-model in scoring, badge-only rate<=1, malformed recovery_labels no-crash, distill degrades on backend-resolution failure). 70 skill tests pass. Deferred (low): #9-#11,#13,#14. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7bf6b95f5d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not src or src in ("all", "both", "auto"): | ||
| return list(FAILURE_VALUE_SOURCE_SCOPE) |
There was a problem hiding this comment.
Preserve the legacy both source scope
When the confirmed config source is both, config.source_scope_sources() defines that legacy scope as only Claude+Codex, but this branch expands it to FAILURE_VALUE_SOURCE_SCOPE (claude, codex, opencode, openclaw). In a clawjournal skill run for a user who confirmed --source both, opencode/openclaw rows can therefore be scored and distilled to the backend even though those sources/projects were never confirmed; keep both mapped to ['claude', 'codex'] instead of treating it like all.
Useful? React with 👍 / 👎.
| unscored = query_unscored_sessions( | ||
| conn, limit=score_limit, source=_config_sources(), since=since) |
There was a problem hiding this comment.
Over-fetch before dropping held scoring rows
Because query_unscored_sessions applies LIMIT before the hold-state filter below, a window where the newest score_limit unscored sessions are pending_review or actively embargoed fills the page with rows that are then dropped, so older shareable unscored sessions are never scored and reruns keep hitting the same blocked page. Fetch extra and cap after filtering, or push the hold-state predicate into the query, so eligible sessions behind held rows can still feed the skill corpus.
Useful? React with 👍 / 👎.
Low batch from the review: - #9 window filter compared start_time strings lexicographically against an isoformat bound, mis-filtering mixed offsets. Widen the SQL prefilter by 2 days (> max 14h offset skew) and filter on the parsed instant in Python. - #10 _parse_start_time now truncates >6-digit fractional seconds so Python 3.10 (a CI target) doesn't reject them (recency -> 0.01 -> dropped from the pool). - #11 the trend arrows / gate glyphs crashed on non-UTF-8 consoles. Add _ascii_safe() downgrades + reconfigure stdout to backslashreplace. - #13 DefaultCaller duplicated benchmark AgentBackendCaller; extract the shared run_agent_json_call() and delegate from both. - #14 release_gate_blockers ran twice over overlapping sets; compute the block set once over the candidate∪eligible union (candidates ⊆ eligible). Confirmed-scope + hold-state egress (reported separately): - 'both' source scope wrongly widened to the 4-source coding scope (opencode/openclaw were never confirmed). Delegate to the canonical config.source_scope_sources so 'both' stays Claude+Codex. - scoring applied query_unscored_sessions' LIMIT BEFORE the hold-state filter, so a page of held rows starved older shareable ones and re-runs stalled. Compute the window's egress-blocked set first (canonical release_gate_blockers, resolves expired embargoes), over-fetch by that count, filter, then cap — held transcripts are never scored (egress) and shareable rows always make progress. Tests: +starvation and 'both'-scope guards; test_orchestration rewired for the block-first flow. 71 skill + 151 benchmark tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c0deb2110
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fetched = query_unscored_sessions( | ||
| conn, limit=score_limit + len(blocked), source=_config_sources(), since=since) |
There was a problem hiding this comment.
Use the widened window when scoring
When an unscored session's stored start_time has a source offset that sorts before the UTC cutoff even though the instant is inside the window (for example 2026-05-24T01:00:00-07:00 versus a 2026-05-24T07:00:00+00:00 cutoff), passing since here makes query_unscored_sessions exclude it with a lexicographic SQL comparison. The selector below already widens and parses timestamps for this reason, so these in-window sessions are never scored by clawjournal skill and cannot enter the corpus until they are scored some other way; use the same widened/filtering approach for the scoring prepass.
Useful? React with 👍 / 👎.
…w parsing From the second xhigh review of the branch: - #0 (regression) _ensure_corpus called release_gate_blockers unchunked over a held_where set that also matched hold_state IS NULL, so `skill --all` could exceed SQLite's variable limit and crash. hold_state defaults to the shareable 'auto_redacted', so drop the NULL clause (set is now small) and reuse the chunked _select._release_blocked_ids. - #1 install_claude bricked future runs if the SKILL.md/.sha256 pair desynced (a stale sidecar hash read as "hand-edited"). Recover when the file is our unmodified managed output (provenance comment is the last content) even if the sidecar is stale/missing; still refuse genuine edits (content after it) and foreign files. - #2 the scoring-window filter compared start_time to `since` lexicographically (mixed-offset mis-filter, same class as the select.py #9 fix). Widen the bound by 2 days; over-including a couple of just-out-of-window sessions for scoring is harmless (the corpus is filtered on the precise instant). - #3 (regression) _parse_start_time only truncated >6 fractional digits; Py3.10 also rejects 1/2/4/5-digit fractions. Normalize any fractional length to 6. Deferred: #4 (load_config re-read 5-6x/run) — low, and a clean fix threads config through preflight/distill; noted for a follow-up. Tests: +stale/missing-sidecar recovery, odd-fractional-seconds; test_orchestration rewired to patch select.release_gate_blockers. 74 skill tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa6b3b3583
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| # Over-fetch by the held count so >= score_limit shareable rows survive the filter. | ||
| fetched = query_unscored_sessions( | ||
| conn, limit=score_limit + len(blocked), source=_config_sources(), since=since) |
There was a problem hiding this comment.
Honor excluded projects before scoring skill sessions
When a user has run clawjournal config --exclude ..., those projects are intentionally removed from export/share via session_matches_excluded_projects, but the skill scorer fetch here only applies source/since/hold filters. clawjournal skill can therefore score an excluded project's transcript through _score_single_session, and already-scored excluded rows can still be distilled unless select_skill_candidates gets the same filter; apply the effective share settings' excluded_projects before both scoring and selection.
Useful? React with 👍 / 👎.
…(all 7) - #0 excluded projects were never filtered from selection/scoring, so --exclude'd content could be distilled + installed. Thread excluded_projects through select_skill_candidates (drops candidates AND the denominator) and _ensure_corpus (skips them before scoring, over-fetching so they can't starve shareable rows), reusing the canonical session_matches_excluded_projects — same gate as export. - #1 the hard-deny dropped the safety lessons the feature exists to produce: bare command names (eval/sudo/rm -rf) matched _SHELL_RE. Deny only actionable injection/exfil (command substitution, pipe-to-shell, URLs, secrets, mcp__ ids), not advisory mentions; preview now lists which rules were dropped and why. - #2 (regression) _is_unmodified_managed only checked the tail, so a mid-body edit with a valid sidecar was silently overwritten. Replace the two-file sidecar with an integrity hash embedded IN SKILL.md (one atomic write): detects any body edit AND can't desync/brick. Legacy sidecars are retired on install. - #3 partial install (Claude ok, Codex fails) left store != disk. Install each target independently and mark_installed/save_mode_snapshot when >=1 landed, so the store reflects reality; report per-target failures and exit non-zero. - #4 support was a monotonic MAX peak and merge ranked by it alone, so stale rules never decayed. Rank by recency-weighted support (30-day half-life via last_seen); rules re-proposed this run stay fresh. - #5 _scan_source_filter now derives from the canonical source_scope_sources. - #6 promote a shared paths.atomic_write_text; install.py and bundle.py delegate. Tests: +excluded-projects, advisory-safety-vs-injection, mid-body-edit detection, pre-integrity migration, recency decay. 78 skill + 56 bundle tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cfcc06048d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for r in rules: | ||
| fp = upsert_seen(conn, r, now=ts) |
There was a problem hiding this comment.
Stop refreshing stale rules as seen
When a previously installed rule is carried forward but is not re-proposed by the current distill run, this call still writes last_seen_at = ts via upsert_seen and the following update. Because merge_rules() decays stored support based on last_seen, a rule that merely survives each weekly install is made fresh every run, so high-support stale lessons never accumulate idle age and can continue outranking newly recurring rules. Keep last_seen_at tied to actual fresh evidence (or separate it from install time) when marking carried rules installed.
Useful? React with 👍 / 👎.
… FPs, +more
- #0 SKILL_DISTILL_SCHEMA now lists ALL rule properties in `required`; Codex
strict --output-schema rejected the partial list, making the default Codex
distill a silent no-op ("No usable rules").
- #1 a secret/PII rule dead-ended the whole install with no rejectable fingerprint
(gate-failed runs persist nothing). Add a PER-RULE secret/PII/TruffleHog gate so
only the dirty rule is dropped; the preview now lists dropped rules + reasons.
- #2 _SHELL_META_RE matched any markdown backtick span (`rg`, `npm run build`) —
drop the backtick alternative; keep $(...) and pipe-to-shell.
- #3 _SECRET_RE's bare 40-hex/base64 matched git commit SHAs and hashes; restrict
to high-confidence key prefixes (real secrets caught by _scrub + render gate).
- #4 merge kept the STALE stored object when a re-distilled rule had lower current
support, discarding fresh why/trigger/title/evidence — now keep fresh content and
carry the peak support forward.
- #5 install loop now also catches UnicodeDecodeError (non-UTF-8 existing file).
- #6 codex distill model no longer forced to an unverified 'gpt-5.4'; falls back to
the known-good fast default.
- #8 install_claude no longer hard-refuses an externally-touched SKILL.md (a
weekly-regenerated artifact); it backs the copy up to .local.bak and regenerates.
- backend bleed: scoring uses the configured scorer_backend, not the distill
--backend. NAME_MAX: shared temp prefix no longer adds a byte. dropped->proposed:
upsert_seen revives a re-seen dropped fingerprint.
Tests: +codex-strict schema, per-rule secret gate, backup-on-edit, advisory-vs-
injection. 80 skill + 56 bundle tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f27a5b1f37
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| """Return *existing* with the managed region replaced (or appended).""" | ||
| block = f"{BEGIN_MARKER}\n{region_body.rstrip()}\n{END_MARKER}\n" | ||
| start = existing.find(BEGIN_MARKER) | ||
| end = existing.find(END_MARKER) |
There was a problem hiding this comment.
Prevent rule text from closing the managed region
When a distilled rule contains the literal <!-- END clawjournal-lessons --> marker, this uses the first end marker in the file on the next install. That lets the remainder of the old generated rule escape outside the managed block and persist in ~/.codex/AGENTS.md even after the rule is removed or rejected, which defeats the managed-region cleanup and prompt-injection backstop for untrusted distilled text. Deny/escape the markers in rendered rules or splice against a delimiter that cannot appear in the body.
Useful? React with 👍 / 👎.
Correctness: - #0 install reset the decay clock: upsert_seen AND mark_installed bumped last_seen_at to now for EVERY installed rule, including carried-over rules not re-distilled this run, so a stale rule kept ~7-day age and never decayed out. last_seen now tracks last-seen-in-distillation (preserved for carried rules, stamped now only for freshly distilled ones); mark_installed no longer touches it. - #3 an idle-week run (empty corpus, kept rules re-proposed) saved an empty mode snapshot, clobbering the last real one and resetting the week-over-week trend. Only snapshot when eligible_scored > 0. - #1/#2 the --model help claimed Codex distill uses gpt-5.4; it uses the fast default now, so correct the help text. - #4 _decayed_support crashed on a naive `now`; coerce to aware in merge_rules. Cleanup/perf: - #5 the whole-doc gate re-ran TruffleHog after the per-rule gate (N+1 subprocess spawns). Per-rule gate now does fast regex secrets/PII only; TruffleHog runs once on the whole doc as the backstop. - #6 drop the dead ai_summary column/field from select (never read downstream). Deferred (low/plausible): load_config re-read ~7x/run (perf micro-opt; a clean fix threads config through preflight/distill). Tests: +carried-vs-fresh last_seen, naive-now merge. 83 skill tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
…, cleanups Correctness/robustness: - #0 preflight validated only the distill backend, but scoring uses the configured scorer_backend (decoupled last round). A broken scorer passed preflight then scored nothing. preflight now validates scorer_backend too (skipped on --no-score). - #1 select_skill_candidates didn't normalize a naive `now` (merge_rules already did), risking a naive/aware TypeError. Coerce to aware. - #2 the scoring loop only handled the {'error'} dict; a deeper exception (e.g. sqlite 'database is locked' vs a running daemon) aborted the whole run. Wrap the per-session score call and skip on any exception. - #3 a TruffleHog-only finding (regex-missed) dead-ended the install again, since the per-rule gate skips TruffleHog. Now when the whole-doc gate fires, re-run the per-rule gate WITH TruffleHog to pinpoint + drop the offending rule instead. Perf/cleanup: - #4 load_config re-read ~7x/run -> read once and thread `cfg` through the scope / scorer helpers (they take an optional cfg). - #5 ensure_table did a CREATE + PRAGMA + commit on every store call; fast-path returns after a cheap PRAGMA when the schema is already present. - #6 _candidate_aliases computed twice per distill -> once, passed to build_prompt. - #7 drop the redundant source_agents field from the distill prompt. Tests: +preflight scorer check. 84 skill tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d11ef4e045
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| """Projects the user has --exclude'd (same egress gate as export/share).""" | ||
| from .config import load_config | ||
| cfg = cfg if cfg is not None else load_config() | ||
| return list(cfg.get("excluded_projects") or []) |
There was a problem hiding this comment.
Honor policy excludes in skill egress
When a project is excluded from the workbench Policies UI/API (policy_type='exclude_project'), export/share drops it via get_effective_share_settings() by merging those policies into excluded_projects, but the skill path calls this helper for both the scoring prepass and distill selection and it only returns config.json exclusions. In that context, sessions from a policy-excluded project are still sent to the scorer/distiller backend by clawjournal skill; load the effective share settings from the open index or otherwise include exclude_project policies before scoring and selection.
Useful? React with 👍 / 👎.
…on dup, +3 - #0 the recurring gate dead-end: when the whole-doc gate flags a finding no individual rule reproduces (a context artifact of the concatenated doc / static frontmatter), the per-rule pinpoint returns empty and install dead-ended forever with no rejectable fingerprint. Now clear gate_issues in that case — the per-rule TruffleHog pass already cleared every rule, so the install proceeds. - #1 the "hit the per-run score cap" warning was unreachable (fetched exactly score_limit+skip); fetch one extra so "more remain" is detectable and warned. - #2 upsert_region duplicated the managed AGENTS.md block if an END marker appeared before BEGIN in the user's own notes; find the END that CLOSES this BEGIN (search after it), replacing through EOF when malformed. - #3 parse_rules kept a failure `taxonomy` on a 'do' rule, mislabeling a success under "Do" with a failure mode + inflated recurrence count; clear it for 'do'. - #4 a frontier (Opus) distill failure was swallowed -> "No usable rules" for users whose plan lacks Opus. Retry once on the backend's fast default (with a note). - #5 select's `float(x or 3)` turned a real 0 score into 3; use an explicit None check. Tests: +unattributable-gate no-dead-end, do-taxonomy-cleared, region stray-END dedup, +1 score-cap probe. 87 skill tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 67f94720d7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not value: | ||
| return "" | ||
| text = anon.text(str(value)) | ||
| redacted, _, _ = redact_text(text) |
There was a problem hiding this comment.
Apply configured redactions before distilling
When a user has configured custom redactions via clawjournal config --redact ... or the Workbench redact-string policy, the export/share path removes those strings, but this scrubber only applies home/user anonymization and the generic secret regex before build_prompt sends project/title/learning_summary/score_reason to the agent backend. A client or internal name that the user explicitly marked for redaction can therefore leave the machine during clawjournal skill; thread the effective share settings' custom strings (and policy usernames) into this scrub step before the distill call.
Useful? React with 👍 / 👎.
…, +6 Security-critical: - #0 my round-7 gate fix was a bypass: clearing an unattributable whole-doc finding silently INSTALLED flagged content, violating the fail-closed "any TruffleHog finding blocks" invariant. Now fail closed — an unattributable real finding keeps gate_issues set and blocks the install; nothing is written. - #1 a TruffleHog SCAN ERROR (blocking, zero findings) was misattributed to every rule by the per-rule pinpoint, silently dropping the whole set. Detect scanner errors and fail closed as-is without the per-rule pass; clearer block message. - #3 the distill call auto-loaded the installed clawjournal-lessons skill + user CLAUDE.md, so the model re-emitted/reinforced already-installed and --rejected rules (inflating support, defeating decay). Run Claude with --bare (claude_bare). Correctness/cleanup: - #2 stop rendering ephemeral per-run case-NN evidence aliases — a carried-over rule cited case ids mapping to unrelated sessions this run (misleading provenance). - #4 the trend is per-run, not calendar-weekly; relabel "vs your last run" (was "week-over-week" / "re-run next week"). - #5 the selection window was only lower-bounded; drop future-dated sessions (bound both ends in _in_window). - #6 remove dead SkillRule.as_dict. #7 the dirty-path re-gate skips the redundant whole-doc TruffleHog re-scan (surviving rules already passed per-rule). Deferred (low): consolidate the 4th private ISO parser (drift risk vs the Py3.10 fractional-second handling). Tests: +fail-closed unattributable finding, +scanner-error no-misattribution. 88 skill + 151 benchmark tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e07ea67edf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # A scanner INFRA failure (blocking with no finding: missing binary, timeout, | ||
| # crash) must fail closed AS-IS — never run the per-rule pinpoint, which would | ||
| # misattribute the same error to every rule and silently drop the whole set. | ||
| scanner_error = any(("error" in i) or ("scan failed" in i) for i in gate_issues) |
There was a problem hiding this comment.
Preserve scanner failures as gate issues
When TruffleHog is missing, gate_rendered returns trufflehog: trufflehog-not-installed, which does not match this scanner-error predicate. With --skip-preflight or if the binary disappears after preflight, the code falls into per-rule pinpointing, drops every rule, clears gate_issues once rules is empty, and the CLI reports only “No usable rules” instead of the mandatory scanner failure; treat not-installed/missing-binary reasons as scanner errors so the run fails closed with an actionable gate issue.
Useful? React with 👍 / 👎.
| print("Try `clawjournal skill --all` for the first run.") | ||
| else: | ||
| print("No usable rules this run.") | ||
| return |
There was a problem hiding this comment.
Show safety-gate drops when none survive
When every distilled rule is rejected by gate_rules or gate_secret_pii_per_rule, res.rules is empty but res.blocked contains the reasons; this early return skips the blocked/gate output. In a run where the model produced only URL-, secret-, or PII-tainted rules, users see only “No usable rules” with no indication that the safety gate fired, so report res.blocked before returning or special-case blocked-only results.
Useful? React with 👍 / 👎.
| evidence.append(aliases[sid]) | ||
| elif sid in allowed_aliases: | ||
| evidence.append(sid) | ||
| r.evidence_session_ids = list(dict.fromkeys(evidence)) |
There was a problem hiding this comment.
Drop rules without validated evidence IDs
If a backend omits evidence_session_ids or returns only fabricated/stale IDs, this remapping leaves evidence empty but still returns the rule. In that scenario an ungrounded lesson can be merged and installed despite the distill contract that each rule be grounded in selected cases; filter out rules whose remapped evidence list is empty before support backfill.
Useful? React with 👍 / 👎.
…cleanups - #0/#1 the scanner-error guard only matched "error"/"scan failed", so a missing binary ("trufflehog-not-installed") was misclassified as a content finding and the per-rule pinpoint dropped the whole clean set. Classify by the CONTENT marker instead: an issue lacking "match(es)"/"finding(s)" is an infra error -> fail closed. - #2 the interactive install prompt used bare input(); Ctrl-D/EOF now declines gracefully instead of tracing back. - #3 the distill fallback re-ran the IDENTICAL model on Codex (whose distill default already is the fast model), doubling a ~240s call. Retry only when the fast default actually differs from the frontier default the first attempt used. - #4 _row_to_rule now isinstance-guards a valid-JSON non-array evidence_json (was an uncaught TypeError). #5 last_mode_snapshot guards a non-object rates_json (was an uncaught AttributeError from .items()). - #6 fix the backwards NAME_MAX comment in paths.atomic_write_text and cap the temp stem at 200 chars so a long output filename can't overflow NAME_MAX. - #7 config now read ONCE per run and threaded through preflight/_ensure_corpus/ select (was 3 reads). Tests: +not-installed scanner classification, +non-array/non-object JSON guards. 241 skill + benchmark tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
…window edge - #0 (security) my round-8 perf tweak made the dirty-path re-gate skip TruffleHog (run_trufflehog=False), so a residual detector-only finding in the reduced document could slip through and flagged content be written to disk — a fail-open of the "any TruffleHog finding blocks" invariant. Re-run the FULL gate (incl. TruffleHog) so an unattributable residual finding still fails closed. - #1 the post-install durable-store writes (mark_installed / save_mode_snapshot) ran after the irreversible on-disk install with no guard; a DB error (sqlite lock vs the daemon) crashed the command with the skill installed but state unrecorded. Guard them: warn and continue (worst case the next run re-labels [NEW]). - #2 _in_window's fallback for an unparseable start_time did a lexicographic compare ('not-a-date' >= '2026-...') that pulled out-of-window sessions into the corpus. Exclude unparseable timestamps instead (the parser already handles Z/offsets/frac). - #3 distill_skills re-read config for redact_usernames; thread the already-loaded cfg through generate_skill -> distill_skills (config now read once per run). Tests: +unparseable-timestamp exclusion. 242 skill + benchmark tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a9e9e2eac5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| sid = s["session_id"] | ||
| print(f" [{i}/{len(unscored)}] {sid}", flush=True) | ||
| try: | ||
| res = _score_single_session(conn, sid, backend=scorer_backend) |
There was a problem hiding this comment.
Apply custom redactions before skill scoring
When clawjournal skill runs with its default scoring step and an unscored session contains a string the user configured via clawjournal config --redact or a Workbench redact-string policy, this call sends the full transcript to _score_single_session before the skill distill scrubber runs. The scoring path only anonymizes home/usernames (_anonymize_for_scoring reads redact_usernames, not custom strings), so client/internal names explicitly marked for redaction can leave the machine during the skill prepass; thread the effective share redactions into scoring here or skip scoring those sessions until they can be scrubbed.
Useful? React with 👍 / 👎.
…, cleanups - #0 `clawjournal skill --all` capped scoring at --score-limit (25) per run, contradicting the README/design promise to score ALL history on the first run. --all now scores every unscored session (uncapped, with a "this can take a while" note); windowed runs keep the cap + re-run warning. - #2 a cleanly-recovered failure session with empty ai_failure_modes fell out of the avoid pool (clean recovery) AND the success query (which required modes length > 0), vanishing from both. The success query now admits any clean recovery regardless of failure_modes, so the "what worked" lesson is surfaced. - #3 decouple the infra-vs-content gate classification from gate_rendered's prose: move it to render.gate_has_scanner_error, colocated with the message producer. - #4 render the shared body ONCE via render.render_targets (was built 2x, 4x on the TruffleHog-pinpoint path). Not changed (by design / low): the fail-closed gate dead-end on an unattributable whole-doc finding is intentional (security invariant; clear message + rules shown in preview); the 3x resolve_backend on the rare frontier->fast fallback is deferred. Tests: +clean-recovery-without-modes success candidate. 243 skill + benchmark pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 210c3c6072
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fetched = query_unscored_sessions( | ||
| conn, limit=10**9, source=_config_sources(cfg), since=since) | ||
| else: | ||
| fetched = query_unscored_sessions( | ||
| conn, limit=score_limit + len(skip) + 1, source=_config_sources(cfg), since=since) |
There was a problem hiding this comment.
Defer active sessions before skill scoring
When clawjournal skill runs right after indexing a still-active trace, these calls use query_unscored_sessions with its defaults, which the index docs say do not defer in-flight sessions and do not re-select rows that later grew after being scored (workbench/index.py lines 2646-2661). The daemon avoids this by passing include_stale_scored=True and settle_seconds=SCORE_SETTLE_SECONDS (workbench/daemon.py lines 413-419); without the same options here, the skill prepass can persist a premature score for a partial transcript and later skill runs will skip it as already scored.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2fe309b3bd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if caller is not None or model is not None or effort is not None: | ||
| return [] |
There was a problem hiding this comment.
Keep fallback when only effort is overridden
When a user passes only --effort and the default frontier model is unavailable or over quota, this guard treats the effort override like an explicit model choice and returns [] before the fast-model fallback. That makes an advertised effort-only override turn a recoverable model-availability failure into “No usable rules”; the model was still chosen by default, so retry the fast default with the requested effort (or without the default xhigh effort) instead of skipping fallback.
Useful? React with 👍 / 👎.
Surface RECURRING, broadly-applicable habits (lessons that fire on most tasks) instead of rare-but-severe incident post-mortems: - select: rank candidates by frequency^1.5 x gentle-severity x gentle-recency, so a frequently-recurring habit outranks a single severe one-off; widen the pool DEFAULT_POOL_CAP 5 -> 15 so the distiller has diverse failure modes to synthesize from (it still emits <=5 rules). - distill prompt: prioritize by recurrence + breadth — a durable everyday habit beats a narrow post-mortem tied to one project/value — while keeping guidance concrete. - merge: semantic dedup collapses paraphrase clusters the kind+guidance fingerprint misses (same failure mode + word overlap, or strong overlap) and prefers the carried/installed rule over a fresh reword, so the skill file doesn't churn its wording every run; distinct lessons within one mode are preserved. Regenerating over a 14-day window flipped the set from incident lessons (Recompute Final Badge, Validate config folder-ID) to daily habits (Fix Root Cause, Self-Review Invariants, Close With Recommendation, Isolate Environment Failures, Trust Primary Data). Tests: +paraphrase-collapse (prefers carried), +distinct-same-mode preserved; merge/select tests updated for the wider pool. 96 skill / 498 skill+scoring+benchmark tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36dd824392
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| codex_output_schema=SKILL_DISTILL_SCHEMA, | ||
| claude_safe_mode=True, | ||
| ) |
There was a problem hiding this comment.
Disable Claude tools during distillation
When the resolved backend is Claude and a scored session's learning_summary/score_reason contains prompt-injection text, setting only claude_safe_mode=True does not sandbox the subprocess: the Claude CLI docs state safe mode disables customizations while built-in tools and permissions still work, and the shared runner still launches Claude with --permission-mode bypassPermissions. The distill step only needs a JSON answer from the provided corpus, so restrict tools/permissions for this call (for example, no Bash/Edit) or a malicious log summary can trigger local command/file access during clawjournal skill before any rendered-rule gate runs. docs
Useful? React with 👍 / 👎.
…dedup decay - #1 (privacy) the skill egress gate read excluded projects from config.json only, so a project excluded via the workbench Policies UI (a DB exclude_project row) was still scored and sent to the distill AI. _config_excluded_projects now returns the EFFECTIVE set via get_effective_share_settings (config + DB policies) — the same gate as export/share — threaded with the open conn into scoring and selection. - #2 the frontier->fast distill fallback fired on ANY exception, so a transient Opus timeout silently downgraded to haiku at low effort with a misleading "model unavailable" message. Only downgrade on a genuine plan-unavailability (is_backend_unavailable_error); a transient failure degrades to [] without downgrade. - #3 on an old agent CLI that rejects the new --safe-mode/--effort flags, the retry kept them and failed identically. Detect a flag-rejection and retry with those flags relaxed (DefaultCaller gains a claude_safe_mode param); the message names the cause. - dedup: when semantic dedup keeps a carried rule over a fresh paraphrase, refresh its last_seen so a lesson the distiller keeps re-teaching (reworded) doesn't decay out. Tests: +DB-policy exclusion, +error classification, +safe-mode relax, +carried last_seen refresh. 502 skill+scoring+benchmark tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
Word-overlap dedup missed real LLM paraphrases: two `avoid` rules on the SAME
failure mode ('Answer The Actual Question' vs 'Close With Recommendation', both
communication_error) share ~0.1 word overlap yet teach the same lesson, so both
installed — wasting a slot and reducing mode coverage.
_same_lesson now treats same non-empty taxonomy as the same lesson unconditionally
(for a 5-rule budget, covering 5 DISTINCT modes beats two rules on one mode); 'do'
rules (no taxonomy) fall back to word overlap at a lowered 0.3 threshold so genuine
rewrites still collapse. Distinct 'do' topics with low overlap are preserved.
Tests: same-mode avoid rules collapse to one; distinct do rules survive. 101 pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
…nd avoid
Taxonomy-collapse fixed same-mode avoid duplicates, but the distiller also emits one
lesson as BOTH a 'do' and an 'avoid' with the same title ('Verify Beyond Green Tests'
appeared twice). The within-kind dedup never compared them.
_same_lesson now treats an identical title as the same lesson across kinds, and
cross-kind rules with different titles collapse only on strong word overlap (>=0.45).
merge_rules dedups the FULL ranked pool before splitting for the good+bad interleave.
Tests: +cross-kind same-title collapse. 102 skill tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b471a2bdf4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # deny bare dotfile names (.env / .ssh) — a lesson like "keep secrets in .env, never | ||
| # commit it" is legitimate advice, and the anonymizer already strips real home paths. | ||
| _OUT_OF_REPO_PATH_RE = re.compile( | ||
| r"(?:^|\s)(?:/Users/|/home/|/etc/|/var/|/root/|~/\.|[A-Za-z]:\\)\S+|\.aws/credentials", re.I |
There was a problem hiding this comment.
Block tilde home paths in rule text
When a distilled rule mentions a non-dotfile home path such as ~/Downloads/client-notes.txt, this hard-deny does not match because it only accepts ~/\.. The later secret/PII gates do not reject ordinary paths, so an untrusted rule can still be installed into global agent context with instructions pointing at arbitrary files under the user's home directory, bypassing the intended out-of-repo path backstop.
Useful? React with 👍 / 👎.
| elif (_store.fingerprint(kept[dup]) in new_fps) and (_store.fingerprint(r) not in new_fps): | ||
| r.last_seen = "" # distiller re-taught this lesson (reworded) -> keep it fresh, don't decay out | ||
| kept[dup] = r # replace a fresh paraphrase with the carried original (no churn) |
There was a problem hiding this comment.
Refresh carried paraphrases seen this run
When an already-installed rule sorts before a freshly distilled paraphrase of the same lesson, this condition is false because kept[dup] is not in new_fps, so the fresh duplicate is discarded without clearing the carried rule's last_seen. This is common for do rules whose fresh support is zero, and it lets a lesson the distiller re-taught keep aging and eventually drop out despite current evidence.
Useful? React with 👍 / 👎.
Mode A — self-improving personal skills, distilled locally
Adds
clawjournal skill: a local-first pipeline that turns a user's own scored sessions into a small, installable skill file for Claude Code and Codex. It inverts the personalized benchmark (test → teach): recurring failures become avoid rules, strong successes/recoveries become do rules. Everything runs on the machine; nothing is uploaded.Pipeline (
clawjournal/skill/)preflight(source scope + confirmed projects + installed backend + TruffleHog) →select(two pools from scored sessions, hold-state-gated, capped at 5) → anonymize + secret-scrub →distill(one agent-CLI call, structured output) →render→ deterministic gate →install(atomic, off-tree:~/.claude/skills/clawjournal-lessons/+ a managed region in~/.codex/AGENTS.md) → durablestorefor week-over-week merge and recurrence telemetry.Invariants preserved
auto_redacted/releasedsessions can feed the AI call.kind + guidance, so renames/rewording never change a rule's identity (dedup, reject, merge stay stable).Quality iteration (in this branch)
title(command-like) as its heading; the full rule is kept in aRule:body line.gpt-5.4for Codex); the 25-call scoring fleet stays on the fast model.--modeloverrides.Correctness fixes folded in
Single-pool partition (no double-booking; mode-rates ≤ 100%), good+bad merge interleave (no drift to all-avoid), enforced configured
--sourcescope end-to-end, and real scan/score orchestration coverage.Tests
+2310 lines, purely additive. New
tests/skill/suite (65 tests) covering select/partition, distill + scrub, render/install, gate, preflight, store + merge, telemetry, orchestration, naming, and the distill default-model.🤖 Generated with Claude Code
https://claude.ai/code/session_01LnzqCuYZudab6cSjAgZvfE