feat: add self-learning system (workflow detection + auto-commands)#160
feat: add self-learning system (workflow detection + auto-commands)#160
Conversation
Detect repeated workflows and procedural knowledge across sessions, automatically creating slash commands and skills when confidence thresholds are met. New files: - scripts/hooks/background-learning: core analysis script (Sonnet) - scripts/hooks/stop-update-learning: Stop hook trigger - src/cli/commands/learn.ts: CLI command (enable/disable/status/list/configure/clear) - tests/learn.test.ts: 28 tests for pure functions Modified files: - init.ts: --learn/--no-learn option, hook registration - uninstall.ts: learning hook cleanup - cli.ts: learn command registration - manifest.ts: learn feature flag - session-start-memory: inject learned behaviors into context - CLAUDE.md, README.md: documentation
Security (P0): - Sanitize model-generated artifact names to prevent path traversal - Replace unquoted heredoc with printf %s to prevent command substitution from model-generated content in artifact file writes - Use grep -F (fixed string) instead of regex grep for ID matching throughout background-learning to prevent regex injection Complexity (P1): - Extract applyConfigLayer helper to eliminate duplicated config parsing - Remove dead catch branches that all resolved to the same fallback - Replace unsafe `as string` cast with String() coercion
Code Review: Self-Learning Feature (PR #160)Overall Assessment: CHANGES_REQUESTED Reviewed 9 reports (9,000+ lines of analysis). The self-learning feature follows established patterns well, but has significant architectural, security, and performance issues that need resolution before merge. CRITICAL FINDINGS (≥80% confidence)🔴 Security - Prompt Injection to Code Execution (95% confidence)
🔴 Architecture - 560-Line God Script (95% confidence)
🔴 Architecture - HookEntry Duplication (95% confidence)
🔴 Performance - O(n*k) Subprocess Spawning (95% confidence)
🔴 Performance - Session-Start Synchronous Jq Spawning (92% confidence)
🔴 TypeScript - Unsafe JSON Cast (85% confidence)
🔴 TypeScript - Untyped Options Parameter (82% confidence)
🔴 Consistency - Manifest Field Type Inconsistency (90% confidence)
🔴 Documentation - Missing .memory/ Tree Entry (92% confidence)
MEDIUM FINDINGS (60-79% confidence, for follow-up PRs)
Summary by Category
Positive observations:
Next steps: Address CRITICAL security findings first, then refactor architecture (extract functions, consolidate types, fix performance), then update documentation and add missing tests. Review by Claude Code via code-review skill. Full analysis at |
| return 0 | ||
| } | ||
|
|
||
| # --- Main --- |
There was a problem hiding this comment.
🔴 CRITICAL ARCHITECTURE: 560-Line Monolithic Script
Main section (164-560) is 397 lines of unstructured logic handling 7+ distinct concerns:
- Locking + config loading
- Daily cap enforcement
- Temporal decay calculation
- Transcript extraction
- LLM invocation
- Observation merge/update
- Artifact creation
- YAML frontmatter writing
Problems:
- Zero unit tests (script is completely untestable as-is)
- Any change to data model requires modifying monolith
- Debugging is extremely difficult
- Cognitive load to understand full flow
Fix:
Extract into named functions with single responsibilities:
apply_temporal_decay()— decay + prune loopinvoke_model()— LLM analysis + response parseprocess_observations()— merge/update observationscreate_artifacts()— write commands/skills
Main section becomes ~20-line orchestrator calling these in sequence.
Alternatively, move JSON-heavy logic to TypeScript CLI where it can be properly typed, tested, and refactored.
| /** | ||
| * The hook entry structure used by Claude Code settings.json. | ||
| */ | ||
| interface HookEntry { |
There was a problem hiding this comment.
🔴 ARCHITECTURE: HookEntry/HookMatcher/Settings Duplication (4th Copy)
These three interfaces are identically defined in:
- learn.ts (this file, lines 11-23)
- ambient.ts (lines 11-23)
- memory.ts (lines 12-24)
- hud.ts (lines 18-24)
This is a DRY violation that scales linearly with each new hook feature. If Claude Code's settings.json schema changes (e.g., new required field on HookEntry), all 4 files must be updated in lockstep — risk of silent divergence.
Fix: Extract to shared module:
// src/cli/utils/settings-types.ts
export interface HookEntry {
type: string;
command: string;
timeout?: number;
}
export interface HookMatcher {
hooks: HookEntry[];
}
export interface Settings {
hooks?: Record<string, HookMatcher[]>;
[key: string]: unknown;
}Then import from this shared module in all four command files.
| if (!trimmed) continue; | ||
|
|
||
| try { | ||
| const parsed = JSON.parse(trimmed) as LearningObservation; |
There was a problem hiding this comment.
🔴 TYPESCRIPT: Unsafe as Cast on Unvalidated JSON
JSON.parse(trimmed) as LearningObservation performs an unsafe type assertion. The subsequent check validates only 3 of 10 required fields (id, type, pattern). A JSONL line like {"id":"obs_1","type":"workflow","pattern":"test"} would pass validation but be accepted as LearningObservation with all other fields (confidence, observations, first_seen, last_seen, status, etc.) as undefined at runtime.
This violates CLAUDE.md principle #9: "Validate at boundaries — Parse, don't validate (Zod schemas)"
Fix: Parse as unknown and use a type guard:
const parsed: unknown = JSON.parse(trimmed);
if (
typeof parsed === 'object' && parsed !== null &&
'id' in parsed && typeof (parsed as Record<string, unknown>).id === 'string' &&
'type' in parsed && (parsed as Record<string, unknown>).type in {'workflow':1,'procedural':1} &&
'pattern' in parsed && typeof (parsed as Record<string, unknown>).pattern === 'string' &&
'confidence' in parsed && typeof (parsed as Record<string, unknown>).confidence === 'number' &&
'status' in parsed && typeof (parsed as Record<string, unknown>).status === 'string'
) {
observations.push(parsed as LearningObservation);
}Or define a proper type guard function.
| .option('--list', 'Show all observations sorted by confidence') | ||
| .option('--configure', 'Interactive configuration wizard') | ||
| .option('--clear', 'Reset learning log (removes all observations)') | ||
| .action(async (options) => { |
There was a problem hiding this comment.
🔴 TYPESCRIPT: Untyped Options Parameter
.action(async (options) => { ... }) receives options as implicitly-typed parameter (Commander infers as Record<string, any>). This loses compile-time safety — typos like options.enabl would silently be undefined instead of causing an error.
The init.ts command in this same PR defines an explicit InitOptions interface and passes it via .action(async (options: InitOptions) => ...), which is the correct pattern.
Fix: Define explicit options interface and type the parameter:
interface LearnOptions {
enable?: boolean;
disable?: boolean;
status?: boolean;
list?: boolean;
configure?: boolean;
clear?: boolean;
}
// ...
.action(async (options: LearnOptions) => {| teams: boolean; | ||
| ambient: boolean; | ||
| memory: boolean; | ||
| learn?: boolean; |
There was a problem hiding this comment.
🔴 CONSISTENCY: Manifest Field Type Asymmetry
learn field is declared as optional (learn?: boolean) while peer features teams, ambient, and memory are all required (boolean). However, the code always writes this field (init.ts:855 learn: learnEnabled), making it optional creates an inconsistency where future consumers must null-check learn but not the others.
This was likely done for backward compatibility (like hud?), but since learn is brand-new and init always writes a value, there's no need for optionality.
Fix: Make it required to match peer features:
features: {
teams: boolean;
ambient: boolean;
memory: boolean;
learn: boolean; // not optional — always written by init
hud?: boolean; // remains optional for backwards compat
}Note: For backward compatibility with existing manifests lacking learn, default it in readManifest(): data.features.learn = data.features.learn ?? false;
| NOW_EPOCH=$(date +%s) | ||
| TEMP_FILE="$LEARNING_LOG.tmp" | ||
| > "$TEMP_FILE" | ||
| while IFS= read -r line; do |
There was a problem hiding this comment.
🔴 PERFORMANCE: O(n*k) Subprocess Spawning in Loops
Per-line while-read loop spawns multiple jq subprocesses per entry. With 100 entries (the cap), this means 300-600 forked processes. Each jq invocation:
- Forks a process
- Parses entire JSON file from scratch
- Extracts single field
- Serializes result
- Exits
Impact: Measurable latency for decay pass (estimated 2-5 seconds on macOS). Degrades further as log approaches 100 entries.
Fix: Replace per-line jq invocations with single jq -s (slurp) pass:
NOW_EPOCH=$(date +%s)
jq -c --argjson now "$NOW_EPOCH" '
map(
(.last_seen // "" | if . != "" then
(now - (. | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime)) / 86400 / 30 | floor
else 0 end) as $periods |
if $periods > 0 then
.confidence *= (pow(0.9; $periods)) |
if .confidence < 0.1 then empty else . end
else . end
)
' "$LEARNING_LOG" > "$TEMP_FILE"
mv "$TEMP_FILE" "$LEARNING_LOG"This eliminates all per-line subprocess spawning and replaces many full-file reads with one.
| LEARNED_SKILLS="" | ||
|
|
||
| while IFS= read -r line; do | ||
| if ! echo "$line" | jq -e . >/dev/null 2>&1; then continue; fi |
There was a problem hiding this comment.
🔴 PERFORMANCE: Synchronous O(n) Jq Spawning During Session Startup
New learned-behaviors section reads learning-log.jsonl in two separate while-read loops (lines 142-175 and 196-219), each spawning 4-6 jq subprocesses per line. This runs synchronously during SessionStart hook, directly adding 1-3 seconds to every session launch.
Impact: Direct latency hit on session startup perceived by user.
Fix: Merge both loops into a single jq -s pass that extracts all needed data at once. For example:
PARSED=$(jq -s --argjson cutoff "$LAST_NOTIFIED" '
[.[] | select(.status == "created" and .artifact_path != null)] |
{
commands: [.[] | select(.type == "workflow") | {name: (.artifact_path | split("/") | last | rtrimstr(".md")), conf: .confidence}],
skills: [.[] | select(.type != "workflow") | {name: (.artifact_path | capture("learned-(?<n>[^/]*)") // {n:""} | .n), conf: .confidence}],
new: [.[] | select(.last_seen != null) | select((.last_seen | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime) > $cutoff)]
}
' "$LEARNING_LOG" 2>/dev/null)This eliminates all per-line subprocess spawning and replaces two full-file scans with one.
| if [ "$OBS_STATUS" != "ready" ]; then | ||
| log "Skipping artifact for $ART_OBS_ID (status: $OBS_STATUS, need: ready)" | ||
| continue | ||
| fi |
There was a problem hiding this comment.
🔴 CRITICAL SECURITY: Prompt Injection to Code Execution
Model-generated artifact content is written directly to executable slash commands and skills (lines ~500-560). These are loaded in future sessions where they execute as instructions. Combined with --dangerously-skip-permissions (line 389), this creates an indirect prompt injection vector.
Attack path:
- User session contains attacker-controlled text (copy-pasted from malicious README, etc.)
- Background learning analyzes transcript with full permissions (
--dangerously-skip-permissions) - Model generates malicious artifact content
- Content written to
.claude/commands/learned/*.mdas executable slash command - In next session, the malicious command loads and influences Claude's behavior
Critical fixes needed:
- Remove
--dangerously-skip-permissionsor replace with restricted tool set (--allowedTools "") - Validate artifact content before write:
- Reject patterns: 'dangerously-skip-permissions', '.ssh', '.aws', 'curl', 'eval'
- Enforce max length (e.g., 5000 chars)
- Consider requiring user confirmation before creating artifacts
See detailed security analysis at .docs/reviews/feat-self-learning/security.md
| EXISTING_OBS="" | ||
| if [ -f "$LEARNING_LOG" ]; then | ||
| # Apply temporal decay first | ||
| NOW_EPOCH=$(date +%s) |
There was a problem hiding this comment.
🔴 CRITICAL ARCHITECTURE: 560-Line Monolithic Script
This file concentrates 7+ distinct responsibilities into an unstructured 397-line main section (lines 164-560):
- Locking + config loading
- Daily cap enforcement
- Temporal decay calculation
- Transcript extraction
- LLM invocation
- Observation merge/update
- Artifact creation
- YAML frontmatter writing
Problems:
- Zero unit tests — script is completely untestable as-is
- Any data model change requires modifying monolith
- High cognitive load; difficult to debug
- Future maintainers must understand entire 560-line flow
Required fix:
Extract into 4 named functions with single responsibilities:
apply_temporal_decay()— decay + pruning logicinvoke_model()— LLM analysis + response parsingprocess_observations()— merge/update observationscreate_artifacts()— write commands/skills to disk
Main section becomes ~20-line orchestrator calling these in sequence.
Alternatively, move JSON-heavy logic to TypeScript CLI where it can be properly typed, tested, and refactored.
| LAST_EPOCH=$(date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LAST_SEEN" +%s 2>/dev/null \ | ||
| || date -d "$LAST_SEEN" +%s 2>/dev/null \ | ||
| || echo "$NOW_EPOCH") | ||
| DAYS_ELAPSED=$(( (NOW_EPOCH - LAST_EPOCH) / 86400 )) |
There was a problem hiding this comment.
🔴 PERFORMANCE: O(n*k) Subprocess Spawning
This per-line while-read loop spawns 3-6 jq subprocesses per entry (validity check, field extraction, update). With 100 entries (the cap), this means 300-600 forked processes. Each jq:
- Forks a new process
- Parses entire JSON file from scratch
- Extracts single field or transforms record
- Serializes result
- Exits
Impact: 2-5 second latency for decay pass alone. Degrades further as log approaches 100 entries.
Fix: Replace per-line jq invocations with single jq -s (slurp) pass processing all entries at once:
NOW_EPOCH=$(date +%s)
jq -c --argjson now "$NOW_EPOCH" '
map(
((.last_seen // "") | if . != "" then
(now - (. | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime)) / 86400 / 30 | floor
else 0 end) as $periods |
if $periods > 0 then
.confidence *= (pow(0.9; $periods)) |
if .confidence < 0.1 then empty else . end
else . end
)
' "$LEARNING_LOG" > "$TEMP_FILE"
mv "$TEMP_FILE" "$LEARNING_LOG"This eliminates all per-line subprocess spawning and processes file in single pass.
| type: string; | ||
| command: string; | ||
| timeout?: number; | ||
| } |
There was a problem hiding this comment.
🔴 ARCHITECTURE: Interface Duplication (4th Copy)
HookEntry, HookMatcher, and Settings interfaces are identically duplicated across 4 files:
- learn.ts (this file, lines 11-23)
- ambient.ts (lines 11-23)
- memory.ts (lines 12-24)
- hud.ts (lines 18-24)
This DRY violation scales linearly with each new hook-based feature. If Claude Code's settings.json schema changes, all 4 files must be synchronized — risk of silent divergence.
Fix: Extract to shared module:
// src/cli/utils/settings-types.ts
export interface HookEntry {
type: string;
command: string;
timeout?: number;
}
export interface HookMatcher {
hooks: HookEntry[];
}
export interface Settings {
hooks?: Record<string, HookMatcher[]>;
[key: string]: unknown;
}Then import in all 4 command files.
| if (!trimmed) continue; | ||
|
|
||
| try { | ||
| const parsed = JSON.parse(trimmed) as LearningObservation; |
There was a problem hiding this comment.
🔴 TYPESCRIPT: Unsafe as Cast on Unvalidated JSON
JSON.parse(trimmed) as LearningObservation performs unsafe type assertion. The subsequent check validates only 3 of 10 required fields (id, type, pattern). A JSONL line like {"id":"obs_1","type":"workflow","pattern":"test"} would pass the check but be accepted as full LearningObservation with all other required fields (confidence, observations, first_seen, last_seen, status, etc.) as undefined at runtime.
This violates CLAUDE.md principle #9: "Validate at boundaries — Parse, don't validate (Zod schemas)"
Fix: Use proper type guard:
const parsed: unknown = JSON.parse(trimmed);
if (
typeof parsed === 'object' && parsed !== null &&
'id' in parsed && typeof (parsed as Record<string, unknown>).id === 'string' &&
'type' in parsed && (parsed as Record<string, unknown>).type in {'workflow':1,'procedural':1} &&
'pattern' in parsed && typeof (parsed as Record<string, unknown>).pattern === 'string' &&
'confidence' in parsed && typeof (parsed as Record<string, unknown>).confidence === 'number' &&
'status' in parsed && typeof (parsed as Record<string, unknown>).status === 'string'
) {
observations.push(parsed as LearningObservation);
}| const hasFlag = options.enable || options.disable || options.status || options.list || options.configure || options.clear; | ||
| if (!hasFlag) { | ||
| p.intro(color.bgYellow(color.black(' Self-Learning '))); | ||
| p.note( |
There was a problem hiding this comment.
🔴 TYPESCRIPT: Untyped Options Parameter
.action(async (options) => { ... }) receives implicitly-typed options (Commander infers as Record<string, any>). This loses compile-time safety — typos like options.enabl would silently be undefined instead of causing an error.
The init.ts command defines explicit InitOptions interface, which is the correct pattern.
Fix: Define interface and type the parameter:
interface LearnOptions {
enable?: boolean;
disable?: boolean;
status?: boolean;
list?: boolean;
configure?: boolean;
clear?: boolean;
}
.action(async (options: LearnOptions) => {| teams: boolean; | ||
| ambient: boolean; | ||
| memory: boolean; | ||
| learn?: boolean; |
There was a problem hiding this comment.
🔴 CONSISTENCY: Manifest Field Type Inconsistency
learn field is declared optional (learn?: boolean) while peers teams, ambient, memory are required. Code always writes it (init.ts:855), so making it optional creates asymmetry where consumers must null-check learn but not others.
Fix: Make required to match peers:
features: {
teams: boolean;
ambient: boolean;
memory: boolean;
learn: boolean; // not optional
hud?: boolean; // remains optional for backwards compat
}For backward compatibility with existing manifests lacking learn, default it in readManifest().
| ARTIFACT_PATH=$(echo "$line" | jq -r '.artifact_path // ""') | ||
|
|
||
| # Only show created artifacts | ||
| if [ "$STATUS" != "created" ]; then continue; fi |
There was a problem hiding this comment.
🔴 PERFORMANCE: Synchronous Session-Start Latency
New learned-behaviors section reads learning-log.jsonl in two separate while-read loops (lines 140-171 and 193-215), each spawning 4-6 jq subprocesses per line. This runs synchronously during the SessionStart hook, directly adding 1-3 seconds to every session launch with 100 observations.
Unlike background-learning which runs detached after session stop, session-start hooks are blocking — they run before the user sees the Claude Code prompt.
Impact: Noticeable latency spike perceived by user on every /clear and session startup.
Fix: Merge both loops into single jq -s (slurp) pass:
PARSED=$(jq -s --argjson cutoff "$LAST_NOTIFIED" '
[.[] | select(.status == "created" and .artifact_path != null)] |
{
commands: [.[] | select(.type == "workflow") | {name: (.artifact_path | split("/") | last | rtrimstr(".md")), conf: .confidence}],
skills: [.[] | select(.type != "workflow") | {name: (.artifact_path | capture("learned-(?<n>[^/]*)") // {n:""} | .n), conf: .confidence}],
new: [.[] | select(.last_seen != null) | select((.last_seen | strptime("%Y-%m-%dT%H:%M:%SZ") | mktime) > $cutoff)]
}
' "$LEARNING_LOG" 2>/dev/null)This eliminates all per-line subprocess spawning and replaces two full-file scans with one.
Summary
Implements intelligent workflow detection and auto-command generation for DevFlow. The system learns from repeated user patterns (within and across sessions) to automatically create skills and commands that reduce friction for recurring tasks.
Changes
New Files
scripts/hooks/background-learning- Detects workflow patterns from session transcriptsscripts/hooks/stop-update-learning- Hook to persist learning data at session endscripts/hooks/session-start-memory- Hook to inject learned patterns + base memorysrc/cli/commands/learn.ts- CLI command for learning system managementtests/learn.test.ts- Learning system unit tests (28 tests)Modified Files
src/cli/cli.ts- Register/learncommandsrc/cli/commands/init.ts- Enable learning hooks during setupsrc/cli/commands/uninstall.ts- Remove learning hooks on uninstallsrc/cli/utils/manifest.ts- Add learning metadata trackingCLAUDE.md- Document learning system for developersREADME.md- User-facing learning feature docsKey Features
.memory/learned-patterns.jsonLEARNED:prefix for transparencyBreaking Changes
None.
Testing
/learn status,/learn list, learning hooks trigger correctlySecurity Fixes (from self-review)
.claude/scopeRelated Issues
Closes #161