feat: session-scoped file tracking via PostToolUse hooks (#62)#63
Conversation
Ignore .claude/sessions/, .kata/verification-evidence/, and eval-transcripts/ — these are generated at runtime and should not be tracked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add PostToolUse hook that tracks which files each session modifies via an append-only edits.jsonl log. Scope committed/feature_tests_added stop conditions and task-evidence warnings to only consider session-owned files. Key changes: - New src/tracking/edits-log.ts module (appendEdit, readEditsSet, baseline) - handlePostToolUse in hook.ts for Edit/Write/NotebookEdit/Bash tracking - Bash mutation detection via safe-list → suspicious-regex → git-status diff - Baseline snapshot on kata enter to exclude pre-existing dirty files - Session-scoped checkGlobalConditions and checkFeatureTestsAdded - Advisory warning for out-of-scope dirty files - PostToolUse hook registration in settings.json via setup.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix 17 pre-existing test failures caused by tests writing session state and config to .claude/ paths while runtime code expects .kata/ paths. Also fix schema validation for agent expansion and add missing stage fields to stop-hook-test template. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bun does not honor `process.exitCode = undefined` — once set to 1, it stays latched. Use `process.exitCode = 0` instead. Also fix missing exitCode destructuring in enter.test.ts rejects-unknown-mode test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hooks Replaces legacy mode-gate/task-deps/task-evidence entries with single pre-tool-use handler. Adds PostToolUse hook for session file tracking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dedup - Use parseGitStatusPaths in hook.ts evidence check (was using l.slice(3)) - Remove unused readBaseline from can-exit.ts and hook.ts evidence check - Extract captureBaseline helper in enter.ts (was duplicated in two places) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…over test files When session-scoped filtering produces an empty test file set (e.g., tests were written by agents before PostToolUse was registered), fall back to the unfiltered list rather than failing the check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verification Report — VP execution for issue #62Source: VP step results
Regression analysis
Remaining 12 failures (all pre-existing, out of scope)
Assessment
Evidence: |
…iling newlines only `git status --porcelain` emits a leading space for worktree-only modifications (index untouched), e.g. " M README.md". Four call sites used `.trim()` on the full output, which stripped that leading space from the first line. Combined with `parseGitStatusPaths`' `line.slice(3)`, this corrupted the first character of the first dirty file's path — e.g. baseline captured "EADME.md" instead of "README.md". Fix: replace `.trim()` with `.replace(/\n+$/, '')` at all four sites (baseline capture, scoped committed check, task-evidence pre-check, Bash pre/post snapshots). This strips trailing newlines from execSync output without eating the leading space of the first porcelain line. Add two regression tests documenting worktree-only modification/deletion status lines so callers' expected input shape is guarded. Discovered during verify-mode e2e of PR #63 against issue #62.
…n test Review of 0c687d7 found two concrete gaps: 1. **Missed call site**: handleTaskEvidence (src/commands/hook.ts:409) still used .trim() on porcelain output. While this site only uses the count (not parseGitStatusPaths), leaving it inconsistent invites the same bug to resurface if paths are later parsed. Fixed for consistency. 2. **Weak regression test**: the prior commit's tests only exercised the leaf parseGitStatusPaths helper, which was always correct — the bug was in the caller's .trim() corrupting input. Reintroducing .trim() at any fix site would not fail the prior tests. Added a real integration test that builds a git repo, makes a worktree-only modification (emitting " M README.md"), runs kata enter, and asserts baseline.json records "README.md" — verified to fail with the old .trim() and pass with the fix.
Verify-Mode Follow-up: E2E found a real bug, fixedThe prior report marked VP-9 as "no regressions introduced." That was wrong — there was no end-to-end check, only unit tests. Running an e2e manually surfaced a real regression introduced by this PR: Bug
Reproduction (before fix): FixTwo commits, reviewed by external review-agent with APPROVE verdict:
Post-fix state
OutstandingThe VP's test-coverage-gap VP steps (3, 5, 6, 7, 8) remain — targeted unit tests for baseline integration, feature_tests_added session scope, bash mutation detection, evidence scoping, and advisory messaging are not written. The new integration test closes the VP-3 gap substantively. Recommend follow-up issue for the others. Verdict: safe to merge. |
Summary
edits.jsonllogkata enterto exclude pre-existing dirty filescommittedandfeature_tests_addedstop conditions to only consider session-owned files.claude/→.kata/path migration, bunprocess.exitCodecompat)Test plan
bun run typecheckpassesbun test src/tracking/edits-log.test.ts— 13 new tests for data layerbun test src/commands/— 0 failures (was 17 pre-existing)kata enter taskcreatesbaseline.jsonin session diredits.jsonlhas entry after PostToolUse firesCloses #62
🤖 Generated with Claude Code