Claude has both regular project transcripts and nested subagent transcripts, plus an existing append-only incremental parser. Moving it behind a concrete provider keeps those source shapes and optional incremental capability explicit at the provider boundary.\n\nThe provider preserves recursive project discovery, symlinked project directories, standard and subagent raw-ID lookup, changed-path classification, content hashing, project-name normalization, excluded-session reporting, relationship inference, and incremental append parsing for linear JSONL growth.
fix(parser): preserve claude provider edge events
Claude provider sync must distinguish true append idleness from files that were truncated or replaced, and watcher classification must still identify deleted primary and subagent transcripts after the file is gone. Otherwise provider-path sync can retain stale messages or miss removals.
Return full-parse status for truncated incremental inputs, add missing-path classification for valid Claude source shapes, and make raw subagent lookup follow symlinked project directories like discovery does. This branch now opts Claude into shadow comparison.
Validation: go test -tags "fts5" ./internal/parser -run 'Test(ClaudeProvider|FindClaudeSourceFile|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check
fix(sync): replace claude content after file rewrites
Claude incremental parsing is append-oriented, so any fallback caused by truncation or file replacement must replace persisted messages instead of flowing through the append-preserving write path. Otherwise stale higher ordinals or stale tool rows can survive a full parse fallback.
The provider now marks truncated incremental inputs as force-replace, and the legacy engine path carries forceReplace when file identity changes or the file shrinks before falling back to a full parse.
Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestClaudeProviderParseIncremental|TestIncrementalSync_Claude(FileReplaced|TruncatedFileReplacesStoredMessages|SameSizeFileReplaceUsesFullParse|MidStreamSplitFallsBackToFullParse|AgentIDFallbackUpdatesStoredToolCall)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check
fix(sync): replace claude same-size rewrites
A same-size rewrite can reach the full-parse fallback when the normal skip check did not skip the file, which means the content changed even though the byte count did not. That fallback must replace persisted rows, or stale higher ordinals and tool rows can survive the parse.
The regression rewrites a Claude file in place to the same byte length with fewer logical messages and verifies the stale assistant row is deleted.
Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesClaudeLegacyParser|TestClaudeProviderParseIncremental|TestIncrementalSync_Claude(FileReplaced|TruncatedFileReplacesStoredMessages|SameSizeFileReplaceUsesFullParse|SameSizeInPlaceRewriteClearsStaleRows|MidStreamSplitFallsBackToFullParse|AgentIDFallbackUpdatesStoredToolCall)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check
test(sync): compare claude shadow parity
Claude is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseClaudeSessionWithExclusions.
The fixture exercises the project-directory source shape and verifies session, message, usage, exclusion, and data-version planning parity while preserving provider-computed file hashes.
Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesClaudeLegacyParser -count=1
test(sync): cover claude provider usage exclusions
Roborev job 2721 caught that the Claude shadow parity fixture only compared a plain exchange, so it did not prove provider parity for per-message token usage or /usage-only session exclusions.
Add assistant message usage metadata to the normal fixture and a separate /usage-only source discovered by the provider, then assert non-empty token metadata and excluded IDs against the legacy parser.
Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesClaudeLegacyParser -count=1; go fmt ./...; go vet ./...; git diff --check
refactor(parser): fold claude into provider
Move Claude source discovery, lookup, full parse, exclusion handling,
and append-only incremental parse ownership onto the concrete
claudeProvider and delete the package-level DiscoverClaudeProjects,
FindClaudeSourceFile, ParseClaudeSessionFrom, and
ParseClaudeSessionWithExclusions free functions. The discover and
find-source bodies stay as provider-neutral helpers
(ClaudeProjectSessionFiles, claudeFindSourceFile) and the parse bodies
become claudeParseWithExclusions and claudeParseSessionFrom; the public
ParseClaudeSession wrapper and the Cowork parser (which reuses the
Claude transcript format) call the shared helper, so no provider file
references a legacy Discover/Find/Parse entrypoint.
Make Claude provider-authoritative and drop its legacy sync dispatch:
the classifyOnePath Claude block, the processFile case arm, and the
processClaude method. Source classification, project resolution, and
exclusion handling are reproduced through the provider's changed-path
and parse paths. The provider's SourcesForChangedPath also reproduces
the legacy "classify despite a transient stat error" behavior so a
changed path under a momentarily unreadable parent is not dropped.
Wire the provider-authoritative engine path to preserve Claude's
DB-aware single-file semantics, which a stateless provider cannot do
alone:
- tryProviderIncrementalAppend drives the provider's ParseIncremental
through the shared tryIncrementalJSONL bookkeeping (session lookup,
data-version and inode/device identity guards, ordinal resume,
cross-sync split detection, cumulative counters, and forceReplace
fallback), so append-only syncs keep the stored file hash and append
rows instead of recomputing and rewriting.
- providerSingleSessionFresh reproduces the shouldSkipFile gate so an
unchanged, already-synced session is skipped instead of re-parsed
every full sync and a single-session resync does not reapply a
worktree project mapping to an unchanged file.
- stampProviderFileIdentity stamps inode/device on parsed results so
the incremental path can later detect an atomic file replacement.
- processProviderFile honors a caller-supplied file.Project as the
source ProjectHint when no explicit ProviderSource was given, so a
SyncSingleSession does not revert a user's project override.
The engine's expandClaudeDuplicateCandidates and
dedupeClaudeDiscoveredFiles stay as provider-neutral engine-level dedup
plumbing; expansion now enumerates via ClaudeProjectSessionFiles. The
duplicate-candidate expansion and session-ID dedup/precedence behavior
is unchanged.
Because dropping the Claude DiscoverFunc would otherwise remove Claude
from surfaces that gate on DiscoverFunc != nil, parse-diff (engine and
CLI flag validation) and the SSH remote resolve script now also include
file-based agents that have left legacy-only mode through the provider
facade, restoring Claude (and the other already-folded agents) to those
surfaces.
Drop the Claude AgentDef DiscoverFunc/FindSourceFunc hooks, set its
provider migration mode to ProviderAuthoritative, remove
claude_provider.go from the pending shim scan list, replace the shadow
baseline test with provider-API coverage plus a guard asserting the
four legacy entrypoints stay gone, and re-vehicle the generic
shadow-mechanism caller tests onto the still-legacy Cowork agent since
Claude no longer has a legacy process arm to observe in shadow.
refactor(parser): fold ParseClaudeSession onto the Claude provider
Delete the ParseClaudeSession free function and route its only production
caller (the session upload handler) plus the test suite through the Claude
provider's new ParseUploadedTranscript method, exposed via the
ClaudeUploadParser interface. Uploads live outside any configured root, so
the method parses the staged transcript directly under the caller-supplied
project. That project stays authoritative rather than being overridden by
the transcript's recorded cwd, matching the prior upload behavior and
unlike the discovered-session Parse path.
Unexport ClassifyClaudeSystemMessage to classifyClaudeSystemMessage; it is
a Claude-internal classifier with no callers outside the package. Both
removals clear the last provider-specific legacy parse/classify entrypoints
this branch owned.
fix(sync): skip fresh claude before fingerprinting
The Claude provider migration preserved DB freshness skipping, but only after provider fingerprinting had already hashed the whole transcript. That lost the legacy cheap size/mtime/data-version gate for unchanged files.\n\nRun the single-session freshness check before provider fingerprinting, and pass the computed fingerprint into incremental parsing so truncation detection can distinguish appended files from zero-byte rewrites. Zero-byte truncation now forces a full replacement parse instead of reporting no new data.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestClaudeProviderParseIncremental(Truncated|EmptyTruncation)NeedsFullParse' -count=1; go test -tags "fts5" ./internal/sync -run 'TestIncrementalSync_ClaudeAppend|TestProcessFileProviderAuthoritativeSkipsFreshClaudeBeforeFingerprint' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check
Claude now uses a concrete provider for regular project transcripts and nested subagent transcripts. The provider keeps recursive project discovery, symlinked project directories, standard and subagent lookup, changed-path classification, content hashing, project normalization, excluded-session reporting, and relationship inference.
The provider also exposes Claude's existing incremental append parser as an optional provider capability so linear JSONL growth can continue to avoid full reparses while full-parse fallback remains available for DAG or row-rewrite cases.