Codex sessions have an append-only JSONL transcript plus a session_index.jsonl title sidecar. Moving Codex behind a concrete provider keeps that composite source identity and incremental append capability explicit at the provider boundary.
The provider preserves dated and archived discovery, live-over-archived lookup, shallow index watch planning, index-event classification, index-aware mtimes, source hashing, full parse output, and append parsing with full-parse fallback signals.
fix(parser): preserve codex provider sidecar semantics
Codex index changes are part of source freshness, so the provider cannot treat unchanged transcript size as no new data when the index mtime drove the fingerprint. The provider also needs to keep legacy live-over-archived UUID behavior and classify removed transcript paths syntactically.
Index events now conservatively refresh sibling Codex sources because this provider layer has no DB state for title diffing; the sync engine can still apply its DB-aware filtering before provider dispatch is fully authoritative.
Validation: go test -tags "fts5" ./internal/parser -run TestCodexProvider -count=1; go vet ./...; git diff --check. go test -tags "fts5" ./internal/parser -count=1 currently fails on TestProviderMigrationModes because inherited lower provider branches such as claude still need their branch-local shadow opt-ins.
fix(parser): make codex provider sidecars authoritative
The Codex provider could not safely infer sidecar-only freshness from a single max mtime. Rather than advertise append-only parsing with incomplete sidecar state, keep provider-authoritative Codex parses on the full-parse path until the facade can model sidecar dirtiness explicitly.
Also route persisted path lookup and changed-path classification through the same UUID canonicalization as discovery so archived duplicates do not win over live dated transcripts.
Validation: go test -tags "fts5" ./internal/parser -run 'Test(CodexProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check
test(sync): compare codex shadow parity
Codex is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseCodexSession.
The fixture uses the real sessions/YYYY/MM/DD layout plus sibling session_index.jsonl, proving the provider preserves title sidecar behavior, parser output, and data-version planning.
Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesCodexLegacyParser|TestCodexProvider|TestParseCodex|TestProviderMigrationModes' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check
fix(parser): accept codex legacy-shaped sources
Provider-authoritative Codex sync still has to rediscover sessions that were stored by the legacy parser even when their rollout filename does not expose a UUID-shaped session id. Without that compatibility path, the later dispatch migration can drop or fail to reprocess valid Codex transcripts that ParseCodexSession can read from session metadata.
Keep the UUID-aware source contract as the preferred path and fall back to root-scoped JSONL sources only when Codex path metadata does not apply, so normal duplicate canonicalization remains unchanged while legacy-shaped fixtures stay reachable.
Validation: go test ./internal/parser -count=1; go fmt ./...; go vet ./...; go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestCodexProvider|TestSyncEngineCodex|TestSyncSingleSessionHashCodex|TestSyncEngineSkipCache' -count=1; git diff --check
refactor(parser): fold codex into provider
Make the Codex provider own its source discovery, lookup, and parse
behavior instead of shimming the package-level free functions. Delete
DiscoverCodexSessions, FindCodexSourceFile, ParseCodexSession, and
ParseCodexSessionFrom: discovery and find-source bodies move onto the
codex source set (discoverSessionPaths, findSourceFile), and parse moves
onto the provider (parseSession, parseSessionFrom). Drop the Codex
AgentDef DiscoverFunc/FindSourceFunc hooks and make Codex
provider-authoritative; ShallowWatchRootsFunc and the exec-source helpers
(IsCodexExecSessionFile, ResolveCodexShallowWatchRoots, the one-time
codex_exec skip migration) stay since only the four parser entrypoints
must go.
A provider has no database handle, so the engine reproduces the DB-aware
and mtime-aware bookkeeping the legacy single-session JSONL path
performed, scoped to Codex to preserve behavior exactly:
- shouldSkipProviderSourceByDB folds the session_index.jsonl sidecar
into a DB-stored fingerprint skip, so an unchanged transcript is not
reparsed when only the shared index mtime advanced and this session's
title did not change, and a resync still skips after the in-memory
skip cache is cleared.
- The provider Parse force-replaces stored rows because Codex emits a
full parse (it does not advertise incremental append); a late
token_count line appended to an existing turn rewrites the stored
message instead of being dropped by an append-only write.
- Index events keep flowing through the engine's DB-aware
classifyCodexIndexPath rather than the provider's broad index
fan-out: the engine fans out only to sessions whose stored title
changed and pins the chosen on-disk copy (SourceRefForPath) so the
provider's live-over-archived canonicalization cannot resurrect a
stale duplicate over the stored copy.
- SyncAllSince re-expands a UUID's live and archived duplicates
(AllSourcePathsForUUID) before the mtime cutoff filter, restoring the
legacy discover-then-filter order so a changed archived copy newer
than the cutoff is not lost behind an older live copy.
Route parse-diff, the token-use disk probe, and the SSH remote resolve
script through provider Discover/FindSource for provider-authoritative
agents that no longer carry a DiscoverFunc, so Codex sources stay
discoverable, resolvable on disk, and transferable (including the
session_index.jsonl sidecar).
Replace the deleted shadow-baseline test with provider-API coverage
(provider Discover/Parse through ObserveProviderSource) plus a guard that
the four legacy entrypoints stay gone, route the package and engine tests
through the provider methods, and remove codex_provider.go from the
pending shim scan list. This also fixes the previously known-failing
TestSyncPathsCodexIndexEventRefreshesStoredDuplicate, since the index
event now honors the stored archived copy.
test(sync): host shared shadow source helper at codex fold
The per-provider shadow/parse tests share writeProviderShadowSourceFile
to write source fixtures. The Codex fold is the lowest branch that calls
it, so the canonical definition lives here; later provider folds inherit
it instead of redeclaring their own copies.
test(sync): remove unused codex stat assignments
The pre-commit lint hook rejects two Codex appended-fixture tests because they assign os.Stat results back to info without using the value. The tests already assert the append and close operations that matter for setup.
Removing the unused assignments keeps staticcheck clean for the Codex provider migration branch.
fix(parser): pin codex duplicate sources
Codex discovery and raw-ID lookup should still prefer the live dated transcript, but exact filesystem events and DB-stored source hints are different: the caller has already selected a concrete source path. Canonicalizing those paths back to a stale live duplicate can overwrite an updated archived transcript.
Changed-path classification now returns the source pinned to the event path, and non-fresh stored path/fingerprint lookup returns the exact source so SyncSingleSession preserves the archived path already recorded in the database.
Validation: go test -tags "fts5" ./internal/parser -run 'TestCodexProvider(FindSourcePinsExactArchivedDuplicate|ChangedPathPinsArchivedDuplicate|SourceMethods|DiscoverDedupesLiveAndArchivedByUUID)' -count=1; go test -tags "fts5" ./internal/sync -run 'TestSync(PathsCodexArchivedDuplicateEventPinsChangedFile|SingleSessionCodexPreservesStoredArchivedDuplicate|PathsCodexIndexEventRefreshesStoredDuplicate|AllSinceCodexKeepsChangedArchivedDuplicate)' -count=1; go test -tags "fts5" ./internal/parser -run 'TestCodexProvider|TestParseCodex|TestDiscoverCodex' -count=1; go test -tags "fts5" ./internal/sync -run 'Test.*Codex.*' -count=1; go vet ./...; git diff --check
fix(sync): keep codex freshness skips out of cache
Codex provider DB-fresh skips are successful freshness decisions, not parse failures or intentional no-session skips. Recording them in the persistent skip cache can hide a later parser data-version bump because the cache check runs before the DB freshness check.\n\nKeep DB-fresh provider skips non-cacheable and make existing skip-cache entries fall through when a stored row at that path has a stale data version. The same bypass helper still preserves the existing stale-project self-healing behavior.\n\nValidation: go test -tags "fts5" ./internal/sync -run 'TestProcessFile(SkipCacheReparsesStaleCodex(Project|DataVersion)|CodexDBFreshSkipIsNotCached)|Test.*Codex.*' -count=1; go test -tags "fts5" ./internal/parser -run 'TestCodexProvider|TestParseCodex|TestDiscoverCodex' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check
fix(sync): surface codex provider discovery failures
Provider-backed parse-diff should not report a clean or incomplete diff when provider discovery failed. Returning that error keeps requested provider-authoritative agents honest and matches the expectation that parse-diff is a verification surface, not a best-effort sync.\n\nAlso pin coverage for stale Codex index entries whose transcripts no longer resolve, so the existing empty-candidate guard cannot regress into an invalid empty work item.\n\nValidation: go test -tags "fts5" ./internal/sync -run 'Test(ParseDiffProviderDiscoveryErrorFails|ClassifyCodexIndexPathSkipsMissingTranscript|ProcessFile(SkipCacheReparsesStaleCodex(Project|DataVersion)|CodexDBFreshSkipIsNotCached))' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check
fix(sync): drop duplicate shadowCallerProvider Discover in codex test
Codex now has a concrete parser provider for dated and archived JSONL transcripts plus the session_index.jsonl title sidecar. The provider owns discovery, watch planning, changed-path classification, lookup, index-aware fingerprinting, full parse output, and append parsing with full-parse fallback signals.\n\nIndex-file events conservatively map to matching local session sources because the provider layer does not receive stored DB session names; skip and replacement decisions remain with the sync layer.