Skip to content

feat(parser): migrate cowork provider#773

Draft
mariusvniekerk wants to merge 1 commit into
provider-claudefrom
provider-cowork
Draft

feat(parser): migrate cowork provider#773
mariusvniekerk wants to merge 1 commit into
provider-claudefrom
provider-cowork

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Cowork now has a concrete parser provider instead of routing through the legacy adapter. The provider keeps the local-agent metadata relationship explicit: metadata changes map back to the main transcript, fingerprints fold metadata mtime into transcript freshness, and source lookup handles both prefixed main sessions and subagent transcripts.\n\nThis branch preserves the existing Cowork parser helpers and shallow metadata watch behavior while exposing discovery, classification, lookup, fingerprinting, and parse capabilities through the shared provider interface.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (ed34c04)

Medium: WatchPlan advertises cowork watch support, but its watch roots will miss normal workspace updates.

  • Medium: internal/parser/cowork_provider.go:149
    • WatchPlan watches only the configured root non-recursively for local_*.json, while cowork metadata lives under <root>/<org>/<workspace>/local_*.json and transcripts live deeper. A provider-based watcher will not observe typical existing session updates.
    • Suggested fix: return watch roots that cover actual workspace directories shallowly, or use a recursive plan with appropriate includes/excludes. Otherwise, do not report WatchSources as supported.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 5m20s), codex_security (codex/security, done, 1m37s) | Total: 7m4s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (17e6b3e)

Summary

This commit migrates the cowork agent provider to the new Provider/ProviderFactory interface, mirroring the pattern used by the previously-migrated claude, cursor, vibe, and hermes providers. The new internal/parser/cowork_provider.go implements source discovery, watch planning, changed-path classification, fingerprinting, and parsing of local cowork session transcripts; provider.go registers the factory.

I reviewed the path-handling surface, which is the only part of this diff touching potentially externally-influenced input (file paths and RawSessionID lookups). The handling is sound:

  • All path resolution funnels through relUnder (cowork.go:335), which uses filepath.Rel and rejects any path that escapes the configured root via ... sourceRef additionally requires the path to be a valid transcript under a .claude/projects/ subtree.
  • FindCoworkSourceFile (cowork.go:265) does not construct a path from the supplied RawSessionID; it first gates on IsValidSessionID (a strict [A-Za-z0-9_-] allowlist that rejects separators and dots — discovery.go:828) and then walks the tree comparing basenames, so a crafted session ID cannot induce traversal.
  • The non-comma-ok assertion ref.Opaque.(coworkSource) in pathFromSource is safe because sourceRef always populates Opaque with a coworkSource value; in any case a panic there is a local crash, not a trust-boundary issue.

No SQL, command execution, subprocess spawning, auth/authz logic, credential handling, or new network surface is introduced. Parsed session data flows to the SPA, but markdown rendering is DOMPurify-sanitized and displaying session contents is the tool's stated purpose. Session transcript files are local, user-owned, and non-adversarial per the project's documented threat model.

No issues found.


Panel: ci_default_security | Synthesis: claude-code | Members: codex_default (claude-code/default, failed, 2s), codex_security (claude-code/security, done, 1m2s) | Total: 1m4s

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (c1e366b)

Summary verdict: one medium-risk sync correctness issue remains; no security issues were found.

Medium

  • Location: internal/sync/engine.go:5029
  • Problem: The new provider can classify deleted local_*.json metadata as the transcript source, but processCowork still runs the DB size/mtime skip unconditionally. If the removed metadata file was older than or equal to the transcript, the composite mtime after deletion matches the stored transcript mtime, so SyncPaths(metaPath) can skip and leave stale title/project metadata.
  • Fix: Honor file.ForceParse around this skip, or otherwise force parsing for metadata-delete events. Add an end-to-end deletion test where metadata mtime is not newer than the transcript.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 7m36s), codex_security (codex/security, done, 2m16s) | Total: 9m59s

@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (6ad405b)

Summary verdict: one medium correctness issue remains; no security issues were reported.

Medium

  • internal/parser/cowork_provider.go:239 - The Cowork provider fingerprints and returns hash/mtime/size, but does not carry file inode/device into parsed results. Once AgentCowork moves to ProviderMigrationShadowCompare, shadow comparison against processCowork will falsely report session differs on Unix because processCowork sets Session.File.Inode and Session.File.Device.

    Fix: Populate inode/device in the provider fingerprint and copy them into each returned Session.File, or explicitly normalize those fields out of provider shadow comparisons.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 8m8s), codex_security (codex/security, done, 3m14s) | Total: 11m30s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (e86b1f2)

Medium severity issue found; no High or Critical findings.

Medium

  • internal/parser/provider_migration.go:21 - Migrating Cowork to provider-authoritative drops the legacy processCowork freshness check. Cowork reports IncrementalAppend as not applicable, so providerSingleSessionFresh never skips it, causing every full sync to reparse and rewrite unchanged Cowork transcripts instead of using stored size/composite mtime/data version. Add a provider freshness path for Cowork or generic non-incremental provider sources using SourceFingerprint.Size and MTimeNS against GetFileInfoByPath, preserving the old CoworkSessionMtime behavior before calling Parse.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 7m51s), codex_security (codex/security, done, 2m32s) | Total: 10m29s

Cowork stores Claude-shaped transcripts behind local-agent metadata, so the provider boundary needs to preserve that metadata-to-transcript relationship instead of treating the files as plain Claude JSONL sources.

The concrete provider keeps shallow metadata watching, metadata change classification, subagent transcript discovery, raw/full ID lookup, composite mtime freshness, and hash propagation explicit for the sync path.

fix(parser): cover cowork nested watch events

Cowork metadata and transcripts live below org/workspace/session directories, so a shallow root watch could not deliver the paths the provider claimed to classify. Deleted metadata also lost the JSON needed to resolve the transcript, leaving stale provider state after remove or rename events.

Make the watch plan recursive for Cowork source globs, recover deleted metadata from the local session directory shape, cover removed metadata/main/subagent paths, and move Cowork into shadow comparison as its branch-local migration step.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(CoworkProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

fix(parser): reject ambiguous cowork metadata removal

Deleted Cowork metadata can only be recovered from the local session directory shape. If that directory contains multiple main transcripts, choosing the first filesystem match would attach the event to an arbitrary source and leave the real stale source unresolved.

Refuse ambiguous deleted-metadata recovery unless exactly one main transcript is present, and cover the multi-transcript case. The regular single-transcript metadata removal path remains supported.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(CoworkProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

fix(parser): validate cowork deleted metadata candidates

Cowork metadata deletion recovery scans project directories after the metadata file is gone, so it cannot rely on the normal metadata-guided resolution path. It still needs the same transcript validity rules as normal discovery: regular files only, and symlink targets must stay inside the local session directory.

Apply that validation before selecting or counting fallback candidates so symlink escapes are ignored and broken symlinks do not create false ambiguity.

Validation: go test -tags "fts5" ./internal/parser -run 'TestCoworkProvider|TestResolveCoworkSessionRejectsSymlinkEscape|TestClassifyCoworkPath|TestParseCowork' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

test(sync): compare cowork shadow parity

Cowork is a sidecar-backed Claude transcript provider, so add source-level migration coverage that compares provider observation with ParseCoworkSession.

The fixture includes local-agent metadata plus the nested Claude transcript and verifies session, messages, usage, excluded IDs, and data-version planning parity while preserving provider-computed hashes.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesCoworkLegacyParser|TestCoworkProvider|TestParseCowork|TestClassifyCoworkPath' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

refactor(parser): fold cowork into provider

Move Cowork source discovery, lookup, parse, and changed-path classification onto the concrete coworkProvider and delete the package-level DiscoverCoworkSessions, FindCoworkSourceFile, ParseCoworkSession, and ClassifyCoworkPath free functions. Discovery and find-source bodies now live as provider-owned helpers (discoverTranscriptPaths, coworkFindSourceFile), parseSession is a receiver method, and the metadata-to-transcript classifier moves onto SourcesForChangedPath as classifyCoworkPath so a sibling local_<uuid>.json change still resolves to the session's main transcript.

Make Cowork provider-authoritative and drop its legacy sync dispatch: the classifyOnePath cowork block, the processFile case arm, and the processCowork method. The sibling-meta composite freshness is preserved on the provider's Fingerprint, which already folds CoworkSessionMtime (the max of transcript and metadata mtime) into the freshness identity so a title-only rename triggers a reparse through processProviderFile. CoworkSessionMtime stays exported and the engine's skip-cache and SourceMtime watcher-fallback blocks keep calling it, mirroring how the commandcode fold retained commandCodeEffectiveInfo.

Replace the legacy free-function tests with provider API coverage plus a guard asserting the four entrypoints stay gone, drop the shadow-baseline comparison test, relocate the shared writeProviderShadowSourceFile helper into provider_shadow_support_test.go, and remove cowork_provider.go from the pending-shim scan list.

test(sync): drop obsolete cowork shadow-legacy tests

Folding cowork into its provider removes its legacy processFile arm, so
the two shadow-compare tests that built fixtures via the deleted
parser.ParseCoworkSession and asserted a legacy result coexisting with
the shadow provider can no longer pass: a non-authoritative cowork file
now falls through to the unknown-agent default. The shadow machinery
keeps coverage through provider_shadow_test.go and the cached-skip
not-comparable case.

fix(sync): skip fresh cowork provider sources

Cowork moved behind the provider-authoritative sync path, but the migrated path still fingerprinted and parsed unchanged transcripts before checking the stored file metadata. That dropped the cheap DB freshness gate the legacy Cowork path relied on and made full syncs rewrite fresh sessions unnecessarily.\n\nRestore that gate for Cowork before provider fingerprinting, using the same transcript size plus CoworkSessionMtime identity stored in the database. Per-file force parses still bypass the gate so metadata-driven refreshes and explicit reparses continue to reach the provider.\n\nValidation: go test -tags "fts5" ./internal/sync -run 'TestProcessFileProviderAuthoritative(SkipsFreshCoworkBeforeFingerprint|ForceParseBypassesFreshCoworkSkip)|TestSyncAllSinceCoworkMetaUpdateTriggersResync|TestSyncPathsCoworkReplacesUpdatedMessageOrdinal' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (a99c56c)

Verdict: One medium regression should be fixed before merging.

Medium

  • internal/parser/types.go:108 - Cowork no longer has a FindSourceFunc, but resolveRawSessionID only probes on-disk sessions through FindSourceFunc. An unsynced Cowork session passed to agentsview session usage cowork:<id> or as a bare ID will no longer be marked known, so the local backend skips on-demand sync and reports it as not found.
    • Fix: Add a provider-authoritative disk probe to resolveRawSessionID, or keep a Cowork FindSourceFunc shim until that path is migrated. Add coverage for unsynced Cowork canonical and raw IDs.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 11m21s), codex_security (codex/security, done, 2m31s) | Total: 13m58s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant