Skip to content

feat(parser): migrate zed shelley providers#780

Draft
mariusvniekerk wants to merge 1 commit into
provider-positronfrom
provider-zed-shelley
Draft

feat(parser): migrate zed shelley providers#780
mariusvniekerk wants to merge 1 commit into
provider-positronfrom
provider-zed-shelley

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Zed and Shelley now have concrete parser providers for their shared SQLite database layouts. The providers model physical DB sources separately from per-session virtual paths, preserving discovery, watch planning, WAL/SHM changed-path classification, raw and full ID lookup, source fingerprinting, multi-session parse fan-out, force-replace parse semantics, and parser output normalization.

Zed uses DB-backed thread discovery with per-thread virtual sources, while Shelley keeps its per-conversation content fingerprint so same-second metadata and message rewrites remain visible through the provider facade.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (3f4f89f)

Summary verdict: No Medium, High, or Critical findings to report.

Both reviews are clean at the requested severity threshold. The only reported issue was Low severity, so it is omitted per instructions.


Panel: ci_default_security | Synthesis: codex, 4s | Members: codex_default (codex/default, done, 5m20s), codex_security (codex/security, done, 51s) | Total: 6m15s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Review Unavailable (18becee)

The review agent repeatedly failed to run (likely an agent or configuration error). roborev will try again on the next commit.

Last error: agent: claude-code failed stream: stream errors: You've hit your session limit · resets 5:50am (UTC): exit status 1

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (a17b839)

Medium confidence: two medium correctness issues need attention before enabling the new providers broadly.

Medium

  • internal/parser/provider_migration.go:56
    Enabling shadow compare for Zed/Shelley can produce false mismatches for shared SQLite sources. The provider uses the physical DB path as the source key, while legacy results use per-session virtual paths like db#id; later syncs also let legacy skip unchanged rows while the provider emits all rows.
    Fix: Keep these agents LegacyOnly until shared-DB shadow comparison accounts for virtual per-session source keys and stored-state skips, or update provider planning/comparison to use each result’s Session.File.Path.

  • internal/parser/shelley_provider.go:334
    Deleted Shelley virtual conversations cannot reach the provider’s SkipNoSession parse path because the sync/provider observation flow fingerprints before parsing, and Fingerprint returns an error when the conversation is missing.
    Fix: For missing virtual conversations, return a fallback fingerprint based on the physical DB/source key so Parse can return SkipNoSession, and add an ObserveProviderSource/process-flow test for stale Shelley virtual paths.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 5m42s), codex_security (codex/security, done, 1m27s) | Total: 7m17s

@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (bd46e82)

Summary verdict: medium-risk correctness issues remain in the Zed/Shelley provider shadow comparison; no security findings were reported.

Medium

  • internal/parser/zed_provider.go:416 and internal/parser/shelley_provider.go:425
    Physical DB sources use the physical DB path as the provider fingerprint/source key, but parsed sessions persist virtual db#id file paths. Shadow comparison plans SourceKeys from the physical source while legacy plans them from each result’s virtual Session.File.Path, so every Zed/Shelley physical-source shadow comparison will report planned.source_keys mismatches even when parsing is otherwise identical.
    Fix: Make planned effects for multi-session physical sources use the result Session.File.Path values, or add an explicit per-result source-key surface; extend the new shadow tests to run compareProviderObservationToProcessResult.

  • internal/parser/zed_provider.go:142 and internal/parser/shelley_provider.go:141
    The provider physical-source parse paths always parse every row, but the legacy Zed/Shelley paths skip unchanged per-session virtual rows using stored mtime/hash state. With ProviderMigrationShadowCompare enabled, periodic syncs after the initial import can compare “all provider sessions” against “only changed legacy sessions,” producing noisy result-count and data-version mismatches.
    Fix: Make shadow comparison aware of legacy per-session skips, or pass/apply the same stored freshness filter when observing these multi-session providers; add coverage with one unchanged stored virtual session and one changed session.


Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 5m58s), codex_security (codex/security, done, 1m51s) | Total: 7m58s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (a6b8d5e)

Summary verdict: Medium issues remain in the Zed/Shelley provider migration; no security findings were reported.

Medium

  • internal/sync/engine.go:3561
    Deleted Zed/Shelley DB paths classified through provider changed-path handling carry a ProviderSource, so processProviderFile skips the !found deleted-source branch and calls provider.Fingerprint; that stats the removed DB and returns an error instead of the intended empty force-replace result.
    Fix: Treat os.IsNotExist from Fingerprint as an empty force-replace when file.ForceParse and providerDeletedPhysicalSQLiteSource(...) are true, or avoid attaching a provider source for missing physical DB delete events.

  • internal/sync/engine.go:3674
    dropUnchangedSharedSQLiteResults only honors e.forceParse, but SyncSingleSession uses per-file ForceParse; explicit Zed/Shelley resyncs can therefore drop the only parsed result as “unchanged” and become a no-op, regressing the legacy direct resync path.
    Fix: Pass the per-file force flag into the drop helper, or bypass the unchanged filter when file.ForceParse is true.


Panel: ci_default_security | Synthesis: codex, 11s | Members: codex_default (codex/default, done, 6m57s), codex_security (codex/security, done, 4m11s) | Total: 11m19s

Zed and Shelley both store multiple conversations in a shared SQLite database, so their provider boundary needs to model the physical database source separately from per-session virtual paths. Keeping that source shape explicit makes lookup, watch classification, and force-replace parse behavior available through the shared facade instead of the legacy adapter path.

The providers preserve physical DB discovery, WAL/SHM change classification, raw and full ID lookup, virtual source fingerprints, multi-session parse fan-out, per-session Shelley content fingerprints, and parser output normalization.

fix(parser): define sqlite provider deletion semantics

Shared SQLite providers need to treat deletion events as source-level state, not as unclassifiable paths. Without that, a deleted Zed or Shelley database can disappear before the provider facade reports a complete empty source, leaving future cleanup behavior under-specified.

Classifying syntactically valid DB, WAL, and SHM paths even after the main DB is gone preserves the watcher-to-parse path. A missing backing DB now produces a complete force-replace SkipNoSession outcome, and the Zed fingerprint comment records the intentional legacy whole-DB hash tradeoff.

fix(parser): align zed shelley capabilities

Provider capabilities are used as a contract for what normalized content a provider can actually emit. Shelley currently records token totals from messages but does not emit aggregate usage events, and Zed filters child threads before relationship fields can surface.

Keep the declarations conservative so the facade does not advertise unsupported content features while the providers continue to preserve the parser behavior they actually expose today.

fix(parser): tighten sqlite sidecar classification

Deleted DB sidecar events need to remain classifiable, but basename-only matching is too broad once missing DB parses become complete force-replace outcomes. Unrelated files under the same provider root should not synthesize the canonical shared DB source.

Restrict Zed sidecars to the watched threads directory and Shelley sidecars to the provider root, with regression tests for unrelated matching basenames.

fix(parser): shadow zed shelley providers

The Zed and Shelley branch had concrete providers and passing provider coverage, but still left both agents marked legacy-only. That kept the stack additive and prevented provider changed-path classification from participating while both shapes run.

Move both agents into shadow compare on their migration branch so the runtime bridge can exercise the concrete providers before the legacy path is removed later in the stack.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(ZedProvider|ShelleyProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/sync -run 'TestSync.*(Zed|Shelley)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go test -tags "fts5" ./internal/sync -count=1; go vet ./...; git diff --check

fix(sync): tolerate deleted sqlite provider sources

Zed and Shelley providers can classify removed physical database paths while shadow comparison is enabled. Legacy sync still owns writes on this branch, so those forced remove events must not fail at the pre-parse stat step.

Treat provider-classified deleted physical SQLite sources as an OK no-result parse. This keeps the archive-preserving legacy behavior while avoiding watcher failures until provider-authoritative deletion semantics are implemented at the stack tip.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestEngine_(ClassifyPathsProviderRemoveKeepsDeletedSQLiteSources|ProcessFileProviderDeletedSQLiteSourcesDoNotFail)|Test(Zed|Shelley)ProviderClassifiesDeletedPhysicalDB|Test(Zed|Shelley)Provider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

test(sync): compare zed shelley shadow parity

Zed and Shelley are shadow-compared database-backed providers, so provider method tests are not enough to prove the sync migration preserves legacy parser output.

Cover physical DB source observation for both providers and compare sessions, messages, force-replace intent, and data-version planning against the legacy DB parsers.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatches(Zed|Shelley)LegacyParser|Test(Zed|Shelley)Provider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

test(parser): cover sqlite provider stored hints

SQLite fan-out providers need to treat stored virtual source paths differently depending on caller intent. Fresh lookups should reject deleted rows or malformed/stale virtual paths, while non-fresh lookup still needs to preserve the virtual source identity so changed-path cleanup can observe a SkipNoSession tombstone.

This closes part of the stored-hint compatibility gap for the Zed/Shelley branch without making provider writes authoritative. The tests document row deletion, invalid virtual paths, stale DB-path hints, and tombstone parse behavior while legacy sync remains the write path.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(Zed|Shelley)Provider' -count=1 -v; go test -tags "fts5" ./internal/sync -run 'TestObserveProviderSourceMatches(Zed|Shelley)LegacyParser|TestEngine_ClassifyPathsProviderRemoveKeepsDeletedSQLiteSources|TestEngine_ProcessFileProviderDeletedSQLiteSourcesDoNotFail' -count=1 -v; go test -tags "fts5" ./internal/parser ./internal/sync -count=1 (sync fails only known TestSyncPathsCodexIndexEventRefreshesStoredDuplicate on this branch); go fmt ./...; go vet ./...; manual ./custom-gcl package loop with GOMAXPROCS=1 GOGC=5 GOMEMLIMIT=128MiB; git diff --check

Generated with Codex
Co-authored-by: Codex <codex@openai.com>

refactor(parser): fold zed and shelley providers

Move Zed and Shelley source ownership onto their concrete providers and
delete the ten package-level legacy entrypoints
(DiscoverZedSessions, FindZedSourceFile, ParseZedSQLiteVirtualPath,
ParseZedThreadDirect, ParseZedThreadFromDB, DiscoverShelleySessions,
FindShelleySourceFile, ParseShelleyConversationDirect,
ParseShelleyConversationFromDB, ParseShelleyVirtualPath) plus the
now-orphaned FindShelleyDBPath helper. Both agents become
provider-authoritative so runtime sync routes through provider
discovery, changed-path classification, and processProviderFile instead
of the removed processZed/processShelley methods and syncSingleZed.

Both providers keep multiple conversations in one shared SQLite
database addressed by a "<dbPath>#<id>" virtual path. The fold preserves
that shape:

- Discovery surfaces the single physical threads.db / shelley.db as one
  source; Parse fans it out to one session per thread/conversation.
- Virtual-path resolution flows through the provider-neutral
  ParseVirtualSourcePathForBase helper. parseZedVirtualPath restores the
  legacy IsValidSessionID guard the bespoke parser enforced; parseShelley
  VirtualPath maps directly onto the shared helper. Every engine call
  site that split a virtual path now uses the neutral helper too, and the
  surviving ZedSQLiteSourceMtime / ShelleySourceMtime watchers were
  repointed at it.
- The Zed and Shelley single-conversation direct parses move onto the
  providers as parseThreadDirect / parseConversationDirect over the
  unexported parseZedThreadFromDB / parseShelleyConversationFromDB.

Because a provider has no database handle, the engine reproduces the
per-session skip the legacy fan-out loops performed in
dropUnchangedSharedSQLiteResults: the provider re-parses every session on
any database change, and the engine drops results whose stored file_mtime
(plus the content fingerprint in file_hash for Shelley's second-precision
timestamps) and data_version already match, applying the path rewriter so
remote stored paths resolve. Force-parse runs keep every result. A forced
parse on a deleted shared database now completes as an empty force-replace
in processProviderFile so the engine retires the removed sessions instead
of failing. ParseDiff synthesizes the Zed/Shelley database source the way
it already does for Kiro/OpenCode/Kilo so --agent zed/shelley keeps
working without a DiscoverFunc.

Tests move from the deleted free functions to provider API coverage, add
a guard asserting the legacy entrypoints stay gone, drop both provider
files from the pending shim scan list, and remove the shadow comparison
test. The shared writeProviderShadowSourceFile helper is rehomed into a
dedicated support file so the sync package keeps compiling after the
shadow test is deleted.

refactor(parser): delete zed legacy whole-database parser

ParseZedSessions parsed every top-level thread in a Zed threads.db, but
the provider routes per-thread through parseZedThreadFromDB, so the
free function survived only as test-exercised dead production code.

Delete ParseZedSessions; the retained parse tests reproduce the
whole-database walk with the provider's own primitives (ListZedThreadMetas
+ parseZedThreadFromDB), which share the top-level parent_id filter and
ordering, so they exercise the production path without the deleted shim.

fix(sync): preserve zed shelley force replaces

Zed and Shelley share one physical SQLite database across many virtual session paths. Provider-authoritative sync needs source-level force-replace behavior to retire those virtual sessions when a physical database disappears or when a provider reports a complete empty outcome.

SyncSingleSession also marks the discovered source with ForceParse, so unchanged-result filtering must respect the per-file flag instead of only the engine-wide flag. Otherwise a targeted resync can silently keep corrupted stored rows because the shared SQLite fingerprint has not changed.

Validation: go test -tags "fts5" ./internal/sync -run 'TestSync(SingleSessionZedForce|PathsZedDeleted|SingleSessionShelleyForce|PathsShelleyDeleted)' -count=1; go test -tags "fts5" ./internal/parser -run 'Test(Zed|Shelley)Provider' -count=1; go test -tags "fts5" ./internal/sync -run 'Test.*(Zed|Shelley).*' -count=1; go vet ./...; git diff --check
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (dd7bf96)

High-level verdict: changes need fixes before merge due to one failing test and two medium sync correctness regressions.

High

  • internal/parser/provider_shim_scan_test.go:62
    positron_provider.go was added to pendingShimProviderFiles, but that test requires every pending file to still reference a legacy free-function shim. Positron no longer references any legacy entrypoint, so TestProviderFilesDoNotReferenceLegacyEntrypoints will fail.
    Fix: Remove positron_provider.go from pendingShimProviderFiles.

Medium

  • internal/sync/engine.go:715
    Zed/Shelley WAL or SHM events now map back to the physical DB source, but providerChangedPathForceParse marks them ForceParse because sourcePath != eventPath. That bypasses dropUnchangedSharedSQLiteResults, causing every sidecar write to reparse and rewrite all sessions in the shared DB.
    Fix: Treat sourcePath+"-wal" and sourcePath+"-shm" as backing events for physical SQLite DB sources, and reserve ForceParse for actual deleted DB sources or true tombstone cases.

  • internal/parser/shelley_provider.go:370
    A stale stored virtual Shelley path whose conversation row was deleted errors during Fingerprint, before Parse can return the intended SkipNoSession/ForceReplace tombstone outcome. This regresses forced single-session resync or stale-source cleanup for Shelley rows while shelley.db still exists.
    Fix: Make Shelley virtual-source fingerprinting tolerate a missing conversation, similar to Zed, so the provider parse path can emit the deletion outcome.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 6m45s), codex_security (codex/security, done, 1m31s) | Total: 8m26s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant