Skip to content

feat(parser): migrate opencode-family providers#774

Draft
mariusvniekerk wants to merge 1 commit into
provider-coworkfrom
provider-opencode-family
Draft

feat(parser): migrate opencode-family providers#774
mariusvniekerk wants to merge 1 commit into
provider-coworkfrom
provider-opencode-family

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

OpenCode, Kilo, and MiMoCode now share a concrete parser provider for their common storage format. The provider owns storage session files, message and part child-file classification, legacy SQLite virtual paths, hybrid storage-plus-SQLite discovery, duplicate filtering, and storage fingerprinting.\n\nThis keeps the existing parser helpers and fork-specific relabeling behavior while exposing discovery, watch planning, changed-path classification, lookup, fingerprinting, and parse output through the shared provider interface.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (f821bdc)

Summary verdict: two medium correctness issues remain; no security findings were reported.

Medium

  • internal/parser/opencode_provider.go:130
    Successful OpenCode-family provider parses do not return ForceReplace, while the existing sync path forces full message replacement because these agents rewrite messages and tool results in place. Without this, the provider-based write path can retain stale rows with existing ordinals.
    Fix: Return ForceReplace: true for successful OpenCode/Kilo/MiMoCode parse outcomes, mark ForceReplaceOnParse as supported, and add a provider test for in-place content rewrite.

  • internal/parser/opencode_provider.go:252
    Hybrid storage+SQLite discovery fails the entire provider when the optional SQLite fallback DB cannot be listed, dropping otherwise valid storage-backed sessions.
    Fix: Treat SQLite listing errors as non-fatal when storage mode already yielded file-backed sources; only return the error for pure SQLite roots.


Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 4m44s), codex_security (codex/security, done, 1m33s) | Total: 6m23s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (12c22e5)

This commit migrates the opencode/kilo/mimocode session parsers to the new provider-factory abstraction. It is a structural refactoring: the new openCodeFormatProvider and openCodeFormatSourceSet wrap pre-existing parsing/discovery helpers (ResolveOpenCodeSource, FindOpenCodeSourceFile, ParseOpenCodeSession, etc.) behind the Provider interface, and provider.go registers the three factories.

I assessed the change against the relevant trust boundaries:

  • Path handling: All filesystem lookups route through relUnder (which rejects ../../-prefixed relative paths) and isStorageSessionPath/sourceRef, which re-validate that resolved and virtual-DB paths stay under the configured root before use. Session IDs read from local JSON (readOpenCodeProviderStorageSessionID) feed find() whose results are re-validated by sourceRef. No traversal escape is introduced.
  • No new external attack surface: There are no HTTP handlers, auth/authz decisions, secret-bearing paths, command execution, or CI workflow changes in the diff. The provider operates entirely within the local sync engine over the configured session roots.
  • Input source: The parsed JSON files and SQLite DBs are the user's own local agent-CLI session stores. Per the project threat model, these are non-adversarial, user-owned files, and TOCTOU/data-exposure on local paths is out of scope.
  • SQLite access: The parseSQLite/listSQLite helpers operate on the user's own local DB; even setting that aside, the query-building logic is pre-existing and unchanged by this migration.

No trust boundary, security control, or privileged operation is weakened or newly reached by this change.

No issues found.


Panel: ci_default_security | Synthesis: claude-code | Members: codex_default (claude-code/default, failed, 2s), codex_security (claude-code/security, done, 1m1s) | Total: 1m3s

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (4dd6ebf)

Medium-risk issues remain in OpenCode-family provider migration; no Critical or High findings were reported.

Medium

  • internal/sync/engine.go:2003
    Deleted OpenCode-family session JSON files are now classified as processable sources, but processFile immediately stats the deleted path and returns a sync error before parser logic can handle it.
    Fix: Treat deleted session JSON events as a no-op, or add explicit tombstone handling before the generic stat/read path.

  • internal/parser/opencode_provider.go:453
    DB/WAL changes return per-session virtual sources while the legacy classifier still schedules the physical DB path. In shadow mode this can reparse/write the same SQLite-backed sessions twice and also logs a shadow “source not found” for the legacy DB fan-out path.
    Fix: Either support a DB-level provider source for the legacy DB path, or suppress the provider virtual-source expansion while legacy DB fan-out remains authoritative.


Panel: ci_default_security | Synthesis: codex, 20s | Members: codex_default (codex/default, done, 6m53s), codex_security (codex/security, done, 1m44s) | Total: 8m57s

@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (11bfc10)

Summary verdict: medium-risk sync regressions remain; no critical or high issues reported.

Medium

  • internal/parser/opencode_provider.go:453
    SQLite DB/WAL changes now return per-session virtual sources while the legacy classifier still returns the physical DB file. In shadow mode, those virtual sources are enqueued as normal legacy work; opencode.db#session is not resolved before os.Stat, so OpenCode SQLite updates produce sync errors, and Kilo/MiMoCode can be parsed twice.
    Fix: Avoid appending DB-backed virtual OpenCode-family sources alongside the legacy DB file in shadow mode, or switch the legacy classifier/stat path to handle and dedupe those virtual sources exclusively.

  • internal/sync/engine.go:2003
    Removed storage session files are classified even though processFile immediately stats the returned path. Since the file is gone, the sync path records a stat failure instead of handling the removal/preservation case cleanly.
    Fix: Do not enqueue missing physical session files, or add an explicit deleted-source path that bypasses os.Stat and preserves/removes state according to the intended archive semantics.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 7m36s), codex_security (codex/security, done, 2m19s) | Total: 10m3s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (b4396d4)

Summary verdict: one medium correctness issue needs attention; no security findings were reported.

Medium

  • internal/sync/parsediff.go:469 - OpenCode/Kilo/MiMoCode SQLite sessions are now discovered as per-session virtual sources, but parse-diff still strips db#session to the shared DB path before error handling and the presence sweep. A parse error, --limit, or any partial parse-diff run for one virtual source can be applied to every stored session sharing that DB, producing false parse errors or false “not emitted” presence diffs.

    Suggested fix: For provider-owned per-session OpenCode-family virtual sources, key parse-diff errors and presence tracking by the exact virtual source path/session instead of the stripped DB path; keep shared-base grouping only for true multi-session physical DB jobs.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 7m56s), codex_security (codex/security, done, 49s) | Total: 8m53s

OpenCode, Kilo, and MiMoCode share the same storage/session, message, part, and legacy SQLite source model. Moving them behind one concrete provider keeps that shared source contract explicit instead of spreading it across sync-only classifier paths.

The provider preserves storage-first discovery, hybrid SQLite fallback, duplicate filtering, child-file changed-path classification, SQLite virtual paths, composite source mtimes, storage fingerprints, and fork-specific ID relabeling.

fix(parser): classify removed opencode storage sessions

OpenCode-family storage sessions are watched recursively, so delete and rename-style events for the primary session JSON need to map back to the same provider source even after the file no longer exists. Without that syntactic fallback, provider-path sync can miss stale storage sources until a broader resync.

Move OpenCode, Kilo, and MiMoCode into shadow comparison on this branch so the stack continues as a real migration rather than an additive provider implementation.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(OpenCodeProvider|OpenCodeFamilyProvider|ProviderMigration)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

fix(sync): classify removed opencode session files

The provider path now handles deleted OpenCode-family storage session JSONs, but the legacy SyncPaths classifier is still active during the migration. It needs the same syntactic fallback so watcher-driven sync remains behaviorally equivalent while both forms run.

Validation: go test -tags "fts5" ./internal/sync -run 'TestEngine_ClassifyPathsOpenCodeFamilyRemovedSessionFile' -count=1; go test -tags "fts5" ./internal/parser -run 'Test(OpenCodeProvider|OpenCodeFamilyProvider|ProviderMigration)' -count=1; go test -tags "fts5" ./internal/sync -count=1; go vet ./...; git diff --check

test(sync): compare opencode family shadow parity

OpenCode, Kilo, and MiMoCode share the OpenCode-format provider implementation on this branch, so add source-level migration coverage for all three storage-mode source shapes.

The table test compares provider observation with each legacy parser and verifies session/message/data-version parity while preserving provider-computed storage fingerprints.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesOpenCodeFamilyLegacyParsers|TestOpenCode|TestParseOpenCode|TestParseKilo|TestParseMiMoCode|TestDiscoverKilo|TestDiscoverMiMoCode|TestProviderMigrationModes' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check

refactor(parser): fold opencode family into providers

OpenCode, Kilo, and MiMoCode share one on-disk format (storage/session
JSON plus message/part files, with a legacy SQLite fallback exposed as
<db>#<sessionID> virtual paths). They were still shims: the concrete
provider delegated to package-level free functions and the agents stayed
LegacyOnly, which violated the migration manifest invariant (a concrete
provider must not remain legacy-only) and left the runtime on the legacy
sync dispatch.

Make the three providers authoritative and own their behavior. One
shared openCodeFormatProvider implementation is parameterized per agent
by a format struct (SQLite filename, storage session subdir, ID prefix);
Kilo and MiMoCode reuse the OpenCode storage and SQLite readers and only
relabel the parsed session onto their own agent and ID prefix, so the
parse/discover/find logic is not duplicated three times.

The 15 legacy free functions (Discover/Find/ParseFile/ParseSession/
ParseSQLiteVirtualPath for each of opencode/kilo/mimocode) are deleted.
ParseOpenCodeFile/ParseOpenCodeSession move to unexported helpers
(parseOpenCodeStorageFile/parseOpenCodeDBSession) that the provider spec
drives. SQLite virtual-path resolution now goes through the
provider-neutral ParseVirtualSourcePathForBase, so engine, parsediff, and
resume callers no longer reference deleted parsers.

Remove the opencode-family legacy engine dispatch: the classify blocks
and classifyOpenCodeFormatPath, the processFile arm and
processOpenCodeFormat, the DB-backed sync pass, single-session resync,
and orphaned helpers. Runtime now routes through provider changed-path
classification and processProviderFile. Because these agents now flow
through file discovery, the resync empty-discovery guard tracks a
non-container discovered count so a self-preserving storage store cannot
mask plain file-backed sessions whose directories went empty, and
parse-diff discovers them through the provider facade.

fix(sync): skip provider-authoritative agents in parse-diff db synthesis

parseDiffDatabaseSources synthesized a raw opencode.db/kilo.db source so
the legacy processOpenCode fan-out re-parsed every DB session. Once those
agents became provider-authoritative, parseDiffProviderSources already
enumerates their DB sessions through the provider, which applies the
storage-ID filter that drops a file-backed storage session's stale db
row. Re-adding the raw db then double-counted those sessions and parsed
the filtered storage row, surfacing a spurious ParseError and an extra
examined file. Skip agents that have dropped their DiscoverFunc; the Kiro
data.sqlite3 synthesis still runs because Kiro keeps its legacy
DiscoverFunc until its own fold.

refactor(parser): delete opencode legacy whole-database parser

ParseOpenCodeDB parsed every session in an OpenCode SQLite database, but
the provider (and the Kilo/MiMoCode reuse) routes per-session through
parseOpenCodeDBSession, so the free function survived only as
test-exercised dead production code.

Delete ParseOpenCodeDB along with the orphaned loadOpenCodeSessions and
the OpenCodeSession bundle type; loadOpenCodeProjects stays since the
per-session path also resolves worktrees through it. The retained parse
tests reproduce the whole-database walk with the provider's own
primitives (ListOpenCodeSessionMeta + parseOpenCodeDBSession).

fix(sync): preserve parse-diff virtual sqlite identity

OpenCode-family providers expose SQLite sessions as per-session virtual sources. Parse-diff was still collapsing those db#session paths to the shared database path before error attribution, presence sweep, and limit accounting, which could apply one session's parse failure or omitted sample to every sibling in the DB.

Keep exact source keys for OpenCode, Kilo, and MiMoCode provider virtual SQLite paths while retaining shared-base grouping for true physical multi-session jobs. Source existence checks still stat the physical DB path so virtual identities do not look missing.

Validation: go test -tags "fts5" ./internal/sync -run 'TestParseDiffProviderVirtualSQLite(ErrorUsesExactSource|PresenceUsesExactSource|LimitUsesExactSource)|TestStripVirtualSourceSuffixVisualStudioCopilot' -count=1; go test -tags "fts5" ./internal/sync -run 'TestParseDiff(CoversMixedOpenCodeRoot|CoversMixedKiloRoot|ProviderVirtualSQLite|PresenceSweep|LimitNewestFirst|ReportHasFailures)' -count=1; go test -tags "fts5" ./internal/parser -run 'Test(OpenCodeProvider|OpenCodeFamilyProvider|Kilo|MiMoCode)' -count=1; go vet ./...; git diff --check
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (192f26c)

Summary verdict: one medium issue remains; no high or critical findings were reported.

Medium

  • internal/parser/opencode_provider.go:301 - In hybrid OpenCode/Kilo/MiMoCode roots, storage sessions are discovered before the optional SQLite fallback is listed, but any SQLite listing error returns from Discover and causes callers to discard already found storage sources. A corrupt or schema-incompatible opencode.db/kilo.db/mimocode.db in a storage root can block healthy file-backed sessions from syncing.

    Fix: Treat SQLite fallback listing errors as non-fatal in storage mode: keep storage sources and continue, while still surfacing errors for pure SQLite roots where no storage source can be used. Add a hybrid storage plus broken DB test.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 9m59s), codex_security (codex/security, done, 2m45s) | Total: 12m52s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant