Skip to content

fix(parser): require explicit provider factories#784

Draft
mariusvniekerk wants to merge 5 commits into
provider-db-backed-familyfrom
provider-explicit-registry
Draft

fix(parser): require explicit provider factories#784
mariusvniekerk wants to merge 5 commits into
provider-db-backed-familyfrom
provider-explicit-registry

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Every registered parser agent now has an explicit provider factory instead of relying on the legacy fallback. Claude.ai and ChatGPT are import-only parsers, so they use a small ProviderBase-backed import-only provider with no source discovery/watch behavior while preserving the standard unsupported parse response. The provider registry tests now fail if any future AgentDef falls through to legacyProviderFactory, which keeps new providers on the shared facade path.

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (a63405c)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m7s), codex_security (codex/security, done, 33s) | Total: 2m40s

@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from a63405c to 41637b8 Compare June 20, 2026 00:06
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (41637b8)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 1m15s), codex_security (codex/security, done, 25s) | Total: 1m40s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (555b938)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m54s), codex_security (codex/security, done, 11s) | Total: 3m5s

@mariusvniekerk

mariusvniekerk commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (30ae7e6)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m1s), codex_security (codex/security, done, 14s) | Total: 2m15s

@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch 3 times, most recently from 2fa10c1 to 089ddd7 Compare June 20, 2026 00:24
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (089ddd7)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m4s), codex_security (codex/security, done, 23s) | Total: 2m27s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (0671d6c)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 1m19s), codex_security (codex/security, done, 23s) | Total: 1m42s

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 81b0cad to ae27e4f Compare June 20, 2026 00:42
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from 0671d6c to b6de96f Compare June 20, 2026 00:42
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (b6de96f)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 1m55s), codex_security (codex/security, done, 17s) | Total: 2m12s

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from ae27e4f to 0799e94 Compare June 20, 2026 00:53
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from b6de96f to 1ab0ad4 Compare June 20, 2026 00:53
@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 0799e94 to 82f9e12 Compare June 20, 2026 00:57
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from 1ab0ad4 to b68ac14 Compare June 20, 2026 00:57
@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 82f9e12 to e8149e5 Compare June 20, 2026 01:02
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from b68ac14 to 42bcdb4 Compare June 20, 2026 01:02
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (42bcdb4)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (claude-code/default, done, 3m11s), codex_security (claude-code/security, done, 1m21s) | Total: 4m32s

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from e8149e5 to 83f8b73 Compare June 20, 2026 01:42
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from 42bcdb4 to 53ea4b2 Compare June 20, 2026 01:42
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Review Unavailable (53ea4b2)

The review agent repeatedly failed to run (likely an agent or configuration error). roborev will try again on the next commit.

Last error: agent: claude-code failed stream: stream errors: You've hit your session limit · resets 5:50am (UTC): exit status 1

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 83f8b73 to 1f17524 Compare June 21, 2026 00:41
@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (9671172)

Synthesis unavailable. Showing individual review outputs.

claude-code — default (done)

Review Findings

Severity: Low
Location: internal/sync/engine.go:695-825 (attachProviderSources and its only callee discoverProviderSources)
Problem: These two methods are dead in production. The full-sync path uses discoverProviderFiles, and the changed-path path uses attachChangedPathProviderSources. The only caller of attachProviderSources (and therefore the only thing that reaches discoverProviderSources) is internal/sync/provider_process_test.go:1003. That is ~95 lines of unexercised production code that largely duplicates discoverProviderFiles, which will rot independently of the live path.
Fix: Remove both methods (and adjust the test), or wire attachProviderSources into the path it was intended for if it is meant to be used.


Severity: Low
Location: internal/sync/parsediff.go:614-620 (parseDiffPresencePath)
Problem: The helper is effectively return path. For a virtual path (base != path) it returns path; for a physical path (base == path) it returns base, which equals path. Both branches return the input unchanged, so the function name and structure imply a stripping/keeping distinction that does not exist. The correct exact-path behavior actually comes from storedByPath now being keyed by both full and base paths; this wrapper just obscures that.
Fix: Replace the call with job.path directly, or drop the helper.

Summary

A large, well-tested refactor that removes the legacy DiscoverFunc/FindSourceFunc adapters and makes per-agent providers authoritative for discovery, source lookup, watch planning, incremental parsing, export, and sync skip/caching; the only issues found are minor dead/redundant code.


claude-code — security (done)

I've reviewed the full diff. This is a large (65-commit, ~13k line) internal refactor that migrates the parser/sync system from per-agent function callbacks (DiscoverFunc, FindSourceFunc, WatchRootsFunc) to a unified provider abstraction, flipping all agents to provider-authoritative/import-only migration modes and removing the legacy adapter.

I focused on the changed surfaces that could touch a trust boundary:

  • cmd/agentsview/session_export.go — New provider-based source resolution. The Visual Studio Copilot multi-conversation trace filtering (the one path with an explicit disclosure concern) is preserved and extended: when the resolved source is a real trace file, the conversation ID is derived from the session ID and filtered, guarded by IsValidSessionID. The new visualStudioCopilotTraceHasConversation also gates on IsValidSessionID(rawID) before interpolating rawID into the trace-content match string, so a raw ID can't inject an arbitrary search substring. This is a local CLI over user-owned session files regardless.
  • internal/ssh/resolve.go — Only the filter predicate changed (ProviderSupportsSourceDiscovery vs DiscoverFunc != nil); the values interpolated into the generated remote shell script remain static parser.Registry constants, so no new injection.
  • internal/sync/engine.go — New provider source-discovery/attachment plumbing. The one new SQL query (countRootProviderVirtualDBSessions) is fully parameterized with ? placeholders. pathRewriter/idPrefix remote-sync machinery is moved but not semantically changed. No new exec/command sinks.
  • internal/parser/* providers — Codex incremental append parsing, fingerprint inode/device additions, factory wiring (panic on missing factory is a programmer-error guard). All operate on local, user-owned files that per the project threat model are non-adversarial.

The externally-reachable code (HTTP API, auth/bearer-token, CORS, managed Caddy config generation) and the DB/Postgres schema are not touched by this diff. No changes construct paths, queries, or commands from attacker-controlled input crossing a trust boundary, and no secrets are exposed.

No issues found.

Make the provider registry force every agent onto an explicit facade path
instead of silently inheriting a legacy fallback factory. Remove the
legacy provider fallback entirely so an unhandled AgentDef is a loud
construction error, and represent the non-filesystem export parsers
(Claude.ai, ChatGPT) with explicit import-only providers. Mark the
concrete providers authoritative in the migration manifest and drop the
legacy-only marker.

Route FindSourceFile and SourceMtime through provider FindSource and
Fingerprint so the stack tip exercises provider-owned source identities
and composite mtimes rather than parallel legacy dispatch.

Retire the test scaffolding that depended on the removed legacy types:
per-provider tests assert concrete construction, the obsolete
legacy-fallback and legacy-only-mode registry tests are dropped, and the
zero-legacy anti-shim gate runs with an empty pending list.

With codex's legacy ShallowWatchRootsFunc removed in favor of the provider
WatchPlan, fix collectProviderWatchRoots so a WatchPlan root that does not
exist yet but lives under an already-watched ancestor no longer marks the
whole directory for unwatched polling. A not-yet-created per-session
recursive root otherwise regressed parity by polling two codex dirs that
share a watched state-directory root; the ancestor watch observes the
target's creation and a later sync establishes the deeper watch.
ChatGPT and Claude.ai sessions never come from disk discovery; they enter
the archive only through a one-shot import. Move the ParseChatGPTExport and
ParseClaudeAIExport free functions onto the import-only provider as the
ChatGPTExportParser and ClaudeAIExportParser methods, and route the importer
and tests through NewProvider plus a type assertion.

This removes the last provider-specific legacy parse free functions, so the
parser package now owns every agent's parse behavior on provider receivers
rather than on standalone entrypoints.
No provider runs in shadow-compare mode: every agent is now either
provider-authoritative or import-only. The shadow harness that dual-ran a
side-effect-free provider parse against the legacy result and recorded the
diff was therefore never invoked at runtime.

Delete the harness end to end: the ObserveProviderSource entry point and its
comparison machinery, the Engine.observeProviderShadow hook and its two call
sites, the ProviderShadowRecorder config/field wiring, and the
ProviderMigrationShadowCompare mode (collapsing every switch that paired it
with provider-authoritative). The provider outcome validation and effect
planning helpers that the live parse path still relies on move to
provider_effects.go, which is all that file ever held that was reachable.
With every provider folded onto receiver methods and zero provider-specific
legacy parse free functions left, the migration's enforcement tests have
served their purpose. They assert the absence of named functions and that
provider files do not shim legacy entrypoints, which is only meaningful
while the stack is mid-migration; after merge they are pure maintenance drag
that breaks whenever a symbol is legitimately renamed.

Delete the per-provider Test*ProviderOwnsLegacyEntrypoints guards and the
shared anti-shim scan (provider_shim_scan_test.go). The providers' behavioral
tests remain and are what actually protect the parse paths going forward.
origin/main carries three agents the facade stack never migrated: Aider,
OhMyPi, and Reasonix. After rebasing onto it, the explicit provider registry
panicked on startup because those agents had no concrete factory, and the
migration manifest still listed them as legacy-only against a manifest that no
longer defines that mode.

Give each a concrete provider so the zero-legacy registry stays intact:

- OMP shares Pi's JSONL session format, so the Pi provider is parameterized by
  AgentDef (type and ID prefix) and serves both Pi and OhMyPi; ParseOMPSession
  is folded away.
- Reasonix gets a single-file provider whose composite fingerprint folds the
  .jsonl.meta sidecar (mirroring reasonixEffectiveInfo) and whose changed-path
  classifier reproduces the project/global/archive/subagent layouts and the
  sidecar-to-transcript mapping.
- Aider gets a multi-session provider that fans one history file out into one
  session per run under "<history>#<idx>" virtual paths and force-replaces on
  parse, mirroring the Shelley shape. Per-run skip is handled by
  dropUnchangedSharedSQLiteResults (content-hash compare); remote-sync identity
  stability is preserved by threading the path rewriter through ProviderConfig
  so per-run IDs stay stable across temp extraction dirs.

The three manifest entries flip to provider-authoritative and the legacy engine
methods (processAider, processReasonix, aiderFileUnchanged, aiderIdentityPath,
classifyAiderPath) plus the now-dead legacy processFile fall-through are
removed.

The two codex append regression tests that were re-pointed onto processFile no
longer consume their re-stat result; drop the unused assignment to satisfy
staticcheck.
@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 4f27d46 to 39a2870 Compare June 23, 2026 23:56
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from 9671172 to 245482d Compare June 23, 2026 23:56
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (245482d)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 9m51s), codex_security (codex/security, done, 4m54s) | Total: 14m45s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant