Skip to content

Add parser provider facade core#751

Draft
mariusvniekerk wants to merge 1 commit into
locate-parser-interfacefrom
provider-facade-core
Draft

Add parser provider facade core#751
mariusvniekerk wants to merge 1 commit into
locate-parser-interfacefrom
provider-facade-core

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Adds the first implementation slice for the parser provider facade: the provider and factory contract, config snapshots, source/fingerprint/parse/incremental outcome types, zero-value ProviderBase optional methods, typed unsupported-feature errors, generated capability enum support, and a transitional factory registry that mirrors the current AgentDef registry.

This does not move sync dispatch or parser execution yet. The legacy factories intentionally return unsupported parse errors so later stacked branches can replace them provider group by provider group, with interface invariants documented at the package boundary for source identity, parse error semantics, and capability declarations.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (103d72b)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m11s), codex_security (codex/security, done, 13s) | Total: 2m24s

@mariusvniekerk mariusvniekerk force-pushed the locate-parser-interface branch from f8a39c8 to 204a879 Compare June 19, 2026 18:45
@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (b6abc79)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m46s), codex_security (codex/security, done, 10s) | Total: 2m56s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Review Unavailable (63b5b72)

The review agent repeatedly failed to run (likely an agent or configuration error). roborev will try again on the next commit.

Last error: agent: claude-code failed stream: stream errors: You've hit your session limit · resets 5:50am (UTC): exit status 1

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (120e40f)

High-level verdict: changes need fixes before merge due to one High and one Medium correctness risk.

High

  • internal/sync/engine.go:4004
    The new skip branch persists every processResult{skip: true} into the skip cache. Many skips are normal freshness skips after shouldSkipFile / shouldSkipByPath confirms the session is already current, not intentional parser skips. Once persisted, a later parser data-version bump can be bypassed by the early skip-cache check before data_version is revalidated, leaving unchanged sessions stale until their source mtime changes.
    Fix: Do not cache generic r.skip results. Add an explicit cacheable-skip flag/reason for intentional parser/provider skips, or mark freshness skips as non-cacheable.

Medium

  • internal/sync/engine.go:593
    Shadow-mode provider changed-path sources are appended to the live SyncPaths work list with ProviderProcess false, so they are processed by the legacy parser. Provider-only or virtual source paths from stored hints, such as state.sqlite3#session, can then fail legacy stat/parse and turn a shadow observation into a real sync failure.
    Fix: In shadow mode, only merge provider metadata onto paths already classified by the legacy path, or run provider-only changed-path observations outside the live worker list. Enqueue provider-only sources for live processing only in provider-authoritative mode.

Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 5m35s), codex_security (codex/security, done, 33s) | Total: 6m17s

@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (9cbce59)

Medium risk: the change appears to broaden skip-cache persistence and can weaken normal sync self-healing; no security issues were reported.

Medium

  • internal/sync/engine.go:4245-4248, interacting with internal/sync/engine.go:4610 and legacy process* skip returns such as internal/sync/engine.go:5134-5142

    The new r.skip branch in collectAndBatch persists a skip-cache entry via e.cacheSkip(r.skipCacheKey(), r.mtime). Previously this branch only continued.

    Legacy process* functions return processResult{skip: true} for already-up-to-date sessions, and processFile now unconditionally stamps res.cacheSkip = e.shouldCacheSkip(file) for many agents. This means unchanged legacy sessions can now be persisted into skipped_files, causing the table to grow with every unchanged session after subsequent syncs.

    It also weakens self-healing: once an unchanged session is skip-cached, the mtime short-circuit at internal/sync/engine.go:4512-4528 can fire before legacy DB-existence checks such as processClaude’s GetSession check. If a DB row disappears while its source file is unchanged, a normal sync may no longer re-add it; recovery would require a forced/full resync.

    Recommended fix: scope the new skip-branch caching to the provider path only, for example by gating it on the provider cache key or a dedicated flag set only by processProviderFile’s SkipReason return. If caching legacy up-to-date skips is intentional, add tests covering the new skipped_files contents and the DB self-healing behavior.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (claude-code/default, done, 11m27s), codex_security (claude-code/security, done, 1m22s) | Total: 12m59s

Introduce the Provider interface, ProviderBase/ProviderFactory, and source-set
helpers; own provider discovery and lookup at the root; and add the
legacy-call shim scan that gates provider files.

fix(parser): cover Aider, OMP, Reasonix in migration manifest

These agents live in the registry but were absent from the provider
migration manifest, so ValidateProviderMigrationModes failed once the
registry began enforcing that every agent has a mode. They remain on the
legacy path here; later stack commits migrate them to concrete providers
and flip these entries to provider-authoritative.

fix(sync): keep shadow provider discovery observational

Shadow provider mode must not add provider-only work or satisfy source lookups that the legacy runtime would miss. Otherwise a migration comparison can change live sync behavior before the provider becomes authoritative.\n\nProvider-authoritative discovery now reports discovery failures as sync failures and suppresses the provider completion watermark for that run, preserving the next incremental pass. The shim scan also keeps pending exemptions honest by failing stale entries while ignoring provider-owned selector methods.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestProviderFilesDoNotReferenceLegacyEntrypoints' -count=1; go test -tags "fts5" ./internal/sync -run 'Test(DiscoverProviderSourcesOnlyRunsAuthoritativeProviders|SyncAllProviderDiscoveryFailureSkipsFinishedWatermark|FindSourceFileFallsBackToAuthoritativeNonFileProvider|ClassifyProviderChangedPath|ProcessFileShadow|ProcessFileProviderAuthoritative|ProviderVirtualSourceBackedByEvent)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check

docs(parser): clarify provider freshness contract

The facade spec still described successful parses as eligible for a clean skip-cache entry, which conflicts with the no-schema-change data-version model and can leave unchanged sessions stale after parser upgrades.\n\nDocument stored changed-path hints explicitly and keep successful unchanged-source freshness tied to DB metadata plus parser data version, reserving skipped_files for retry, failure, and explicit skip cases.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestProviderFilesDoNotReferenceLegacyEntrypoints' -count=1; git diff --check. mdformat ran via commit hook.

docs(parser): pin provider source identity semantics

The facade contract needs to say exactly which provider source key is persisted because the migration intentionally avoids a schema change. Without that rule, providers could diverge between SourceRef, SourceFingerprint, and sessions.file_path identities.\n\nAlso define capability conformance by meaningful field presence so unsupported zero-value fields are treated consistently in provider tests.\n\nValidation: git diff --check. mdformat is unavailable on PATH, but the commit hook ran.

style(docs): mdformat provider dual-run harness plan
@mariusvniekerk mariusvniekerk force-pushed the locate-parser-interface branch from 3d2b79d to 833ccf4 Compare June 25, 2026 05:47
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (8bc8fef)

Summary verdict: Medium-risk issues remain in the provider migration/shadow sync path; no security-specific findings were reported.

Medium

  • internal/sync/engine.go:580
    Shadow-compare providers can add DiscoveredFiles into the live SyncPaths pipeline with ForceParse enabled and ProviderProcess false. This can let shadow-only changed-path classification trigger legacy parsing, DB writes, skip-cache changes, or failures for paths the legacy classifier would have ignored, violating the shadow mode invariant that provider work is side-effect-free.
    Fix: Do not append provider-classified files to the live file list in ProviderMigrationShadowCompare; keep changed-path shadow observations separate, or only use provider sources for comparison without changing the legacy work item.

  • internal/sync/engine.go:599
    classifyProviderChangedPath calls SourcesForChangedPath for every provider watch root without checking whether the changed path is actually under that root. Providers receive an unmatched WatchRoot plus unrelated stored hints, despite the request contract describing WatchRoot as the matched root. This can cause unrelated filesystem events to be classified into provider sources.
    Fix: Filter watch roots before fetching hints and calling the provider, using the same path/root descendant semantics as the rest of the sync code, with any explicit companion-path exceptions documented.

  • internal/sync/engine.go:738
    providerDiscoveredPath prefers DisplayPath over FingerprintKey, but DisplayPath is human-readable while FingerprintKey is the persisted freshness/source identity. When they differ, provider discovery and provider-backed source lookup can feed a display string into DiscoveredFile.Path and later FindSource fallback fields, breaking skip-cache keys, SyncSingleSession, and source lookup for virtual or composite providers.
    Fix: Use FingerprintKey as the engine identity first, then fall back to stable provider key/display only when no fingerprint key exists.


Panel: ci_default_security | Synthesis: codex, 12s | Members: codex_default (codex/default, done, 6m41s), codex_security (codex/security, done, 3m59s) | Total: 10m52s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant