Skip to content

feat(parser): migrate pi provider#760

Draft
mariusvniekerk wants to merge 1 commit into
provider-amp-zencoderfrom
provider-pi
Draft

feat(parser): migrate pi provider#760
mariusvniekerk wants to merge 1 commit into
provider-amp-zencoderfrom
provider-pi

Conversation

@mariusvniekerk

@mariusvniekerk mariusvniekerk commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Pi is the next JSONL-shaped parser that can move behind the provider facade without introducing a new source framework. Its source layout composes the directory JSONL helper, but keeps provider-owned filtering because legacy discovery validates the session header while raw session lookup only checks the expected filename under encoded-cwd directories.

This keeps that discovery-versus-lookup asymmetry explicit in the provider and preserves symlinked encoded-cwd directory support while parse output continues to come from the existing Pi parser.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (b027db9)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m9s), codex_security (codex/security, done, 1m17s) | Total: 3m26s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Review Unavailable (5a4c47f)

The review agent repeatedly failed to run (likely an agent or configuration error). roborev will try again on the next commit.

Last error: agent: claude-code failed stream: stream errors: You've hit your session limit · resets 5:50am (UTC): exit status 1

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (93fe904)

Summary: One medium issue remains in the Pi provider migration; no security findings were reported.

Medium

  • internal/parser/pi_provider.go:189 - Pi is now opted into shadow comparison, but the provider fingerprint never includes a file hash. piProvider.Parse only copies req.Fingerprint.Hash, while the legacy processPi path computes and stores Session.File.Hash, so real Pi shadow comparisons will report a session mismatch that the new test misses.

    Fix: Set Hash: true in the Pi JSONLSourceSetOptions and update the shadow parity test to compare against the legacy sync result or otherwise assert the expected file hash.


Panel: ci_default_security | Synthesis: codex, 11s | Members: codex_default (codex/default, done, 4m2s), codex_security (codex/security, done, 1m41s) | Total: 5m54s

@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (28e3ce8)

This PR has one medium-severity issue to address before merge.

Medium

  • internal/parser/pi_provider.go:189: The Pi provider fingerprint is configured without Hash: true, so Parse receives an empty Fingerprint.Hash and leaves Session.File.Hash empty. The legacy processPi path computes and stores ComputeFileHash, and shadow comparison compares the full ParsedSession, so every Pi shadow observation will report a session mismatch. A future provider-authoritative switch would also drop the stored hash.

    Fix: Add Hash: true to the Pi JSONLSourceSetOptions and update the Pi shadow test to compare against legacy process behavior or set and verify the expected hash from observation.Fingerprint.Hash.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 3m19s), codex_security (codex/security, done, 1m46s) | Total: 5m12s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (66756c4)

Summary verdict: Changes are not ready; the Pi provider migration appears to break OMP discovery and parse-diff support.

High

  • Location: internal/parser/types.go:317
  • Problem: OMP remains ProviderMigrationLegacyOnly, but its legacy DiscoverFunc/FindSourceFunc were removed and the sync engine no longer has an OMP/Pi legacy processing case. That leaves OhMyPi sessions undiscoverable in full sync and changed-path sync, and any OMP DiscoveredFile would fall through to unknown agent type.
  • Fix: Restore the OMP legacy discovery/find/process path, or add a concrete OMP provider that uses AgentOMP/omp: and move OMP to provider-authoritative mode.

Medium

  • Location: internal/parser/types.go:307
  • Problem: Removing Pi’s DiscoverFunc also removes Pi from parse-diff, which still only supports file-based agents with DiscoverFunc. parse-diff --agent pi will now be rejected, and unrestricted parse-diff runs will classify stored Pi sessions as not discovered instead of re-parsing them.
  • Fix: Make parse-diff discover and process provider-authoritative sources, and update the supported-agent checks, or keep Pi’s legacy discovery hook until parse-diff is provider-aware.

Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 3m3s), codex_security (codex/security, done, 2m52s) | Total: 6m5s

Pi is the next JSONL-shaped parser that can move behind the provider facade without introducing a new source framework. Its source layout is still simple enough to compose the directory JSONL helper, but it needs provider-owned filtering because legacy discovery validates the session header while raw session lookup only checks the expected filename under encoded-cwd directories.

This keeps that discovery-versus-lookup asymmetry explicit in the provider and preserves symlinked encoded-cwd directory support while parse output continues to come from the existing Pi parser.

Validation: go test -tags "fts5" ./internal/parser -run TestPiProvider -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; make test-short; git diff --check

fix(parser): preserve pi header-based discovery

Pi discovery has historically treated the filename as source shape only: any one-level JSONL file under an encoded-cwd directory can be a session if its header has type=session. The provider migration accidentally applied raw session ID filename validation before header validation, which would drop valid files whose session ID comes from the header instead of the filename.

Raw-ID lookup still validates the requested ID before reconstructing <id>.jsonl, so the legacy discovery-versus-lookup asymmetry remains explicit without broadening lookup inputs.

Validation: go test -tags "fts5" ./internal/parser -run TestPiProviderDiscoveryAcceptsSessionHeaderInNonSessionIDFilename -count=1; go test -tags "fts5" ./internal/parser -run 'TestPiProvider(DiscoveryAcceptsSessionHeaderInNonSessionIDFilename|SourceMethods|Parse|DiscoversSymlinkedCWDDirectory|FactoryReplacesLegacyAdapter)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; make test-short; git diff --check

test(parser): opt pi into provider shadow

Pi now has a concrete facade provider on this branch, so its migration mode should enter the shared shadow-compare harness instead of remaining an additive implementation behind legacy-only dispatch.

The stack keeps lower provider opt-ins inherited and leaves later provider branches legacy-only until their own migrations land.

Validation: go test -tags "fts5" ./internal/parser -run TestProviderMigrationModes -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare pi shadow parity

Pi is shadow-compared on this branch, so add the shared source-level proof that provider observation matches ParsePiSession output for a representative session file.

Validation: go fmt ./...; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

refactor(parser): fold pi into provider

Pi should not keep exported parser and source callback APIs after its concrete provider exists. Removing those hooks also exposed that full sync and single-session lookup still assumed AgentDef callbacks, so provider-authoritative agents were not actually runnable without legacy callbacks.

Move Pi parsing behind the provider, remove its legacy discovery and sync dispatch, add provider discovery and provider lookup to the sync root path, and replace shadow-baseline coverage with provider API tests plus a guard that the old symbols stay gone.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync ./cmd/agentsview -count=1; go vet ./...; git diff --check

fix(parser): preserve pi family provider capabilities

OMP shared the Pi on-disk format but was left legacy-only after the legacy registry hooks were removed, so full sync and changed-path sync could no longer reach it through the migrated provider path. Parse-diff had the same shape of regression for provider-authoritative agents because it only trusted AgentDef discovery callbacks.\n\nFold OMP into the concrete Pi-family provider, derive parse identity from the provider definition, and teach parse-diff plus CLI validation to accept provider-authoritative on-disk sources. This keeps the branch as an actual migration rather than a shim around removed legacy functions.\n\nValidation: go test -tags "fts5" ./internal/parser -count=1; go test -tags "fts5" ./cmd/agentsview -run 'TestParseDiff' -count=1; go test -tags "fts5" ./internal/sync -run 'Test(ParseDiff|OMPSyncAllAndChangedPathUseProvider)' -count=1; go test -tags "fts5" ./internal/sync -count=1; go vet ./...; git diff --check

fix(parser): thread ctx through pi source lookups
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (c3da94b)

Medium-risk migration regressions need fixing before merge.

Medium

  • internal/parser/pi_provider.go:197: The new Pi/OMP source set does not request content hashes, so normal provider sync leaves req.Fingerprint.Hash empty and piProvider.Parse stores no file_hash. Resyncing existing Pi/OMP rows will clear hashes that the legacy processPi path populated.
    Fix: Set Hash: true in newPiSourceSet or compute the file hash before returning the parse result, and assert stored file_hash in integration coverage.

  • internal/parser/provider_migration.go:35: Moving Pi/OMP to the provider-authoritative path drops the legacy processPi size/mtime/data-version skip. processProviderFile only checks the skipped-files cache before parsing, so unchanged Pi/OMP sessions are reparsed and rewritten on every full sync instead of being counted as skipped.
    Fix: Add provider-path DB skip logic using the provider fingerprint and stored file metadata before calling Parse, or keep Pi/OMP on the legacy path until skip parity exists.

  • internal/parser/types.go:308: Pi/OMP now have nil DiscoverFunc/FindSourceFunc, but existing call sites still depend on those hooks: internal/ssh/resolve.go filters remote transfer targets by DiscoverFunc, and cmd/agentsview/token_use.go resolves unsynced disk IDs via FindSourceFunc. This regresses remote SSH sync and on-demand session usage/token-use resolution for Pi/OMP sessions.
    Fix: Update those consumers to use provider-aware discoverability/source lookup for provider-authoritative file agents, or retain compatibility hooks until all callers are migrated.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 7m13s), codex_security (codex/security, done, 2m19s) | Total: 9m42s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant