Skip to content

feat(parser): migrate amp and zencoder providers#759

Draft
mariusvniekerk wants to merge 1 commit into
provider-deepseek-tuifrom
provider-amp-zencoder
Draft

feat(parser): migrate amp and zencoder providers#759
mariusvniekerk wants to merge 1 commit into
provider-deepseek-tuifrom
provider-amp-zencoder

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Amp and Zencoder both use shallow session-file roots, so migrating them together keeps the provider stack moving without introducing another source helper.

The concrete providers preserve legacy filename filters, raw/full ID lookup, deleted-path classification, fingerprint propagation, and parse normalization while continuing to compose the shared JSON source mechanics explicitly.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (c9fdb0b)

PR needs changes: one medium issue must be fixed before merge.

Medium

  • internal/parser/amp_provider.go:111 and internal/parser/zencoder_provider.go:112 - The new source sets do not enable hashing, so provider.Fingerprint() returns an empty Hash. Since Parse only copies req.Fingerprint.Hash, the normal provider flow would persist an empty Session.File.Hash, unlike the existing Amp/Zencoder sync paths that compute and store ComputeFileHash. The current tests mask this by injecting a fake hash directly. Set Hash: true in both JSONLSourceSetOptions and update the parse tests to use provider.Fingerprint().

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 4m4s), codex_security (codex/security, done, 1m40s) | Total: 5m50s

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (172d38f)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 5m24s), codex_security (codex/security, done, 1m3s) | Total: 6m27s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Review Unavailable (a9f62c4)

The review agent repeatedly failed to run (likely an agent or configuration error). roborev will try again on the next commit.

Last error: agent: claude-code failed stream: stream errors: You've hit your session limit · resets 5:50am (UTC): exit status 1

@mariusvniekerk mariusvniekerk force-pushed the provider-deepseek-tui branch from 808b17a to 4b11d8c Compare June 21, 2026 00:40
@mariusvniekerk mariusvniekerk force-pushed the provider-amp-zencoder branch from a9f62c4 to 870586d Compare June 21, 2026 00:40
@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (870586d)

Summary verdict: One medium issue should be fixed before merge.

Medium

  • internal/parser/amp_provider.go:111 and internal/parser/zencoder_provider.go:112
    The new providers populate Session.File.Hash from req.Fingerprint.Hash, but their JSONLSourceSetOptions do not enable hashing. Because these agents are in shadow-compare mode, provider observations will omit the hash while the legacy Amp/Zencoder processors compute and store it, causing shadow parity mismatches and potentially dropping the hash if these providers become authoritative.

    Fix: Add Hash: true to both source set option blocks, and update parity tests to compare against the hashed legacy process output.


Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 5m23s), codex_security (codex/security, done, 3m0s) | Total: 8m29s

@mariusvniekerk mariusvniekerk force-pushed the provider-deepseek-tui branch from 4b11d8c to 282fb83 Compare June 21, 2026 01:35
@mariusvniekerk mariusvniekerk force-pushed the provider-amp-zencoder branch from 870586d to 3002195 Compare June 21, 2026 01:36
@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (3002195)

Summary verdict: One medium issue remains; no high or critical findings were reported.

Medium

  • internal/parser/amp_provider.go:111 and internal/parser/zencoder_provider.go:112: The new Amp and Zencoder source sets do not enable hashing, so provider.Fingerprint() returns an empty hash and Parse() leaves Session.File.Hash empty. The legacy sync paths compute and store sess.File.Hash, so shadow comparison will produce systematic session mismatches, while the new shadow tests miss this because they compare only raw parser output.
    • Fix: Set Hash: true in both source set options and update the shadow/parity tests to assert against the same hashed session state produced by the legacy sync path.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m32s), codex_security (codex/security, done, 23s) | Total: 7m2s

@mariusvniekerk mariusvniekerk force-pushed the provider-deepseek-tui branch from 282fb83 to adb7150 Compare June 21, 2026 01:46
@mariusvniekerk mariusvniekerk force-pushed the provider-amp-zencoder branch from 3002195 to 97d6ed3 Compare June 21, 2026 01:46
@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (97d6ed3)

Provider migration has one medium issue to fix before merge.

Medium

  • internal/parser/amp_provider.go:111, internal/parser/zencoder_provider.go:112
    The new Amp and Zencoder providers do not enable Hash: true, so ObserveProviderSource produces an empty Session.File.Hash. The legacy processAmp and processZencoder paths always compute and populate the file hash, and shadow comparison uses reflect.DeepEqual on the whole session. As a result, every successful shadow comparison for these newly enabled providers will report a false session mismatch.

    Fix: Set Hash: true in both source sets, and update the shadow tests to assert parity including the observed fingerprint hash.


Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 3m57s), codex_security (codex/security, done, 1m9s) | Total: 5m12s

@mariusvniekerk mariusvniekerk force-pushed the provider-deepseek-tui branch from adb7150 to 990379a Compare June 23, 2026 23:55
@mariusvniekerk mariusvniekerk force-pushed the provider-amp-zencoder branch from 97d6ed3 to 4d6a1d3 Compare June 23, 2026 23:55
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (4d6a1d3)

Synthesis unavailable. Showing individual review outputs.

claude-code — default (done)

Review Findings

Severity: Low
Location: internal/parser/amp_provider.go / internal/parser/zencoder_provider.go (newAmpSourceSet / newZencoderSourceSet)
Problem: Legacy processAmp/processZencoder always computed a full content hash via ComputeFileHash and stored it in sess.File.Hash (→ file_hash). The new source sets do not set Hash: true, and JSONLSourceSet.Fingerprint only hashes when Hash is enabled, so in the production provider-authoritative path req.Fingerprint.Hash is empty and File.Hash is now persisted as empty for amp/zencoder. This is a deliberate-looking but inconsistent choice: the sibling migrations gptme and iflow do set Hash: true "to mirror the legacy sync path," while amp/zencoder (and the already-merged deepseek/commandcode) do not. I traced the only reader of the stored hash (GetFileHashByPath) to the Shelley skip path exclusively, which amp/zencoder never traverse — their skip is MTimeNS-based — so there is no functional skip regression, only emptier metadata (and a one-time re-push if file_hash participates in the PG push fingerprint).
Fix: None required functionally, but confirm the selective application of Hash: true (gptme/iflow yes; amp/zencoder/deepseek/commandcode no) is intentional rather than an oversight; if file_hash parity for these agents is desired, add Hash: true to the two new source sets.


Severity: Low
Location: internal/parser/amp.go:35-36 (and internal/parser/amp_test.go:647)
Problem: The comment "The session ID must match the filename so that FindAmpSourceFile can locate the file by ID" references FindAmpSourceFile, which this commit deletes. The reasoning is still valid (provider FindSource now matches by the filename-derived sessionID), but the named function no longer exists, which will mislead future readers.
Fix: Update the comments to reference the provider FindSource/SessionIDFromPath lookup instead of the removed function.

Summary

Migrates the Amp and Zencoder parsers to the provider facade (new ampProvider/zencoderProvider, registry/factory/migration-mode updates, removal of legacy discover/find/process entrypoints) and consolidates the JSONLSourceSet symlink-stat helpers; the source-set refactor and lookup/validation flow are behavior-preserving and well covered by tests.


claude-code — security (done)

I've now verified the key security-relevant aspects of this change. Let me confirm my analysis of the three areas that could matter: session-ID path traversal in FindSource, the symlink-following change, and the removed engine paths.

The findings from my review:

1. FindSource session-ID resolution (path traversal): The new Amp/Zencoder providers route lookups through the shared JSONLSourceSet.FindSource (jsonl_source_set.go:157-196). This is safer than the legacy FindAmpSourceFile/FindZencoderSourceFile, which constructed a path via filepath.Join(dir, id+".json") from the raw ID. The new path (a) validates req.RawSessionID with IsValidSessionID (line 175), and (b) never constructs a path from the ID — it enumerates already-discovered, root-confined files via Discover and matches each file's computed session ID against the request (lines 183-193). No attacker-controlled ID can escape the configured roots. The diff's own TestAmpProviderFindsSourceFile "Validation" subtest confirms "../bad" is rejected.

2. FollowSymlinkFiles: true (new for amp/zencoder/commandcode/gptme/iflow): This restores parity with legacy discovery, which iterated os.ReadDir and matched on filename only (entry.IsDir() returns false for symlinks-to-files, so they were already accepted and parsed). The symlink target is read and displayed to the same local user who owns the session directories (~/.local/share/amp/threads, ~/.zencoder/sessions, etc.). Planting a malicious symlink there requires write access to the user's own home directory — an actor who already holds the user's privileges. No less-privileged actor and no trust boundary is crossed; the data is rendered back to its owner. Per the project threat model this is not a privilege escalation.

3. sourceFileInfo refactor (jsonl_source_set.go:303-319): The consolidation of sourceFileInfo/sourcePathInfo preserves semantics — os.Lstat for the entry, accept regular files, and only os.Stat-through-symlink when FollowSymlinkFiles is set. Discovery and changed-path resolution remain root-confined via pathAllowedByRoot/pathIncluded. The removed classifyOnePath/processAmp/processZencoder engine branches are replaced by the provider's own root-confined SourcesForChangedPath/Parse.

This is a mechanical migration to the provider facade plus a deliberate symlink-parity restoration. No new attacker-reachable input, no auth/authz change, no injection sink, no schema/PG-parity change, and no path-traversal regression.

No issues found.

Amp and Zencoder both use shallow session-file roots, so migrating them together keeps the provider stack moving without introducing another source helper.

The concrete providers preserve legacy filename filters, raw/full ID lookup, deleted-path classification, fingerprint propagation, and parse normalization while continuing to compose the shared JSON source mechanics explicitly.

fix(parser): preserve JSONL symlink file sources

Migrated providers are intended to preserve legacy source discovery while moving behind the provider facade. Several legacy JSON/JSONL discoveries accepted matching symlinked session files and the parsers read through those symlink targets, so the shared source helper needs an explicit opt-in for that source shape instead of treating every symlink as non-regular metadata.

This keeps the default helper behavior strict while allowing shallow and directory JSONL providers to opt into the compatibility path they already had before the migration.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(Amp|Zencoder)ProviderSourceMethodsFollowSymlinkedSessionFile' -count=1; go test -tags "fts5" ./internal/parser -run 'Test(Amp|Zencoder|DeepSeekTUI)ProviderSourceMethodsFollowSymlinkedSessionFile|Test(CommandCode|Iflow)ProviderDiscoversSymlinkedProjectDirectory|TestGptmeProvider' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; make test-short; make nilaway; git diff --check

test(parser): opt amp zencoder into provider shadow

Amp and Zencoder now have concrete facade providers on this branch, so their migration modes should fail closed through the shared shadow-compare harness instead of leaving those implementations additive.

Earlier provider opt-ins stay inherited from lower stack branches, and later provider families remain legacy-only until their own branches introduce concrete providers.

Validation: go test -tags "fts5" ./internal/parser -run TestProviderMigrationModes -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare amp zencoder shadow parity

Amp and Zencoder are shadow-compared on this branch, so add source-level migration tests that run ObserveProviderSource and compare provider output to the legacy ParseAmpSession and ParseZencoderSession functions.

Validation: go fmt ./...; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

refactor(parser): fold amp zencoder into providers

Amp and Zencoder should stop carrying two public parser shapes once their concrete providers exist. Keeping exported parser entrypoints and legacy sync dispatch made this branch additive instead of a real migration.

Make both providers authoritative, move parsing behind provider methods, remove source callbacks and engine dispatch, and replace shadow-baseline tests with provider API coverage plus guards that the old symbols stay gone.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync ./cmd/agentsview -count=1; go vet ./...; git diff --check

fix(parser): preserve amp zencoder file hashes

Amp and Zencoder legacy sync stored the source content hash, but the migrated providers did not request hashed source fingerprints. Provider-authoritative writes would therefore clear file_hash when running through the real provider path.\n\nEnable source hashing for both providers and update their provider tests to exercise Fingerprint -> Parse instead of passing manually injected hashes.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'Test(Amp|Zencoder)ProviderParse' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check
@mariusvniekerk mariusvniekerk force-pushed the provider-deepseek-tui branch from 990379a to 49d1447 Compare June 25, 2026 05:47
@mariusvniekerk mariusvniekerk force-pushed the provider-amp-zencoder branch from 4d6a1d3 to d9a938d Compare June 25, 2026 05:48
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (d9a938d)

Medium: Removing the Amp/Zencoder legacy discovery hooks breaks still-legacy consumers.

  • internal/parser/types.go:235
  • internal/parser/types.go:244

DiscoverFunc and FindSourceFunc were removed for Amp/Zencoder, which makes file-backed agents appear unsupported to consumers that are not yet provider-aware. Impact includes parse-diff --agent amp|zencoder errors, all-agent parse-diff skipping them, SSH remote resolve omitting their default dirs, and token-use raw disk probing failing to resolve unsynced Amp/Zencoder IDs.

Suggested fix: keep the legacy discovery/source lookup wrappers until all consumers are provider-aware, or update parse-diff, SSH resolve, and token-use to use the concrete provider facade for provider-authoritative file-backed agents.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m54s), codex_security (codex/security, done, 1m18s) | Total: 8m19s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant