Skip to content

feat(parser): migrate deepseek tui provider#758

Draft
mariusvniekerk wants to merge 1 commit into
provider-gptmefrom
provider-deepseek-tui
Draft

feat(parser): migrate deepseek tui provider#758
mariusvniekerk wants to merge 1 commit into
provider-gptmefrom
provider-deepseek-tui

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

DeepSeek TUI has a shallow one-file-per-session JSON layout, so moving it next keeps the provider migration incremental while exercising the JSON source helper with non-JSONL extensions.

The provider preserves legacy discovery filters for latest and offline queue files, raw/full ID lookup, changed-path classification, fingerprint propagation, and parse normalization without changing runtime sync dispatch.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (b601036)

Summary verdict: One medium-severity issue remains; no high or critical findings were reported.

Medium

  • internal/parser/deepseek_tui_provider.go:111
    The new source set omits Hash: true, so provider.Fingerprint will never populate SourceFingerprint.Hash. Parse only copies req.Fingerprint.Hash, which means the migrated DeepSeek provider will stop persisting the file_hash that the existing DeepSeek sync path computes.
    Fix: Enable hashing in JSONLSourceSetOptions for DeepSeek TUI, and update the provider test to call Fingerprint and verify the parsed session receives that hash.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 4m0s), codex_security (codex/security, done, 1m19s) | Total: 5m25s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (808b17a)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (claude-code/default, done, 4m5s), codex_security (claude-code/security, done, 41s) | Total: 4m46s

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (4b11d8c)

Summary verdict: One medium issue remains; no critical or high findings were reported.

Medium

  • internal/parser/deepseek_tui_provider.go:111: The new DeepSeek TUI provider never requests a content hash, but Parse only preserves sess.File.Hash when req.Fingerprint.Hash is populated. The legacy DeepSeek path computes file_hash plus inode/device metadata, so ProviderMigrationShadowCompare can report mismatches for successful parses and would drop that metadata if this provider becomes authoritative.
    • Fix: Enable hashing for the DeepSeek source set and carry file identity through the provider fingerprint/parse path, or otherwise make the provider output match the legacy processDeepSeekTUI session metadata. Add a caller-level shadow comparison test that exercises the real legacy process result.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 6m31s), codex_security (codex/security, done, 1m9s) | Total: 7m48s

@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (282fb83)

Summary verdict: one medium issue remains; no high or critical findings were reported.

Medium

  • internal/parser/deepseek_tui_provider.go:95 - The new DeepSeek TUI provider does not populate File.Hash, File.Inode, or File.Device the way the legacy processDeepSeekTUI path does. newDeepSeekTUISourceSet also does not enable hashing, so shadow compare will report session mismatches for valid DeepSeek files when this mode is enabled. Fix by computing/copying the same file hash and identity metadata as the legacy path, and add an engine-level shadow comparison test that asserts no mismatches for DeepSeek TUI.

Panel: ci_default_security | Synthesis: codex, 13s | Members: codex_default (codex/default, done, 5m23s), codex_security (codex/security, done, 1m41s) | Total: 7m17s

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (adb7150)

Summary verdict: one medium issue remains; no high or critical findings were reported.

Medium

  • internal/parser/deepseek_tui_provider.go:95
    The new provider does not preserve DeepSeek TUI file metadata parity with the legacy sync path. The legacy processDeepSeekTUI populates Session.File.Hash, Inode, and Device, but real provider parses only copy req.Fingerprint.Hash, and the source set is not configured to compute that hash. In shadow mode this can report mismatches for otherwise equivalent sessions, and provider-authoritative mode would stop persisting those fields.

    Fix: Populate the provider fingerprint/parse result with the same hash and file identity fields as the legacy path, and add a shadow comparison test that compares against the full legacy process result rather than only ParseDeepSeekTUISession.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 5m13s), codex_security (codex/security, done, 45s) | Total: 6m5s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (990379a)

Synthesis unavailable. Showing individual review outputs.

claude-code — default (done)

Review Findings

Severity: Medium
Location: internal/parser/deepseek_tui_provider.go, newDeepSeekTUISourceSet (JSONLSourceSetOptions block)
Problem: The migration drops the persisted per-session content hash. The legacy processDeepSeekTUI always computed ComputeFileHash(file.Path) and stored it in sess.File.Hash. In the new provider, Parse only sets sess.File.Hash = req.Fingerprint.Hash when that value is non-empty, and the fingerprint's Hash is populated only when the source set enables Hash: true (see JSONLSourceSet.Fingerprint). newDeepSeekTUISourceSet does not set Hash: true, so Fingerprint.Hash is always "" in the real engine path and DeepSeek TUI sessions are now persisted with an empty file_hash. This diverges from the sibling JSONL providers migrated in the same stack: newIflowSourceSet and newGptmeSourceSet both set Hash: true, with iflow even commenting that it "mirrors the legacy sync path, which set the session file hash from a full content hash of the transcript." The TestDeepSeekTUIProviderParse test masks this because it injects Fingerprint{Hash: "abc123"} manually rather than going through provider.Fingerprint. Practical impact is limited (DeepSeek skip/change-detection is mtime+size based, not hash based, so updates are not missed), but it silently stops populating a DB column that was previously written and is still written for other agents, and will cause a one-time PG re-push churn for existing DeepSeek sessions as their fingerprint flips to the empty hash.
Fix: Add Hash: true to the JSONLSourceSetOptions in newDeepSeekTUISourceSet, matching iflow/gptme; optionally extend TestDeepSeekTUIProviderParse to assert provider.Fingerprint(...) returns a non-empty hash so the regression is caught without manually supplying one.

Summary

Migrates the DeepSeek TUI agent from legacy discover/find/parse functions to the Provider facade backed by JSONLSourceSet (adding a FollowSymlinkFiles option), but omits Hash: true, dropping the persisted session content hash that the legacy path and sibling provider migrations preserve.


claude-code — security (done)

I've reviewed the diff and traced the security-relevant control flow into the implementations not shown in the diff (JSONLSourceSet.FindSource, sourceForPath, and the symlink-resolution helpers).

This commit migrates the DeepSeek TUI parser from standalone discovery/find/parse functions to the Provider/JSONLSourceSet facade. I focused on the two security-relevant surfaces:

Path-traversal on session-ID lookup (the one attacker-influenceable input path). The legacy FindDeepSeekTUISourceFile guarded against traversal with IsValidSessionID(rawID) before filepath.Join(root, rawID+".json"). The replacement (JSONLSourceSet.FindSource, lines 175–193) is strictly stronger: it still validates RawSessionID with IsValidSessionID, and rather than joining the raw ID into a path, it only equality-matches the ID against already-discovered sources under configured roots. StoredFilePath/FingerprintKey lookups go through sourceForPath, which constrains the path with pathAllowedByRoot (must be under a configured root) and pathIncluded. So even an authenticated remote-mode caller passing a crafted session ID cannot escape the configured roots. The retained ../session_123 → not-found test confirms parity. Protection preserved.

New FollowSymlinkFiles option. sourceFileInfo/sourcePathInfo resolve symlinks-to-files via os.Stat, so a symlink under the sessions dir (e.g. evil.json -> /etc/passwd) would be treated as a source and read by parseSession. This matches legacy behavior: the old DiscoverDeepSeekTUISessions used entry.IsDir() (which reports false for symlinks) and then os.ReadFile followed the link at parse time. Planting such a symlink requires write access to the user-owned ~/.deepseek/sessions directory, which already implies same-user privileges; the remote API exposes no way to create it. Per the project threat model (local same-user access is not an attacker boundary; TOCTOU/symlinks on user-owned local paths are out of scope), and because the behavior is unchanged from the legacy path, this is not a new boundary crossing.

No SQL/command/template sinks, secrets, auth decisions, or privileged operations are touched; parsing is gjson-based with no eval, and IsValidSessionID still gates the stored ID prefix.

No issues found.

DeepSeek TUI has a shallow one-file-per-session JSON layout, so moving it next keeps the provider migration incremental while exercising the JSON source helper with non-JSONL extensions.

The provider preserves legacy discovery filters for latest and offline queue files, raw/full ID lookup, changed-path classification, fingerprint propagation, and parse normalization without changing runtime sync dispatch.

fix(parser): preserve deepseek tui symlink files

DeepSeek TUI legacy lookup and parsing followed direct symlinks to session JSON files, so the facade provider needs an explicit way to preserve that source shape instead of silently dropping linked archives.

The JSONL source helper keeps symlink-file following opt-in, DeepSeek TUI enables it, and the branch manifest opts the concrete provider into shadow comparison so the migration is exercised rather than additive.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(DeepSeekTUIProvider|JSONLSourceSet|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(parser): skip deepseek symlink test when unsupported

Some test environments deny symlink creation even though the provider behavior is valid when links are available. The regression should skip in that environment instead of failing for host permissions.

Validation: go test -tags "fts5" ./internal/parser -run 'Test(DeepSeekTUIProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare deepseek tui shadow parity

DeepSeek TUI is shadow-compared on this branch, so add the shared source-level proof that provider observation matches the existing ParseDeepSeekTUISession output. This keeps the branch review focused on an actual migration surface rather than only provider-local parser tests.

Validation: go fmt ./...; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

refactor(parser): fold deepseek tui into provider

DeepSeek TUI should have one maintained parser shape on this branch. Leaving exported discover, lookup, and parse functions beside the concrete provider kept the migration additive and forced sync to preserve a second dispatch path.

Make the concrete provider authoritative, move parsing onto the provider, remove the AgentDef legacy hooks and engine dispatch, and replace shadow-baseline tests with provider API coverage plus a guard that the old symbols stay gone.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync ./cmd/agentsview -count=1; go vet ./...; git diff --check

fix(parser): preserve deepseek tui file hash

DeepSeek TUI legacy sync stored the transcript content hash, but the migrated provider source set did not request hashing, so provider-authoritative Parse left Session.File.Hash empty when using the real Fingerprint path.\n\nEnable source hashing for DeepSeek TUI and make the provider parse test use Fingerprint -> Parse to assert the persisted file_hash value comes from the session JSON content.\n\nValidation: go test -tags "fts5" ./internal/parser -run TestDeepSeekTUIProvider -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (49d1447)

Medium issue found: DeepSeek TUI parse-diff coverage regressed; no security findings.

Medium

  • internal/parser/types.go:365 and internal/sync/parsediff.go:217
    DeepSeek TUI no longer has a DiscoverFunc, but parse-diff still only discovers and accepts file-based agents with DiscoverFunc. As a result, --agent deepseek-tui will be rejected as having no on-disk source, and unrestricted parse-diff will skip existing DeepSeek sessions as “not discovered” instead of comparing them.
    Fix: Teach parse-diff to include provider-authoritative agents and build provider-backed work items, or keep the legacy discovery hook until parse-diff is provider-aware. Add parse-diff coverage for DeepSeek TUI/provider-authoritative sources.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m18s), codex_security (codex/security, done, 1m17s) | Total: 7m42s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant