Skip to content

feat(parser): migrate commandcode and iflow providers#757

Draft
mariusvniekerk wants to merge 1 commit into
provider-directory-jsonl-source-setfrom
provider-commandcode-iflow
Draft

feat(parser): migrate commandcode and iflow providers#757
mariusvniekerk wants to merge 1 commit into
provider-directory-jsonl-source-setfrom
provider-commandcode-iflow

Conversation

@mariusvniekerk

@mariusvniekerk mariusvniekerk commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Migrates Command Code and iFlow from the transitional legacy provider adapter to concrete provider facade implementations. Both providers share the directory-shaped JSONL source helper while still keeping parser-specific filters, source IDs, project hints, fingerprinting, and parse normalization explicit in their own files.

The slice keeps legacy behavior for deleted-path classification, full-ID and raw-ID lookup, source fingerprints, parse output, and symlinked project-directory discovery. Command Code and iFlow are grouped together because they exercise the same //.jsonl layout without adding nested or composite provider complexity.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (7a75a70)

Summary verdict: Two medium-risk parity regressions should be fixed before merge.

Medium

  • internal/parser/commandcode_provider.go:57
    The migrated Command Code provider delegates change classification, watch planning, and fingerprinting to the generic JSONL source set, so <session>.meta.json sidecar changes are ignored even though the parser reads that file for title/cwd metadata. This regresses legacy behavior that routed meta changes to the transcript and folded sidecar size/mtime into freshness.
    Fix: Add Command Code-specific handling that maps .meta.json changes to the sibling .jsonl, watches/classifies those files, and includes sidecar size/mtime in the provider fingerprint and parsed File metadata.

  • internal/parser/iflow_provider.go:87
    The iFlow provider now passes the raw project directory name directly into ParseIflowSession, dropping legacy project normalization and cwd/git-branch inference from processIflow. Encoded project directories or sessions whose JSONL includes a better cwd hint will now be grouped under the wrong project.
    Fix: Mirror the legacy logic in the provider: start with GetProjectName for the directory hint, then use ExtractIflowProjectHints and ExtractProjectFromCwdWithBranchContext(ctx, cwd, gitBranch) when available.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 3m59s), codex_security (codex/security, done, 1m56s) | Total: 6m5s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (ce4ee0b)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (claude-code/default, done, 5m17s), codex_security (claude-code/security, done, 2m27s) | Total: 7m44s

@mariusvniekerk mariusvniekerk force-pushed the provider-directory-jsonl-source-set branch from 55cfa0e to 5f1b70c Compare June 21, 2026 00:40
@mariusvniekerk mariusvniekerk force-pushed the provider-commandcode-iflow branch from ce4ee0b to a58099b Compare June 21, 2026 00:40
@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (a58099b)

Summary verdict: Provider implementation has medium-severity parity gaps; no security issues were reported.

Medium

  • Location: internal/parser/commandcode_provider.go:115, internal/parser/iflow_provider.go:122
    Problem: The new source sets never enable hashing, so provider fingerprints have an empty Hash. The provider parse path only copies req.Fingerprint.Hash, while the legacy engine stores a SHA-256 Session.File.Hash for both agents. This causes every shadow comparison to differ and could eventually clear hashes if the provider path becomes authoritative.
    Fix: Enable Hash: true for both JSONL source sets or compute the same SHA-256 hash before parsing.

  • Location: internal/parser/commandcode_provider.go:71
    Problem: Command Code fingerprints and parsed FileInfo only use the .jsonl file, but the legacy path folds the sibling .meta.json into size/mtime and classifies meta-only changes back to the transcript. Sessions with metadata will shadow-mismatch on File.Size/File.Mtime, and a future authoritative provider can miss title/cwd updates when only metadata changes.
    Fix: Mirror commandCodeEffectiveInfo in the provider fingerprint/parse result and classify/watch .meta.json sidecar changes for the owning session.

  • Location: internal/parser/iflow_provider.go:87
    Problem: The iFlow provider uses the directory name directly as the project and returns parser results without the legacy GetProjectName, ExtractIflowProjectHints/cwd-based project resolution, or InferRelationshipTypes normalization. This makes shadow results diverge for common iFlow files that include cwd or parent session metadata.
    Fix: Mirror processIflow’s project resolution and relationship inference in iflowProvider.Parse.


Panel: ci_default_security | Synthesis: codex, 12s | Members: codex_default (codex/default, done, 5m2s), codex_security (codex/security, done, 2m38s) | Total: 7m52s

@mariusvniekerk mariusvniekerk force-pushed the provider-commandcode-iflow branch from a58099b to f98e057 Compare June 21, 2026 01:35
@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (f98e057)

Medium-risk parity issues remain in the new provider paths; no security findings were reported.

Medium

  • Location: internal/parser/commandcode_provider.go:115 and internal/parser/iflow_provider.go:122
    Problem: The new providers never enable JSONLSourceSetOptions.Hash, but their parsers only populate Session.File.Hash from req.Fingerprint.Hash. In real shadow mode, ObserveProviderSource gets that fingerprint from the provider, so hashes stay empty while the legacy paths compute and store SHA-256 hashes.
    Fix: Set Hash: true for both source sets, or otherwise compute the same file hash before returning parse results.

  • Location: internal/parser/iflow_provider.go:87
    Problem: iFlow provider parsing uses the directory name directly as the project, but the legacy sync path normalizes it with GetProjectName and then prefers ExtractIflowProjectHints plus ExtractProjectFromCwdWithBranchContext. This will produce different projects for common iFlow files and makes the shadow comparison unreliable.
    Fix: Mirror the legacy project derivation in the provider before calling ParseIflowSession.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 4m29s), codex_security (codex/security, done, 2m13s) | Total: 6m50s

@mariusvniekerk mariusvniekerk force-pushed the provider-commandcode-iflow branch from f98e057 to 21ad0bb Compare June 21, 2026 01:46
@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (21ad0bb)

Medium severity findings:

  • internal/parser/iflow_provider.go:87 - The iFlow provider returns raw ParseIflowSession output, but the legacy sync path normalizes the project from cwd/git metadata, computes the file hash, and calls InferRelationshipTypes. Shadow comparison will report mismatches for normal iFlow sessions, and provider-authoritative mode would write different metadata. Reuse or extract the legacy iFlow post-processing into the provider path, including project derivation, hash population, and relationship inference.

  • internal/parser/commandcode_provider.go:70 - The Command Code provider fingerprints only the .jsonl file and does not hash by default, while the legacy path stores the transcript hash and treats the sibling .meta.json as part of the effective source size/mtime. This makes shadow comparisons mismatch and would miss or skip meta-only updates if made authoritative. Add Command Code-specific fingerprint and changed-path handling for .meta.json, compute the transcript hash, and mirror the legacy effective size/mtime fields.


Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 4m33s), codex_security (codex/security, done, 1m21s) | Total: 6m3s

@mariusvniekerk mariusvniekerk force-pushed the provider-directory-jsonl-source-set branch from 5ef4189 to a302cd4 Compare June 23, 2026 23:55
@mariusvniekerk mariusvniekerk force-pushed the provider-commandcode-iflow branch from 21ad0bb to 95f661b Compare June 23, 2026 23:55
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (95f661b)

Synthesis unavailable. Showing individual review outputs.

claude-code — default (done)

Review Findings

Severity: Low
Location: internal/parser/commandcode_provider.go:230-249 (addCommandCodeFingerprintPart), consumed at commandcode_provider.go:155-158 (Parse)
Problem: The migrated Command Code provider stores a different value in sess.File.Hash than the legacy path did. Legacy processCommandCode set sess.File.Hash = ComputeFileHash(file.Path) — a pure SHA-256 of the transcript content. The new provider sets sess.File.Hash = req.Fingerprint.Hash, and that fingerprint is a composite digest built by addCommandCodeFingerprintPart, which folds the file basename, info.Size(), and info.ModTime().UnixNano() into the hashed input for both the transcript and the .meta.json companion. Because mtime is baked into the digest, the persisted file_hash now changes on a mtime-only touch with identical content, whereas the legacy content hash was stable. file_hash feeds change detection (internal/sessionwatch) and the PG push fingerprint (internal/postgres/push.go:1010), so touch-triggered reparses will now produce a "changed" hash and extra rewrites/re-pushes even when content is unchanged. This is not a correctness/data-loss bug (the skip-cache gates on mtime in both old and new paths, and a genuine content change is still detected on reparse), but it is a deliberate semantic divergence. Note that Command Code went straight from ProviderMigrationLegacyOnly to ProviderMigrationProviderAuthoritative (skipping ShadowCompare), so this File.Hash difference was never exercised by the dual-run reflect.DeepEqual session comparison. iFlow does not have this issue: it uses Hash: true on its source set, so its fingerprint hash equals hashJSONLSourceFile == ComputeFileHash, matching legacy.
Fix: If byte-for-byte parity with the legacy content hash is desired, set sess.File.Hash from a pure content hash of the transcript (e.g. reuse hashJSONLSourceFile(path)) rather than the composite fingerprint, and keep the size/mtime composite only for the freshness/skip identity. If the composite-in-file_hash behavior is intentional (it matches the established Shelley/OpenCode pattern), consider dropping info.ModTime().UnixNano() from the hashed parts so the stored hash stays content-stable, and confirm the extra PG re-pushes on no-op touches are acceptable.

Summary

Migrates the Command Code and iFlow agents from legacy AgentDef discover/find/parse hooks to provider-authoritative Provider implementations backed by DirectoryJSONLSourceSet, removing the corresponding legacy free functions and sync-engine branches; the wiring (discovery, changed-path classification, processing, and source lookup) is correct and well-tested, with one low-impact divergence in how Command Code computes the persisted file_hash.


claude-code — security (done)

I've completed my analysis of this migration diff. Let me summarize my security assessment.

Summary

This change migrates the Command Code and iFlow agent parsers from legacy DiscoverFunc/FindSourceFunc/Parse* free functions to the Provider abstraction backed by DirectoryJSONLSourceSet. It removes the corresponding legacy discovery/classification/processing code from discovery.go, engine.go, and the Registry entries, and adds commandcode_provider.go / iflow_provider.go plus helper files.

I focused on whether the migration weakens any path-handling or input-validation boundary, since session IDs and watched paths are the only externally/locally-influenced inputs here.

Key checks performed:

  • Session-ID path safety is preserved or improved. The legacy FindCommandCodeSourceFile/FindIflowSourceFile validated IDs with IsValidSessionID and then built a path via filepath.Join(..., rawID+".jsonl"). The new JSONLSourceSet.FindSource (jsonl_source_set.go:171) still gates on IsValidSessionID, and critically no longer constructs a path from the raw ID at all — it enumerates real discovered files and matches by computed session ID. IsValidSessionID (discovery.go:831) rejects everything except [A-Za-z0-9_-], so /, ., and .. cannot pass. No traversal is reachable through the find path.
  • Persisted-hint lookups stay root-confined. StoredFilePath/FingerprintKey resolution routes through sourceForPathpathAllowedByRoot (must be under a configured root) + IncludePath, so a tampered hint can't escape the configured roots.
  • sourceForMetaCompanion (new) is bounded. It validates the stem with IsValidSessionID, derives the transcript path, then re-validates it via sourceForPath, plus an explicit WatchRoot containment check.
  • Discovery filters match legacy semantics. isCommandCodeSourcePath keeps the .checkpoints/.prompts exclusions and IsValidSessionID check; isIflowSourcePath keeps the session- prefix requirement; isDirectoryJSONLPath still enforces the exact <root>/<project>/<file> two-component shape.
  • Symlink following (FollowSymlinkDirs: true) is preserved legacy behavior, and per the project trust model the local-filesystem/symlink/TOCTOU surface in ~/.commandcode / ~/.iflow is not an attacker boundary (home-dir write access already implies equivalent privilege).
  • No schema or session-model changes, so the PostgreSQL parity requirements do not apply to this diff.
  • Error strings (fmt.Errorf("stat %s: ...")) carry only local file paths into local sync logs on a single-user tool; not a cross-boundary leak.

The validation that mattered in the legacy code is intact, and the new find-source path is actually hardened (enumerate-and-match rather than construct-from-input). Nothing here opens a new attack surface across a real trust boundary.

No issues found.

Command Code and iFlow both fit the directory JSONL source shape, so moving them together proves the helper against real providers without mixing in nested layouts like Qwen or composite providers like WorkBuddy.

The providers keep source discovery, changed-path classification, persisted lookup, fingerprinting, and parse normalization behind concrete facade implementations while preserving the legacy parser functions for current runtime callers.

fix(parser): preserve JSONL provider symlink discovery

Command Code and iFlow legacy discovery followed symlinked project directories. The migrated providers should keep that behavior so users with linked project roots do not silently lose discovery or raw-session lookup after moving onto the provider facade.

test(parser): opt commandcode iflow into provider shadow

CommandCode and iFlow now have concrete facade providers on this branch, so keeping them legacy-only would let the migration branch remain additive instead of exercised by the shared provider harness.

This makes the migration manifest fail closed for the providers introduced here while leaving unrelated providers for their own stack branches.

Validation: go test -tags "fts5" ./internal/parser -run TestProviderMigrationModes -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare commandcode iflow shadow parity

Command Code and iFlow now opt into shadow comparison, so their provider branch should prove more than provider-local parsing. Add source-level migration tests that run ObserveProviderSource and compare the normalized provider output against the legacy parser functions for both agents.

Validation: go fmt ./...; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

refactor(parser): fold commandcode and iflow into provider

Move Command Code and iFlow parse ownership onto their concrete
providers and delete the package-level discover/find/parse entrypoints
plus the legacy sync dispatch for both agents. Both agents become
provider-authoritative so runtime sync routes through provider
changed-path classification and processProviderFile instead of the
removed processCommandCode/processIflow methods.

Command Code:
- parseSession moves onto the provider; DiscoverCommandCodeSessions,
  FindCommandCodeSourceFile, and ParseCommandCodeSession are removed.
- The provider reproduces the legacy .meta.json companion behavior:
  WatchPlan includes *.meta.json, SourcesForChangedPath remaps a
  changed .meta.json back to its .jsonl transcript, the composite
  Fingerprint folds the companion size, mtime, and content into the
  freshness identity, and Parse overrides File.Size/File.Mtime with the
  combined transcript+meta effective info. commandCodeEffectiveInfo
  stays in the engine for the SourceMtime watcher fallback.

iFlow:
- parseSession moves onto the provider; DiscoverIflowProjects,
  FindIflowSourceFile, and ParseIflowSession are removed.
- Parse mirrors the legacy sync path: it resolves the project from the
  recorded cwd and git branch (falling back to GetProjectName of the
  project directory), applies InferRelationshipTypes to derive
  continuation/subagent links, and enables source content hashing so
  File.Hash matches the legacy ComputeFileHash value.

Tests move from the deleted free functions to provider API coverage,
add guard tests asserting the legacy entrypoints stay gone, drop the
shadow comparison test, and remove both provider files from the
pending-shim scan list.

fix(parser): preserve commandcode file hash parity

Command Code needs a composite provider fingerprint so metadata-only edits invalidate freshness, but that value should not replace the persisted transcript content hash. The legacy sync path stored the SHA-256 of the transcript file in file_hash, and changing that semantic would make metadata-only edits look like transcript content changes.\n\nKeep the composite value scoped to SourceFingerprint and recompute Session.File.Hash from the transcript during provider parse. The provider test now exercises Fingerprint -> Parse with a .meta.json companion to prove the two hashes remain distinct.\n\nValidation: go test -tags "fts5" ./internal/parser -run TestCommandCodeProvider -count=1; go test -tags "fts5" ./internal/parser -count=1; go test -tags "fts5" ./internal/sync -run 'Test.*CommandCode|Test.*Iflow' -count=1; go vet ./...; git diff --check

fix(parser): thread ctx through commandcode and iflow source lookups
@mariusvniekerk mariusvniekerk force-pushed the provider-directory-jsonl-source-set branch from a302cd4 to 6c48133 Compare June 25, 2026 05:47
@mariusvniekerk mariusvniekerk force-pushed the provider-commandcode-iflow branch from 95f661b to 1591073 Compare June 25, 2026 05:47
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (1591073)

Medium: iFlow and Command Code migration drops legacy discovery consumers.

  • Medium: internal/parser/types.go:257 and internal/parser/types.go:348
    iFlow and Command Code are still file-backed agents, but removing their DiscoverFunc / FindSourceFunc without migrating all legacy hook consumers drops them from parse-diff, SSH remote directory resolution, and token-use on-disk ID resolution. Default parse-diff will skip them, explicit --agent iflow / commandcode will be rejected, SSH sync will not transfer their directories, and unsynced on-disk IDs will not resolve.
    Fix: Update those callers to use provider discovery/lookup for provider-authoritative agents, or keep registry hook adapters that delegate to the new providers until every consumer is migrated.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m42s), codex_security (codex/security, done, 1m49s) | Total: 8m38s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant