Skip to content

feat(parser): migrate claw providers#766

Draft
mariusvniekerk wants to merge 1 commit into
provider-kimifrom
provider-openclaw-qclaw
Draft

feat(parser): migrate claw providers#766
mariusvniekerk wants to merge 1 commit into
provider-kimifrom
provider-openclaw-qclaw

Conversation

@mariusvniekerk

@mariusvniekerk mariusvniekerk commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

OpenClaw and QClaw now have concrete provider facades for their shared Claw-style session layout, where each agent directory owns a sessions folder containing active JSONL files and archived JSONL variants.

The providers preserve active-over-archive and newest-archive source selection, colon-delimited agent/session lookup, selected-source changed-path classification, symlinked agent directories, stale stored-path remapping, source fingerprinting, archive promotion on remove/rename-style events, and existing parse normalization while keeping the archive rules local to the Claw provider implementation.

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (b53058f)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 4m18s), codex_security (codex/security, done, 2m3s) | Total: 6m21s

@roborev-ci

roborev-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

roborev: Combined Review (1faf81d)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 4m14s), codex_security (codex/security, done, 15s) | Total: 4m29s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (c9203fc)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (claude-code/default, done, 5m31s), codex_security (claude-code/security, done, 1m30s) | Total: 7m1s

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (aefb920)

Summary verdict: one medium-risk regression should be fixed before merge; no security issues were identified.

Medium

  • internal/parser/claw_provider.go:410
    ProjectHint is set to the Claw agent directory, but shadow changed-path classification copies that into DiscoveredFile.Project. The legacy processOpenClaw/processQClaw path then passes it to the parser, which can bypass the existing cwd-derived project name and overwrite sessions with projects like main.

    Fix: Leave Claw ProjectHint empty, or make shadow-mode changed-path merging avoid overriding the legacy project. Add a SyncPaths/changed-path regression test.


Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 6m21s), codex_security (codex/security, done, 2m5s) | Total: 8m32s

@mariusvniekerk

mariusvniekerk commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (d7112d3)

Summary verdict: two medium issues need attention before merge; no high or critical findings were reported.

Medium

  • Location: internal/parser/claw_provider.go:410
    Problem: ProjectHint is set to the Claw agent directory (main, etc.), but provider changed-path classification copies ProjectHint into DiscoveredFile.Project. The legacy Claw parsers treat a non-empty project as authoritative, so watcher-triggered syncs can overwrite the cwd-derived project with the agent ID.
    Fix: Leave ProjectHint empty for Claw sources unless it is an actual project name, and add sync/classify coverage that verifies the parsed project still comes from cwd.

  • Location: internal/sync/engine.go:1496
    Problem: On removal of an active Claw .jsonl, the new provider can emit the promoted archive, but the legacy classifier still returns the deleted active path because the .jsonl branch does not require the file to exist. The engine then processes both and reports a stat error for the deleted active file.
    Fix: Add an engine-level active-removal test and suppress the missing active Claw path when a provider maps the event to an archive, or make the legacy Claw classifier ignore missing active files in that case.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 8m11s), codex_security (codex/security, done, 2m22s) | Total: 10m43s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (1b8c2d1)

Synthesis unavailable. Showing individual review outputs.

claude-code — default (done)

Review Findings

Severity: Low
Location: internal/parser/claw_provider.go:422 (sourceRef), with clawAgentIDFromRawID
Problem: sourceRef sets ProjectHint: clawAgentIDFromRawID(rawID), i.e. the per-agent directory name (e.g. "main"). But both Parse methods intentionally call p.parseSession(path, "", machine) — passing an empty project so the project is derived from the session's cwd — so this hint is never consumed for claw. The agent-id is not a project. Sibling providers that likewise don't consume the hint set it to empty (cortex_provider.go:178, commandcode_provider.go:186, pi_provider.go:193), while providers that set a non-empty hint (kimi, qwen, workbuddy, pi) actually pass req.Source.ProjectHint into parseSession. The claw provider is the odd one out: a non-empty, semantically-wrong hint that is silently ignored. This is currently inert (the parsed cwd project wins, and processProviderFile does not apply file.Project), but if any future change starts consuming ProjectHint as a project fallback, claw sessions would incorrectly be grouped under "main".
Fix: Return "" for the claw ProjectHint (matching cortex/commandcode/pi) to avoid the latent mismatch, or document why the agent-id is stored there.


Severity: Low
Location: internal/parser/claw_provider_test.go:21-65 (TestClawProvidersOwnLegacyEntrypoints)
Problem: This test reads the .go source files as text and asserts the absence of symbol/call substrings (with a convoluted strings.ReplaceAll(providerText, "func "+call, "") to distinguish definitions from calls). Since claw_provider.go never references the legacy functions at all, the call-absence assertions pass trivially and verify little. It is fragile to cosmetic source edits and runs against the project's test-style guidance favoring observable-behavior tests over source-text matching.
Fix: Drop the source-text assertions and rely on the behavioral tests (assertClawProviderReplacesLegacyAdapter, the source-method/parse tests) which already prove the providers own these entrypoints.

Summary

A clean, faithful migration of the OpenClaw/QClaw parsers from the legacy discover/find/parse functions to the provider architecture; routing, dedup/archive selection, and existing engine classification tests are correctly preserved through the generic provider path, with only minor code-quality nits.


claude-code — security (done)

I've completed my analysis of this diff, which migrates the OpenClaw and QClaw session parsers from the legacy function-based architecture (DiscoverOpenClawSessions, FindOpenClawSourceFile, ParseOpenClawSession, and their QClaw equivalents) to the provider-based architecture (openClawProvider/qClawProvider with a shared clawSourceSet), and flips both agents from ProviderMigrationLegacyOnly to ProviderMigrationProviderAuthoritative.

The one security-relevant surface is path construction from session/agent IDs. I verified:

  • Path traversal is not reachable. Both sourcePathForRawID (raw-ID → path) and rawIDFromPath (path → raw-ID) gate every path component through IsValidSessionID, which (discovery.go:831) rejects any rune that isn't [A-Za-z0-9_-] — so .., /, and \ cannot appear in agentID or sessionID before they reach filepath.Join. The session ID extracted from a filename is re-validated after suffix stripping.
  • The control is preserved, not weakened. The new rawIDFromPath enforces the same len(parts) == 3 && parts[1] == "sessions" structure and IsValidSessionID(agentID) check the removed classifyOnePath/FindOpenClawSourceFile code used, and adds an extra IsValidSessionID(sessionID) check on the filename-derived ID that the old discovery path did not have. The externally-influenceable inputs (StoredFilePath, FingerprintKey, RawSessionID) all flow through this same validation before any os.Stat/os.ReadDir.
  • No new sink. Parse/Fingerprint resolve paths from a clawSource.Opaque that the provider itself constructs from validated components during discovery; the fallback candidates re-validate via sourceForPathInRoot.

Symlinked agent directories are still followed via isDirOrSymlink, matching prior behavior, and this is a local single-user tool reading user-owned session files (per the project trust model). No schema/column changes are involved, so PG parity is not implicated.

This is a behavior-preserving refactor that keeps (and slightly tightens) the existing input validation. No injection, traversal, auth, credential, or data-exposure changes are introduced.

No issues found.

@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (cacb2e2)

Summary verdict: one medium correctness regression should be fixed before merge.

Medium

  • internal/parser/claw_provider.go:312 — The new OpenClaw/QClaw provider fingerprint only returns size and mtime, so clawParseOutcome never receives a hash during normal engine processing. Legacy processOpenClaw/processQClaw computed sess.File.Hash, so the first provider-backed resync will clear sessions.file_hash for these agents.

    Fix: Compute and populate SourceFingerprint.Hash in clawSourceSet.Fingerprint, or compute the file hash during provider parse. Add a provider test that exercises Fingerprint plus Parse without manually injecting a hash.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m59s), codex_security (codex/security, done, 1m53s) | Total: 8m59s

OpenClaw and QClaw share a Claw-style source layout where each agent directory owns a sessions folder and active JSONL files compete with archived JSONL variants for the same logical session.

Moving them behind concrete provider facades keeps that active-over-archive and newest-archive policy explicit without broadening the generic JSONL source helpers around variable archive suffixes. The providers preserve colon-delimited agent/session lookup, selected-source change classification, symlinked agent directories, stale stored-path remapping, source fingerprinting, and existing parse normalization.

fix(parser): promote claw archives on removal

Claw providers choose a single source per logical session, so live-sync removal events need to account for source promotion. When an active file or newest archive disappears, another archive may become the selected source even though the changed path is no longer the source to parse.

This keeps write events strict about the selected path, while remove and rename-style missing-path events can remap a valid stale Claw path to the newly selected source for the same raw session ID.

test(parser): opt openclaw qclaw into provider shadow

OpenClaw and QClaw now have concrete facade providers on this branch, so their migration modes should enter shadow comparison rather than staying legacy-only and additive.

Earlier provider opt-ins remain inherited; later provider branches still own their own modes.

Validation: go test -tags "fts5" ./internal/parser -run TestProviderMigrationModes -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare claw shadow parity

OpenClaw and QClaw are shadow-compared on this branch, so add source-level migration coverage that compares provider observation with their legacy parsers.

The paired test follows the shared provider implementation and keeps the agent/session raw ID shape and planned data-version behavior visible during review.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesClawLegacyParsers|Test(OpenClaw|QClaw)Provider|TestParse(OpenClaw|QClaw)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

refactor(parser): fold claw providers into provider

OpenClaw and QClaw should no longer keep exported discover/find/parse entrypoints beside the provider facade. Folding discovery, raw-ID lookup, archive selection, and parsing into the concrete providers makes this branch a real migration instead of another shim around the legacy path.

The sync engine now relies on provider changed-path handling for this family, so the provider migration mode can become authoritative and the shadow-only comparison test is removed.

Validation: go test -tags "fts5" ./internal/parser -run 'TestClawProvidersOwnLegacyEntrypoints|TestOpenClaw|TestQClaw|TestClawProvider|TestParseOpenClaw|TestParseQClaw|TestDiscoverOpenClaw|TestDiscoverQClaw|TestFindOpenClaw|TestFindQClaw' -count=1 -v; go test -tags "fts5" ./internal/sync -run 'TestEngine_ClassifyPathsQClaw|TestProviderMigration|TestObserveProvider|TestProviderProcess' -count=1 -v; go fmt ./...; go test -tags "fts5" ./internal/parser ./internal/sync ./cmd/agentsview -count=1; go vet ./...; git diff --check
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (ef5bc88)

Summary verdict: changes are not clean; two medium issues need follow-up before merge.

Medium

  • internal/parser/types.go:367 - Removing DiscoverFunc/FindSourceFunc for OpenClaw/QClaw leaves remaining callers behind. SSH remote resolution still includes only file-based agents with DiscoverFunc, and token-use disk probes still require FindSourceFunc, so remote SSH sync may skip these agents and on-disk resolution of unsynced OpenClaw/QClaw IDs can fail.

    Fix: Move those callers to the provider facade for provider-authoritative agents, or keep legacy hooks until all callers are migrated.

  • internal/parser/claw_provider.go:312 - The new provider fingerprint does not compute Hash, but clawParseOutcome only preserves sess.File.Hash when req.Fingerprint.Hash is set. The removed legacy sync path computed ComputeFileHash, so the next provider-backed sync can clear existing file_hash metadata for OpenClaw/QClaw sessions.

    Fix: Include a content hash in clawSourceSet.Fingerprint or otherwise compute and set sess.File.Hash during parse, with a regression test that synced sessions retain a non-empty file_hash.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 5m59s), codex_security (codex/security, done, 1m58s) | Total: 8m5s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant