Skip to content

feat(parse-diff): classify live-write skew as raced, not changed#805

Merged
wesm merged 10 commits into
kenn-io:mainfrom
mjacobs:feat/parsediff-raced-skew-detection
Jun 24, 2026
Merged

feat(parse-diff): classify live-write skew as raced, not changed#805
wesm merged 10 commits into
kenn-io:mainfrom
mjacobs:feat/parsediff-raced-skew-detection

Conversation

@mjacobs

@mjacobs mjacobs commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

parse-diff re-parses on-disk sources and compares them against stored rows, so a parser change's effect can be gated in CI with --fail-on-change. Two things made that gate untrustworthy; this PR fixes both, plus a Codex title-refresh gap the work surfaced.

Live-write skew classified as raced

parse-diff snapshots stored rows, then re-parses sources; a daemon or an active session writing the archive mid-run produced torn comparisons reported as changed, tripping --fail-on-change on a write unrelated to the parser. After each per-file compare, the source mtime is re-stat'd (the same agent-aware nanosecond value sync stores in file_mtime) and compared against the snapshot's Session.FileMtime; a file written after the snapshot is classified raced instead of changed. Raced sessions are excluded from --fail-on-change, so a concurrent write no longer fails the run, while a genuine change on a demonstrably untouched file still does. Classification is conservative (unreadable source or missing stored mtime → raced) but never masks a real change.

Report-only engine write guard

NewDiffEngine returns the full *Engine, which still exposed SyncAll/ResyncAll, so a report-only/force-parse engine could in principle be driven into a write path. refuseWriteInForceParse now guards those entrypoints as a no-op when forceParse is set (only NewDiffEngine sets it). Additive; no existing write behavior changes.

Codex title refresh (surfaced by this work)

To keep the raced guard consistent, the incremental write now folds the session_index.jsonl mtime into the stored file_mtime, exactly as a full parse does. That raised the stored watermark on the append path and exposed a latent gap: a title-only index rename whose mtime lands at or below the stored value was filtered out by an indexMtime > storedMtime gate (in both the incremental and SyncAllSince quick-sync paths), leaving the title stale behind shouldSkipCodex's fast path. Both paths now compare the index title to the stored session_name directly. The broader "mtime-watermark conflates transcript + title" class is tracked as a follow-up.

Scope

Covers the parse-diff v2 sub-items for live-write skew and the write-path guard, plus the Codex title-refresh fixes above. Incremental-append skew detection and DB-backed coverage for Warp/Forge/Piebald remain for a follow-up.

Reviewers: internal/sync/parsediff.go, internal/sync/parsediff_compare.go, internal/sync/parsediff_report.go, and the Codex paths in internal/sync/engine.go.

@roborev-ci

roborev-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown

roborev: Combined Review (913765d)

Medium confidence: one Medium correctness issue remains; no High or Critical findings were reported.

Medium

  • internal/sync/parsediff.go:425 - The race check compares one live mtime from job.path against each row’s per-session file_mtime. For fan-out or composite sources these values may not be comparable, so parser drift for older sessions can be incorrectly marked as raced and excluded from --fail-on-change; composite sources can also miss sibling-file changes. Recompute the live mtime per emitted session using the same source identity/effective mtime that populated sessions.file_mtime, such as pw.sess.File.Path via a per-session source-mtime helper, and add coverage for shared DB/virtual/composite sources.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 5m14s), codex_security (codex/security, done, 1m18s) | Total: 6m39s

@roborev-ci

roborev-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown

roborev: Combined Review (bec29cd)

Summary verdict: One medium-severity correctness issue needs fixing before relying on parse-diff results for virtual-path agents.

Medium

  • internal/sync/parsediff.go:438: The skew guard uses discoveredFileMtime for parsed session virtual paths, but that helper does not always match the mtime basis stored in sessions.file_mtime. For Aider virtual paths it stats the literal <history>#idx path and fails, causing real diffs to become DiffRaced; for Zed/Shelley it uses composite DB mtimes instead of the per-session row timestamp, which can mask genuine parser drift and let --fail-on-change exit clean.

    Fix: Add a parse-diff live-mtime resolver that mirrors each agent’s stored File.Mtime semantics for virtual paths, including Aider physical history stat, Zed per-thread mtime, and Shelley conversation updated_at, with regression tests for those agents.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 4m22s), codex_security (codex/security, done, 1m3s) | Total: 5m33s

@roborev-ci

roborev-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown

roborev: Combined Review (150e7a0)

Summary verdict: One medium issue remains; no critical or high findings were reported.

Medium

  • internal/sync/parsediff.go:482: The raced guard resolves live mtimes through discoveredFileMtime, but some reliable file-based sources do not use the same mtime basis stored in file_mtime. For OpenHands, sync stores OpenHandsSnapshot(...).Mtime from child files, while discoveredFileMtime falls through to the session directory stat. Rewriting an existing event file can advance the stored source basis without changing the directory mtime, so live-write skew remains DiffChanged and can still trip --fail-on-change.

    Suggested fix: Use the same per-agent effective mtime used to populate ParsedSession.File.Mtime for the raced check, at least adding an OpenHands snapshot case and auditing other effective-mtime agents.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m47s), codex_security (codex/security, done, 15s) | Total: 7m9s

@mjacobs mjacobs force-pushed the feat/parsediff-raced-skew-detection branch from 150e7a0 to 6323060 Compare June 22, 2026 19:46
@roborev-ci

roborev-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown

roborev: Combined Review (1e90b53)

Summary verdict: One medium issue remains; no high or critical findings were reported.

Medium

  • internal/sync/parsediff.go:451 - The raced guard uses parser.CodexEffectiveMtime, which includes Codex’s global session_index.jsonl mtime. Because that index is shared by all Codex sessions, an unrelated title/index update can make every Codex session look raced and reclassify real parser drift as DiffRaced, causing --fail-on-change to pass incorrectly.

    Suggested fix: Do not use the global index mtime as a per-session raced signal. Use the transcript mtime for non-title diffs, or only apply the index mtime when the changed field is attributable to that session’s own title/index entry.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m57s), codex_security (codex/security, done, 44s) | Total: 7m48s

@mjacobs mjacobs force-pushed the feat/parsediff-raced-skew-detection branch from 246b9ae to 5d77679 Compare June 23, 2026 02:44
mjacobs and others added 2 commits June 24, 2026 11:41
Squash the parse-diff raced-skew work into one branch commit before rebasing onto origin/main. The series adds race-aware classification for live source writes while tightening the guard so shared, virtual, or DB-backed sources keep failing closed instead of masking real parser drift. It also fixes Codex index-title refresh paths that interacted with the same effective-mtime basis.

Included follow-ups:
- feat(parse-diff): classify live-write skew as raced, not changed
- fix(parse-diff): resolve live-write skew mtime per session
- fix(parse-diff): only apply raced guard to reliable file-based sources
- fix(parse-diff): source raced live mtime from the parsed effective File.Mtime
- fix(parse-diff): re-stat source after read for the raced live mtime
- fix(parse-diff): recompute effective live mtime at collect time
- fix(parse-diff): skip raced guard for sources shared by many sessions
- fix(sync): refresh Codex title when index mtime is below stored mtime
- fix(sync): refresh Codex title in quick sync when index mtime is below stored mtime
- test(parse-diff): isolate all agent env vars in fail-on-change tests
- fix(parse-diff): ignore Codex index mtime for raced drift

Co-authored-by: Wes McKinney <wesmckinn+git@gmail.com>
Codex stores file_mtime on an index-folded basis so title changes invalidate normal sync caches. Parse-diff intentionally ignores the shared index as a raced signal, but that left a blind spot when the stored index mtime was newer than both transcript mtimes: a later transcript write could still compare below the stored value and be reported as real parser drift.

Use the existing transcript fingerprint as the Codex fallback when the mtime comparison alone is inconclusive. That keeps index-only skew from masking unchanged transcript drift while still treating proven transcript rewrites as raced.
@wesm wesm force-pushed the feat/parsediff-raced-skew-detection branch from 1e90b53 to bd73851 Compare June 24, 2026 16:51
The Codex raced-skew fallback treats file_hash as the transcript fingerprint. Incremental JSONL sync advanced file_size and file_mtime while preserving the previous full-parse hash, so an already-synced appended transcript could later look like a live rewrite and hide unchanged-source parser drift from --fail-on-change.

Refresh the hash only for Codex incremental snapshots, where parse-diff depends on that fingerprint. Other incremental agents keep preserving the columns they do not recompute.
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (bd73851)

High-level verdict: changes are not clean; one High and one Medium issue need attention before merge.

High

  • Location: internal/sync/parsediff.go:488
    Problem: The Codex hash fallback treats any stored/parsed file_hash mismatch as proof that the transcript changed, but incremental writes intentionally leave file_hash stale while updating file_size and file_mtime. After a Codex session has been incrementally synced, real parser drift on an otherwise clean session can be reclassified as DiffRaced, excluded from FieldCounts, and ignored by --fail-on-change.
    Fix: Do not use stale incremental file_hash as race evidence; either update the hash during Codex incremental writes or store/use a reliable transcript-only fingerprint/mtime for this guard.

Medium

  • Location: internal/sync/engine.go:2544
    Problem: SyncThenRun and RunExclusive remain writable entrypoints on a NewDiffEngine instance. A force-parse/report-only engine can still reach syncAllLocked/resyncAllLocked through SyncThenRun, or execute arbitrary DB-writing work through RunExclusive, bypassing the new refusal guard.
    Fix: Apply refuseWriteInForceParse to these methods as well, returning zero stats/error for SyncThenRun and an error for RunExclusive.

Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 4m29s), codex_security (codex/security, done, 1m16s) | Total: 5m55s

wesm added 3 commits June 24, 2026 12:19
Codex incremental sync stores file_size at the byte offset consumed by the incremental parser. Hashing the whole live file could pair that prefix size with a fingerprint from a later-grown transcript, which made parse-diff's Codex raced-skew fallback compare mismatched snapshots.

Hash the same consumed prefix that file_size records so incremental metadata remains internally consistent even when a partial or concurrent append exists beyond the parser boundary.
NewDiffEngine returns the regular Engine type for report-only parse-diff work, so every public write-capable entrypoint has to enforce the forceParse boundary itself. SyncThenRun and RunExclusive were missed, leaving paths that could still drive sync work or caller-provided DB writes through a report-only engine.

Guard both methods at entry so SyncThenRun behaves like the other no-op sync methods and RunExclusive fails before running caller work.
NewDiffEngine returns the normal Engine type for report-only parse-diff work, so every public DB-writing entrypoint has to enforce the forceParse boundary itself. RecomputeSignals and ScanSecrets were still able to rewrite quality signal and secret metadata through a parse-diff engine.

Refuse both methods before recomputing or scanning so parse-diff callers cannot mutate stored session metadata through those public APIs.
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (903c624)

Summary verdict: one medium-severity issue should be addressed before relying on --fail-on-change.

Medium

  • internal/sync/parsediff.go:488: Codex raced detection compares stored incremental file_hash/file_size against the freshly parsed full-file hash/size. After an incremental sync with a partial trailing JSON line, the new code stores a hash of only the consumed prefix, while processCodex later sets parsed.File.Hash and parsed.File.Size from the whole file. An unchanged source with a still-partial tail can look like “transcript changed” and reclassify genuine parser drift as DiffRaced, causing --fail-on-change to pass incorrectly.

    Fix: Compare the same byte range, such as hashing the live prefix up to the stored file_size before declaring a Codex transcript race, or store and compare a separate full-file fingerprint so prefix hashes are never compared to full-file hashes.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 5m2s), codex_security (codex/security, done, 1m42s) | Total: 6m52s

Codex incremental sync can store file_hash and file_size for the parsed prefix when a transcript has a partial trailing JSON line. Parse-diff compared that prefix snapshot with the full-file hash and size from a later full parse, so an unchanged ignored tail could look like a live transcript rewrite and hide real parser drift from --fail-on-change.

Compare the stored prefix against the live prefix and only treat size skew as raced when the Codex parser would consume a different JSONL boundary. That keeps real consumed appends classified as races without letting ignored partial tails mask parser changes.
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (101887b)

Medium risk: one parse-diff race classification issue should be fixed before relying on --fail-on-change.

Medium

  • internal/sync/parsediff.go:497
    The new Codex raced check treats any stored file_hash mismatch as proof that the transcript changed, but existing archives may already contain stale Codex hashes because the old incremental path advanced file_size/file_mtime without refreshing file_hash. Since dataVersion was not bumped and shouldSkipCodex only checks size/mtime, those rows may not be healed before parse-diff. A real parser diff can therefore be incorrectly reclassified as DiffRaced and excluded from --fail-on-change.

    Suggested fix: Add a one-time refresh path for legacy Codex hashes, such as a data-version bump/full reparse, a targeted migration/backfill, or a guard that only trusts hashes known to match the stored file-size basis.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 5m58s), codex_security (codex/security, done, 25s) | Total: 6m31s

Existing archives can contain Codex rows whose incremental metadata advanced file_size and file_mtime while file_hash still points at an older full-parse snapshot. Without a data-version bump or row marker, a hash-only mismatch is ambiguous and can hide real parser drift by classifying it as DiffRaced.

Use the parser-consumed JSONL byte boundary as the raced fallback and fail closed on hash-only mismatches. That preserves append-race detection for consumed transcript changes while keeping legacy stale hashes visible to --fail-on-change.
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (10f7971)

Summary verdict: One medium issue needs attention; no security issues were found.

Medium

  • internal/sync/parsediff.go:493: The Codex raced fallback returns early using parsed.File.Size, captured before the collect-time check. If the transcript changes after the parser stat/read, the live consumed boundary can differ while parsed.File.Size still equals the stored size, leaving the change as DiffChanged and potentially tripping --fail-on-change.

    Suggested fix: Recompute the live consumed size before returning unchanged. Use the current consumed prefix, and optionally the stored/current prefix hash, rather than the parser’s earlier size snapshot.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 7m53s), codex_security (codex/security, done, 1m8s) | Total: 9m8s

wesm added 2 commits June 24, 2026 14:08
Parse-diff classifies races after the worker has already parsed the file, so the parser's File.Size can be stale by collection time. Codex's fallback used that stale size to return early, which left post-parse transcript appends classified as real parser drift.

Recompute the live Codex consumed JSONL boundary whenever a stored file_size exists, and compare that collect-time boundary to the archived snapshot. This keeps the raced decision on the same post-read basis as the surrounding mtime guard.
Codex file_size has two historical bases: full parses record the raw file length, while incremental parses can record the consumed JSONL boundary when a partial line trails the file. Comparing only against the consumed boundary made stable full-parse archives with ignored partial tails look like live transcript changes.

Check the live raw file size first and only fall back to the consumed boundary when the raw size no longer matches. That preserves collect-time append detection without masking parser drift on unchanged full-parse snapshots.
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (d3a6086)

Clean for merge: no Medium, High, or Critical findings were reported.

Only a Low-severity test robustness issue was raised, so it is omitted per instructions.


Panel: ci_default_security | Synthesis: codex, 4s | Members: codex_default (codex/default, done, 9m49s), codex_security (codex/security, done, 14s) | Total: 10m7s

@wesm

wesm commented Jun 24, 2026

Copy link
Copy Markdown
Member

phew! merging

@wesm wesm merged commit b96075c into kenn-io:main Jun 24, 2026
12 checks passed
@mjacobs

mjacobs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @wesm! Sorry about the churn on this one (and #802), I'm doing a "micro-postmortem"(?) to understand + close process gaps.

@mjacobs mjacobs deleted the feat/parsediff-raced-skew-detection branch June 24, 2026 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants