Skip to content

fix(s3-datastore, gcs-datastore): persist localMtime + markSynced after non-zero pullChanged#119

Merged
stack72 merged 1 commit intomainfrom
fix/issue-222-pullChanged-redundant-push
May 4, 2026
Merged

fix(s3-datastore, gcs-datastore): persist localMtime + markSynced after non-zero pullChanged#119
stack72 merged 1 commit intomainfrom
fix/issue-222-pullChanged-redundant-push

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 4, 2026

Summary

Closes swamp-club #222.

The first explicit `pushChanged` after a fresh-cache `pullChanged` against a populated remote re-uploaded every just-downloaded file with byte-identical content. Not a correctness issue (data converges after the first redundant push), but a per-bootstrap bandwidth waste of N S3/GCS PUTs and a confusing `filesPushed: N` immediately after `filesPulled: N`.

Root cause

In `pullChanged`'s slow path:

  • `pullIndex(forceRemote)` writes the local `.datastore-index.json` from the raw remote payload, which carries the original pusher's `localMtime` values.
  • For each downloaded file we set `this.index.entries[rel].localMtime = stat.mtime.toISOString()` — but only in memory.
  • `markSynced(indexETag)` was gated on `pulled === 0`, so neither the on-disk index nor the sidecar reflected what `pullChanged` just learned.

In a subsequent process, `pushChanged`'s `tryFastPushChanged` returned null (no sidecar), the slow path's `pullIndex(forceRemote)` reloaded the pusher's mtimes into `this.index`, and the walk fell through:

  • size matched
  • `existing.localMtime` was defined (pusher's value), so the line-947 mtime-equality continue and the line-951 undefined fallback both missed
  • every file landed in `toPush`

Fix

Extend the existing `pulled === 0` branch to also handle `pulled > 0`: persist the in-memory index back to disk (with the puller's `stat`-derived `localMtime` values) and call `markSynced(indexETag)`. Wrapped in the same opportunistic try/catch as the pre-existing path.

Ordering invariant preserved: `atomicWriteTextFile(indexPath, ...)` runs before `markSynced` because `markSynced` derives `lastVerifiedAt` from `Deno.stat(indexPath).mtime + 1ms`. Reversing would spuriously fail `tryFastPullChanged`'s mtime check on every subsequent sync.

Mirrored line-for-line in `gcs_cache_sync.ts` (uses `indexGeneration` instead of `indexETag`), per the precedent set in PRs #105/#106/#108/#111-#114.

Out-of-scope follow-up

When a concurrent writer bumps the remote ETag between this process's `pullChanged` and `pushChanged`, the sidecar mismatch forces the slow path, `pullIndex(forceRemote)` overwrites `this.index` with the new remote payload (carrying the writer's `localMtime` values), and the walk re-fires redundant pushes. Pre-existing behavior, not widened by this fix; will file as a separate swamp-club issue post-merge.

Test plan

  • New 5-file regression test in `s3_cache_sync_test.ts` (`pullChanged + pushChanged: fresh-process push after pull against populated remote does not redundantly upload (swamp-club #222)`) — fails pre-fix on the sidecar assertion, passes post-fix
  • Equivalent test in `gcs_cache_sync_test.ts`
  • Positive assertions pin the persistence machinery: sidecar present with `localDirty=false` and matching ETag/generation, on-disk index entries carry valid ISO `localMtime` values (not the seeded epoch)
  • Symptom assertion: fresh `S3CacheSyncService` / `GcsCacheSyncService` against the same cache dir → `pushed === 0`, zero data PUTs
  • s3 suite: 61/61 passing (two consecutive runs)
  • gcs suite: 64/64 passing (two consecutive runs)
  • `deno fmt --check`, `deno lint`, `deno check`, `deno install --frozen` clean in both extension directories
  • MinIO end-to-end: phase 1 pushed 5, phase 2 pulled 5 (sidecar persisted with `localDirty=false`), phase 3 fresh-process push = 0 redundant uploads
  • GCS end-to-end: same scenario against `swamp-uat-gcs-datastore-testing` — pulled 5, pushed 0, bucket cleaned up

Manifest bumps:

  • `@swamp/s3-datastore`: `2026.04.28.4` → `2026.05.04.1`
  • `@swamp/gcs-datastore`: `2026.04.28.4` → `2026.05.04.1`

🤖 Generated with Claude Code

…er non-zero pullChanged

Pre-fix, pullChanged's slow path skipped markSynced when pulled > 0 and
left the on-disk index file carrying the original pusher's localMtime
values (the raw remote payload pullIndex wrote at the start of the
slow path). A subsequent fresh-process pushChanged then took the slow
path (no sidecar), pullIndex(forceRemote) reloaded the pusher's mtimes
into this.index, and the walk pushed every file — size matched the
index but localMtime didn't (pusher's mtime ≠ local stat.mtime from
Deno.writeFile). N redundant byte-identical PUTs per fresh-cache
bootstrap (swamp-club #222).

The fix mirrors the existing pulled === 0 markSynced path: when the
local cache matches the remote index whose ETag we GET'd (whether by
walk-finds-zero-diff OR by downloading the missing files), persist
the in-memory index back to disk (carrying the puller's
stat-derived localMtimes) AND mark synced so the next pushChanged
hits the fast path. Ordering invariant preserved: atomicWriteTextFile
runs BEFORE markSynced because markSynced derives lastVerifiedAt from
indexPath mtime + 1ms.

Out-of-scope follow-up: when a concurrent writer bumps the remote
ETag between pullChanged and pushChanged, the sidecar mismatch forces
the slow path, and pullIndex(forceRemote) overwrites this.index with
the new remote payload (carrying the writer's localMtimes), and the
walk re-fires redundant pushes for any local file whose mtime differs.
Pre-existing behavior, not widened by this fix; will be filed
separately post-merge.

Verified:
- 5-file regression test in each of s3 and gcs cache_sync_test.ts:
  fresh-process pushChanged after fresh-cache pullChanged uploads zero
  bytes, with positive assertions on sidecar presence/cleanliness and
  on-disk index localMtime values
- s3 suite: 61/61 passing, gcs suite: 64/64 passing
- MinIO end-to-end: pulled=5, pushed=0
- GCS end-to-end: pulled=5, pushed=0

Manifests bumped to 2026.05.04.1 in both extensions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Summary

This is a targeted, correct fix for a real bandwidth-waste bug. The root cause analysis in the PR description is accurate and matches the code: pullChanged was updating this.index in memory with the locally-derived stat.mtime values after each download, but only flushing the in-memory state to disk and calling markSynced when pulled === 0. A fresh-process pushChanged loaded the stale on-disk index (carrying the original pusher's localMtime epoch values), found a mismatch against the local stat.mtime, and re-uploaded all files unnecessarily.

Fix correctness:

  • The change from if (pulled === 0 && indexETag)if (indexETag) with an inner if (pulled > 0 && this.index) guard to write the index first is correct.
  • The ordering invariant (index write before markSynced) is essential and clearly documented: markSynced derives lastVerifiedAt from Deno.stat(indexPath).mtime + 1ms, so writing the index after would bump the mtime past lastVerifiedAt and cause spurious slow-path fallbacks.
  • If atomicWriteTextFile throws, it's caught by the outer try/catch and markSynced is correctly skipped — we don't record a clean sidecar when the on-disk index is stale.
  • The this.index null guard (pulled > 0 && this.index) is sound: if the index were null, no files would be in toPull, so pulled would remain 0 and the guard never fires.
  • Mirrored symmetrically between s3_cache_sync.ts and gcs_cache_sync.ts as expected.

Tests:

  • Both new regression tests correctly use in-memory mock clients (no live cloud services).
  • Temp dirs are cleaned up in finally blocks; no env vars set/restored needed.
  • The two positive assertions (sidecar contents + on-disk index mtimes) are load-bearing: they pin the persistence mechanism so a future refactor can't satisfy only the symptom assertion while silently breaking the fix.
  • ETag normalization is handled correctly: the mock stores '"index-pre-pull"' (quoted HTTP ETag format), normalizeETag strips the quotes before storage, and the assertion checks the unquoted form. The GCS test uses a raw generation string ("42"), which doesn't need normalization.
  • No sanitizeResources: false needed — mock clients don't open connection pools.
  • CLAUDE.md rules: no any types in new hand-written code (the typed cast as { entries: ... } is correct), named exports, no model files touched.

Suggestions

  • Minor naming oddity: The second service in each test is named serviceC (skipping serviceB), which is slightly confusing. Not a bug — the comment explains it simulates a cross-process scenario — but serviceB or freshService would be clearer.
  • Minor inconsistency: The GCS test reads the sidecar via Deno.readTextFile + JSON.parse directly, while the S3 test uses the readSidecar helper. Functionally identical; using the helper in both would be marginally cleaner.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Traced the full data flow through pullChangedatomicWriteTextFilemarkSynced → (next process) pushChangedtryFastPushChanged / slow-path walk for both S3 and GCS implementations. Examined edge cases around null indexETag/indexGeneration, null this.index, multipart ETags, generation "0", concurrent writers, and atomicWriteTextFile failures.

Critical / High

None found.

Medium

None found.

Low

  1. s3_cache_sync.ts:840 / gcs_cache_sync.ts:813atomicWriteTextFile failure skips markSynced: If the index rewrite throws (disk full, permissions), the catch block swallows it and also skips markSynced. The on-disk index retains the remote payload's localMtime values, so a subsequent fresh-process pushChanged still sees the pre-fix mismatch and redundantly pushes. This is the same as pre-fix behavior (self-healing on next slow-path sync), and the comment correctly describes it as "opportunistic." Not a regression — just an inherent limitation of the opportunistic design that prevents full self-healing on disk-full.

  2. s3_cache_sync.ts:836 / gcs_cache_sync.ts:809pulled === 0 with reconciled in-memory mtimes not persisted: During the walk phase (before downloads), the code reconciles this.index.entries[rel].localMtime in memory when a local file's stat.mtime differs from the remote entry's localMtime but size matches. When pulled === 0, these reconciled values are NOT written to disk (the if (pulled > 0 && this.index) guard skips the rewrite). The sidecar IS written, so the next sync hits the fast path — meaning the stale on-disk mtimes are never consulted. If the fast path later fails (concurrent writer bumps ETag), pullIndex(forceRemote) reloads from remote anyway. No practical impact, but worth noting for future refactors that might remove the fast path.

Verdict

PASS. The fix is well-targeted: it extends the existing pulled === 0 sidecar-persistence block to also handle pulled > 0 by (a) rewriting the on-disk index with the puller's stat-derived localMtime values before (b) calling markSynced. The ordering invariant (atomicWriteTextFile before markSynced) is correctly maintained — markSynced derives lastVerifiedAt from Deno.stat(indexPath).mtime + 1ms, so reversing would capture a stale baseline. The error handling preserves the existing opportunistic/self-healing contract. The S3 and GCS implementations are structurally identical modulo indexETagindexGeneration. Both regression tests correctly pin the persistence machinery (sidecar contents + on-disk index mtimes) in addition to the symptom assertion (zero redundant PUTs).

@stack72 stack72 merged commit e964c00 into main May 4, 2026
30 checks passed
@stack72 stack72 deleted the fix/issue-222-pullChanged-redundant-push branch May 4, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant