fix(s3-datastore, gcs-datastore): persist localMtime + markSynced after non-zero pullChanged by stack72 · Pull Request #119 · systeminit/swamp-extensions

stack72 · 2026-05-04T16:27:53Z

Summary

Closes swamp-club #222.

The first explicit `pushChanged` after a fresh-cache `pullChanged` against a populated remote re-uploaded every just-downloaded file with byte-identical content. Not a correctness issue (data converges after the first redundant push), but a per-bootstrap bandwidth waste of N S3/GCS PUTs and a confusing `filesPushed: N` immediately after `filesPulled: N`.

Root cause

In `pullChanged`'s slow path:

`pullIndex(forceRemote)` writes the local `.datastore-index.json` from the raw remote payload, which carries the original pusher's `localMtime` values.
For each downloaded file we set `this.index.entries[rel].localMtime = stat.mtime.toISOString()` — but only in memory.
`markSynced(indexETag)` was gated on `pulled === 0`, so neither the on-disk index nor the sidecar reflected what `pullChanged` just learned.

In a subsequent process, `pushChanged`'s `tryFastPushChanged` returned null (no sidecar), the slow path's `pullIndex(forceRemote)` reloaded the pusher's mtimes into `this.index`, and the walk fell through:

size matched
`existing.localMtime` was defined (pusher's value), so the line-947 mtime-equality continue and the line-951 undefined fallback both missed
every file landed in `toPush`

Fix

Extend the existing `pulled === 0` branch to also handle `pulled > 0`: persist the in-memory index back to disk (with the puller's `stat`-derived `localMtime` values) and call `markSynced(indexETag)`. Wrapped in the same opportunistic try/catch as the pre-existing path.

Ordering invariant preserved: `atomicWriteTextFile(indexPath, ...)` runs before `markSynced` because `markSynced` derives `lastVerifiedAt` from `Deno.stat(indexPath).mtime + 1ms`. Reversing would spuriously fail `tryFastPullChanged`'s mtime check on every subsequent sync.

Mirrored line-for-line in `gcs_cache_sync.ts` (uses `indexGeneration` instead of `indexETag`), per the precedent set in PRs #105/#106/#108/#111-#114.

Out-of-scope follow-up

When a concurrent writer bumps the remote ETag between this process's `pullChanged` and `pushChanged`, the sidecar mismatch forces the slow path, `pullIndex(forceRemote)` overwrites `this.index` with the new remote payload (carrying the writer's `localMtime` values), and the walk re-fires redundant pushes. Pre-existing behavior, not widened by this fix; will file as a separate swamp-club issue post-merge.

Test plan

New 5-file regression test in `s3_cache_sync_test.ts` (`pullChanged + pushChanged: fresh-process push after pull against populated remote does not redundantly upload (swamp-club #222)`) — fails pre-fix on the sidecar assertion, passes post-fix
Equivalent test in `gcs_cache_sync_test.ts`
Positive assertions pin the persistence machinery: sidecar present with `localDirty=false` and matching ETag/generation, on-disk index entries carry valid ISO `localMtime` values (not the seeded epoch)
Symptom assertion: fresh `S3CacheSyncService` / `GcsCacheSyncService` against the same cache dir → `pushed === 0`, zero data PUTs
s3 suite: 61/61 passing (two consecutive runs)
gcs suite: 64/64 passing (two consecutive runs)
`deno fmt --check`, `deno lint`, `deno check`, `deno install --frozen` clean in both extension directories
MinIO end-to-end: phase 1 pushed 5, phase 2 pulled 5 (sidecar persisted with `localDirty=false`), phase 3 fresh-process push = 0 redundant uploads
GCS end-to-end: same scenario against `swamp-uat-gcs-datastore-testing` — pulled 5, pushed 0, bucket cleaned up

Manifest bumps:

`@swamp/s3-datastore`: `2026.04.28.4` → `2026.05.04.1`
`@swamp/gcs-datastore`: `2026.04.28.4` → `2026.05.04.1`

🤖 Generated with Claude Code

…er non-zero pullChanged Pre-fix, pullChanged's slow path skipped markSynced when pulled > 0 and left the on-disk index file carrying the original pusher's localMtime values (the raw remote payload pullIndex wrote at the start of the slow path). A subsequent fresh-process pushChanged then took the slow path (no sidecar), pullIndex(forceRemote) reloaded the pusher's mtimes into this.index, and the walk pushed every file — size matched the index but localMtime didn't (pusher's mtime ≠ local stat.mtime from Deno.writeFile). N redundant byte-identical PUTs per fresh-cache bootstrap (swamp-club #222). The fix mirrors the existing pulled === 0 markSynced path: when the local cache matches the remote index whose ETag we GET'd (whether by walk-finds-zero-diff OR by downloading the missing files), persist the in-memory index back to disk (carrying the puller's stat-derived localMtimes) AND mark synced so the next pushChanged hits the fast path. Ordering invariant preserved: atomicWriteTextFile runs BEFORE markSynced because markSynced derives lastVerifiedAt from indexPath mtime + 1ms. Out-of-scope follow-up: when a concurrent writer bumps the remote ETag between pullChanged and pushChanged, the sidecar mismatch forces the slow path, and pullIndex(forceRemote) overwrites this.index with the new remote payload (carrying the writer's localMtimes), and the walk re-fires redundant pushes for any local file whose mtime differs. Pre-existing behavior, not widened by this fix; will be filed separately post-merge. Verified: - 5-file regression test in each of s3 and gcs cache_sync_test.ts: fresh-process pushChanged after fresh-cache pullChanged uploads zero bytes, with positive assertions on sidecar presence/cleanliness and on-disk index localMtime values - s3 suite: 61/61 passing, gcs suite: 64/64 passing - MinIO end-to-end: pulled=5, pushed=0 - GCS end-to-end: pulled=5, pushed=0 Manifests bumped to 2026.05.04.1 in both extensions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

Code Review

Blocking Issues

None.

Summary

This is a targeted, correct fix for a real bandwidth-waste bug. The root cause analysis in the PR description is accurate and matches the code: pullChanged was updating this.index in memory with the locally-derived stat.mtime values after each download, but only flushing the in-memory state to disk and calling markSynced when pulled === 0. A fresh-process pushChanged loaded the stale on-disk index (carrying the original pusher's localMtime epoch values), found a mismatch against the local stat.mtime, and re-uploaded all files unnecessarily.

Fix correctness:

The change from if (pulled === 0 && indexETag) → if (indexETag) with an inner if (pulled > 0 && this.index) guard to write the index first is correct.
The ordering invariant (index write before markSynced) is essential and clearly documented: markSynced derives lastVerifiedAt from Deno.stat(indexPath).mtime + 1ms, so writing the index after would bump the mtime past lastVerifiedAt and cause spurious slow-path fallbacks.
If atomicWriteTextFile throws, it's caught by the outer try/catch and markSynced is correctly skipped — we don't record a clean sidecar when the on-disk index is stale.
The this.index null guard (pulled > 0 && this.index) is sound: if the index were null, no files would be in toPull, so pulled would remain 0 and the guard never fires.
Mirrored symmetrically between s3_cache_sync.ts and gcs_cache_sync.ts as expected.

Tests:

Both new regression tests correctly use in-memory mock clients (no live cloud services).
Temp dirs are cleaned up in finally blocks; no env vars set/restored needed.
The two positive assertions (sidecar contents + on-disk index mtimes) are load-bearing: they pin the persistence mechanism so a future refactor can't satisfy only the symptom assertion while silently breaking the fix.
ETag normalization is handled correctly: the mock stores '"index-pre-pull"' (quoted HTTP ETag format), normalizeETag strips the quotes before storage, and the assertion checks the unquoted form. The GCS test uses a raw generation string ("42"), which doesn't need normalization.
No sanitizeResources: false needed — mock clients don't open connection pools.
CLAUDE.md rules: no any types in new hand-written code (the typed cast as { entries: ... } is correct), named exports, no model files touched.

Suggestions

Minor naming oddity: The second service in each test is named serviceC (skipping serviceB), which is slightly confusing. Not a bug — the comment explains it simulates a cross-process scenario — but serviceB or freshService would be clearer.
Minor inconsistency: The GCS test reads the sidecar via Deno.readTextFile + JSON.parse directly, while the S3 test uses the readSidecar helper. Functionally identical; using the helper in both would be marginally cleaner.

github-actions

Adversarial Review

Traced the full data flow through pullChanged → atomicWriteTextFile → markSynced → (next process) pushChanged → tryFastPushChanged / slow-path walk for both S3 and GCS implementations. Examined edge cases around null indexETag/indexGeneration, null this.index, multipart ETags, generation "0", concurrent writers, and atomicWriteTextFile failures.

Critical / High

None found.

Medium

None found.

Low

s3_cache_sync.ts:840 / gcs_cache_sync.ts:813 — atomicWriteTextFile failure skips markSynced: If the index rewrite throws (disk full, permissions), the catch block swallows it and also skips markSynced. The on-disk index retains the remote payload's localMtime values, so a subsequent fresh-process pushChanged still sees the pre-fix mismatch and redundantly pushes. This is the same as pre-fix behavior (self-healing on next slow-path sync), and the comment correctly describes it as "opportunistic." Not a regression — just an inherent limitation of the opportunistic design that prevents full self-healing on disk-full.
s3_cache_sync.ts:836 / gcs_cache_sync.ts:809 — pulled === 0 with reconciled in-memory mtimes not persisted: During the walk phase (before downloads), the code reconciles this.index.entries[rel].localMtime in memory when a local file's stat.mtime differs from the remote entry's localMtime but size matches. When pulled === 0, these reconciled values are NOT written to disk (the if (pulled > 0 && this.index) guard skips the rewrite). The sidecar IS written, so the next sync hits the fast path — meaning the stale on-disk mtimes are never consulted. If the fast path later fails (concurrent writer bumps ETag), pullIndex(forceRemote) reloads from remote anyway. No practical impact, but worth noting for future refactors that might remove the fast path.

Verdict

PASS. The fix is well-targeted: it extends the existing pulled === 0 sidecar-persistence block to also handle pulled > 0 by (a) rewriting the on-disk index with the puller's stat-derived localMtime values before (b) calling markSynced. The ordering invariant (atomicWriteTextFile before markSynced) is correctly maintained — markSynced derives lastVerifiedAt from Deno.stat(indexPath).mtime + 1ms, so reversing would capture a stale baseline. The error handling preserves the existing opportunistic/self-healing contract. The S3 and GCS implementations are structurally identical modulo indexETag ↔ indexGeneration. Both regression tests correctly pin the persistence machinery (sidecar contents + on-disk index mtimes) in addition to the symptom assertion (zero redundant PUTs).

github-actions Bot approved these changes May 4, 2026

View reviewed changes

github-actions Bot reviewed May 4, 2026

View reviewed changes

stack72 merged commit e964c00 into main May 4, 2026
30 checks passed

stack72 deleted the fix/issue-222-pullChanged-redundant-push branch May 4, 2026 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(s3-datastore, gcs-datastore): persist localMtime + markSynced after non-zero pullChanged#119

fix(s3-datastore, gcs-datastore): persist localMtime + markSynced after non-zero pullChanged#119
stack72 merged 1 commit intomainfrom
fix/issue-222-pullChanged-redundant-push

stack72 commented May 4, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stack72 commented May 4, 2026

Summary

Root cause

Fix

Out-of-scope follow-up

Test plan

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review

Blocking Issues

Summary

Suggestions

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Adversarial Review

Critical / High

Medium

Low

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant