Skip to content

fix(s3-datastore, gcs-datastore): exclude per-target FileLock files from cache sync#123

Merged
stack72 merged 1 commit intomainfrom
fix/s3-gcs-datastore-exclude-per-target-locks
May 5, 2026
Merged

fix(s3-datastore, gcs-datastore): exclude per-target FileLock files from cache sync#123
stack72 merged 1 commit intomainfrom
fix/s3-gcs-datastore-exclude-per-target-locks

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 4, 2026

Summary

  • Extend isInternalCacheFile (s3 + gcs) to exclude any path with basename .lock, in addition to the existing top-level .datastore.lock exact-match.
  • Bump both manifests to 2026.05.05.1.
  • Add a unit test per extension pinning the new exclusion at any depth, plus negative cases (data/@m/.locked.yaml, data/@m/lock are NOT filtered).

Why

The self-healing discoverIndexFromBucket path added in #122 lists the bucket and synthesizes an index from whatever it finds. The data tier writes per-target FileLock files at data/<kind>/<type>/<id>/.lock via the s3-datastore's own createLock — which PUTs and DELETEs them directly through S3 outside cache sync. Old isInternalCacheFile only excluded the top-level .datastore.lock, so:

  1. Writer's setup-time pullIndex 404s → discoverIndexFromBucket → LISTs the bucket → catches the freshly PUT data/.../<id>/.lock → builds an index containing it → uploads.
  2. Workflow runs, releases the per-target lock → S3 DELETE removes the .lock object.
  3. Reader's setup pulls the now-stale index, hydrates → 404 on the missing .lock → setup reports an error and never persists datastore: to .swamp.yaml.
  4. Reader's next swamp datastore sync --pull reads .swamp.yaml, sees no datastore section, falls back to filesystem, and the sync command's preflight emits Current datastore type: filesystem.

Surfaced by the swamp-uat tests/cli/datastore/sync_test.ts cross-repo pull tests on @swamp/s3-datastore@2026.05.04.4: https://github.com/systeminit/swamp-uat/actions/runs/25343366047. GCS got the parallel fix in #122; the latent bug is identical there.

Test plan

  • deno test — 1 new unit test per extension pinning .lock exclusion at any depth, plus negative cases.
  • All existing s3 (45) + gcs (43) cache_sync tests still pass.
  • Linux + mock-S3 end-to-end repro (writer workflow run → reader setup → reader sync --pull):
    • Before fix: setup errors: ["…/.lock - S3 getObject failed HTTP 404 NoSuchKey"], .swamp.yaml missing datastore block, sync --pull reports Current datastore type: filesystem.
    • After fix: setup filesPulled: 19, errors: [], .swamp.yaml has the s3 datastore block, sync --pull succeeds, data list shell-echo returns the writer's 4 records.
  • deno fmt --check and deno lint clean for both extensions.

🤖 Generated with Claude Code

…rom cache sync

Extend `isInternalCacheFile` to exclude any path with basename `.lock` (in
addition to the existing top-level `.datastore.lock`). Without this, the
self-healing `discoverIndexFromBucket` path added in #122 captures
transient per-target lock files (`data/<kind>/<type>/<id>/.lock`, written
by the data tier's lock subsystem via S3 PUT/DELETE) into the synthesized
index. The lock subsystem then deletes them on release, leaving the
remote index referencing missing objects. A fresh reader running
`datastore setup` 404s during hydration, fails to persist the
`datastore:` block to `.swamp.yaml`, and the next `datastore sync --pull`
falls back to the filesystem datastore and reports
`Current datastore type: filesystem`.

Surfaced by the swamp-uat sync_test.ts cross-repo pull tests
(`sync pull retrieves data written by another repo …`) on
@swamp/s3-datastore@2026.05.04.4. Reproduced end-to-end in a Linux
container with a mock S3 server: writer runs a workflow (acquires the
per-target lock, pushes data, releases the lock), reader's setup hits
the 404 on the stale `.lock` index entry, `.swamp.yaml` never gets the
datastore block, sync --pull surfaces the filesystem-fallback error.
With the fix the reader hydrates 19 files cleanly and the pull
succeeds.

Mirrored in @swamp/gcs-datastore — same `discoverIndexFromBucket` path,
same latent bug.

Test plan:
- 1 new unit test per extension pinning the `.lock` exclusion at any
  depth, plus negative cases (`data/@m/.locked.yaml`, `data/@m/lock`).
- All existing s3 (45) + gcs (43) cache_sync tests still pass.
- Linux+mock-S3 end-to-end repro (writer workflow run → reader setup →
  reader sync --pull) goes from `errors: ["…/.lock - 404 NoSuchKey"]`
  + filesystem-fallback to `filesPulled: 19` + a successful pull
  returning the writer's 4 data records.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. Misleading comment in s3_cache_sync_test.ts — The new test includes:

    // The top-level distributed lock stays excluded (already covered by the
    // exact-match branch, but the basename branch catches it too).
    assertEquals(isInternalCacheFile(".datastore.lock"), true);

    The claim "the basename branch catches it too" is factually wrong. ".datastore.lock".split("/").pop() yields ".datastore.lock", not ".lock", so the base === ".lock" guard does not fire for it. The assertion is correct (the exact-match branch covers it), but the comment misstates the mechanism and could mislead a future reader into thinking the exact-match check is redundant. The GCS test omits this comment entirely — worth fixing the S3 version to match.

  2. The .datastore.lock assertion is already covered by the existing isInternalCacheFile: excludes the sync-state sidecar test (line 1487). Repeating it for context is fine, but it's redundant.


The fix is correct and minimal: a single exact basename check base === ".lock" is the right tool — precise enough not to catch .locked.yaml or lock, confirmed by the negative test cases. The root-cause explanation in the PR description and the comment blocks in both source files are thorough and accurate. Version bumps are appropriate. No CLAUDE.md violations, no security concerns, no type issues.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

I traced every code path through the modified isInternalCacheFile in both S3 and GCS, verified the discoverIndexFromBucket filter application at s3_cache_sync.ts:781 and gcs_cache_sync.ts:744, and tested edge cases against the basename extraction logic (rel.split("/").pop() ?? "").

Critical / High

None found.

Medium

None found.

Low

  1. No test for bare .lock at root level (both test files) — The tests cover deep paths (data/command/shell/<uuid>/.lock), shallow paths (data/@m/.lock), and the existing .datastore.lock exact-match, but not a bare .lock as the entire relative path. This case is handled correctly by the basename branch (since ".lock".split("/").pop() is ".lock"), and it's unlikely to occur in practice since the data tier nests lock files under data/<kind>/<type>/<id>/. Purely theoretical gap in test coverage.

  2. Trailing-slash keys would bypass the filter — If a cloud storage listing ever returned a key like data/foo/.lock/ (trailing slash), ".lock/".split("/").pop() yields "", not ".lock", so the filter would miss it. This can't happen in practice: S3 ListObjectsV2 and GCS list return exact object keys, and .lock files are regular objects, not prefix placeholders. Not actionable.

Verdict

PASS — The fix is minimal, correct, and well-targeted. The single-line if (base === ".lock") return true addition in each file correctly excludes per-target FileLock files at any depth from cache sync. The exact-string comparison avoids false positives on similar names (.locked.yaml, lock). Both S3 and GCS implementations are identical, matching the mirrored bug. Tests cover the primary positive and negative cases. Manifest versions follow CalVer convention. No logic errors, no security concerns, no resource leaks.

@stack72 stack72 merged commit 132195b into main May 5, 2026
30 checks passed
@stack72 stack72 deleted the fix/s3-gcs-datastore-exclude-per-target-locks branch May 5, 2026 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant