Skip to content

fix(s3-datastore, gcs-datastore): self-healing index discovery for un-indexed buckets (swamp-club#225)#122

Merged
stack72 merged 3 commits intomainfrom
issue-225-discovery
May 4, 2026
Merged

fix(s3-datastore, gcs-datastore): self-healing index discovery for un-indexed buckets (swamp-club#225)#122
stack72 merged 3 commits intomainfrom
issue-225-discovery

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 4, 2026

Summary

Closes swamp-club#225. Paired follow-up to swamp-club#220.

When a bucket held data under standard prefixes (data/, workflow-runs/, …) but no .datastore-index.json, the post-#220 hydrate step in swamp datastore setup reported Hydrated: 0 pulled and silently left the local cache empty. From the contributor's view, setup exited 0 and swamp datastore status was healthy, but swamp data list and swamp workflow run search returned nothing.

pullIndex()'s NotFound/NoSuchKey branch now lists the bucket via ListObjectsV2, filters via isInternalCacheFile, builds an index from the listing, publishes it back, and continues. The same fix is mirrored in @swamp/gcs-datastore. Both pullChanged and pushChanged callers inherit discovery automatically.

Side benefit

Previously, pushChanged against an unindexed populated bucket would write an index reflecting only LOCAL files, dropping the existing remote entries from the index even though the storage still held them. With discovery in pullIndex, push first builds a complete merged view, then walks local against it. Pinned by Test E.

Design choices documented in inline comments

  • Unconditional PutObject — discovery is idempotent (same listing → same synthesized index across racing peers), matches pushChanged writeback (line 1029), and If-None-Match: * is not portable across all S3-compatible backends this extension supports.
  • this.index set BEFORE put — mirrors the existing post-fetch order. If put throws, no local file is written and the next forceRemote: true re-triggers discovery — idempotent, no orphaned state.
  • Both callers of pullIndex inherit discovery — no separate change to pushChanged is needed.

Invariant note (regression pin)

pullChanged alone never writes the remote index — only pushChanged does. Setup's hydrate calls pullChanged but not pushChanged, so a second peer's setup against an already-discovered bucket must NOT modify the remote index. The second-peer ETag-stability check below pins this property.

MinIO verification

Container running on :9100; bucket repro-bucket seeded directly with aws s3 cp (no .datastore-index.json).

Before fix (origin/main):

INF datastore·setup Hydrating cache from remote...
Datastore Setup Complete
  Type:     @swamp/s3-datastore
  Hydrated: 0 pulled

swamp workflow run search → empty. .datastore-index.json still absent in bucket.

After fix, peer 1:

Hydrated: 3 pulled
  • 3 data files in local cache.
  • .datastore-index.json now present in bucket (size 602, ETag 8771228f99da99eeaa91a198860bf0ed).

After fix, peer 2 (/tmp/swamp-repro-issue-225-peer2, fresh repo, same bucket):

Hydrated: 3 pulled
  • 3 data files in local cache.
  • Bucket's .datastore-index.json ETag UNCHANGED (still 8771228f99da99eeaa91a198860bf0ed) — no redundant discovery write fired.

Test plan

  • 5 new unit tests per extension pinning discovery (A populates+writes back with exact-match assertion, B pullChanged hydrates, C internal-file filter for .datastore.lock / .push-queue.json / _catalog.db, D empty-bucket fallthrough regression pin, E pushChanged side benefit)
  • 1 new programmable-server integration test proving the AWS SDK surfaces the missing-index 404 with the name === "NoSuchKey" discriminant our catch branch expects
  • End-to-end MinIO verification (single-peer + second-peer ETag stability)
  • deno fmt --check and deno lint clean on all 8 changed files
  • Updated DEF-2 integration: 403 throws immediately with credential hint test for the new credential-hint format from #226 (rebase merge artifact, not a behaviour change)

Manifest bumps

  • @swamp/s3-datastore: 2026.05.04.32026.05.04.4
  • @swamp/gcs-datastore: 2026.05.04.32026.05.04.4

Post-merge follow-ups (human-side bookkeeping, NOT a lifecycle gate)

These live outside this PR and the issue-225 lifecycle:

  1. File adversarial UAT issue in systeminit/swamp-uat (label adversarial): "datastore setup against bucket with data and no index file should hydrate, not silently report 0 pulled". The existing tests/cli/datastore/setup_test.ts covers basic setup but not this edge case.
  2. Optional manual docs polish in systeminit/swamp-club (content/manual/reference/datastore-configuration.md) noting the self-healing behaviour under the Hydrate step.
  3. Companion design-doc PR in systeminit/swamp adding an "Initial Index Discovery" subsection to design/datastores.md.

🤖 Generated with Claude Code

…-indexed buckets (swamp-club#225)

Residual case from swamp-club#220: when a bucket holds data under
standard prefixes but no `.datastore-index.json`, the post-#220 hydrate
step reported `Hydrated: 0 pulled` and silently left the cache empty.
The user's contributors saw a successful `swamp datastore setup` that
delivered no data.

`pullIndex()`'s NotFound branch now lists the bucket via ListObjectsV2,
filters via `isInternalCacheFile`, builds an index from the listing,
publishes it back unconditionally, and continues. Both `pullChanged`
and `pushChanged` callers inherit discovery — push against an unindexed
populated bucket also now produces a correct merged index instead of
losing remote entries.

The unconditional PutObject is deliberate: discovery is idempotent
(same listing → same synthesized index across racing peers), matches
the existing `pushChanged` writeback semantics, and `If-None-Match: *`
is not portable across all S3-compatible backends this extension
supports. Inline comments document the rationale and the failure-state
ordering (`this.index` set before put, atomicWriteTextFile after) so
the post-fetch path's contract isn't accidentally diverged from.

Mirrors the same fix in @swamp/gcs-datastore.

Test plan:
- 5 new unit tests per extension pinning the discovery path (discovery
  populates+writes back, pullChanged hydrates, internal-file filter,
  empty-bucket fallthrough, pushChanged side benefit).
- 1 new programmable-server integration test proving the AWS SDK
  surfaces the missing-index 404 with the discriminant the catch
  branch matches against.
- End-to-end MinIO repro: single-peer setup hydrates 3 files and
  publishes the index; second-peer setup hydrates 3 files with no
  redundant writeback (index ETag stable).

Closes swamp-club#225.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Reviewed all 8 changed files: s3_cache_sync.ts, gcs_cache_sync.ts, s3_client.ts, gcs_client.ts, their test files, and both manifests. Traced every new code path, checked error handling on all branches, verified the API migration has no orphaned callers, and stress-tested the concurrency claims.

Critical / High

None found.

Medium

  1. Inaccurate idempotency claim across racing peerss3_cache_sync.ts:731, gcs_cache_sync.ts:700

    Both doc comments state "same listing → same synthesized index across racing peers" to justify the unconditional PutObject. This is technically false: lastPulled (line 787 S3 / line 746 GCS) uses new Date().toISOString(), which differs per peer. Additionally, the lastModified fallback (entry.lastModified ?? new Date()).toISOString() (S3 line 782) and (entry.updated ?? new Date()).toISOString() (GCS line 741) produce different timestamps if the listing entry lacks the field.

    Concrete scenario: Two peers race through discovery at timestamps T1 and T2. Both see the same 3 objects (with lastModified populated), but the serialized JSONs differ in the lastPulled field. Last-writer-wins overwrites the first peer's index.

    Impact: Functionally benign — the entry keys, sizes, and (usually) modification timestamps are identical, so sync behavior is correct. No data loss. But a future maintainer reading "same… index" could rely on byte-level idempotency for a conditional write optimization and be surprised. Consider softening the comment to "same listing → functionally equivalent synthesized index (entry keys and sizes match; only metadata timestamps may differ)".

  2. S3ListResult.keysS3ListResult.entries and GcsListResult.keysGcsListResult.entries are breaking interface changess3_client.ts:71, gcs_client.ts:72

    These are exported interfaces. No in-tree callers use the old .keys shape (verified via grep), but any external consumer importing S3ListResult or GcsListResult from the _lib module will get a compile-time break. The _lib path convention suggests internal-only, so this is likely fine, but worth confirming no external extensions depend on these types.

Low

  1. No upper bound on listAllObjects in the discovery paths3_cache_sync.ts:758, gcs_cache_sync.ts:727

    A bucket with millions of legacy objects (not inconceivable for a long-running datastore) produces an unbounded in-memory array, an O(n) filter pass, an O(n) entries map, and a potentially multi-MB JSON string — all held simultaneously. For the migration/setup path this targets, this is acceptable, but a bucket with 1M+ objects would cause significant memory pressure. A maxKeys/maxResults guard with a warning log would be a belt-and-suspenders safety net.

  2. Unnecessary optional chaining on retryWithBackoff returns3_cache_sync.ts:798, gcs_cache_sync.ts:761

    putResult?.etag and putResult?.generation use optional chaining, but retryWithBackoff<T> returns Promise<T> (never undefined) — it either resolves with the value or throws. The ?. is harmless but suggests putResult could be nullish, which it can't be. Cosmetic only; not blocking.

Verdict

PASS — The core logic is correct. The discovery path properly handles both sub-cases (empty bucket vs. populated-but-unindexed), error propagation is sound (listing/put failures throw rather than leaving corrupted state), isInternalCacheFile covers all known internal files including .datastore-index.json itself (defensive against concurrent writes), the this.index-before-put ordering mirrors the existing post-fetch path, and the API migration from keys: string[] to entries: Entry[] is complete with no orphaned callers. Tests are thorough — 5 unit tests per extension plus an integration test proving the real SDK error shape. The one substantive note (Medium #1) is a documentation accuracy issue, not a correctness issue.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. Missing inline comment on the new S3 integration test — CLAUDE.md requires a comment explaining why is needed whenever it appears. The DEF-2 section comment at line 1076 covers the earlier integration tests, but the new pullIndex discovery test at the end of the file is in its own block and has no comment. The GCS test file has a standalone comment block (around line 1508) that serves exactly this purpose; a similar one-liner near the new S3 test would satisfy the rule (e.g. "AWS SDK connection pooling trips Deno's resource leak detection").

  2. Synthesized falls back to when listing lacks timestamps — in both implementations, / sets the index entry's to the time discovery ran rather than the object's actual modification time. This is documented and reconciles real mtimes during download, so it's not incorrect — just worth noting that a discovery index is not a faithful reflection of remote object timestamps.


The implementation is sound. The idempotency argument (same listing → same synthesized index across racing peers, so unconditional is benign) is correctly reasoned and documented. The interface changes to / (adding , /, / to list entries) are fully migrated with no orphaned callers. The five unit tests per extension cover all the cases claimed (discovery write-back with exact-match assertion, hydration via , internal-file filter, empty-bucket fallthrough regression pin, and the side benefit). The S3 integration test additionally proves the real AWS SDK surfaces the missing-index 404 with the discriminant the catch branch relies on — a valuable guard against future SDK reclassification. No types, no default exports, no live cloud services, env vars restored in via 's cleanup. All good.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. Missing inline sanitizeResources comment on new S3 integration test — CLAUDE.md requires a comment explaining why sanitizeResources: false is needed whenever it appears. The DEF-2 section comment at line 1076 covers the earlier integration tests, but the new "pullIndex discovery" test added at the end of the file is in its own block and has no comment. The GCS test file has a standalone comment block (around line 1508) that serves exactly this purpose; a similar one-liner near the new S3 test would satisfy the rule (e.g. "AWS SDK connection pooling trips Deno resource leak detection").

  2. Synthesized lastModified falls back to now when listing lacks timestamps — in both discoverIndexFromBucket implementations, the fallback (entry.updated ?? new Date()).toISOString() sets the index entry's lastModified to the time discovery ran rather than the object's actual modification time. This is documented and pullChanged reconciles real mtimes during download, so it is not incorrect — just worth noting that a discovery index is not a faithful reflection of remote object timestamps.


The implementation is sound. The idempotency argument (same listing produces same synthesized index across racing peers, so unconditional PutObject is benign) is correctly reasoned and documented. The interface changes to S3ListResult / GcsListResult (adding size, etag/generation, lastModified/updated to list entries) are fully migrated with no orphaned callers. The five unit tests per extension cover all the cases claimed: discovery write-back with exact-match assertion, hydration via pullChanged, internal-file filter, empty-bucket fallthrough regression pin, and the pushChanged side benefit. The S3 integration test additionally proves the real AWS SDK surfaces the missing-index 404 with the name === "NoSuchKey" discriminant the catch branch relies on — a valuable guard against future SDK reclassification. No any types, no default exports, no live cloud services, env vars restored in finally via withProgrammableServer's cleanup.

- Soften the "idempotent across racing peers" claim in both
  discoverIndexFromBucket inline notes (s3 + gcs). Discovery is
  FUNCTIONALLY idempotent — entry keys and sizes match, but
  per-peer metadata timestamps (`lastPulled`, the `lastModified`
  fallback when the SDK didn't surface one) make the JSON bodies
  byte-different. Sync behaviour depends on keys + sizes, not
  timestamps, so last-writer-wins is still benign — but the comment
  now warns future maintainers NOT to add a content-fingerprint
  optimization on the assumption of byte-equality.

- Add a comment block above the new s3 integration test explaining
  why `sanitizeResources: false` is set (mirrors the same setting
  used by every test in the DEF-2 integration block — AWS SDK
  keep-alive connections trip Deno's resource-leak detector and are
  reclaimed at runtime teardown).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. GCS fallback on zero-byte objects: In gcs_client.ts, parseInt(obj.size, 10) returns NaN only if obj.size is present but non-numeric — GCS always returns a numeric string so this is safe in practice, but an explicit Number(obj.size) || 0 would be more defensive. Non-blocking.

  2. ETag double-quotes note: The integration test comments that the raw ETag from the PUT includes surrounding double-quotes (e.g. '"index-etag"') and correctly documents that normalizeETag() is what callers use for comparison. Consider adding a similar note in discoverIndexFromBucket itself for future readers who may be surprised the returned fingerprint is not normalized. Non-blocking.


Overall: The fix is correct and well-designed. The discoverIndexFromBucket method is properly symmetric between S3 and GCS, the documented invariants (unconditional put, in-memory write before remote put, all callers inherit discovery) are sound, and the test suite covers all meaningful cases including the regression pin for empty buckets and the pushChanged side-benefit. The S3ListResult/GcsListResult API change from keys: string[] to entries: S3ListEntry[]/GcsListEntry[] is internal to each extension and all callers are updated. CLAUDE.md rules are followed — no model files touched, no any types, named exports, exact version pins, sanitizeResources: false with comment. Approved.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. GCS size fallback: In gcs_client.ts, parseInt(obj.size, 10) is safe since GCS always returns a numeric string, but Number(obj.size) || 0 would be more defensive if obj.size is ever an unexpected value. Non-blocking.

  2. ETag double-quotes: The integration test correctly notes the raw ETag from PUT includes surrounding double-quotes and that normalizeETag() is used for comparisons. Adding a similar note in discoverIndexFromBucket itself would help future readers who may be surprised the returned fingerprint is not normalized. Non-blocking.

Overall: The fix is correct and well-designed. discoverIndexFromBucket is properly symmetric between S3 and GCS, the documented invariants (unconditional put, in-memory write before remote put, all callers inherit discovery) are sound, and the test suite covers all meaningful cases including regression pins for empty buckets and the pushChanged side-benefit. The S3ListResult/GcsListResult API change from keys: string[] to entries: S3ListEntry[]/GcsListEntry[] is internal and all callers are updated. CLAUDE.md rules are followed: no model files touched, no any types, named exports, exact version pins, sanitizeResources: false with comment. Approved.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Reviewed all 6 hand-written files (2 sync services, 2 client modules, 2 test files); skipped manifest YAMLs (version bumps only, no logic).

Critical / High

None found.

Medium

  1. datastore/gcs/extensions/datastores/_lib/gcs_client.ts:1009parseInt on obj.size is safe but fragile against a hypothetical empty string.
    parseInt("", 10) returns NaN. The guard obj.size ? parseInt(obj.size, 10) : 0 handles undefined and empty-string (both falsy), so this is safe today. However, a GCS JSON API response with "size": "0" (truthy string) correctly produces 0. No actual breakage scenario exists with the real API — noting only because the S3 sibling gets Size as a native number from the SDK and avoids parsing altogether. No action required.

  2. Both discoverIndexFromBucket implementations skip scrubIndex() after building the index.
    The normal pullIndex fetch path calls scrubIndex() to remove zombie internal-file entries (swamp-club#29). The discovery path builds the index by filtering isInternalCacheFile before creating entries, which is functionally equivalent to a post-hoc scrub. This is correct — but it's an implicit invariant: if scrubIndex ever gains additional scrub criteria beyond isInternalCacheFile, the discovery path won't inherit them. Acceptable because both functions live in the same file and the relationship is documented. No action required.

Low

  1. Test mocks for listAllObjects don't return updated (GCS) or lastModified (S3) timestamps.
    The mocks always exercise the ?? new Date() fallback path in discoverIndexFromBucket. The happy path where timestamps are present and propagated correctly is untested. Unlikely to break (it's a direct assignment), but a gap. Cosmetic — no action required.

  2. S3 mock listAllObjects doesn't accept/check signal parameter.

    listAllObjects(): Promise<Array<{...}>>

    vs the GCS mock which accepts and checks it:

    listAllObjects(_subPrefix?: string, signal?: AbortSignal): Promise<...>

    JavaScript silently ignores extra arguments, so no test breakage, but the S3 mock wouldn't catch a bug where discovery fails to propagate abort signals. Cosmetic inconsistency — no action required.

Verified non-issues

  • Return type change (keys: string[]entries: S3ListEntry[] / GcsListEntry[]). Confirmed all callers of listObjects and listAllObjects across the datastore/ tree. Only internal pagination (listAllObjects → listObjects) and the new discoverIndexFromBucket methods call them. No external consumers reference the old .keys property. No breakage.

  • isInternalCacheFile correctly excludes .datastore-index.json from the listing. During discovery the index is absent (that's the trigger), so the listing won't include it. After the put-back, any subsequent listing would include it, but isInternalCacheFile filters it. No self-referential index entry possible.

  • Race between concurrent peers discovering simultaneously. Both list the same objects, build functionally-equivalent indexes (same keys, same sizes, different metadata timestamps), and put unconditionally. Last writer wins. Since sync depends on keys + sizes (not timestamps), the last-writer-wins outcome is benign. The unconditional put is the correct choice given the documented cross-backend portability concern (If-None-Match: * not portable).

  • indexMutated not set in discovery path. Correct: pushChanged uses indexMutated to trigger writeback when scrub removed zombie entries. Discovery produces a clean index by construction, so no writeback is needed beyond what discovery itself performs. If pushChanged subsequently pushes files, pushed > 0 triggers writeback independently.

  • Error propagation. If listAllObjects or putObject throws inside discoverIndexFromBucket, the error propagates to pullIndex's caller. This is correct: this.index is set in-memory but atomicWriteTextFile hasn't run, so no orphaned local state. The next forceRemote call re-triggers discovery.

Verdict

PASS. The discovery logic is sound, the two provider implementations are properly symmetric, the listObjects return-type change is fully contained, and the test coverage (5 unit tests + 1 integration test per provider) pins the important invariants. The edge cases I probed (empty bucket, internal-only-files bucket, concurrent peers, partial failure mid-discovery) are all handled correctly.

- Swap `obj.size ? parseInt(obj.size, 10) : 0` for `Number(obj.size)
  || 0` in gcs_client.ts. More defensive against a hypothetical
  empty-string `size` field; matches the S3 sibling's spirit (which
  gets a native number from the SDK and avoids parsing entirely).
  Comment explains the asymmetry between the two clients.

- Add inline notes in both `discoverIndexFromBucket` implementations
  documenting that the returned fingerprint is the raw form from the
  PUT response — S3's quoted ETag, GCS's numeric `generation` string —
  matching the post-fetch path's contract. `normalizeETag()` is what
  callers apply for byte-level comparison; don't strip quotes here or
  the discovery path would diverge from the existing fingerprint
  convention.

Both observations from the reviewer were "no action required" but
worth documenting for future readers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Summary

This PR correctly implements self-healing index discovery for the @swamp/s3-datastore and @swamp/gcs-datastore extensions. When pullIndex hits a 404/NotFound for .datastore-index.json, it now lists the bucket, filters internal cache files, synthesizes an index, publishes it, and returns a fingerprint — restoring the pre-existing empty-bucket semantics as a fallthrough. The side benefit (preventing push from clobbering remote-only entries on unindexed buckets) is also correctly handled since both pullChanged and pushChanged go through pullIndex.

CLAUDE.md compliance — no model/ files touched, no any types in hand-written code, all exports are named, no new npm deps introduced.

Testing — comprehensive unit coverage (A: discovery+writeback, B: pullChanged hydration, C: internal-file filter, D: empty-bucket fallthrough, E: pushChanged merged view), plus a programmable-server integration test proving the real AWS SDK surfaces name === "NoSuchKey". The integration test correctly uses sanitizeResources: false with an explanatory comment. All mocks use in-memory storage and temp dirs; no live cloud services.

API surface changelistAllObjects / listObjects return types changed from string[] / { keys: string[] } to richer entry structs. Both methods have exactly one external caller each (the respective cache_sync.ts files), both updated correctly. TypeScript strict mode would catch any missed callers at check time.

Security — no credential exposure, no shell interpolation, path traversal protection unchanged (assertSafePath still applied on all file-path operations).

Minor observation (not blocking): In the GCS discoverIndexFromBucket, putResult?.generation uses optional chaining even though retryWithBackoff returns GcsWriteResult (non-nullable) and GcsWriteResult.generation is a required string. The ?. and ?? null will never trigger for GCS. Harmless defensive code — mirrors the S3 sibling where etag is genuinely optional — so no action needed.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

I traced every code path in the new discoverIndexFromBucket methods (both S3 and GCS), the caller integration in pullIndex, the interactions with pullChanged and pushChanged, the fast-path sidecar, signal propagation, error handling, and concurrent-peer scenarios. The client return-type changes (listObjects/listAllObjects from string[] to structured entries) have no other callers in production code — no breakage.

Critical / High

None found.

Medium

  1. listAllObjects in discovery not wrapped in retryWithBackoffs3_cache_sync.ts:765 and gcs_cache_sync.ts:732. The PutObject that publishes the discovered index IS retried (line 799 / 760), but the listing call that precedes it is not. A transient 5xx or timeout during listing would fail the entire sync without retry, while the adjacent PutObject would have been retried up to 3 times. This is inconsistent — if the retry budget exists because these operations are flaky, the listing is equally flaky. Breaking example: S3 returns a 503 ServiceUnavailable on the ListObjectsV2 call during discovery; the sync fails immediately instead of retrying. Mitigation: wrap the listAllObjects call in retryWithBackoff, same as the putObject call two lines below. Not blocking because: (a) the discovery path is only reached on a 404 (uncommon), and (b) the user retries the sync at the CLI level.

  2. S3ListResult and GcsListResult interfaces changed from { keys: string[] } to { entries: S3ListEntry[]/GcsListEntry[] }s3_client.ts:71 and gcs_client.ts:72. These are exported interfaces. While no current callers broke (I verified), any downstream consumers of these clients who reference result.keys would get a compile-time error. The listAllObjects return type also changed from string[] to S3ListEntry[]/GcsListEntry[]. This is a breaking API contract change. Not blocking because: the _lib/ location signals internal use, there are no external callers, and this repo pins exact versions.

Low

  1. Theoretical: discovery races with concurrent index creation — Between the initial getObject 404 and the listAllObjects call, another peer could publish a valid index. Discovery would still list, build, and unconditionally overwrite with a functionally-equivalent index. This is documented as intentional (note i in both implementations), and the overwrite is benign (entry keys + sizes match, timestamps differ). No action needed.

  2. GCS Number(obj.size) || 0 treats a genuine size-zero object as 0gcs_client.ts:1017. Number("0") || 0 evaluates to 0, which is correct, but arrives via the || 0 fallback rather than the Number() conversion. If a future change replaced || with ?? (to distinguish 0 from NaN), the semantics would silently change. This is a standard JS idiom and not worth fixing, but worth noting that Number(obj.size) ?? 0 would be more precise.

Verdict

PASS. The code is well-structured with careful documentation of invariants and failure-state ordering. Discovery is functionally idempotent across racing peers, the internal-file filter is correctly applied before index construction (not just after), empty-bucket semantics are preserved, both pullChanged and pushChanged callers inherit discovery transparently, and the test coverage is thorough (5 new unit tests per extension + 1 integration test). The medium findings are minor consistency and API hygiene issues, neither of which risks data loss or incorrect sync behavior.

@stack72 stack72 merged commit ae35c6f into main May 4, 2026
30 checks passed
@stack72 stack72 deleted the issue-225-discovery branch May 4, 2026 20:47
stack72 added a commit that referenced this pull request May 5, 2026
…rom cache sync (#123)

Extend `isInternalCacheFile` to exclude any path with basename `.lock` (in
addition to the existing top-level `.datastore.lock`). Without this, the
self-healing `discoverIndexFromBucket` path added in #122 captures
transient per-target lock files (`data/<kind>/<type>/<id>/.lock`, written
by the data tier's lock subsystem via S3 PUT/DELETE) into the synthesized
index. The lock subsystem then deletes them on release, leaving the
remote index referencing missing objects. A fresh reader running
`datastore setup` 404s during hydration, fails to persist the
`datastore:` block to `.swamp.yaml`, and the next `datastore sync --pull`
falls back to the filesystem datastore and reports
`Current datastore type: filesystem`.

Surfaced by the swamp-uat sync_test.ts cross-repo pull tests
(`sync pull retrieves data written by another repo …`) on
@swamp/s3-datastore@2026.05.04.4. Reproduced end-to-end in a Linux
container with a mock S3 server: writer runs a workflow (acquires the
per-target lock, pushes data, releases the lock), reader's setup hits
the 404 on the stale `.lock` index entry, `.swamp.yaml` never gets the
datastore block, sync --pull surfaces the filesystem-fallback error.
With the fix the reader hydrates 19 files cleanly and the pull
succeeds.

Mirrored in @swamp/gcs-datastore — same `discoverIndexFromBucket` path,
same latent bug.

Test plan:
- 1 new unit test per extension pinning the `.lock` exclusion at any
  depth, plus negative cases (`data/@m/.locked.yaml`, `data/@m/lock`).
- All existing s3 (45) + gcs (43) cache_sync tests still pass.
- Linux+mock-S3 end-to-end repro (writer workflow run → reader setup →
  reader sync --pull) goes from `errors: ["…/.lock - 404 NoSuchKey"]`
  + filesystem-fallback to `filesPulled: 19` + a successful pull
  returning the writer's 4 data records.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant