Skip to content

refactor(persistence): extract LockfileRepository (W2 prequel for swamp-club#231)#1298

Merged
stack72 merged 1 commit intomainfrom
feat/w2-prequel-lockfile-repository
May 4, 2026
Merged

refactor(persistence): extract LockfileRepository (W2 prequel for swamp-club#231)#1298
stack72 merged 1 commit intomainfrom
feat/w2-prequel-lockfile-repository

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 4, 2026

Summary

Pure persistence-layer refactor. Consolidates the scattered upstream_extensions.json read/write surface into a single LockfileRepository. No user-visible CLI behavior changes; lockfile JSON shape is byte-identical pre/post.

W2 prequel for swamp-club#231 — the lifecycle services planned in W2 (Install/Remove/Upgrade) need to write the lockfile as part of their unit-of-work, but W1b only exposed read access via the getLockedVersion closure. Filed and lifecycled separately (swamp-club#233) per W2's Phase A audit; pre-committed threshold rule (>7 callsites → SPLIT) chose this carve-out.

Net: ~160 LOC for the new repository, 16 callsites migrated, 2 duplicate acquireLock helpers consolidated.

Architectural decision: asymmetric read/write semantics

This is the load-bearing design choice and it's documented in the JSDoc — LockfileRepository deliberately splits read and write semantics:

  • Reads (getEntry, getAllEntries, getLockedVersion) serve from a cache populated once at construction. The cache does NOT auto-refresh. This preserves W1b's "snapshot frozen at construction" race-window contract documented in extension_repository.ts:88-95 — relocated one layer out, contract identical.
  • Writes (writeEntry, removeEntry) acquire the advisory file lock, re-read CURRENT disk state, merge, atomic-write, release, and update the local cache. Two concurrent writers don't clobber each other.

A regression test (LockfileRepository: cross-instance snapshot regression) constructs repoA, mutates the lockfile via a sibling LockfileRepository, asserts repoA still serves the OLD value, then constructs repoB and asserts it sees the NEW value. Future contributors who try to "fix" the cached read to be live will hit this test.

What changed

  • New: src/infrastructure/persistence/lockfile_repository.ts + 12 unit tests.
  • ExtensionRepository constructor takes LockfileRepository (replaces W1b's getLockedVersion closure). All 5 construction sites + the stub helper migrate in the same commit.
  • 9 installExtension callsites and 16 lockfile reader files migrated.
  • 3 DI surfaces (doctor.ts, list.ts, update.ts) take LockfileRepository directly instead of function-shape DIs.
  • emptyLockedVersionLookup deleted (zero callers verified by grep).
  • createInstallContext / createExtensionPullDeps / createExtensionRmDeps / createExtensionUpdateDeps are now async (they construct the snapshot-capturing repository); JSDoc pins the single-use rule (don't reuse a context across multiple operations).

Test plan

  • deno check clean
  • deno lint clean
  • deno fmt clean
  • deno run test5397 passed, 0 failed
  • deno run compile — binary built
  • Author smoke against a real repo (/tmp/swamp-smoke-233):
    • swamp repo init
    • swamp extension pull @stack72/system-extensions ✓ (lockfile written with all expected fields)
    • swamp extension list (log mode) ✓
    • swamp extension list --json ✓ (JSON shape unchanged)
    • swamp doctor extensions ✓ — all 5 registries pass (validates DI surface change)
    • swamp model type search system ✓ — types resolve via ExtensionRepository lockfile-fallback path through new injection
    • swamp extension rm --force ✓ — lockfile becomes {}, files deleted, dirs pruned
    • Sequential extension pull --force ✓ — second pull's writeEntry re-reads disk under lock
    • swamp extension update --check
  • Cross-process concurrency stress: 50 concurrent Deno subprocesses each writing a distinct entry to the same lockfile, repeated 3 times. Result: 150 operations, 0 failures, 0 leftover .lock files, 0 structural corruption (574–766ms per 50-worker run).
  • Reviewer smoke on a different repo
  • ~2-day diversity-matrix soak (Linux + macOS, mixed local/pulled, version transitions). REQUIRED before merge per the prequel-specific gate established in swamp-club#231 Phase A. Don't skip — the architecture-agent's note: "don't skip the soak just because it's obviously trivial."

Forward-only revert posture

No schema changes; no migration. Lockfile JSON shape unchanged → binary downgrade reads the same file back. Reverting this PR is purely a code revert.

Out of scope (deferred by design)

  • W2 lifecycle services (swamp-club#231) — blocked on this landing + soaking.
  • Cross-platform lockfile concurrency improvements — W3 territory; this prequel does not change the existing acquireLock race window.
  • ADV-13 test-seam audit (SQLite-commit-vs-lockfile-write fault injection) — stays in W2's plan v4 step 10; this prequel doesn't introduce that boundary.

🤖 Generated with Claude Code

…mp-club#231) (swamp-club#233)

## Summary

Pure persistence-layer refactor that consolidates the scattered
upstream_extensions.json read/write surface into a single
`LockfileRepository`. No user-visible CLI behavior changes; lockfile
JSON shape is byte-identical pre/post.

Filed and lifecycled separately as a refactor-prequel for W2
(swamp-club#231) per its Phase A audit. The pre-committed mechanical
threshold rule (>7 callsites → SPLIT) chose this carve-out: ~160 LOC
new + 16 callsites touched.

## What ships

- `src/infrastructure/persistence/lockfile_repository.ts` — sole
  gateway for upstream_extensions.json. ASYMMETRIC semantics:
  - Reads serve from a snapshot captured at construction time
    (preserves W1b's "snapshot frozen at construction" race-window
    contract documented in `extension_repository.ts:88-95`).
  - Writes acquire the advisory file lock, re-read disk, merge, atomic-
    write, release, and update the local cache. Two concurrent writers
    don't clobber each other.
- 12 unit tests including a cross-instance snapshot-divergence
  regression and concurrent-writer contention.
- `ExtensionRepository` constructor migrated to take
  `LockfileRepository` (replaces the W1b `getLockedVersion` closure).
  All 5 construction sites + the stub helper move in the same commit.
- 9 `installExtension` callsites and 16 lockfile reader files migrated
  to the repository surface.
- Duplicate `acquireLock` and `removeUpstreamExtension` helpers in
  pull.ts and rm.ts deleted (consolidated into the repository).
- Three DI surfaces (`doctor.ts`, `list.ts`, `update.ts`) take
  `LockfileRepository` directly instead of function-shape DIs.
- `emptyLockedVersionLookup` deleted (zero callers verified).

## Why split from W2

W2 lifecycle services (Install/Remove/Upgrade) need to write the
lockfile as part of their unit-of-work; W1b only exposed read access.
Bundling this refactor with W2's ~1300 LOC of new service code would
have pushed the W2 PR over the auto-ship-on-merge soak threshold.
Splitting matches the W1a → W1b precedent (each ships and soaks before
the next builds on it).

## Test plan

- [x] deno check, deno lint, deno fmt, deno run test (5397 passed, 0 failed)
- [x] deno run compile (binary built)
- [x] Author smoke against a real repo: pull + rm round-trip, list
      (log + JSON), doctor extensions, model type search, sequential
      pulls (writeEntry re-read-under-lock), extension update --check.
      All green.
- [x] Cross-process concurrency: 50 concurrent Deno subprocesses
      writing distinct entries to the same lockfile, repeated 3 times.
      All 150 operations completed; no failures, no leftover .lock
      files, no structural corruption (~600-800ms per 50-worker run).
- [ ] Reviewer smoke on a different repo
- [ ] ~2-day diversity-matrix soak (Linux + macOS, mixed local/pulled,
      version transitions). REQUIRED before merge per the prequel-
      specific gate from swamp-club#231 Phase A.

## Forward-only revert posture

No schema changes; no migration. Lockfile JSON shape is unchanged so a
binary downgrade reads the same file back. Reverting is purely a code
revert.

## Out of scope

- W2 lifecycle services (swamp-club#231) — depends on this landing.
- Cross-platform lockfile concurrency improvements — W3 territory.
- ADV-13 test-seam audit (SQLite-commit-vs-lockfile-write fault
  injection) — stays in W2's plan v4 step 10; this prequel doesn't
  introduce that boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

None.

Verdict

PASS — This is a pure persistence-layer refactor. No user-visible changes: no flags, no help text, no output formats, no error messages (beyond the minor improvement adding "Please retry." to the lock-contention error). Lockfile JSON shape is unchanged. The audit of command files (extension_pull.ts, extension_rm.ts, extension_update.ts, extension_search.ts, doctor_extensions.ts, open.ts) confirms all user-facing surfaces are identical before and after.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Clean persistence-layer refactor that consolidates scattered lockfile read/write into a single LockfileRepository. The migration is thorough — 16 callsites migrated, duplicate acquireLock helpers consolidated, tests comprehensive (12 new unit tests including cross-instance snapshot regression and concurrent writer tests).

DDD assessment

LockfileRepository is well-placed in infrastructure/persistence/ as a persistence gateway. It encapsulates the advisory file lock, atomic writes, and snapshot-caching semantics behind a clean interface. The asymmetric read/write design (cached reads, lock-re-read-merge writes) is clearly documented and protected by a regression test. The constructor/static-factory split (public constructor for test seams, create() for production) is a good pattern.

No blocking issues

The code adheres to CLAUDE.md conventions: license headers present, named exports, no any types, test files co-located, libswamp import boundary respected (CLI files importing LockfileRepository directly from infrastructure/persistence/ follows the established pattern for ExtensionRepository and ExtensionCatalogStore). All async signature changes (createExtensionPullDeps, createExtensionRmDeps, createExtensionUpdateDeps, createInstallContext, createExtensionListDeps) have their callers properly updated.

Suggestions

  1. acquireLock error type: The old acquireLock in pull.ts threw UserError (clean CLI message), but the consolidated version in LockfileRepository throws plain Error. For the pull/rm paths where this surfaces to users, a UserError would give a cleaner experience. Minor since the lock timeout is already rare in practice.

  2. removeEntry missing mkdir: writeEntry calls Deno.mkdir(dirname(...)) before acquireLock, but removeEntry does not. If removeEntry is called when the parent directory doesn't exist, acquireLock would fail trying to create the .lock file. This is safe in practice (you can't remove an extension without a lockfile), but adding the same mkdir guard would make it symmetric.

  3. shorthandUpstream duplication: update_test.ts defines shorthandUpstream() which duplicates fixedLockedVersions() from stub_extension_repository.ts. Could be consolidated into a shared test helper.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Thorough code-path trace of all 39 changed files. This is a well-executed mechanical refactoring that correctly preserves the existing concurrency and snapshot semantics. The asymmetric read/write design is sound, the lock-re-read-merge-write pattern prevents clobbering, and the cache-update-after-write keeps the writer's own reads consistent.

Critical / High

None found.

Medium

  1. Error type regression in lock acquisition failuresrc/infrastructure/persistence/lockfile_repository.ts:192: The consolidated acquireLock throws plain Error, but the old acquireLock in pull.ts threw UserError. The top-level error handler in error_output.ts:64-68 renders UserError as a clean message ("Error: {message}") and Error as the full object with stack trace. This means a lock-exhaustion failure during extension pull now produces a noisy stack trace instead of a clean error message.

    Breaking example: Two concurrent swamp extension pull operations contend for the lock, the loser retries 10 times and fails — user sees a stack trace instead of "Error: Could not acquire lock on upstream_extensions.json. Another pull may be in progress. Please retry.".

    Suggested fix: Import UserError and throw it instead of plain Error, matching the old pull.ts behavior. The old rm.ts also threw plain Error, so this was an inconsistency even before — but the UserError path is the better UX.

Low

  1. removeEntry lacks Deno.mkdir for parent directorylockfile_repository.ts:162: writeEntry (line 130) calls Deno.mkdir(dirname(this.lockfilePath), { recursive: true }) before acquiring the lock, but removeEntry does not. If removeEntry is ever called when the lockfile's parent directory doesn't exist, acquireLock will throw a NotFound error from Deno.open (not AlreadyExists), which gets rethrown as an unhandled error. In practice this is unreachable — callers confirm the entry exists before calling removeEntry, which implies the lockfile was readable — but the asymmetry with writeEntry is notable.

  2. getAllEntries shallow copy doesn't protect nested entry mutationlockfile_repository.ts:106: The method returns { ...this.cache } which copies top-level keys but shares inner UpstreamExtensionEntry object references. A caller doing entries["@foo/bar"].files!.push("injected") would mutate the internal cache. I verified all current callers are read-only, so this is theoretical. The regression test only verifies top-level key deletion, not nested mutation.

Verdict

PASS — No critical or high-severity issues. The one medium finding (error type regression) is on an uncommon failure path and the error message content is still descriptive; it's a cosmetic presentation regression, not a logic error. The refactoring is mechanically correct, all 16 callsites are properly migrated, the snapshot-frozen-at-construction contract is preserved, and the concurrent-writer protection (lock → re-read → merge → atomic-write) is faithfully ported. The cross-instance snapshot regression test is a good guardrail.

@stack72 stack72 merged commit 430c153 into main May 4, 2026
11 checks passed
@stack72 stack72 deleted the feat/w2-prequel-lockfile-repository branch May 4, 2026 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant