Skip to content

feat(datastore): thread relPath through markDirty for per-path dirty tracking (swamp-club#232)#1300

Merged
stack72 merged 1 commit intomainfrom
feat/markdirty-relpath-issue-232
May 4, 2026
Merged

feat(datastore): thread relPath through markDirty for per-path dirty tracking (swamp-club#232)#1300
stack72 merged 1 commit intomainfrom
feat/markdirty-relpath-issue-232

Conversation

@keeb
Copy link
Copy Markdown
Contributor

@keeb keeb commented May 4, 2026

Summary

Adds an optional relPath field to DatastoreSyncOptions so swamp core can
attribute each markDirty call to a specific cache-relative path. Strictly
additive: existing extensions, call sites, and bulk-invalidate semantics are
unchanged. Per-key-queryable backends (Mongo/Redis/etcd-style) can now record
exactly which paths changed and skip the full-cache walk on pushChanged.

The eight-rule contract is on the markDirty JSDoc and in
design/datastores.md "markDirty() contract": pre-write timing,
absence-on-disk = delete, undefined = bulk, restart-loses-set, cache-relative

  • forward-slash with native-separator conversion on Windows, backward
    compatibility, field scope, bulk-overrides-per-path-within-one-operation.

Per-call granularity emitted by core

Method relPath
save, append, allocateVersion data-name directory
removeLatestMarker data-name directory
delete(version) version directory
delete() data-name directory
finalizeVersion version directory
Yaml save, delete per-yaml file path
rename, collectGarbage, bulk yaml deletes undefined (bulk)

The internal MarkDirtyHook is positional (relPath?: string) because
repositories don't have an AbortSignal in scope; the wiring layer in
repo_context.ts (buildMarkDirtyHook helper) packs into the public options
bag and owns the absolute → cache-relative + forward-slash normalization.

Test plan

  • deno fmt --check clean (1311 files)
  • deno lint clean (1185 files)
  • deno check clean
  • deno run test — 5328 passed, 0 failed
  • deno run compile — binary built
  • Per-repository contract tests assert relPath per granularity (incl. new yaml_evaluated_definition_repository_test.ts).
  • repo_context_test.ts end-to-end wiring assertion with Windows-aware forward-slash check.
  • testing_package_compat_test.ts exercises both markDirty() and markDirty({ relPath }).
  • rename ordering test pins rule 8: bulk signal emitted before the inner save()'s per-path signal.
  • End-to-end verified against an out-of-tree Mongo-backed datastore extension — relPath flows correctly, per-path pushChanged works with subtree expansion, tombstone safety gate holds, pull steady-state ~0.5s, method run + push 1.55s for 13 files.

Closes swamp-club#232

…tracking (swamp-club#232)

Adds an optional `relPath` field to `DatastoreSyncOptions` so swamp core can
attribute each `markDirty` call to a specific cache-relative path. Strictly
additive: existing extensions, call sites, and bulk-invalidate semantics are
unchanged. Per-key-queryable backends (Mongo/Redis/etcd-style) can now record
exactly which paths changed and skip the full-cache walk on `pushChanged`.

The contract is documented in eight load-bearing rules on the JSDoc and in
`design/datastores.md` "markDirty() contract" section: pre-write timing,
absence-on-disk = delete, undefined = bulk, restart-loses-set, cache-relative
+ forward-slash with native-separator conversion on Windows, backward
compatibility, field scope, and bulk-overrides-per-path-within-one-operation.

Internal `MarkDirtyHook` becomes positional `(relPath?: string)` because
repositories don't have an `AbortSignal` in scope; the wiring layer in
`repo_context.ts` packs it into the public-interface options bag and owns
the absolute-to-cache-relative + forward-slash normalization.

Per-call granularity emitted by core:
- save / append / allocateVersion: data-name directory (version not yet allocated)
- removeLatestMarker: data-name directory
- delete (with version): version directory
- delete (no version): data-name directory (whole subtree)
- finalizeVersion: version directory
- yaml save / delete: per-yaml file path
- rename / collectGarbage / deleteAllByWorkflowId / clearAll: undefined (bulk)

End-to-end verified against an out-of-tree Mongo-backed datastore extension:
relPath flows correctly, per-path pushChanged works with subtree expansion,
tombstone safety gate holds.

Closes swamp-club#232
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Clean, well-structured PR that threads relPath through the markDirty contract for per-path dirty tracking. Backward compatible, thoroughly tested, cross-platform safe.

Blocking Issues

None.

Suggestions

  1. notifyDirty JSDoc vs parameter name (unified_data_repository.ts:460): The JSDoc says "relPath is the absolute path of the file or directory" but the parameter name is relPath, which could mislead readers into thinking it's already cache-relative. The wiring layer comment explains the conversion, but renaming the parameter to absPath (matching buildMarkDirtyHook's parameter name) would eliminate ambiguity. Minor — the MarkDirtyHook type signature makes the intent clear.

What looks good

  • DDD alignment: buildMarkDirtyHook lives in the composition root (repo_context.ts) where wiring belongs. Repositories don't know about cache roots — proper separation of concerns.
  • Contract design: The 8-rule contract on markDirty is thorough and well-placed — in the JSDoc, design doc, and extension docs simultaneously.
  • Backward compatibility: relPath is optional throughout; existing implementations are unaffected.
  • Test coverage: New yaml_evaluated_definition_repository_test.ts, updated tests for all four repository types, end-to-end wiring test in repo_context_test.ts, and type-compat test updated.
  • Cross-platform: Forward-slash normalization via SEPARATOR check in buildMarkDirtyHook, with the wiring test exercising paths containing separators to catch Windows regressions.
  • yaml_output_repository.delete improvement: Moving notifyDirty inside the loop so it only fires when a match is found (and passing the resolved path) is a nice refinement — no spurious dirty signals on no-op deletes, while maintaining the pre-write timing contract.
  • Security: relative() output is metadata/hint only; actual file I/O continues through assertSafePath.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

No critical or high severity findings.

Medium

  1. removeLatestMarker has no per-path relPath testsrc/infrastructure/persistence/unified_data_repository_test.ts: The test exercises save, append, allocateVersion, finalizeVersion, rename, delete (both forms), and collectGarbage, but removeLatestMarker is missing. It passes dataNameDir to notifyDirty, which is structurally identical to the other per-path cases, but the lack of test coverage means a future refactor could regress the path argument without detection. Not blocking because the implementation is trivially correct, but worth adding.

Low

  1. buildMarkDirtyHook does not validate that absPath is under cacheRootsrc/cli/repo_context.ts:146: If a caller passes an absolute path outside the cache root, relative(cacheRoot, absPath) produces a ../-prefixed string. In practice this cannot happen because all repository baseDir values are resolved under cacheRoot via dsPath() in the repository factory, and all getPath/getDataNameDir methods join on baseDir. The invariant is sound but implicit — if a future change wires a repo whose baseDir is not under cacheRoot, the relPath would contain .. components and confuse extensions. Documenting this precondition in a one-line assertion or comment would be cheap insurance.

  2. Inline type { relPath?: string } in test mock vs DatastoreSyncOptionspackages/testing/datastore_test_context.ts:215: The markDirty mock parameter is typed as { relPath?: string } rather than DatastoreSyncOptions. Structurally compatible in TypeScript, but if DatastoreSyncOptions gains a new field that extensions rely on, this mock won't exercise it. Cosmetic — not blocking.

  3. Example pushChanged does a full walk when dirty.size === 0 && !bulkInvalidated.claude/skills/swamp-extension-datastore/references/examples.md:128-132: This is correct for the "first push after process restart" case (rule 4), but means every idle push does a full walk even when nothing changed. The comment mentions rule 4 but the code could be clearer — e.g. a separate firstPush flag. This is example/documentation code, not production, so purely advisory.

Verdict

PASS — Strictly additive, well-documented contract, thorough test coverage on the main code paths, backward-compatible interface evolution. The change is sound and ready to merge.

@stack72 stack72 merged commit da05608 into main May 4, 2026
11 checks passed
@stack72 stack72 deleted the feat/markdirty-relpath-issue-232 branch May 4, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants