test: add property-based tests for import normalization and chunking invariants by edheltzel · Pull Request #57 · edheltzel/Recall

edheltzel · 2026-06-11T00:31:52Z

Part of #44 — completes the unblocked import + chunking property groups; the issue stays open to track the export (#43) and dedup (#45) groups.

Summary

Adds fast-check (devDependency, Bun-compatible, runs under bun:test) and two focused property-based test suites covering the groups of #44 that are currently unblocked. Property oracles are implemented independently on generator inputs (spec restated, not implementation mirrored), and a manual mutation check (i += size → i += size + 1 in chunked) confirmed the chunking properties fail loudly on an exactly-once/order bug.

Chunking properties — `tests/lib/chunk.property.test.ts`

Flatten round-trip over arbitrary arrays (duplicates included) and sizes: order preservation + exactly-once processing in one invariant
Chunk-count formula, non-empty chunks, all-but-last exactly size
Boundary lengths 0 / 1 / 499 / 500 / 501 / 999 / 1000 / 1001 plus larger N (bounded at 2,500) under the default SQLITE_SAFE_CHUNK_SIZE
Empty input → zero chunks; RangeError on every non-positive/non-integer size (incl. NaN/±Infinity); input never mutated

Import normalization properties — `tests/lib/conversation-import.property.test.ts`

Shared normalizeSession contract across all three source adapters (table-driven for claude-ai/chatgpt; slack separate for its single-session/path-derived semantics): sanitized + prefixed session IDs, trimmed non-empty titles, project derivation incl. empty-string fallback, startedAt/endedAt from sorted message timestamps, every message stamped with session ID/project/valid role/non-empty content
Messages survive normalization exactly as (role, timestamp, content) tuples (multiset equality — nothing dropped, invented, restamped, or role-mis-mapped; pins claude-ai human→user)
Whitespace-only and empty conversations yield zero sessions in every adapter
Unrecognized input (arbitrary JSON + explicit empties): no detection, zero sessions from every adapter, repeatably
Same input → deep-equal normalized output (generators always include timestamps; messages without one legitimately get a new Date() fallback in the parsers, which would make determinism assertions meaningless)

Deferred scope (per #44 dependencies)

Export properties — deferred until Add mem export with JSON, Markdown, SQL dump, and SQLite backup formats #43 lands
Dedup properties — deferred until Add non-destructive dedup with provenance-aware survivor selection #45 lands

Issue #44 should stay open-tracked for those two groups; this PR completes only the currently unblocked import + chunking groups (chunking helper from #41/PR #55, adapter registry from PR #56).

Testing

bun run lint clean (tsc --noEmit)
Full suite: 491 pass / 0 fail across 45 files (~13s); the two property files add 13 tests / ~21k assertions in ~80ms
Generator sizes bounded (≤3 conversations × ≤8 messages; arrays ≤2,500) per the issue's practicality requirement

Post-Deploy Monitoring & Validation

No additional operational monitoring required — test-only change plus a devDependency; no runtime code touched. CI green on both platforms is the validation signal.

…invariants Adds fast-check (devDependency) and two focused property suites per issue #44: - tests/lib/chunk.property.test.ts — chunked() order preservation, exactly-once processing via flatten round-trip, chunk-count formula, boundary lengths 0/1/499/500/501/999/1000/1001 plus larger N (bounded at 2,500), RangeError on invalid sizes, input non-mutation. - tests/lib/conversation-import.property.test.ts — shared normalizeSession contract across all three source adapters (required fields preserved or derived, sanitized session IDs, sorted timestamps, message stamping), whitespace-only and empty conversations rejected, unrecognized input rejected deterministically by every adapter, and parse determinism for fully-timestamped payloads. Export (#43) and dedup (#45) property groups are deferred until those issues land, per the dependency note on #44.

edheltzel · 2026-06-11T00:47:25Z

Review — PR #57 (property-based tests for import normalization + chunking)

Verified independently before this review:

CI green on head c9356e8 (macOS primary + Ubuntu compatibility smoke, both pass).
Full suite reproduced locally at PR head: 491 pass / 0 fail (~13s), matching the PR claim.
Mutation spot-checks (3): the PR-claimed mutation (i += size → i += size + 1 in src/lib/chunk.ts) reproduces exactly — 3 of 6 chunk properties fail, within 1–2 generated cases. Two additional independent mutations: overlapping chunks (slice(i, i + size + 1)) — killed (3 failures); removing the timestamp sort in normalizeSession — killed by all three adapter properties. The oracles are genuinely independent (flatten round-trip, Math.ceil count formula, spec restated), not the implementation mirrored back at itself.
fast-check ^4.8.0 devDependency pin matches the existing caret style; lockfile adds only fast-check@4.8.0 + pure-rand@8.4.0. Sane.
Export (Add mem export with JSON, Markdown, SQL dump, and SQLite backup formats #43) / dedup (Add non-destructive dedup with provenance-aware survivor selection #45) deferral is correct per Add focused property-based tests for import, chunking, export, and dedup invariants #44's instructions — both issues exist and are open.
The conversation-import file's "virtual paths never touch the filesystem" claim verified against parseSlackConversations (statSync only runs when rootPath is passed; tests never pass it).

This is strong work — disciplined scope, bounded generators, the shared expectSessionInvariants contract enforced across all three adapters, and the whole suite costs ~80ms. Three blocking items below, two of which are small strengthenings of an existing assertion; the third is a one-word PR-body fix with process consequences.

Blocking

1. Closes #44 contradicts the PR's own deferred scope — change to Part of #44 / Refs #44.
The PR body says "Issue #44 should stay open-tracked for those two groups," and #44 itself instructs: "complete only the currently unblocked property groups and leave the remaining groups explicitly pending in this issue." The Closes #44 keyword will auto-close the issue on merge, silently dropping the tracking for the export/dedup groups. Edit the PR body before merge.

2. Timestamp values are never pinned — a parser that stamps every message with fallbackNow passes the entire suite.
The generators produce known atSeconds per message but no assertion ever uses them as an oracle. In expectSessionInvariants, the per-message timestamp check is format-regex only (conversation-import.property.test.ts:136), and startedAt/endedAt (:128-129) are compared against session.messages[*].timestamp — self-referential, so they hold even when every timestamp is wrong. The determinism property would catch a fallbackNow regression only if the clock ticks between two back-to-back parse calls — a flaky catch at best. Issue #44's first normalization criterion is "required fields are preserved/derived correctly"; timestamps are currently derived-unverified.
Fix: in the multiset comparison (:180-181), assert the multiset of session.messages.map(m => m.timestamp) equals the multiset of new Date(m.atSeconds * 1000).toISOString() over surviving generated messages (claude-ai/chatgpt cases; slack analogously).

3. claude-ai role mapping (human → user) is asserted nowhere in the repo.
The property only checks set membership (:134), and the example suite pins roles for chatgpt (tests/lib/conversation-import.test.ts:100) and slack (:114) but not claude-ai — verified by reading the claude-ai test block, which asserts counts/extraction writes only. A regression in roleFrom mapping human → assistant passes all 491 tests.
Fix: compare multisets of (role, content) pairs (or (role, content, timestamp) triples, folding in item 2) instead of bare contents at :180-181 — pins mapping and pairing for both multi-conversation adapters in one change.

Non-blocking suggestions

ChatGPT no-current_node fallback is uncovered everywhere: the generator always sets current_node when messages exist (:96), so the create_time-sort fallback, cycle guard, and is_visually_hidden_from_conversation skip in src/lib/conversation-import.ts have zero behavioral coverage. Sometimes omitting current_node in the generator covers it nearly for free.
Timestamp ties are effectively never generated (secondsArb spans ~3B seconds with ≤8 messages), so stable-sort tie behavior is unpinned despite being a classic edge. A tie-prone fc.oneof variant fixes it.
Detector near-misses: fc.jsonValue() essentially never produces mapping/chat_messages+sender/ts+text, so the unrecognized-input property only sees deep garbage. A dedicated near-miss arbitrary ({mapping: 'str'}, slack message missing ts, …) would probe where detect/parse disagreement bugs actually live.
Slack content survival: the slack property checks count and role-by-botness but never that generated text survives into content; toContain(msg.text) pins survival without coupling to the [sender] prefix format.
Unicode: fast-check v4 fc.string() defaults to ASCII graphemes, so sanitizeId's non-ASCII replacement and the 180-char slice are unexercised. Mixing in fc.string({ unit: 'grapheme' }) for ids/titles would cover it.
The 8 boundary lengths in lengthArb are each hit only probabilistically per run; passing them via fc.assert(..., { examples: [...] }) would make the property file deterministically self-sufficient (the example matrix in chunk.test.ts does already cover them suite-wide).

Verdict

REQUEST CHANGES — the invariant architecture is sound and mutation-verified, but items 2–3 leave two of #44's "required fields preserved/derived correctly" criteria (timestamps, claude-ai roles) silently corruptible across the whole repo suite, and item 1 would auto-close a tracking issue that must stay open. All three are small, contained fixes; happy to re-review immediately after.

…ontent) tuples Addresses PR #57 review blocking items 2 and 3: the content-only multiset comparison was blind to a parser restamping every message with the fallback timestamp, and claude-ai's human→user role mapping was asserted nowhere in the repo. The shared expectMessageTuples helper now compares multisets of JSON-encoded (role, timestamp, content) tuples for all three adapters. Mutation-verified: fallbackNow restamping fails 4 properties; human→assistant mis-mapping fails the claude-ai property.

edheltzel · 2026-06-11T00:52:00Z

Review blocking items addressed in 4a29472:

PR body: Closes #44 → Part of #44 — the issue stays open to track the export (Add mem export with JSON, Markdown, SQL dump, and SQLite backup formats #43) and dedup (Add non-destructive dedup with provenance-aware survivor selection #45) groups.
Timestamps pinned: the multiset comparison now uses JSON-encoded (role, timestamp, content) tuples against the generated atSeconds oracle for all three adapters (slack included, with sub-ms ts suffix truncation noted). Mutation check: forcing timestampFrom to always return fallback fails 4 of 7 properties.
claude-ai role mapping pinned: same tuple change; generator roles round-trip (user is sent as the human sender). Mutation check: mapping human → assistant in roleFrom fails exactly the claude-ai property.

Gates re-verified at head: bun run lint clean, 491 pass / 0 fail (~13s). The six non-blocking suggestions are deferred as noted.

edheltzel · 2026-06-11T00:58:46Z

Re-review — delta at `4a29472`

All three blocking items from my previous review are resolved. Verified independently:

Delta scope confirmed: one commit touching only tests/lib/conversation-import.property.test.ts (+26/−6). Chunk properties, package.json, and bun.lock untouched. PR body now reads "Part of #44" with no auto-close keyword anywhere — issue #44 stays open for the export (#43) / dedup (#45) groups.

The tuple oracle genuinely closes the gap — it is not the old assertion renamed. The expected side of expectMessageTuples is built solely from generator inputs — m.role, isoFromSeconds(m.atSeconds), m.text — with no reference to parser output, replacing the self-referential startedAt/endedAt situation. The claude-ai case round-trips user through the human sender wire format, so the roleFrom mapping is pinned, and the slack case correctly derives whole-second ISO strings (sub-millisecond ts suffix truncation, accurately commented). Bonus: full tuple equality on slack also pins exact content survival ([sender] text), subsuming one of my non-blocking suggestions.

Mutation spot-checks (re-run myself at 4a29472):

roleFrom: human → assistant — fails exactly the claude-ai property (1/7), matching the worker's claim. This mutation passed all 491 tests before this commit.
timestampFrom → always return fallback — fails 3/7 deterministically (claude-ai, chatgpt, slack: the three properties carrying the new oracle). The worker reported 4/7; the fourth (determinism) only trips when the clock ticks between back-to-back parse calls, so it's run-dependent — the substantive claim (fallback restamping is now caught loudly) is confirmed.

Gates: CI green on 4a29472 (macOS primary + Ubuntu smoke). Full suite locally: one transient failure on a single loaded run (~2× normal duration), not reproducible across 4 subsequent clean 491/0 runs; the two property files ran 15/15 clean in isolation — pre-existing environment flake, not attributable to this PR.

Deferred items: the remaining non-blocking suggestions (chatgpt no-current_node fallback coverage, timestamp ties, detector near-misses, unicode generators, deterministic boundary examples) stay deferred as agreed; slack content survival is now covered by the tuple equality.

APPROVE

Phase 2 of issue #44 — the export (#43) and dedup (#45) property groups deferred from PR #57. Oracles are computed from generator inputs only (tuple-multiset pattern); duplicate groups are built structurally so expected grouping, survivors, and similarities never read the implementation under test.

edheltzel mentioned this pull request Jun 11, 2026

Add focused property-based tests for import, chunking, export, and dedup invariants #44

Closed

edheltzel merged commit 074579a into main Jun 11, 2026
2 checks passed

edheltzel mentioned this pull request Jun 11, 2026

test: add property-based tests for export and dedup invariants (issue #44 phase 2) #61

Merged

edheltzel deleted the test/property-tests-import-chunking branch June 11, 2026 20:06

edheltzel mentioned this pull request Jun 11, 2026

tests: extraction-semaphore SIGKILL crash-recovery test is flaky on CI #68

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add property-based tests for import normalization and chunking invariants#57

test: add property-based tests for import normalization and chunking invariants#57
edheltzel merged 2 commits into
mainfrom
test/property-tests-import-chunking

edheltzel commented Jun 11, 2026 •

edited

Loading

Uh oh!

edheltzel commented Jun 11, 2026

Uh oh!

edheltzel commented Jun 11, 2026

Uh oh!

edheltzel commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

edheltzel commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Chunking properties — tests/lib/chunk.property.test.ts

Import normalization properties — tests/lib/conversation-import.property.test.ts

Deferred scope (per #44 dependencies)

Testing

Post-Deploy Monitoring & Validation

Uh oh!

edheltzel commented Jun 11, 2026

Review — PR #57 (property-based tests for import normalization + chunking)

Blocking

Non-blocking suggestions

Verdict

Uh oh!

edheltzel commented Jun 11, 2026

Uh oh!

edheltzel commented Jun 11, 2026

Re-review — delta at 4a29472

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edheltzel commented Jun 11, 2026 •

edited

Loading

Chunking properties — `tests/lib/chunk.property.test.ts`

Import normalization properties — `tests/lib/conversation-import.property.test.ts`

Re-review — delta at `4a29472`