Skip to content

test: add property-based tests for import normalization and chunking invariants#57

Merged
edheltzel merged 2 commits into
mainfrom
test/property-tests-import-chunking
Jun 11, 2026
Merged

test: add property-based tests for import normalization and chunking invariants#57
edheltzel merged 2 commits into
mainfrom
test/property-tests-import-chunking

Conversation

@edheltzel

@edheltzel edheltzel commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Part of #44 — completes the unblocked import + chunking property groups; the issue stays open to track the export (#43) and dedup (#45) groups.

Summary

Adds fast-check (devDependency, Bun-compatible, runs under bun:test) and two focused property-based test suites covering the groups of #44 that are currently unblocked. Property oracles are implemented independently on generator inputs (spec restated, not implementation mirrored), and a manual mutation check (i += sizei += size + 1 in chunked) confirmed the chunking properties fail loudly on an exactly-once/order bug.

Chunking properties — tests/lib/chunk.property.test.ts

  • Flatten round-trip over arbitrary arrays (duplicates included) and sizes: order preservation + exactly-once processing in one invariant
  • Chunk-count formula, non-empty chunks, all-but-last exactly size
  • Boundary lengths 0 / 1 / 499 / 500 / 501 / 999 / 1000 / 1001 plus larger N (bounded at 2,500) under the default SQLITE_SAFE_CHUNK_SIZE
  • Empty input → zero chunks; RangeError on every non-positive/non-integer size (incl. NaN/±Infinity); input never mutated

Import normalization properties — tests/lib/conversation-import.property.test.ts

  • Shared normalizeSession contract across all three source adapters (table-driven for claude-ai/chatgpt; slack separate for its single-session/path-derived semantics): sanitized + prefixed session IDs, trimmed non-empty titles, project derivation incl. empty-string fallback, startedAt/endedAt from sorted message timestamps, every message stamped with session ID/project/valid role/non-empty content
  • Messages survive normalization exactly as (role, timestamp, content) tuples (multiset equality — nothing dropped, invented, restamped, or role-mis-mapped; pins claude-ai human→user)
  • Whitespace-only and empty conversations yield zero sessions in every adapter
  • Unrecognized input (arbitrary JSON + explicit empties): no detection, zero sessions from every adapter, repeatably
  • Same input → deep-equal normalized output (generators always include timestamps; messages without one legitimately get a new Date() fallback in the parsers, which would make determinism assertions meaningless)

Deferred scope (per #44 dependencies)

Issue #44 should stay open-tracked for those two groups; this PR completes only the currently unblocked import + chunking groups (chunking helper from #41/PR #55, adapter registry from PR #56).

Testing

  • bun run lint clean (tsc --noEmit)
  • Full suite: 491 pass / 0 fail across 45 files (~13s); the two property files add 13 tests / ~21k assertions in ~80ms
  • Generator sizes bounded (≤3 conversations × ≤8 messages; arrays ≤2,500) per the issue's practicality requirement

Post-Deploy Monitoring & Validation

No additional operational monitoring required — test-only change plus a devDependency; no runtime code touched. CI green on both platforms is the validation signal.

…invariants

Adds fast-check (devDependency) and two focused property suites per issue #44:

- tests/lib/chunk.property.test.ts — chunked() order preservation,
  exactly-once processing via flatten round-trip, chunk-count formula,
  boundary lengths 0/1/499/500/501/999/1000/1001 plus larger N (bounded
  at 2,500), RangeError on invalid sizes, input non-mutation.
- tests/lib/conversation-import.property.test.ts — shared normalizeSession
  contract across all three source adapters (required fields preserved or
  derived, sanitized session IDs, sorted timestamps, message stamping),
  whitespace-only and empty conversations rejected, unrecognized input
  rejected deterministically by every adapter, and parse determinism for
  fully-timestamped payloads.

Export (#43) and dedup (#45) property groups are deferred until those
issues land, per the dependency note on #44.
@edheltzel

Copy link
Copy Markdown
Owner Author

Review — PR #57 (property-based tests for import normalization + chunking)

Verified independently before this review:

  • CI green on head c9356e8 (macOS primary + Ubuntu compatibility smoke, both pass).
  • Full suite reproduced locally at PR head: 491 pass / 0 fail (~13s), matching the PR claim.
  • Mutation spot-checks (3): the PR-claimed mutation (i += sizei += size + 1 in src/lib/chunk.ts) reproduces exactly — 3 of 6 chunk properties fail, within 1–2 generated cases. Two additional independent mutations: overlapping chunks (slice(i, i + size + 1)) — killed (3 failures); removing the timestamp sort in normalizeSession — killed by all three adapter properties. The oracles are genuinely independent (flatten round-trip, Math.ceil count formula, spec restated), not the implementation mirrored back at itself.
  • fast-check ^4.8.0 devDependency pin matches the existing caret style; lockfile adds only fast-check@4.8.0 + pure-rand@8.4.0. Sane.
  • Export (Add mem export with JSON, Markdown, SQL dump, and SQLite backup formats #43) / dedup (Add non-destructive dedup with provenance-aware survivor selection #45) deferral is correct per Add focused property-based tests for import, chunking, export, and dedup invariants #44's instructions — both issues exist and are open.
  • The conversation-import file's "virtual paths never touch the filesystem" claim verified against parseSlackConversations (statSync only runs when rootPath is passed; tests never pass it).

This is strong work — disciplined scope, bounded generators, the shared expectSessionInvariants contract enforced across all three adapters, and the whole suite costs ~80ms. Three blocking items below, two of which are small strengthenings of an existing assertion; the third is a one-word PR-body fix with process consequences.

Blocking

1. Closes #44 contradicts the PR's own deferred scope — change to Part of #44 / Refs #44.
The PR body says "Issue #44 should stay open-tracked for those two groups," and #44 itself instructs: "complete only the currently unblocked property groups and leave the remaining groups explicitly pending in this issue." The Closes #44 keyword will auto-close the issue on merge, silently dropping the tracking for the export/dedup groups. Edit the PR body before merge.

2. Timestamp values are never pinned — a parser that stamps every message with fallbackNow passes the entire suite.
The generators produce known atSeconds per message but no assertion ever uses them as an oracle. In expectSessionInvariants, the per-message timestamp check is format-regex only (conversation-import.property.test.ts:136), and startedAt/endedAt (:128-129) are compared against session.messages[*].timestamp — self-referential, so they hold even when every timestamp is wrong. The determinism property would catch a fallbackNow regression only if the clock ticks between two back-to-back parse calls — a flaky catch at best. Issue #44's first normalization criterion is "required fields are preserved/derived correctly"; timestamps are currently derived-unverified.
Fix: in the multiset comparison (:180-181), assert the multiset of session.messages.map(m => m.timestamp) equals the multiset of new Date(m.atSeconds * 1000).toISOString() over surviving generated messages (claude-ai/chatgpt cases; slack analogously).

3. claude-ai role mapping (humanuser) is asserted nowhere in the repo.
The property only checks set membership (:134), and the example suite pins roles for chatgpt (tests/lib/conversation-import.test.ts:100) and slack (:114) but not claude-ai — verified by reading the claude-ai test block, which asserts counts/extraction writes only. A regression in roleFrom mapping humanassistant passes all 491 tests.
Fix: compare multisets of (role, content) pairs (or (role, content, timestamp) triples, folding in item 2) instead of bare contents at :180-181 — pins mapping and pairing for both multi-conversation adapters in one change.

Non-blocking suggestions

  • ChatGPT no-current_node fallback is uncovered everywhere: the generator always sets current_node when messages exist (:96), so the create_time-sort fallback, cycle guard, and is_visually_hidden_from_conversation skip in src/lib/conversation-import.ts have zero behavioral coverage. Sometimes omitting current_node in the generator covers it nearly for free.
  • Timestamp ties are effectively never generated (secondsArb spans ~3B seconds with ≤8 messages), so stable-sort tie behavior is unpinned despite being a classic edge. A tie-prone fc.oneof variant fixes it.
  • Detector near-misses: fc.jsonValue() essentially never produces mapping/chat_messages+sender/ts+text, so the unrecognized-input property only sees deep garbage. A dedicated near-miss arbitrary ({mapping: 'str'}, slack message missing ts, …) would probe where detect/parse disagreement bugs actually live.
  • Slack content survival: the slack property checks count and role-by-botness but never that generated text survives into content; toContain(msg.text) pins survival without coupling to the [sender] prefix format.
  • Unicode: fast-check v4 fc.string() defaults to ASCII graphemes, so sanitizeId's non-ASCII replacement and the 180-char slice are unexercised. Mixing in fc.string({ unit: 'grapheme' }) for ids/titles would cover it.
  • The 8 boundary lengths in lengthArb are each hit only probabilistically per run; passing them via fc.assert(..., { examples: [...] }) would make the property file deterministically self-sufficient (the example matrix in chunk.test.ts does already cover them suite-wide).

Verdict

REQUEST CHANGES — the invariant architecture is sound and mutation-verified, but items 2–3 leave two of #44's "required fields preserved/derived correctly" criteria (timestamps, claude-ai roles) silently corruptible across the whole repo suite, and item 1 would auto-close a tracking issue that must stay open. All three are small, contained fixes; happy to re-review immediately after.

…ontent) tuples

Addresses PR #57 review blocking items 2 and 3: the content-only multiset
comparison was blind to a parser restamping every message with the
fallback timestamp, and claude-ai's human→user role mapping was asserted
nowhere in the repo. The shared expectMessageTuples helper now compares
multisets of JSON-encoded (role, timestamp, content) tuples for all three
adapters. Mutation-verified: fallbackNow restamping fails 4 properties;
human→assistant mis-mapping fails the claude-ai property.
@edheltzel

Copy link
Copy Markdown
Owner Author

Review blocking items addressed in 4a29472:

  1. PR body: Closes #44Part of #44 — the issue stays open to track the export (Add mem export with JSON, Markdown, SQL dump, and SQLite backup formats #43) and dedup (Add non-destructive dedup with provenance-aware survivor selection #45) groups.
  2. Timestamps pinned: the multiset comparison now uses JSON-encoded (role, timestamp, content) tuples against the generated atSeconds oracle for all three adapters (slack included, with sub-ms ts suffix truncation noted). Mutation check: forcing timestampFrom to always return fallback fails 4 of 7 properties.
  3. claude-ai role mapping pinned: same tuple change; generator roles round-trip (user is sent as the human sender). Mutation check: mapping humanassistant in roleFrom fails exactly the claude-ai property.

Gates re-verified at head: bun run lint clean, 491 pass / 0 fail (~13s). The six non-blocking suggestions are deferred as noted.

@edheltzel

Copy link
Copy Markdown
Owner Author

Re-review — delta at 4a29472

All three blocking items from my previous review are resolved. Verified independently:

Delta scope confirmed: one commit touching only tests/lib/conversation-import.property.test.ts (+26/−6). Chunk properties, package.json, and bun.lock untouched. PR body now reads "Part of #44" with no auto-close keyword anywhere — issue #44 stays open for the export (#43) / dedup (#45) groups.

The tuple oracle genuinely closes the gap — it is not the old assertion renamed. The expected side of expectMessageTuples is built solely from generator inputs — m.role, isoFromSeconds(m.atSeconds), m.text — with no reference to parser output, replacing the self-referential startedAt/endedAt situation. The claude-ai case round-trips user through the human sender wire format, so the roleFrom mapping is pinned, and the slack case correctly derives whole-second ISO strings (sub-millisecond ts suffix truncation, accurately commented). Bonus: full tuple equality on slack also pins exact content survival ([sender] text), subsuming one of my non-blocking suggestions.

Mutation spot-checks (re-run myself at 4a29472):

  • roleFrom: humanassistant — fails exactly the claude-ai property (1/7), matching the worker's claim. This mutation passed all 491 tests before this commit.
  • timestampFrom → always return fallback — fails 3/7 deterministically (claude-ai, chatgpt, slack: the three properties carrying the new oracle). The worker reported 4/7; the fourth (determinism) only trips when the clock ticks between back-to-back parse calls, so it's run-dependent — the substantive claim (fallback restamping is now caught loudly) is confirmed.

Gates: CI green on 4a29472 (macOS primary + Ubuntu smoke). Full suite locally: one transient failure on a single loaded run (~2× normal duration), not reproducible across 4 subsequent clean 491/0 runs; the two property files ran 15/15 clean in isolation — pre-existing environment flake, not attributable to this PR.

Deferred items: the remaining non-blocking suggestions (chatgpt no-current_node fallback coverage, timestamp ties, detector near-misses, unicode generators, deterministic boundary examples) stay deferred as agreed; slack content survival is now covered by the tuple equality.

APPROVE

@edheltzel edheltzel merged commit 074579a into main Jun 11, 2026
2 checks passed
edheltzel added a commit that referenced this pull request Jun 11, 2026
Phase 2 of issue #44 — the export (#43) and dedup (#45) property groups
deferred from PR #57. Oracles are computed from generator inputs only
(tuple-multiset pattern); duplicate groups are built structurally so
expected grouping, survivors, and similarities never read the
implementation under test.
@edheltzel edheltzel deleted the test/property-tests-import-chunking branch June 11, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant