Skip to content

benchmarks: re-run Suite C post-dedup and diff against the 2026-06-11 baseline #78

@edheltzel

Description

@edheltzel

The roadmap's outcome loop, capstone of this milestone.

The committed 2026-06-11 Suite C baseline records exact-lookup MRR collapsing 1.0 → 0.75 → 0.13 → 0.00 across the 100/1k/10k/100k corpus ladder, with true targets buried at ranks 9/10/102/498 under unmarked near-duplicates. The baseline explicitly excluded dedup ("Dedup was NOT run before measurement").

Task: run recall dedup --execute over the seeded benchmark corpora, re-run Suite C, and diff against benchmarks/results/2026-06-11T09-36-53-suite-C.jsonl. The diff is dedup's efficacy report — and the empirical evidence the parked entity-keying gate (#49) is waiting on.

Blocked by: #70 (silent-zero hardening — REQUIRED before any baseline re-record) and #63 (cross-run safety fix should land first so measured behavior is final behavior).

Methodology note: keep baseline-first discipline — record honest numbers, no invented thresholds; document the dedup invocation in the run manifest so the comparison is reproducible.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions