You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add stage-local contract tests for the compiler and query pipelines. Current test suite (6 files, 1,530 lines) covers full lifecycle end-to-end but cannot pinpoint which internal stage broke when an answer is wrong.
Compiler pipeline tests needed:
Test category
What it validates
Parser resilience
Same semantic IR under whitespace, reorder, and prose-only rewrites
Graph closure
No dangling references, route closure intact, relation typing intact
Index round-trip
Aliases, anchors, IDs, and route labels remain resolvable
Validation coverage
Known-bad inputs produce specific validation findings
Snapshot determinism
Same source → byte-identical snapshot
Query pipeline tests needed:
Test category
What it validates
Trace determinism
Same snapshot + same query → same selected support set
Seed coverage
Known question patterns activate expected candidate categories
Frontier bounds
Expansion respects hop budget and anchor limits
Projection stability
Answer surface may rephrase; support set must not silently drift
Synthesis isolation
Synthesis failure does not alter deterministic answer content
"Done" looks like: Each compiler/query stage has at least one focused contract test that fails precisely when that stage's promise breaks.
Kind
test coverage
Affected area
src/runtime/ (compiler & query engine)
Acceptance criteria
At least one contract test per compiler stage (parser, graph, index, validation, snapshot)
At least one contract test per query stage (normalizer, seeder, ranker, frontier, projector, synthesis)
Tests pinpoint the failing stage — not just "end-to-end answer is wrong"
Some tests (graph closure, trace determinism) can be written against the current monolithic modules
FPF grounding:
B.3 (Trust & Assurance Calculus, Stable): Each stage needs its own evidence of correctness anchored at the stage level. End-to-end tests create a single assurance layer where trust is opaque — when a test fails, you cannot attribute the failure to a specific stage.
A.15 (Role–Method–Work Alignment, Stable): Each stage's Work must be independently auditable. The current end-to-end-only tests violate A.15's principle that work evidence should be traceable to the specific role/method that produced it.
B.3.4 (Evidence Decay & Epistemic Debt, Stable): Tests are evidence artifacts with freshness constraints. When stage code changes, stage-local tests provide targeted evidence refresh rather than requiring full end-to-end reruns.
Note: Previous version cited E.19 (Pattern Quality Gates) and F.15 (SCR/RSCR Harness). E.19 governs admission/refresh of FPF spec patterns to the canonical corpus, not software testing. F.15 is the harness for the FPF unification process (Part F), not software test suites. Both were misapplied and have been replaced with B.3, A.15, and B.3.4.
Objective
Add stage-local contract tests for the compiler and query pipelines. Current test suite (6 files, 1,530 lines) covers full lifecycle end-to-end but cannot pinpoint which internal stage broke when an answer is wrong.
Compiler pipeline tests needed:
Query pipeline tests needed:
"Done" looks like: Each compiler/query stage has at least one focused contract test that fails precisely when that stage's promise breaks.
Kind
test coverage
Affected area
src/runtime/ (compiler & query engine)
Acceptance criteria
Agent-delegable?
partially — needs human review
Additional context
tests/fpf-spec-runtime.test.ts(464 lines),tests/lm-studio-synthesizer.test.ts(488 lines),tests/mcp-server.test.ts(297 lines),tests/docs-projection.test.ts(189 lines),tests/runtime-path-resolution.test.ts(69 lines),tests/server-config.test.ts(23 lines)FPF grounding:
Note: Previous version cited E.19 (Pattern Quality Gates) and F.15 (SCR/RSCR Harness). E.19 governs admission/refresh of FPF spec patterns to the canonical corpus, not software testing. F.15 is the harness for the FPF unification process (Part F), not software test suites. Both were misapplied and have been replaced with B.3, A.15, and B.3.4.
Measurable impact
ls tests/*.test.ts | wc -lgrep -Ec 'describe.*parser|describe.*graph|describe.*index|describe.*validation|describe.*snapshot' tests/*.test.tsgrep -Ec 'describe.*normalizer|describe.*seeder|describe.*ranker|describe.*frontier|describe.*projector' tests/*.test.tsgit diff HEAD -- tests/fpf-spec-runtime.test.ts tests/mcp-server.test.ts | wc -l(should be 0)