test: add stage-local contract tests for compiler and query pipelines#32
Conversation
Original prompt from Stanislau
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
📝 WalkthroughWalkthroughAdds two stage-scoped contract test suites: one for the compiler pipeline validating parser, graph, index, validation, and snapshot invariants; and one for the query pipeline validating normalizer, seeder, ranker, frontier, projection, and synthesis isolation behaviors. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces stage-local contract tests for the compiler and query pipelines in tests/compiler-contracts.test.ts and tests/query-contracts.test.ts. The reviewer provided feedback focused on improving test robustness and determinism, specifically recommending the use of relative paths instead of process.cwd() and fixed timestamps for snapshots to ensure reproducible results. Additionally, the feedback suggests replacing magic numbers with constants and adding explicit assertions to conditional tests to prevent silent passes and ensure meaningful coverage.
| if (cachedOutput) { | ||
| return cachedOutput; | ||
| } | ||
| const sourcePath = resolve(process.cwd(), 'FPF-spec.md'); |
There was a problem hiding this comment.
Using process.cwd() to resolve the source file path is fragile as it depends on the directory from which the test runner is executed. This can lead to failures if tests are run from a different directory level (e.g., in a monorepo or a sub-package). It is safer to resolve paths relative to the test file's location.
| cachedOutput = compileFpfSource({ | ||
| sourcePath, | ||
| sourceHash, | ||
| builtAt: new Date().toISOString(), |
There was a problem hiding this comment.
Using new Date().toISOString() for builtAt introduces non-determinism into the cached compiler output. For contract tests, especially those verifying snapshot consistency and determinism, using a fixed timestamp is preferred to ensure reproducible results across different runs.
| builtAt: new Date().toISOString(), | |
| builtAt: '2025-01-01T00:00:00.000Z', |
| expect(validation.parsedSections).toBeGreaterThan(100); | ||
| expect(validation.parsedPatterns).toBeGreaterThan(50); | ||
| expect(validation.parsedRoutes).toBeGreaterThan(0); | ||
| expect(validation.parsedLexiconEntries).toBeGreaterThan(5); |
There was a problem hiding this comment.
These assertions use arbitrary magic numbers as thresholds. While they serve as a baseline for the current FPF-spec.md, they make the tests brittle and tightly coupled to the specific content of the source document. If the document is refactored or reduced in size, these tests may fail even if the compiler logic is correct. Consider defining these thresholds as constants.
| if (cachedSnapshot) { | ||
| return cachedSnapshot; | ||
| } | ||
| const sourcePath = resolve(process.cwd(), 'FPF-spec.md'); |
| const output = compileFpfSource({ | ||
| sourcePath, | ||
| sourceHash, | ||
| builtAt: new Date().toISOString(), |
| if (firstRoute) { | ||
| const trace = engine(snapshot).trace(`Tell me about the ${firstRoute} route`); | ||
| expect(trace.detected.routeNames).toContain(firstRoute); | ||
| } |
There was a problem hiding this comment.
This test is conditional on firstRoute being defined. If the parser fails to find any routes, the test will pass silently without actually asserting the behavior. It should explicitly check that routes exist before proceeding.
| if (firstRoute) { | |
| const trace = engine(snapshot).trace(`Tell me about the ${firstRoute} route`); | |
| expect(trace.detected.routeNames).toContain(firstRoute); | |
| } | |
| expect(routeNames.length).toBeGreaterThan(0); | |
| const firstRoute = routeNames[0]!; | |
| const trace = engine(snapshot).trace('Tell me about the ' + firstRoute + ' route'); | |
| expect(trace.detected.routeNames).toContain(firstRoute); |
| if (statusKeys.length > 0) { | ||
| const firstStatus = statusKeys[0]!; | ||
| const trace = engine(snapshot).trace(`Show me ${firstStatus} patterns`); | ||
| expect(trace.detected.statusTerms).toContain(firstStatus); | ||
| } |
There was a problem hiding this comment.
Similar to the route detection test, this test passes silently if statusKeys is empty. Asserting that status terms are present in the index ensures that the test coverage is meaningful and that the parser is working as expected.
expect(statusKeys.length).toBeGreaterThan(0);
const firstStatus = statusKeys[0]!;
const trace = engine(snapshot).trace('Show me ' + firstStatus + ' patterns');
expect(trace.detected.statusTerms).toContain(firstStatus);| if (trace.retrievalHops.length > 0) { | ||
| const firstHop = trace.retrievalHops[0]!; | ||
| expect(firstHop.iteration).toBe(1); | ||
| expect(firstHop.reason.length).toBeGreaterThan(0); | ||
| expect(typeof firstHop.sufficientAfter).toBe('boolean'); | ||
| } |
There was a problem hiding this comment.
The assertions for hop metadata are wrapped in an if block. If no hops are recorded, the assertions are skipped, and the test passes. This could mask a bug where the engine fails to perform any retrieval hops for a complex query. You should assert that at least one hop occurred.
expect(trace.retrievalHops.length).toBeGreaterThan(0);
const firstHop = trace.retrievalHops[0]!;
expect(firstHop.iteration).toBe(1);
expect(firstHop.reason.length).toBeGreaterThan(0);
expect(typeof firstHop.sufficientAfter).toBe('boolean');| if (trace.sufficient) { | ||
| expect(trace.selectedAnchorIds.length).toBeGreaterThan(0); | ||
| } |
There was a problem hiding this comment.
This test only verifies anchor selection if the trace is already marked as sufficient. For a known pattern query like 'What is A.1.1?', the engine should always reach sufficiency. Asserting sufficiency directly makes the test more robust.
expect(trace.sufficient).toBe(true);
expect(trace.selectedAnchorIds.length).toBeGreaterThan(0);There was a problem hiding this comment.
Pull request overview
Adds stage-local contract tests for the runtime compiler and query engine so failures localize to a specific pipeline stage rather than surfacing only as end-to-end mismatches.
Changes:
- Introduces compiler stage contract tests (parser, graph closure, indexes, validation, determinism) against
FPF-spec.md. - Introduces query stage contract tests (normalization, seeding, ranking, frontier bounds, projection, synthesis isolation, trace determinism) against a compiled snapshot.
- Keeps existing end-to-end/integration tests unchanged while expanding stage-level evidence coverage.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tests/compiler-contracts.test.ts | Adds stage-local compiler contract tests validating parse/graph/index/validation/determinism invariants. |
| tests/query-contracts.test.ts | Adds stage-local query contract tests validating normalization, candidate seeding/ranking, frontier bounds, projection behavior, synthesis isolation, and trace determinism. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const routeNames = Object.values(snapshot.routeGraph.nodes).map((r) => r.name); | ||
| const firstRoute = routeNames[0]; | ||
|
|
||
| if (firstRoute) { | ||
| const trace = engine(snapshot).trace(`Tell me about the ${firstRoute} route`); | ||
| expect(trace.detected.routeNames).toContain(firstRoute); | ||
| } |
There was a problem hiding this comment.
This test can silently pass if routeGraph.nodes is empty (or contains routes without names) because it guards on if (firstRoute) and otherwise performs no assertions. Since the contract is specifically about route-name detection, assert a non-empty route set (e.g., expect(routeNames.length).toBeGreaterThan(0)) and then run the detection expectations unconditionally so regressions can’t be skipped.
| const routeNames = Object.values(snapshot.routeGraph.nodes).map((r) => r.name); | |
| const firstRoute = routeNames[0]; | |
| if (firstRoute) { | |
| const trace = engine(snapshot).trace(`Tell me about the ${firstRoute} route`); | |
| expect(trace.detected.routeNames).toContain(firstRoute); | |
| } | |
| const routeNames = Object.values(snapshot.routeGraph.nodes) | |
| .map((r) => r.name) | |
| .filter((name): name is string => typeof name === 'string' && name.length > 0); | |
| expect(routeNames.length).toBeGreaterThan(0); | |
| const firstRoute = routeNames[0]!; | |
| const trace = engine(snapshot).trace(`Tell me about the ${firstRoute} route`); | |
| expect(trace.detected.routeNames).toContain(firstRoute); |
| if (statusKeys.length > 0) { | ||
| const firstStatus = statusKeys[0]!; | ||
| const trace = engine(snapshot).trace(`Show me ${firstStatus} patterns`); | ||
| expect(trace.detected.statusTerms).toContain(firstStatus); | ||
| } |
There was a problem hiding this comment.
Similar to the route-name test: this can become a vacuous pass if the status index ever becomes empty because the assertions are skipped under if (statusKeys.length > 0). Consider asserting statusKeys.length > 0 (or using a known status term like stable) so the test reliably fails when status-term detection breaks.
| if (statusKeys.length > 0) { | |
| const firstStatus = statusKeys[0]!; | |
| const trace = engine(snapshot).trace(`Show me ${firstStatus} patterns`); | |
| expect(trace.detected.statusTerms).toContain(firstStatus); | |
| } | |
| expect(statusKeys.length).toBeGreaterThan(0); | |
| const firstStatus = statusKeys[0]!; | |
| const trace = engine(snapshot).trace(`Show me ${firstStatus} patterns`); | |
| expect(trace.detected.statusTerms).toContain(firstStatus); |
| const routeCandidates = trace.candidateScores.filter((c) => c.kind === 'route'); | ||
| expect(routeCandidates.length).toBeGreaterThan(0); |
There was a problem hiding this comment.
This is intended to validate seeding route expansion candidates, but it inspects trace.candidateScores, which is a ranked + truncated view (slice(0, 16)). A route can be correctly seeded yet not appear in the top 16, causing a false failure. Prefer asserting against trace.frontierCandidates (e.g., presence of origin === 'route_expansion') which represents the seed set without truncation.
| const routeCandidates = trace.candidateScores.filter((c) => c.kind === 'route'); | |
| expect(routeCandidates.length).toBeGreaterThan(0); | |
| const routeFrontier = trace.frontierCandidates.filter( | |
| (c) => c.origin === 'route_expansion', | |
| ); | |
| expect(routeFrontier.length).toBeGreaterThan(0); |
| // threshold (100) and total count stays low relative to the full catalog. | ||
| const highScoring = trace.candidateScores.filter((c) => c.score >= 100); | ||
| expect(highScoring.length).toBe(0); | ||
| expect(trace.candidateScores.length).toBeLessThan( |
There was a problem hiding this comment.
trace.candidateScores is always capped to the top 16 candidates (see QueryEngine.trace()), so the assertion about “total count stays low relative to the full catalog” is not actually testing what it claims—it will almost always be true regardless of how many candidates were generated. To make this contract meaningful, assert against an un-truncated signal (e.g., trace.frontierCandidates.length) or extend TraceResult to include the full candidate count.
| expect(trace.candidateScores.length).toBeLessThan( | |
| expect(trace.frontierCandidates.length).toBeLessThan( |
| it('status index keys partition compiled nodes without overlap', async () => { | ||
| const { snapshot } = await getCompilerOutput(); | ||
| const statusIndex = snapshot.indexes.statusIndex; | ||
|
|
||
| expect(Object.keys(statusIndex).length).toBeGreaterThan(0); | ||
|
|
||
| for (const [_status, nodeIds] of Object.entries(statusIndex)) { | ||
| for (const nodeId of nodeIds) { | ||
| expect(snapshot.compiledNodes[nodeId]).toBeDefined(); | ||
| } | ||
| } | ||
| }); |
There was a problem hiding this comment.
The test name/comment claims the status index “partitions compiled nodes without overlap”, but the assertions only check that referenced node IDs exist. Either tighten the assertions to actually verify the partition/overlap property (likely for pattern nodes only), or rename the test to reflect what it currently validates (e.g., that status index entries resolve to compiled nodes).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c0595bfd6e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (statusKeys.length > 0) { | ||
| const firstStatus = statusKeys[0]!; | ||
| const trace = engine(snapshot).trace(`Show me ${firstStatus} patterns`); | ||
| expect(trace.detected.statusTerms).toContain(firstStatus); |
There was a problem hiding this comment.
Use a supported status token in normalizer contract test
This test picks statusKeys[0] from snapshot.indexes.statusIndex and expects it in trace.detected.statusTerms, but QueryEngine.trace() only emits the fixed tokens draft, stable, stub, and transitional. The status index currently also contains values like new, full text, and transitional stub, so a harmless change in insertion order or source ordering can make this test fail even when normalizer behavior is unchanged, creating a flaky CI gate.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid point — the normalizer only detects the fixed tokens draft, stable, stub, transitional via substring matching, so picking an arbitrary statusKeys[0] could yield a key like new or full text that the normalizer never emits.
Fixed in 3239888: the test now finds a known token from the hardcoded list that exists in the status index, ensuring the test validates actual normalizer behavior.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/compiler-contracts.test.ts`:
- Around line 193-203: The test currently only checks that each nodeId in
statusIndex exists in snapshot.compiledNodes but doesn't ensure nodeIds are
unique across status keys; update the test (referencing statusIndex, snapshot,
compiledNodes) to assert there is no overlap by tracking seen node IDs (e.g., a
Set) as you iterate all Object.entries(statusIndex) and for each nodeId assert
it's not already present before adding it, and optionally assert that the total
number of unique node IDs equals the sum of all nodeIds lengths to catch
duplicates.
- Around line 220-236: The tests currently only run against live production data
(getCompilerOutput) so they won't detect regressions that stop reporting issues;
add a deterministic malformed fixture and a unit test that feeds it through the
same validation path and asserts specific findings. Create a new test case
(e.g., in the same suite 'Compiler / Validation stage') that builds a small
synthetic snapshot with intentionally missing required fields and an unresolved
reference, call the same validation logic used by getCompilerOutput (reuse the
functions that populate snapshot.validation or invoke the validator directly),
then assert exact expected counts on validation.missingRequiredFields and
validation.unresolvedReferences (or indexMapNodes if applicable) to ensure the
validator flags the known-bad inputs. Ensure the test references
getCompilerOutput/validator entrypoint and the validation fields
missingRequiredFields and unresolvedReferences/indexMapNodes so future
regressions fail deterministically.
- Around line 38-82: The tests under "Compiler / Parser stage" currently only
validate the canonical snapshot returned by getCompilerOutput(); add additional
test cases that recompile semantic-equivalent variants (whitespace-only,
reordered sections, and prose-only edits) and assert the same invariants
(parsedSections/parsedPatterns/parsedRoutes/parsedLexiconEntries counts,
compiledNodes IDs, pattern metadata for 'A.1.1', and valid anchors) hold for
each variant; implement helper functions to produce or load variant inputs, call
the same getCompilerOutput or a variant-taking compile function, and run the
existing assertions against each resulting snapshot to ensure parser resilience.
In `@tests/query-contracts.test.ts`:
- Around line 268-316: The tests for fallback synthesis check status, ids,
citations, and relations but don’t assert that the fallback answer and
constraints remain identical to the deterministic baseline; update the three
tests using engine(...) and LocalAnswerSynthesizer (the "returns deterministic
answer when synthesizer is unavailable", "falls back to deterministic answer
when synthesizer throws", and "does not alter deterministic IDs or citations
when synthesis fails") to also capture the deterministic baseline by calling
eng.query(...) (or engine(snapshot) once) and then assert that result.answer and
result.constraints equal the baseline's answer and constraints (for both the
unavailable and throwing synthesizer cases) so any regression that changes
answer content or constraints will fail.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bf33b511-7b48-4d5a-a9f4-7322e0c100c8
📒 Files selected for processing (2)
tests/compiler-contracts.test.tstests/query-contracts.test.ts
| describe('Compiler / Parser stage', () => { | ||
| it('parses a non-trivial number of sections, patterns, routes, and lexicon entries', async () => { | ||
| const { snapshot } = await getCompilerOutput(); | ||
| const { validation } = snapshot; | ||
|
|
||
| expect(validation.parsedSections).toBeGreaterThan(100); | ||
| expect(validation.parsedPatterns).toBeGreaterThan(50); | ||
| expect(validation.parsedRoutes).toBeGreaterThan(0); | ||
| expect(validation.parsedLexiconEntries).toBeGreaterThan(5); | ||
| }); | ||
|
|
||
| it('assigns IDs to all compiled nodes and none are empty strings', async () => { | ||
| const { snapshot } = await getCompilerOutput(); | ||
| const nodeIds = Object.keys(snapshot.compiledNodes); | ||
|
|
||
| expect(nodeIds.length).toBeGreaterThan(50); | ||
| for (const nodeId of nodeIds) { | ||
| expect(nodeId.length).toBeGreaterThan(0); | ||
| } | ||
| }); | ||
|
|
||
| it('preserves pattern metadata fields (title, status, part)', async () => { | ||
| const { snapshot } = await getCompilerOutput(); | ||
| const pattern = snapshot.patternGraph.nodes['A.1.1']; | ||
|
|
||
| expect(pattern).toBeDefined(); | ||
| expect(pattern!.title.length).toBeGreaterThan(0); | ||
| expect(pattern!.status.length).toBeGreaterThan(0); | ||
| expect(pattern!.sectionIds.length).toBeGreaterThan(0); | ||
| }); | ||
|
|
||
| it('produces anchors with valid line ranges', async () => { | ||
| const { snapshot } = await getCompilerOutput(); | ||
| const anchors = Object.values(snapshot.anchorMap); | ||
|
|
||
| expect(anchors.length).toBeGreaterThan(50); | ||
| for (const anchor of anchors.slice(0, 20)) { | ||
| expect(anchor.lineStart).toBeGreaterThanOrEqual(0); | ||
| expect(anchor.lineEnd).toBeGreaterThan(anchor.lineStart); | ||
| } | ||
|
|
||
| const nonEmpty = anchors.filter((a) => a.text.length > 0); | ||
| expect(nonEmpty.length).toBeGreaterThan(anchors.length / 2); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Parser resilience is not actually under test here.
These assertions only inspect the canonical FPF-spec.md snapshot. They never recompile whitespace-only, reordered, or prose-only variants, so a regression on semantically equivalent source edits would still pass this whole block.
Suggested direction
+ it('preserves semantic IR under whitespace-only rewrites', async () => {
+ const sourcePath = resolve(process.cwd(), 'FPF-spec.md');
+ const sourceText = await readFile(sourcePath, 'utf8');
+ const rewritten = sourceText
+ .replace(/[ \t]+$/gm, '')
+ .replace(/\n{3,}/g, '\n\n');
+ const builtAt = '2025-01-01T00:00:00.000Z';
+
+ const base = compileFpfSource({
+ sourcePath,
+ sourceHash: createHash('sha256').update(sourceText).digest('hex'),
+ builtAt,
+ sourceText,
+ });
+ const variant = compileFpfSource({
+ sourcePath,
+ sourceHash: createHash('sha256').update(rewritten).digest('hex'),
+ builtAt,
+ sourceText: rewritten,
+ });
+
+ expect(Object.keys(variant.snapshot.patternGraph.nodes)).toEqual(
+ Object.keys(base.snapshot.patternGraph.nodes),
+ );
+ expect(Object.keys(variant.snapshot.routeGraph.nodes)).toEqual(
+ Object.keys(base.snapshot.routeGraph.nodes),
+ );
+ });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/compiler-contracts.test.ts` around lines 38 - 82, The tests under
"Compiler / Parser stage" currently only validate the canonical snapshot
returned by getCompilerOutput(); add additional test cases that recompile
semantic-equivalent variants (whitespace-only, reordered sections, and
prose-only edits) and assert the same invariants
(parsedSections/parsedPatterns/parsedRoutes/parsedLexiconEntries counts,
compiledNodes IDs, pattern metadata for 'A.1.1', and valid anchors) hold for
each variant; implement helper functions to produce or load variant inputs, call
the same getCompilerOutput or a variant-taking compile function, and run the
existing assertions against each resulting snapshot to ensure parser resilience.
| it('status index keys partition compiled nodes without overlap', async () => { | ||
| const { snapshot } = await getCompilerOutput(); | ||
| const statusIndex = snapshot.indexes.statusIndex; | ||
|
|
||
| expect(Object.keys(statusIndex).length).toBeGreaterThan(0); | ||
|
|
||
| for (const [_status, nodeIds] of Object.entries(statusIndex)) { | ||
| for (const nodeId of nodeIds) { | ||
| expect(snapshot.compiledNodes[nodeId]).toBeDefined(); | ||
| } | ||
| } |
There was a problem hiding this comment.
This does not verify the “without overlap” part of the contract.
The loop only checks that every indexed ID exists in compiledNodes. If the same node is emitted under multiple status keys, this test still passes.
Suggested fix
expect(Object.keys(statusIndex).length).toBeGreaterThan(0);
- for (const [_status, nodeIds] of Object.entries(statusIndex)) {
+ const seen = new Map<string, string>();
+ for (const [status, nodeIds] of Object.entries(statusIndex)) {
for (const nodeId of nodeIds) {
expect(snapshot.compiledNodes[nodeId]).toBeDefined();
+ expect(seen.has(nodeId)).toBe(false);
+ seen.set(nodeId, status);
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it('status index keys partition compiled nodes without overlap', async () => { | |
| const { snapshot } = await getCompilerOutput(); | |
| const statusIndex = snapshot.indexes.statusIndex; | |
| expect(Object.keys(statusIndex).length).toBeGreaterThan(0); | |
| for (const [_status, nodeIds] of Object.entries(statusIndex)) { | |
| for (const nodeId of nodeIds) { | |
| expect(snapshot.compiledNodes[nodeId]).toBeDefined(); | |
| } | |
| } | |
| it('status index keys partition compiled nodes without overlap', async () => { | |
| const { snapshot } = await getCompilerOutput(); | |
| const statusIndex = snapshot.indexes.statusIndex; | |
| expect(Object.keys(statusIndex).length).toBeGreaterThan(0); | |
| const seen = new Map<string, string>(); | |
| for (const [status, nodeIds] of Object.entries(statusIndex)) { | |
| for (const nodeId of nodeIds) { | |
| expect(snapshot.compiledNodes[nodeId]).toBeDefined(); | |
| expect(seen.has(nodeId)).toBe(false); | |
| seen.set(nodeId, status); | |
| } | |
| } | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/compiler-contracts.test.ts` around lines 193 - 203, The test currently
only checks that each nodeId in statusIndex exists in snapshot.compiledNodes but
doesn't ensure nodeIds are unique across status keys; update the test
(referencing statusIndex, snapshot, compiledNodes) to assert there is no overlap
by tracking seen node IDs (e.g., a Set) as you iterate all
Object.entries(statusIndex) and for each nodeId assert it's not already present
before adding it, and optionally assert that the total number of unique node IDs
equals the sum of all nodeIds lengths to catch duplicates.
There was a problem hiding this comment.
The test was renamed from "status index keys partition compiled nodes without overlap" to "status index entries resolve to existing compiled nodes" in 3239888, so the name now accurately describes what the assertions check (existence only).
The overlap assertion is intentionally omitted because the FPF status index legitimately assigns nodes to multiple statuses (e.g., a pattern can be both "stub" and "transitional"). Adding a uniqueness constraint would assert a property the data model doesn't guarantee.
There was a problem hiding this comment.
Skipped: comment is from another GitHub bot.
There was a problem hiding this comment.
The test was renamed from "status index keys partition compiled nodes without overlap" to "status index entries resolve to existing compiled nodes" in 3239888, so the name now accurately describes what the assertions check (existence only).
The overlap assertion is intentionally omitted because the FPF status index legitimately assigns nodes to multiple statuses (e.g., a pattern can be both "stub" and "transitional"). Adding a uniqueness constraint would assert a property the data model doesn't guarantee.
There was a problem hiding this comment.
Skipped: comment is from another GitHub bot.
| describe('Compiler / Validation stage', () => { | ||
| it('keeps missing required fields bounded', async () => { | ||
| const { snapshot } = await getCompilerOutput(); | ||
| const { validation } = snapshot; | ||
|
|
||
| // The FPF spec has a small number of patterns with incomplete metadata. | ||
| // The contract is that this stays bounded — a regression would spike it. | ||
| expect(validation.missingRequiredFields).toBeLessThan(25); | ||
| }); | ||
|
|
||
| it('counts a plausible number of index map nodes', async () => { | ||
| const { snapshot } = await getCompilerOutput(); | ||
| const { validation } = snapshot; | ||
|
|
||
| expect(validation.indexMapNodes).toBeGreaterThan(50); | ||
| expect(validation.indexMapNodes).toBe(Object.keys(snapshot.indexMap).length); | ||
| }); |
There was a problem hiding this comment.
The validator is never exercised with known-bad input.
Both checks run against the current production spec, so a regression that stops reporting missing fields or unresolved references can still go green here. This stage needs at least one deliberately malformed fixture and explicit findings assertions.
Suggested direction
+ it('reports specific findings for malformed input', () => {
+ const sourceText = `
+## A.1.1
+Status: draft
+
+References: Missing.Id
+`;
+ const output = compileFpfSource({
+ sourcePath: 'inline',
+ sourceHash: createHash('sha256').update(sourceText).digest('hex'),
+ builtAt: '2025-01-01T00:00:00.000Z',
+ sourceText,
+ });
+
+ expect(output.snapshot.validation.missingRequiredFields).toBeGreaterThan(0);
+ expect(output.snapshot.validation.unresolvedReferences.length).toBeGreaterThan(0);
+ });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/compiler-contracts.test.ts` around lines 220 - 236, The tests currently
only run against live production data (getCompilerOutput) so they won't detect
regressions that stop reporting issues; add a deterministic malformed fixture
and a unit test that feeds it through the same validation path and asserts
specific findings. Create a new test case (e.g., in the same suite 'Compiler /
Validation stage') that builds a small synthetic snapshot with intentionally
missing required fields and an unresolved reference, call the same validation
logic used by getCompilerOutput (reuse the functions that populate
snapshot.validation or invoke the validator directly), then assert exact
expected counts on validation.missingRequiredFields and
validation.unresolvedReferences (or indexMapNodes if applicable) to ensure the
validator flags the known-bad inputs. Ensure the test references
getCompilerOutput/validator entrypoint and the validation fields
missingRequiredFields and unresolvedReferences/indexMapNodes so future
regressions fail deterministically.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32398887bb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| sourceText: `${sourceText}\n<!-- change -->\n`, | ||
| }); | ||
|
|
||
| expect(first.snapshot.sourceHash).not.toBe(second.snapshot.sourceHash); |
There was a problem hiding this comment.
Assert source hash changes through runtime path
This check is tautological for the compiler contract: compileFpfSource stores the caller-provided sourceHash in the snapshot, so passing two already-different hashes guarantees this assertion passes even if source-text handling regresses. As written, the test validates the test’s own hash precomputation rather than any compiler behavior, so a real compiler regression could slip through with green CI.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — you're right, this assertion is tautological since compileFpfSource just stores the caller-provided sourceHash verbatim. The test validates the test's own hash precomputation, not any compiler behavior.
I'll fix this by verifying that the compiler actually produces structurally different output when the source text changes (e.g., different compiled node count or different searchable text), rather than just checking the pass-through hash.
ccd523a to
2fe8114
Compare
Code ReviewOverviewAdds two new test files — Strengths
Issues & nitsBrittle to spec-specific IDsexpect(trace.detected.ids).toContain('A.1.1');
expect(snapshot.patternGraph.nodes['A.1.1']).toBeDefined();Hardcoded IDs (
|
|
@venikman All three nits fixed in
Re: coverage gaps (heuristic seed rules, session cache persistence, IndexingView classification) — agreed, these belong in follow-up PRs. The current suite covers the six query stages and five compiler stages. Re: coupling with #31 — agreed on landing together. Stage names map 1:1 to #31's new files. |
venikman
left a comment
There was a problem hiding this comment.
One substantive issue before merge:
tests/query-contracts.test.ts is described as stage-local coverage, but the file still routes every assertion through new QueryEngine(...).trace() or .query(). That means a regression in ranking, frontier expansion, or projection can still fail the blocks labeled Normalizer or Seeder, so the PR does not yet provide true stage-local fault attribution for the query pipeline.
If the goal for #6 is real per-stage isolation, the query-side tests need a smaller seam than QueryEngine itself, for example by exposing and testing the individual query stages directly or by extracting stage helpers behind explicit contracts.
I validated the branch locally in an isolated snapshot and did not find a runtime regression beyond this review point.
|
Devin is archived and cannot be woken up. Please unarchive Devin if you want to continue using it. |
Closes #6 - compiler-contracts.test.ts: 5 describe blocks covering Parser resilience, Graph closure, Index round-trip, Validation coverage, and Snapshot determinism - query-contracts.test.ts: 7 describe blocks covering Normalizer, Seeder, Ranker, Frontier expansion, Projection stability, Synthesis isolation, and cross-cutting Trace determinism Each test targets a specific pipeline stage promise so that a failure pinpoints the broken stage rather than surfacing as a generic 'end-to-end answer is wrong.' Existing end-to-end tests remain untouched. Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
- Add total>0 guard before division in route resolution test (Devin Review) - Use hardcoded normalizer tokens for status term test (Codex) - Use frontierCandidates for route seeder and nonsense tests (Copilot) - Rename status index test to match actual assertions (CodeRabbit) - Fix hop metadata test to handle already-sufficient grounding - Use real FPF IDs (A.1.1, A.15, B.3) for hop metadata query Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
…ec-ID fixture comments - Replace 'xyzzy plugh' with '__FPFTEST_NONSENSE_999__' to avoid false failures if spec ever mentions those words - Strengthen Z.99 structural assertion: verify parsedSections grows (not just 'something grew'), with comment explaining why Z.99 appears as a section not a pattern node - Add canonical fixture ID comments to both test files explaining that A.1.1/A.15/B.3 are stable spec anchors and where to update if renamed Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
Address venikman's review: replace QueryEngine.trace()/query() with direct imports of normalizeQuery, seedCandidates, rankCandidates, expandGrounding, buildPatternAnswer, synthesizeAnswer, confidenceFromTrace, and gapsFromTrace. Each describe block now targets its stage function in isolation so a regression in one stage cannot masquerade as a failure in another. Co-Authored-By: Stanislau <nedbailov375426@gmail.com>
195ced1 to
13f7af7
Compare
|
Addressed in Before: Every test imported After: Each describe block imports and calls its stage function directly:
The
|
What
Add stage-local contract tests for both the compiler and query pipelines, so that a test failure pinpoints the specific broken stage rather than surfacing as a generic "end-to-end answer is wrong."
Closes #6
Why
Per FPF grounding (B.3 Trust & Assurance, A.15 Role–Method–Work Alignment, B.3.4 Evidence Decay): each pipeline stage needs independent evidence of correctness. The existing 6 test files (1,530 lines) are all end-to-end/integration tests — none isolate individual stages. Stage-local contract tests provide targeted evidence refresh rather than requiring full end-to-end reruns.
Type
feat— new capabilityfix— bug fixrefactor— code restructuringdocs— documentation onlychore— maintenance (deps, CI, cleanup)Changes
tests/compiler-contracts.test.ts— 5 describe blocks (~20 test cases):tests/query-contracts.test.ts— 7 describe blocks (~30 test cases), refactored to call stage modules directly (noQueryEngineimport):normalizeQuery()fromquery-normalizer.tsseedCandidates()fromcandidate-seeder.tsrankCandidates()fromcandidate-ranker.tsexpandGrounding()fromfrontier-expander.tsbuildPatternAnswer(),confidenceFromTrace(),gapsFromTrace()fromanswer-projector.tssynthesizeAnswer()fromsynthesis-adapter.tsQueryEngineEach describe block now provides true stage-local fault attribution: a regression in ranking cannot fail a test labeled "Normalizer."
Existing 6 end-to-end test files are untouched
Note: Rebased on main after PR #31 merge. Stage modules are now available as direct imports.
Validation
bun run lintpasses locallybun run checkpasses locallybun run testpasses locally (CI green)bun run buildsucceeds (if runtime/server code touched)bun run docs:buildsucceeds (if docs touched)Boundary check
FPF-spec.mdonly — no additional corpora addedsrc/mcp/tool-contracts.tsAgent metadata
Requested by: @venikman