feat(α-8 Phase B): C_rag-ontology sector cell + pipeline_context typed-filter overlay#689
Merged
Merged
Conversation
…d-filter overlay α-8 implementation Phase B — wires the Phase A typed filter module into the production pipeline and adds the matrix runner cell for A/B measurement against C_rag-graph (post-α-7 baseline reference). Design memo: docs/design/v0.4-alpha-8-ontology-typed-filter.md §2.1 + §4. ## Sector cell registry (scripts/qvt_ablation_matrix.py) - New cell `C_rag-ontology` between C_rag-graph and C_rag-full — C_rag-graph + typed filter (R1-R5 evidence-of-absence preservation). - C_rag-graph now sets `JAMES_DISABLE_TYPED_FILTER=1` explicitly so it remains the pre-α-8 graph baseline (Δ = C_rag-ontology − C_rag-graph measures the typed filter contribution in isolation). - C_rag-full intentionally does NOT set the disable flag → typed filter ON in the full stack (production default once α-8 ships). - Label dict updated with new entry. ## Pipeline overlay (core/reasoning/pipeline_context.py) - Imports `apply_typed_filter` + `is_typed_filter_disabled` from α-8 Phase A module. - `build_unified_context` now prepends typed entity summary BEFORE the existing `build_graph_context_str` output (byte-additive — original block preserved verbatim). - Reads query from `loop_state["expanded_query"]`. - Defensive: exception in typed filter falls through with empty prefix (= byte-identical to pre-α-8), engine._log records the error. - When `JAMES_DISABLE_TYPED_FILTER=1` the prefix is skipped (= C_rag-graph byte-identical pre-α-8 path). ## Tests - `tests/test_pipeline_typed_filter_overlay.py` — 5 new integration tests: * Flag disabled → no typed prefix (byte-identical) * Flag enabled → typed prefix prepended (additive) * Temporal query emits `[Date]: (none found in graph for this query)` * Person query emits `[Person]: Alice` row * Order check: typed prefix BEFORE existing graph block - `tests/test_qvt_matrix_sector_cells.py` — updated: * Six → seven standard sector cells (α-8 adds C_rag-ontology) * New test: C_rag-graph sets JAMES_DISABLE_TYPED_FILTER=1 * New test: C_rag-ontology omits JAMES_DISABLE_TYPED_FILTER (filter ON) ## Verification - 137/137 tests pass (pipeline + ontology + sector + typed_filter) - ruff F-class clean ## Default behavior Default (no JAMES_DISABLE_TYPED_FILTER env var): typed filter ACTIVE. This means after merge: - Production /query/ path emits typed entity summary BEFORE graph context — additive change, original block preserved. - LLM sees R1-R5 evidence-of-absence rows for query-relevant types with no entities surfaced. - gemma4 grounding training should now detect "Date is empty → null query" the way it did pre-α-7 with 41-161 entity surface. ## Bench numbers — DEFERRED to Phase C Phase C (next PR) runs the actual measurement: N=3 baseline at M_M + 5-tier remeasurement. The Quality Delta Card values will land with Phase C closure, NOT Phase B. Per design memo §1.3 honest framing: - ⭐⭐ partial if graded Δ ≥ +0.030 - ⭐ operational if Δ ≤ noise - 5-axis cross-tab + per-question-type required (memo §4) The C_rag-ontology cell registration in this PR is the **measurement instrument enablement**, not the measurement itself. Phase C runs the matrix, Phase D writes closure. Quality delta: exempt (label: code — A/B oracle infrastructure; measurement deferred to Phase C per design memo §4) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
α-8 Phase B — wires the Phase A typed filter module into the production pipeline and adds the matrix runner cell for A/B measurement.
Design memo: `docs/design/v0.4-alpha-8-ontology-typed-filter.md` §2.1 + §4.
Changes
Default behavior post-merge
`JAMES_DISABLE_TYPED_FILTER` unset → typed filter ACTIVE in /query/:
[ENTITIES BY TYPE]BEFORE the existing graph context blockDefensive: any exception in typed filter falls through with empty prefix → byte-identical to pre-α-8. Engine logs the error.
Verification
```
$ python -m pytest tests/test_pipeline_typed_filter_overlay.py
5 passed in 0.05s
$ python -m pytest tests/ -k "pipeline or graph_typed_filter or ontology or sector or qvt_matrix"
137 passed, 3350 deselected
$ python -m ruff check core/reasoning/pipeline_context.py scripts/qvt_ablation_matrix.py tests/test_pipeline_typed_filter_overlay.py tests/test_qvt_matrix_sector_cells.py
All checks passed!
```
Bench numbers — DEFERRED to Phase C
Phase B is the measurement instrument enablement, not the measurement. Phase C (next PR) runs N=3 baseline at M_M + 5-tier remeasurement. Quality Delta Card values land with Phase C closure.
Per design memo §1.3 honest framing tier:
Out of scope (Phase C-D)
`Quality delta: exempt (label: code — A/B oracle infrastructure; measurement deferred to Phase C per design memo §4)`
🤖 Generated with Claude Code