feat(synthesize): kb.synthesize answer-mode retrieval over the review-gated KB#238
Conversation
…#222) Add deterministic, citation-gated synthesis over approved claims with an explicit gaps block and synthesis_confidence. Wired across CLI, MCP, and JSONL; capabilities lists kb.synthesize. Tests cover citation traceability and the no-coverage gaps path.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds Changeskb.synthesize answer-mode retrieval
Sequence DiagramsequenceDiagram
participant CLI as vouch synthesize
participant MCP as kb_synthesize
participant JSONL as kb.synthesize handler
participant synthesize as synthesize module
participant KBStore
CLI->>synthesize: synthesize(store, query, depth, max_chars)
MCP->>synthesize: synthesize(_store(), query, depth, max_chars)
JSONL->>synthesize: synthesize(store, query, depth, max_chars, llm)
synthesize->>KBStore: context_pack(query, depth)
KBStore-->>synthesize: ranked claim items
synthesize->>KBStore: load claim artifacts
KBStore-->>synthesize: claim objects with lifecycle status
synthesize-->>CLI: query, answer with citations, claims, gaps, synthesis_confidence
synthesize-->>MCP: query, answer with citations, claims, gaps, synthesis_confidence
synthesize-->>JSONL: query, answer with citations, claims, gaps, synthesis_confidence
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@PR_BODY.md`:
- Line 65: Add a language tag to the fenced code block that contains the ruff
check command output. Change the opening fence from an unlabeled ``` to ```bash
to properly indicate the code block contains bash shell commands, which
satisfies the markdownlint MD040 requirement for labeled code blocks.
In `@src/vouch/synthesize.py`:
- Around line 128-132: The gaps computation currently only checks coverage
against cited_claims, which may exclude approved claims that were dropped due to
truncation. Modify the condition in the gaps list comprehension where _covers is
called to also check coverage against the approved claims in addition to
cited_claims, so that terms are not marked as gaps if they are covered by any
discovered approved claim, regardless of whether that claim was included in the
final cited_claims output.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 492b7f02-ccad-4554-88cf-e152ff2d3ba3
📒 Files selected for processing (8)
CHANGELOG.mdPR_BODY.mdsrc/vouch/capabilities.pysrc/vouch/cli.pysrc/vouch/jsonl_server.pysrc/vouch/server.pysrc/vouch/synthesize.pytests/test_synthesize.py
|
|
||
| Verification gate (fresh venv, editable install of this worktree): | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced verification block.
Line 65 uses an unlabeled fenced block; markdownlint MD040 flags this.
📝 Suggested fix
-```
+```bash
$ ./.venv/bin/ruff check src tests
All checks passed!
...
-```
+```🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 65-65: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@PR_BODY.md` at line 65, Add a language tag to the fenced code block that
contains the ruff check command output. Change the opening fence from an
unlabeled ``` to ```bash to properly indicate the code block contains bash shell
commands, which satisfies the markdownlint MD040 requirement for labeled code
blocks.
Source: Linters/SAST tools
| gaps = [ | ||
| term | ||
| for term in _salient_terms(query) | ||
| if not (cited_claims and _covers(term, *cited_claims)) | ||
| ] |
There was a problem hiding this comment.
Compute gaps from discovered approved claims, not only emitted sentences.
Line 128–132 currently checks coverage only against cited_claims (claims that fit max_chars). If truncation drops a covering claim, gaps incorrectly reports missing knowledge even though an approved claim was found.
💡 Suggested fix
- cited_claims = [c for c in approved if c.id in set(cited)]
gaps = [
term
for term in _salient_terms(query)
- if not (cited_claims and _covers(term, *cited_claims))
+ if not _covers(term, *approved)
]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/synthesize.py` around lines 128 - 132, The gaps computation
currently only checks coverage against cited_claims, which may exclude approved
claims that were dropped due to truncation. Modify the condition in the gaps
list comprehension where _covers is called to also check coverage against the
approved claims in addition to cited_claims, so that terms are not marked as
gaps if they are covered by any discovered approved claim, regardless of whether
that claim was included in the final cited_claims output.
The test-branch merge into feat/222-synthesize left jsonl_server._load_cfg referencing yaml and sessions.session_end referencing salience without their imports, and an unsorted cli import block — ruff F821/I001 failed the CI lint step. Add import yaml to jsonl_server, add salience to the sessions package import, and sort the cli imports. ruff, mypy, and pytest all pass.
feat(synthesize):
kb.synthesizeanswer-mode retrieval over the review-gated KBWhat changed
Adds
kb.synthesize— an answer-mode counterpart tokb.context. Wherekb.contextreturns a ranked list of relevant items,kb.synthesizeanswers a query in prose, but strictly from approved (durable) claims,
with an inline
[claim_id]citation behind every sentence.New surface, wired across all three transports that the capabilities test
keeps in sync:
src/vouch/synthesize.py—synthesize(store, *, query, depth=3, max_chars=4000, llm=False). Walksbuild_context_pack(... limit=depth),keeps only
claimitems that resolve to a durable claim viastore.get_claim, and composes a deterministic answer: one short,single-clause sentence per claim, each carrying at least one
[claim_id]citation. No sentence is emitted that isn't traceable to a claim id.
max_charstruncates by dropping trailing claims (never by cutting acitation). Returns
{"query", "answer", "claims", "gaps", "_meta": {"synthesis_confidence"}}.gapslists the query's salient terms for which no approved claim wasfound (and is the whole answer when nothing matched).
synthesis_confidenceis
highwhen every cited claim isstable,mediumwhen any isworking/actionable,lowwhen any iscontested.llm=Trueraises(reserved for an opt-in generative backend; deterministic synthesis is the
v1 default).
src/vouch/capabilities.py—kb.synthesizeappended toMETHODS.src/vouch/jsonl_server.py—_h_synthesizehandler +HANDLERSentry.src/vouch/server.py—@mcp.tool() kb_synthesize(query, depth=3, max_chars=4000).src/vouch/cli.py—vouch synthesize "<query>" [--depth N] [--max-chars N].CHANGELOG.md—### Addedbullet under## [Unreleased].Why / root cause
kb.contextis a retrieval primitive: it ranks and budgets items but leavesanswer composition (and the discipline of only using approved knowledge) to
the caller. There was no first-class way to ask the KB a question and get a
prose answer whose every clause is provably backed by a reviewed claim, with
the uncovered parts of the question surfaced rather than silently dropped.
kb.synthesizefills that gap deterministically — citation-gated byconstruction, so it cannot fabricate an unbacked sentence — and grades its own
confidence from the lifecycle status of the claims it actually cited.
Test plan
tests/test_synthesize.pycovers:authclaims → non-empty answer citing all 3 ids by[id],confidence
high.answer == "",claims == [],gapspopulated with the query's salient terms.
[id]citation whose id is inclaimsand resolves viastore.get_claim.max_charsdrops trailing claims without cutting a citation(citation count == cited-claim count).
working→ medium,contested→ low).llm=Trueraises the reserved-backendValueError.kb.synthesizeis incapabilities().methodsand in the JSONLHANDLERS,and is callable via
handle_requestend-to-end.Verification gate (fresh venv, editable install of this worktree):
(The 6 skips are pre-existing numpy/embedding-optional tests, unrelated to this
change.)
Closes #222
Summary by CodeRabbit
New Features
vouch synthesizeCLI command and corresponding API/MCP interfaces.Documentation