Skip to content

feat(synthesize): kb.synthesize answer-mode retrieval over the review-gated KB#238

Merged
plind-junior merged 3 commits into
vouchdev:testfrom
dripsmvcp:feat/222-synthesize
Jun 17, 2026
Merged

feat(synthesize): kb.synthesize answer-mode retrieval over the review-gated KB#238
plind-junior merged 3 commits into
vouchdev:testfrom
dripsmvcp:feat/222-synthesize

Conversation

@dripsmvcp

@dripsmvcp dripsmvcp commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

feat(synthesize): kb.synthesize answer-mode retrieval over the review-gated KB

What changed

Adds kb.synthesize — an answer-mode counterpart to kb.context. Where
kb.context returns a ranked list of relevant items, kb.synthesize
answers a query in prose, but strictly from approved (durable) claims,
with an inline [claim_id] citation behind every sentence.

New surface, wired across all three transports that the capabilities test
keeps in sync:

  • src/vouch/synthesize.pysynthesize(store, *, query, depth=3, max_chars=4000, llm=False). Walks build_context_pack(... limit=depth),
    keeps only claim items that resolve to a durable claim via
    store.get_claim, and composes a deterministic answer: one short,
    single-clause sentence per claim, each carrying at least one [claim_id]
    citation. No sentence is emitted that isn't traceable to a claim id.
    max_chars truncates by dropping trailing claims (never by cutting a
    citation). Returns
    {"query", "answer", "claims", "gaps", "_meta": {"synthesis_confidence"}}.
    gaps lists the query's salient terms for which no approved claim was
    found (and is the whole answer when nothing matched). synthesis_confidence
    is high when every cited claim is stable, medium when any is
    working/actionable, low when any is contested. llm=True raises
    (reserved for an opt-in generative backend; deterministic synthesis is the
    v1 default).
  • src/vouch/capabilities.pykb.synthesize appended to METHODS.
  • src/vouch/jsonl_server.py_h_synthesize handler + HANDLERS entry.
  • src/vouch/server.py@mcp.tool() kb_synthesize(query, depth=3, max_chars=4000).
  • src/vouch/cli.pyvouch synthesize "<query>" [--depth N] [--max-chars N].
  • CHANGELOG.md### Added bullet under ## [Unreleased].

Why / root cause

kb.context is a retrieval primitive: it ranks and budgets items but leaves
answer composition (and the discipline of only using approved knowledge) to
the caller. There was no first-class way to ask the KB a question and get a
prose answer whose every clause is provably backed by a reviewed claim, with
the uncovered parts of the question surfaced rather than silently dropped.
kb.synthesize fills that gap deterministically — citation-gated by
construction, so it cannot fabricate an unbacked sentence — and grades its own
confidence from the lifecycle status of the claims it actually cited.

Test plan

tests/test_synthesize.py covers:

  • 3 approved auth claims → non-empty answer citing all 3 ids by [id],
    confidence high.
  • A query the KB doesn't cover → answer == "", claims == [], gaps
    populated with the query's salient terms.
  • Fuzz/traceability: every sentence in a non-empty answer carries at least one
    [id] citation whose id is in claims and resolves via store.get_claim.
  • max_chars drops trailing claims without cutting a citation
    (citation count == cited-claim count).
  • Confidence reflects claim status (working → medium, contested → low).
  • llm=True raises the reserved-backend ValueError.
  • kb.synthesize is in capabilities().methods and in the JSONL HANDLERS,
    and is callable via handle_request end-to-end.

Verification gate (fresh venv, editable install of this worktree):

$ ./.venv/bin/ruff check src tests
All checks passed!

$ ./.venv/bin/mypy src
Success: no issues found in 30 source files

$ ./.venv/bin/python -m pytest -q
94 passed, 6 skipped in 0.81s

(The 6 skips are pre-existing numpy/embedding-optional tests, unrelated to this
change.)

Closes #222

Summary by CodeRabbit

  • New Features

    • Answer synthesis capability now available over approved knowledge base claims with inline citations.
    • Gap reporting identifies uncovered query topics; confidence grading reflects claim stability.
    • New vouch synthesize CLI command and corresponding API/MCP interfaces.
  • Documentation

    • Updated changelog and release documentation for synthesis feature.

…#222)

Add deterministic, citation-gated synthesis over approved claims with an explicit gaps block and synthesis_confidence. Wired across CLI, MCP, and JSONL; capabilities lists kb.synthesize. Tests cover citation traceability and the no-coverage gaps path.
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c50bcd2f-a8da-4388-a1cd-19dbce4cf5b9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds kb.synthesize, a new deterministic answer-mode retrieval feature. A new synthesize.py module builds citation-bearing prose exclusively from approved KB claims, reporting gaps for uncovered query terms and a confidence grade. The feature is wired into the capabilities list, JSONL server, MCP server, and CLI, and validated by a new test suite.

Changes

kb.synthesize answer-mode retrieval

Layer / File(s) Summary
Core synthesis implementation
src/vouch/synthesize.py
New module with _salient_terms, _clause, _covers, _confidence helpers and the top-level synthesize() function. Builds citation-bearing prose from approved claims, stops at max_chars, computes gaps from uncovered query terms, and assigns synthesis_confidence from claim lifecycle statuses. Raises ValueError for llm=True.
Transport wiring
src/vouch/capabilities.py, src/vouch/jsonl_server.py, src/vouch/server.py, src/vouch/cli.py
Adds "kb.synthesize" to METHODS; introduces _h_synthesize handler and HANDLERS["kb.synthesize"] in the JSONL server; adds kb_synthesize MCP tool in server.py; registers vouch synthesize Click command with --depth and --max-chars options in cli.py.
Tests and documentation
tests/test_synthesize.py, CHANGELOG.md, PR_BODY.md
Full test suite covering citation completeness, gap reporting, per-sentence citation traceability, max_chars truncation, confidence grading, llm=True error, capabilities registration, and the JSONL handler. Changelog and PR body document the feature, output shape, and transport surfaces.

Sequence Diagram

sequenceDiagram
  participant CLI as vouch synthesize
  participant MCP as kb_synthesize
  participant JSONL as kb.synthesize handler
  participant synthesize as synthesize module
  participant KBStore

  CLI->>synthesize: synthesize(store, query, depth, max_chars)
  MCP->>synthesize: synthesize(_store(), query, depth, max_chars)
  JSONL->>synthesize: synthesize(store, query, depth, max_chars, llm)
  synthesize->>KBStore: context_pack(query, depth)
  KBStore-->>synthesize: ranked claim items
  synthesize->>KBStore: load claim artifacts
  KBStore-->>synthesize: claim objects with lifecycle status
  synthesize-->>CLI: query, answer with citations, claims, gaps, synthesis_confidence
  synthesize-->>MCP: query, answer with citations, claims, gaps, synthesis_confidence
  synthesize-->>JSONL: query, answer with citations, claims, gaps, synthesis_confidence
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 A query arrives, the burrow hums with thought,
Only approved claims — the freshest that were caught!
Each sentence wears a [claim_id] badge with pride,
And gaps are named for topics that weren't inside.
No hallucination here, just carrots, crisp and true —
synthesis_confidence: high 🥕 for me and you!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing kb.synthesize for answer-mode retrieval over the review-gated KB, which is the core feature of this PR.
Linked Issues check ✅ Passed The PR implementation addresses all core objectives from #222: deterministic answer-mode synthesis, inline [claim_id] citations, gaps reporting, synthesis_confidence grading, integration across CLI/MCP/JSONL transports, and llm=True error handling.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #222: new synthesize module, integration points (CLI/MCP/JSONL/capabilities), tests, and documentation—no extraneous changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@PR_BODY.md`:
- Line 65: Add a language tag to the fenced code block that contains the ruff
check command output. Change the opening fence from an unlabeled ``` to ```bash
to properly indicate the code block contains bash shell commands, which
satisfies the markdownlint MD040 requirement for labeled code blocks.

In `@src/vouch/synthesize.py`:
- Around line 128-132: The gaps computation currently only checks coverage
against cited_claims, which may exclude approved claims that were dropped due to
truncation. Modify the condition in the gaps list comprehension where _covers is
called to also check coverage against the approved claims in addition to
cited_claims, so that terms are not marked as gaps if they are covered by any
discovered approved claim, regardless of whether that claim was included in the
final cited_claims output.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 492b7f02-ccad-4554-88cf-e152ff2d3ba3

📥 Commits

Reviewing files that changed from the base of the PR and between 3beb821 and 07d8376.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • PR_BODY.md
  • src/vouch/capabilities.py
  • src/vouch/cli.py
  • src/vouch/jsonl_server.py
  • src/vouch/server.py
  • src/vouch/synthesize.py
  • tests/test_synthesize.py

Comment thread PR_BODY.md

Verification gate (fresh venv, editable install of this worktree):

```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced verification block.

Line 65 uses an unlabeled fenced block; markdownlint MD040 flags this.

📝 Suggested fix
-```
+```bash
$ ./.venv/bin/ruff check src tests
All checks passed!
...
-```
+```
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 65-65: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@PR_BODY.md` at line 65, Add a language tag to the fenced code block that
contains the ruff check command output. Change the opening fence from an
unlabeled ``` to ```bash to properly indicate the code block contains bash shell
commands, which satisfies the markdownlint MD040 requirement for labeled code
blocks.

Source: Linters/SAST tools

Comment thread src/vouch/synthesize.py
Comment on lines +128 to +132
gaps = [
term
for term in _salient_terms(query)
if not (cited_claims and _covers(term, *cited_claims))
]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Compute gaps from discovered approved claims, not only emitted sentences.

Line 128–132 currently checks coverage only against cited_claims (claims that fit max_chars). If truncation drops a covering claim, gaps incorrectly reports missing knowledge even though an approved claim was found.

💡 Suggested fix
-    cited_claims = [c for c in approved if c.id in set(cited)]
     gaps = [
         term
         for term in _salient_terms(query)
-        if not (cited_claims and _covers(term, *cited_claims))
+        if not _covers(term, *approved)
     ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/synthesize.py` around lines 128 - 132, The gaps computation
currently only checks coverage against cited_claims, which may exclude approved
claims that were dropped due to truncation. Modify the condition in the gaps
list comprehension where _covers is called to also check coverage against the
approved claims in addition to cited_claims, so that terms are not marked as
gaps if they are covered by any discovered approved claim, regardless of whether
that claim was included in the final cited_claims output.

@plind-junior plind-junior changed the base branch from main to test June 17, 2026 04:48
plind-junior and others added 2 commits June 16, 2026 21:48
The test-branch merge into feat/222-synthesize left jsonl_server._load_cfg referencing yaml and sessions.session_end referencing salience without their imports, and an unsorted cli import block — ruff F821/I001 failed the CI lint step. Add import yaml to jsonl_server, add salience to the sessions package import, and sort the cli imports. ruff, mypy, and pytest all pass.
@plind-junior plind-junior merged commit a661bd2 into vouchdev:test Jun 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: kb.synthesize — answer-mode retrieval over the review-gated KB

2 participants