-
Notifications
You must be signed in to change notification settings - Fork 26
feat(synthesize): kb.synthesize answer-mode retrieval over the review-gated KB #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| # feat(synthesize): `kb.synthesize` answer-mode retrieval over the review-gated KB | ||
|
|
||
| ## What changed | ||
|
|
||
| Adds `kb.synthesize` — an answer-mode counterpart to `kb.context`. Where | ||
| `kb.context` returns a *ranked list* of relevant items, `kb.synthesize` | ||
| answers a query in prose, but strictly from **approved (durable) claims**, | ||
| with an inline `[claim_id]` citation behind every sentence. | ||
|
|
||
| New surface, wired across all three transports that the capabilities test | ||
| keeps in sync: | ||
|
|
||
| - `src/vouch/synthesize.py` — `synthesize(store, *, query, depth=3, | ||
| max_chars=4000, llm=False)`. Walks `build_context_pack(... limit=depth)`, | ||
| keeps only `claim` items that resolve to a durable claim via | ||
| `store.get_claim`, and composes a deterministic answer: one short, | ||
| single-clause sentence per claim, each carrying at least one `[claim_id]` | ||
| citation. No sentence is emitted that isn't traceable to a claim id. | ||
| `max_chars` truncates by dropping trailing claims (never by cutting a | ||
| citation). Returns | ||
| `{"query", "answer", "claims", "gaps", "_meta": {"synthesis_confidence"}}`. | ||
| `gaps` lists the query's salient terms for which no approved claim was | ||
| found (and is the whole answer when nothing matched). `synthesis_confidence` | ||
| is `high` when every cited claim is `stable`, `medium` when any is | ||
| `working`/`actionable`, `low` when any is `contested`. `llm=True` raises | ||
| (reserved for an opt-in generative backend; deterministic synthesis is the | ||
| v1 default). | ||
| - `src/vouch/capabilities.py` — `kb.synthesize` appended to `METHODS`. | ||
| - `src/vouch/jsonl_server.py` — `_h_synthesize` handler + `HANDLERS` entry. | ||
| - `src/vouch/server.py` — `@mcp.tool() kb_synthesize(query, depth=3, | ||
| max_chars=4000)`. | ||
| - `src/vouch/cli.py` — `vouch synthesize "<query>" [--depth N] [--max-chars N]`. | ||
| - `CHANGELOG.md` — `### Added` bullet under `## [Unreleased]`. | ||
|
|
||
| ## Why / root cause | ||
|
|
||
| `kb.context` is a retrieval primitive: it ranks and budgets items but leaves | ||
| answer composition (and the discipline of *only* using approved knowledge) to | ||
| the caller. There was no first-class way to ask the KB a question and get a | ||
| prose answer whose every clause is provably backed by a reviewed claim, with | ||
| the uncovered parts of the question surfaced rather than silently dropped. | ||
| `kb.synthesize` fills that gap deterministically — citation-gated by | ||
| construction, so it cannot fabricate an unbacked sentence — and grades its own | ||
| confidence from the lifecycle status of the claims it actually cited. | ||
|
|
||
| ## Test plan | ||
|
|
||
| `tests/test_synthesize.py` covers: | ||
|
|
||
| - 3 approved `auth` claims → non-empty answer citing all 3 ids by `[id]`, | ||
| confidence `high`. | ||
| - A query the KB doesn't cover → `answer == ""`, `claims == []`, `gaps` | ||
| populated with the query's salient terms. | ||
| - Fuzz/traceability: every sentence in a non-empty answer carries at least one | ||
| `[id]` citation whose id is in `claims` and resolves via `store.get_claim`. | ||
| - `max_chars` drops trailing claims without cutting a citation | ||
| (citation count == cited-claim count). | ||
| - Confidence reflects claim status (`working` → medium, `contested` → low). | ||
| - `llm=True` raises the reserved-backend `ValueError`. | ||
| - `kb.synthesize` is in `capabilities().methods` and in the JSONL `HANDLERS`, | ||
| and is callable via `handle_request` end-to-end. | ||
|
|
||
| Verification gate (fresh venv, editable install of this worktree): | ||
|
|
||
| ``` | ||
| $ ./.venv/bin/ruff check src tests | ||
| All checks passed! | ||
|
|
||
| $ ./.venv/bin/mypy src | ||
| Success: no issues found in 30 source files | ||
|
|
||
| $ ./.venv/bin/python -m pytest -q | ||
| 94 passed, 6 skipped in 0.81s | ||
| ``` | ||
|
|
||
| (The 6 skips are pre-existing numpy/embedding-optional tests, unrelated to this | ||
| change.) | ||
|
|
||
| Closes #222 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| """Answer-mode synthesis over the review-gated KB. | ||
|
|
||
| `kb.context` returns a *ranked list* of relevant items; `kb.synthesize` | ||
| answers a query in prose, but only from APPROVED (durable) claims, with an | ||
| inline `[claim_id]` citation behind every sentence. It never invents a | ||
| sentence that isn't traceable to a claim, reports the query topics it found | ||
| no claim for in an explicit `gaps` block, and grades its own confidence from | ||
| the lifecycle status of the claims it cited. | ||
|
|
||
| The synthesis is deterministic in v1 — there is no LLM in the loop. The | ||
| `llm` flag is reserved so the wire shape is stable when an opt-in generative | ||
| backend lands; passing `llm=True` raises rather than silently degrading. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from typing import Any, Literal | ||
|
|
||
| from .context import build_context_pack | ||
| from .models import Claim, ClaimStatus | ||
| from .storage import ArtifactNotFoundError, KBStore | ||
|
|
||
| Confidence = Literal["high", "medium", "low"] | ||
|
|
||
| _STOPWORDS = frozenset( | ||
| { | ||
| "a", "an", "and", "are", "as", "at", "be", "by", "do", "does", "for", | ||
| "from", "how", "in", "into", "is", "it", "its", "of", "on", "or", | ||
| "the", "their", "them", "then", "there", "these", "this", "to", "was", | ||
| "were", "what", "when", "where", "which", "who", "why", "will", "with", | ||
| "you", "your", | ||
| } | ||
| ) | ||
|
|
||
|
|
||
| def _salient_terms(query: str) -> list[str]: | ||
| """Lowercased, de-duplicated, order-preserving content words of the query.""" | ||
| seen: set[str] = set() | ||
| terms: list[str] = [] | ||
| for raw in query.split(): | ||
| token = "".join(ch for ch in raw.lower() if ch.isalnum()) | ||
| if len(token) < 3 or token in _STOPWORDS or token in seen: | ||
| continue | ||
| seen.add(token) | ||
| terms.append(token) | ||
| return terms | ||
|
|
||
|
|
||
| def _clause(text: str) -> str: | ||
| """One short, single-clause rendering of a claim's text.""" | ||
| clause = text.strip().split("\n", 1)[0].strip() | ||
| for sep in (". ", "; ", " — ", " - "): | ||
| head = clause.split(sep, 1)[0] | ||
| if head: | ||
| clause = head | ||
| clause = clause.rstrip(".;,") | ||
| return clause | ||
|
|
||
|
|
||
| def _covers(term: str, *claims: Claim) -> bool: | ||
| return any(term in c.text.lower() for c in claims) | ||
|
|
||
|
|
||
| def _confidence(statuses: list[ClaimStatus]) -> Confidence: | ||
| if any(s == ClaimStatus.CONTESTED for s in statuses): | ||
| return "low" | ||
| if any(s in (ClaimStatus.WORKING, ClaimStatus.ACTIONABLE) for s in statuses): | ||
| return "medium" | ||
| if statuses and all(s == ClaimStatus.STABLE for s in statuses): | ||
| return "high" | ||
| return "medium" | ||
|
|
||
|
|
||
| def synthesize( | ||
| store: KBStore, | ||
| *, | ||
| query: str, | ||
| depth: int = 3, | ||
| max_chars: int = 4000, | ||
| llm: bool = False, | ||
| ) -> dict[str, Any]: | ||
| """Answer `query` from approved claims only, with inline citations. | ||
|
|
||
| Returns a dict with `query`, `answer` (citation-bearing prose, possibly | ||
| empty), `claims` (the cited claim ids), `gaps` (query topics no approved | ||
| claim covered) and `_meta.synthesis_confidence`. | ||
| """ | ||
| if llm: | ||
| raise ValueError( | ||
| "llm synthesis backend not configured; " | ||
| "deterministic synthesis is the default" | ||
| ) | ||
|
|
||
| pack = build_context_pack(store, query=query, limit=depth) | ||
| items = pack["items"] if isinstance(pack, dict) else pack.items | ||
|
|
||
| approved: list[Claim] = [] | ||
| seen_ids: set[str] = set() | ||
| for item in items: | ||
| if (item["type"] if isinstance(item, dict) else item.type) != "claim": | ||
| continue | ||
| cid = item["id"] if isinstance(item, dict) else item.id | ||
| if cid in seen_ids: | ||
| continue | ||
| try: | ||
| claim = store.get_claim(cid) | ||
| except ArtifactNotFoundError: | ||
| continue | ||
| seen_ids.add(cid) | ||
| approved.append(claim) | ||
|
|
||
| sentences: list[str] = [] | ||
| cited: list[str] = [] | ||
| statuses: list[ClaimStatus] = [] | ||
| used = 0 | ||
| for claim in approved: | ||
| sentence = f"{_clause(claim.text)} [{claim.id}]." | ||
| projected = used + len(sentence) + (1 if sentences else 0) | ||
| if projected > max_chars: | ||
| break | ||
| sentences.append(sentence) | ||
| cited.append(claim.id) | ||
| statuses.append(claim.status) | ||
| used = projected | ||
|
|
||
| answer = " ".join(sentences) | ||
| cited_claims = [c for c in approved if c.id in set(cited)] | ||
| gaps = [ | ||
| term | ||
| for term in _salient_terms(query) | ||
| if not (cited_claims and _covers(term, *cited_claims)) | ||
| ] | ||
|
Comment on lines
+128
to
+132
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Compute Line 128–132 currently checks coverage only against 💡 Suggested fix- cited_claims = [c for c in approved if c.id in set(cited)]
gaps = [
term
for term in _salient_terms(query)
- if not (cited_claims and _covers(term, *cited_claims))
+ if not _covers(term, *approved)
]🤖 Prompt for AI Agents |
||
|
|
||
| return { | ||
| "query": query, | ||
| "answer": answer, | ||
| "claims": cited, | ||
| "gaps": gaps, | ||
| "_meta": {"synthesis_confidence": _confidence(statuses)}, | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a language tag to the fenced verification block.
Line 65 uses an unlabeled fenced block; markdownlint MD040 flags this.
📝 Suggested fix
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 65-65: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Source: Linters/SAST tools