-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
area/routingRouting engine: catalog, graph, router, cardsRouting engine: catalog, graph, router, cardscomplexity/complexCross-cutting, significant design or riskCross-cutting, significant design or riskenhancementNew feature or requestNew feature or requestpriority/mediumMedium priority — production readinessMedium priority — production readiness
Milestone
Description
Context
Issue #47 defines the Reranker protocol for rescoring routing candidates after initial retrieval. The default implementation will be NoOpReranker (pass-through). This issue adds an LLM-powered reranker that uses a language model to assess semantic relevance between the query and candidate tools.
Current state
Router.route()scores candidates usingTfIdfScorer(cosine similarity on tokenized text).- TF-IDF is fast but misses semantic relevance: "schedule a meeting" scores low against "calendar event creation" because the vocabulary doesn't overlap.
- [routing] Add
contextweaver[retrieval]extra with BM25 and fuzzy matching backends #55 adds BM25/fuzzy matching — still lexical. [routing] Add optional embedding-based retrieval backend for improved recall at scale #8 adds embedding retrieval — semantic but coarse. - A reranker sits after retrieval and before navigation, using a more expensive model to reorder the top-k candidates by true relevance.
Why it matters
- Accuracy at scale — Cross-encoder and LLM-based reranking consistently outperform bi-encoder and lexical scoring in IR benchmarks (by 10-30% on nDCG).
- Pillar 3 — "Use an LM to better understand the relationship between tools" — reranking is the most targeted application: given this specific query, which 5 of these 20 candidates are truly relevant?
- Composable with existing scoring — Fits cleanly into the retrieve → rerank → navigate pipeline ([routing] Decompose routing into explicit pipeline stages (retrieve → rerank → navigate → pack) #56).
Acceptance Criteria
-
LLMRerankerclass implementingRerankerprotocol (from [routing] Add EngineRegistry with pluggable Retriever, Reranker, and ClusteringEngine protocols #47) - Accepts
llm_fn: Callable[[str], str]— no dependency on any LLM provider - Takes query + candidate list (id, score), asks LLM to reorder by relevance
- Prompt template: presents query + candidate descriptions, asks for ranked ordering with scores
- Parses structured LLM output into reordered
list[tuple[str, float]] - Graceful fallback: if LLM output is unparseable, returns original ordering
- Optional
top_kparameter to limit how many candidates are sent to the LLM (cost control) - Registers in
EngineRegistryas"llm"reranker - Unit tests with mock
llm_fn(valid response, invalid response, empty candidates) - No new runtime dependencies
Implementation Notes
class LLMReranker:
def __init__(
self,
llm_fn: Callable[[str], str],
*,
top_k: int = 20,
fallback_on_error: bool = True,
) -> None: ...
def rerank(
self, query: str, candidates: list[tuple[str, float]]
) -> list[tuple[str, float]]:
# Only rerank top_k candidates (cost control)
to_rerank = candidates[:self.top_k]
prompt = self._build_prompt(query, to_rerank)
response = self.llm_fn(prompt)
return self._parse_response(response, to_rerank)Files likely touched:
src/contextweaver/engines.py(or newsrc/contextweaver/extras/reranker_llm.py)tests/test_engines.py
Dependencies
- Requires [routing] Add EngineRegistry with pluggable Retriever, Reranker, and ClusteringEngine protocols #47 —
Rerankerprotocol andEngineRegistry - Benefits from [routing] Decompose routing into explicit pipeline stages (retrieve → rerank → navigate → pack) #56 — explicit pipeline stages make reranking a clean insertion point
- Complementary to [routing] Add
contextweaver[retrieval]extra with BM25 and fuzzy matching backends #55 (BM25 retrieval) and [routing] Add optional embedding-based retrieval backend for improved recall at scale #8 (embedding retrieval) — reranking refines their output
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/routingRouting engine: catalog, graph, router, cardsRouting engine: catalog, graph, router, cardscomplexity/complexCross-cutting, significant design or riskCross-cutting, significant design or riskenhancementNew feature or requestNew feature or requestpriority/mediumMedium priority — production readinessMedium priority — production readiness