Skip to content

[routing] Add LLM-powered Reranker implementation for semantic candidate scoring #154

@dgenio

Description

@dgenio

Context

Issue #47 defines the Reranker protocol for rescoring routing candidates after initial retrieval. The default implementation will be NoOpReranker (pass-through). This issue adds an LLM-powered reranker that uses a language model to assess semantic relevance between the query and candidate tools.

Current state

Why it matters

  • Accuracy at scale — Cross-encoder and LLM-based reranking consistently outperform bi-encoder and lexical scoring in IR benchmarks (by 10-30% on nDCG).
  • Pillar 3 — "Use an LM to better understand the relationship between tools" — reranking is the most targeted application: given this specific query, which 5 of these 20 candidates are truly relevant?
  • Composable with existing scoring — Fits cleanly into the retrieve → rerank → navigate pipeline ([routing] Decompose routing into explicit pipeline stages (retrieve → rerank → navigate → pack) #56).

Acceptance Criteria

  • LLMReranker class implementing Reranker protocol (from [routing] Add EngineRegistry with pluggable Retriever, Reranker, and ClusteringEngine protocols #47)
  • Accepts llm_fn: Callable[[str], str] — no dependency on any LLM provider
  • Takes query + candidate list (id, score), asks LLM to reorder by relevance
  • Prompt template: presents query + candidate descriptions, asks for ranked ordering with scores
  • Parses structured LLM output into reordered list[tuple[str, float]]
  • Graceful fallback: if LLM output is unparseable, returns original ordering
  • Optional top_k parameter to limit how many candidates are sent to the LLM (cost control)
  • Registers in EngineRegistry as "llm" reranker
  • Unit tests with mock llm_fn (valid response, invalid response, empty candidates)
  • No new runtime dependencies

Implementation Notes

class LLMReranker:
    def __init__(
        self,
        llm_fn: Callable[[str], str],
        *,
        top_k: int = 20,
        fallback_on_error: bool = True,
    ) -> None: ...
    
    def rerank(
        self, query: str, candidates: list[tuple[str, float]]
    ) -> list[tuple[str, float]]:
        # Only rerank top_k candidates (cost control)
        to_rerank = candidates[:self.top_k]
        prompt = self._build_prompt(query, to_rerank)
        response = self.llm_fn(prompt)
        return self._parse_response(response, to_rerank)

Files likely touched:

  • src/contextweaver/engines.py (or new src/contextweaver/extras/reranker_llm.py)
  • tests/test_engines.py

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/routingRouting engine: catalog, graph, router, cardscomplexity/complexCross-cutting, significant design or riskenhancementNew feature or requestpriority/mediumMedium priority — production readiness

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions