[routing] Add LLM-powered Reranker implementation for semantic candidate scoring

## Context

Issue #47 defines the `Reranker` protocol for rescoring routing candidates after initial retrieval. The default implementation will be `NoOpReranker` (pass-through). This issue adds an LLM-powered reranker that uses a language model to assess semantic relevance between the query and candidate tools.

### Current state
- `Router.route()` scores candidates using `TfIdfScorer` (cosine similarity on tokenized text).
- TF-IDF is fast but misses semantic relevance: "schedule a meeting" scores low against "calendar event creation" because the vocabulary doesn't overlap.
- #55 adds BM25/fuzzy matching — still lexical. #8 adds embedding retrieval — semantic but coarse.
- A reranker sits *after* retrieval and *before* navigation, using a more expensive model to reorder the top-k candidates by true relevance.

## Why it matters

- **Accuracy at scale** — Cross-encoder and LLM-based reranking consistently outperform bi-encoder and lexical scoring in IR benchmarks (by 10-30% on nDCG).
- **Pillar 3** — "Use an LM to better understand the relationship between tools" — reranking is the most targeted application: given *this specific query*, which 5 of these 20 candidates are truly relevant?
- **Composable with existing scoring** — Fits cleanly into the retrieve → rerank → navigate pipeline (#56).

## Acceptance Criteria

- [ ] `LLMReranker` class implementing `Reranker` protocol (from #47)
- [ ] Accepts `llm_fn: Callable[[str], str]` — no dependency on any LLM provider
- [ ] Takes query + candidate list (id, score), asks LLM to reorder by relevance
- [ ] Prompt template: presents query + candidate descriptions, asks for ranked ordering with scores
- [ ] Parses structured LLM output into reordered `list[tuple[str, float]]`
- [ ] Graceful fallback: if LLM output is unparseable, returns original ordering
- [ ] Optional `top_k` parameter to limit how many candidates are sent to the LLM (cost control)
- [ ] Registers in `EngineRegistry` as `"llm"` reranker
- [ ] Unit tests with mock `llm_fn` (valid response, invalid response, empty candidates)
- [ ] No new runtime dependencies

## Implementation Notes

```python
class LLMReranker:
    def __init__(
        self,
        llm_fn: Callable[[str], str],
        *,
        top_k: int = 20,
        fallback_on_error: bool = True,
    ) -> None: ...
    
    def rerank(
        self, query: str, candidates: list[tuple[str, float]]
    ) -> list[tuple[str, float]]:
        # Only rerank top_k candidates (cost control)
        to_rerank = candidates[:self.top_k]
        prompt = self._build_prompt(query, to_rerank)
        response = self.llm_fn(prompt)
        return self._parse_response(response, to_rerank)
```

**Files likely touched:**
- `src/contextweaver/engines.py` (or new `src/contextweaver/extras/reranker_llm.py`)
- `tests/test_engines.py`

## Dependencies

- **Requires #47** — `Reranker` protocol and `EngineRegistry`
- Benefits from #56 — explicit pipeline stages make reranking a clean insertion point
- Complementary to #55 (BM25 retrieval) and #8 (embedding retrieval) — reranking refines their output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[routing] Add LLM-powered Reranker implementation for semantic candidate scoring #154

Context

Current state

Why it matters

Acceptance Criteria

Implementation Notes

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[routing] Add LLM-powered Reranker implementation for semantic candidate scoring #154

Description

Context

Current state

Why it matters

Acceptance Criteria

Implementation Notes

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions