No reranker between Qdrant top-k and document-rag synthesis — chunk ordering is raw cosine

## Observation

`trustgraph/retrieval/document_rag/document_rag.py:78-136` does:

```python
results = await asyncio.gather(*[query_concept(v) for v in vectors])
# dedupe by chunk_id, fetch from Garage
# ... pass deduped chunks straight to document_prompt synthesis
```

There is **no reranker stage**. Qdrant returns raw cosine top-k, deduplicated, passed straight to the synthesis LLM. No MMR, no diversity penalty, no token-budget cap, no cross-encoder rerank.

## Why this matters

Cosine top-k is approximate-and-topical, not answer-aware. For executive-synthesis questions ("Who are X's main competitors?"), the top-3 cosine matches may all be the same paragraph rephrased, or all from the same source document, when the answer needs **diversity across sources** to be trustworthy.

Issue [#878](https://github.com/trustgraph-ai/trustgraph/issues/878) (open, 2026-05-07) raises the cross-encoder reranking concern. This issue is the same concern, scoped specifically to the document-rag synthesis path (vs the general retrieval surface).

## Measured impact

In our Sizzl deployment, raising `--doc-limit` from 3 to 30 only moved the rubric needle +0.46 points — meaning the additional chunks at the tail of the top-30 were *not* materially improving synthesis. A reranker that surfaces 10 diverse, high-relevance chunks out of 30 retrieved would likely beat 30-unreranked on both quality and latency (fewer tokens to synthesize).

## Proposal

Add a `reranker` stage between `get_docs()` and `document_prompt()` in `document_rag.py`. Pluggable design:

- Cohere Rerank (API)
- BGE-reranker (local)
- Cross-encoder (local, e.g. ms-marco-MiniLM)

Insertion point: inside `get_docs()` after the Qdrant gather and before `fetch_chunk()` — rerank the chunk_id list, fetch fewer chunks, send leaner context to synthesis.

Estimated latency cost: <500ms for local cross-encoder, ~100-300ms for Cohere API.
Estimated quality lift: 5-15% on synthesis rubric metrics (anecdotal, varies by corpus).

## Related

- #878 — Cross-encoder reranking for graph_rag (the broader concern)
- TG also lacks MMR / diversity penalty on Qdrant top-k — same retrieval-quality issue family

## Stack

TrustGraph 2.3.21.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No reranker between Qdrant top-k and document-rag synthesis — chunk ordering is raw cosine #910

Observation

Why this matters

Measured impact

Proposal

Related

Stack

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

No reranker between Qdrant top-k and document-rag synthesis — chunk ordering is raw cosine #910

Description

Observation

Why this matters

Measured impact

Proposal

Related

Stack

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions