Feature Request: strengthen cross-language recall with explicit multilingual rerank/fusion support

## Problem / Motivation

Cross-language recall currently feels weaker than the README/config surface suggests.

After reviewing the current code path and runtime config, I found two related gaps that together make multilingual retrieval harder to tune and less predictable:

1. **BM25 only indexes the `text` column, so cross-language lexical recall depends on authoring conventions inside prose.**
   In practice, Chinese -> English recall improves only when entries manually include bilingual anchors such as `Keywords (zh): ...`. There is no first-class structured bilingual keyword field or query expansion path.

2. **`vectorWeight` / `bm25Weight` are exposed in config but are not actually used in fusion.**
   This removes an obvious tuning lever for multilingual retrieval, where users often need to bias the system toward vector evidence when BM25 has little or no same-language overlap.

These two issues are tightly connected: when lexical overlap is weak across languages, retrieval quality depends more on vector quality, but today the user-facing tuning knobs and lexical support do not fully support that workflow.

## Proposed Solution

I would like `memory-lancedb-pro` to make cross-language retrieval a first-class, explicitly supported path.

The cleanest direction is to treat this as **two complementary layers** instead of a single heuristic patch:

### 1. Memory-side bilingual keywords / aliases (preferred long-term contract)
Add a first-class bilingual keywords / alias mechanism for stored memories.

Concretely, this means a memory should be able to carry **structured bilingual lookup terms** in addition to its main `text`, for example via a dedicated `keywords` / `aliases` field. BM25 / FTS should be able to index and search those lookup terms alongside `text`.

This is different from query expansion:
- memory-side aliases answer **"what other names or bilingual terms can this memory be found by?"**
- they are attached to the memory itself, not guessed at query time

Why this matters:
- this makes cross-language lexical recall a **native data capability**, not just a writing convention
- it is more stable and controllable than manually appending `Keywords (zh): ...` to the main memory text
- it keeps the body text human-readable while still allowing Chinese <-> English lexical matches

### 2. Query-side expansion / dictionary (good short-term / complementary layer)
Add an optional query expansion path for bilingual or colloquial queries.

Concretely, this means taking the incoming query and expanding it into additional bilingual or technical variants before BM25 search, for example through:
- a static synonym dictionary
- bilingual alias expansion
- domain-specific colloquial -> technical term expansion

This is different from memory-side aliases:
- query expansion answers **"what other forms of this query should we try?"**
- it enriches the search input at retrieval time, instead of attaching structured aliases to stored memories

Why this matters:
- it is easier to ship quickly
- it improves recall for fuzzy Chinese queries and colloquial wording
- it complements, but should not replace, the memory-side alias mechanism

### 3. Fix the fusion contract so lexical improvements can actually be tuned
Make `retrieval.vectorWeight` / `retrieval.bm25Weight` participate in the real fusion logic.

Why this matters:
- lexical improvements alone only help if those candidates can be combined reasonably with vector results
- in cross-language cases, BM25 may still be sparse or uneven, so users need an actual way to bias toward vector evidence when appropriate
- right now the public weighting knobs are exposed, but the current fusion implementation does not consume them

## Suggested rollout shape

If this is easier to stage incrementally, I think the order should be:

1. **short-term:** query expansion / dictionary-based lexical help
2. **mid-term:** structured bilingual keywords / alias support on the memory side
3. **alongside or immediately after:** fix fusion weighting so the new lexical candidates can be tuned correctly

That gives a practical path forward without locking the design into prose-only keyword stuffing.

## Alternatives Considered

Current workarounds are possible but all feel weaker than a first-class fix:

- manually append `Keywords (zh): ...` to memory text
- switch to a more multilingual embedding model and hope vector similarity is enough
- lower thresholds or increase candidate pool

These help, but they do not solve the underlying contract/tuning gaps.

## Area

Retrieval / Search

## Additional Context

Evidence from the current repository state:

- `src/store.ts`
  - FTS index is created on `text`
  - BM25 search reads from the same `text` field
- `README.md`
  - recommends `Keywords (zh)` authoring patterns, which currently acts as an implicit bilingual retrieval aid
- `vectorWeight` / `bm25Weight`
  - exposed in config/types, but not consumed by the current fusion implementation

This issue is intentionally framed as a feature request rather than a narrow bug report because the main problem is the current multilingual retrieval contract and tuning surface, not a single crash or regression.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: strengthen cross-language recall with explicit multilingual rerank/fusion support #130

Problem / Motivation

Proposed Solution

1. Memory-side bilingual keywords / aliases (preferred long-term contract)

2. Query-side expansion / dictionary (good short-term / complementary layer)

3. Fix the fusion contract so lexical improvements can actually be tuned

Suggested rollout shape

Alternatives Considered

Area

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: strengthen cross-language recall with explicit multilingual rerank/fusion support #130

Description

Problem / Motivation

Proposed Solution

1. Memory-side bilingual keywords / aliases (preferred long-term contract)

2. Query-side expansion / dictionary (good short-term / complementary layer)

3. Fix the fusion contract so lexical improvements can actually be tuned

Suggested rollout shape

Alternatives Considered

Area

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions