Skip to content

Add reranker option to semantic_code_search tool #22

@simianhacker

Description

@simianhacker

Feature: Reranker Option for Semantic Search

Overview

Add a use_reranker flag to semantic_code_search so callers can opt into Elastic's text-similarity reranker pipeline for higher quality ranking, surface max_score in responses, and make the reranker inference_id configurable via environment variable (defaulting to .rerank-v1-elasticsearch).

Implementation Roadmap

Task 1: Extend Configuration ✅

File: src/config.ts
Changes: Add rerankerInferenceId (env var e.g. SEMANTIC_CODE_SEARCH_RERANKER_INFERENCE_ID, default .rerank-v1-elasticsearch) into the exported config, update typings/usages.
Why: Centralize inference ID configuration with sane default.
Dependencies: None

Task 2: Extend Schema and Params ✅

File: src/mcp_server/tools/semantic_code_search.ts
Changes: Add optional boolean use_reranker with defaults/documentation to semanticCodeSearchSchema and inferred params.
Why: Input contract needs to surface the flag.
Dependencies: Task 1

Task 3: Build Reranker Query Flow ✅

File: src/mcp_server/tools/semantic_code_search.ts
Changes:

  • Introduce branching that, when use_reranker is true, constructs the retriever.text_similarity_reranker body matching the sample structure, pulling inference_id from config.
  • Ensure the semantic query populates both the inner semantic clause and inference_text; guard that reranker requires query.
  • Merge any KQL filter by embedding it in the bool.must array within the standard retriever's query.
  • Preserve existing non-reranker path.
    Why: Enable Elastic reranking without breaking the current behavior.
    Dependencies: Task 2

Task 4: Include max_score in Results ✅

File: src/mcp_server/tools/semantic_code_search.ts
Changes: Capture response.hits.max_score and inject alongside the hit list (e.g., as a sibling field in the serialized payload).
Why: Provide score reference for consumers.
Dependencies: Task 3

Task 5: Update Documentation ✅

File: src/mcp_server/tools/semantic_code_search.md
Changes: Document new use_reranker option, environment variable for inference ID, behavior constraints (requires query, scoring context).
Why: Keep tool guidance current.
Dependencies: Task 4

Task 6: Expand Tests ✅

File: tests/mcp_server/semantic_code_search.test.ts
Changes:

  • Add test verifying reranker request body (mock client) when flag set, including inference values from config and KQL handling.
  • Assert max_score presence in responses.
  • Add coverage for configurable inference ID (e.g., override env in test).
    Why: Guarantee new logic functions and remains stable.
    Dependencies: Task 3

Technical Details

  • Introduce helper to build base bool query so both retriever and traditional paths share KQL + semantic clauses.

  • Reranker payload mirrors:

    const retriever = {
      text_similarity_reranker: {
        retriever: { standard: { query: baseBoolQuery } },
        field: 'semantic_text',
        inference_id: elasticsearchConfig.rerankerInferenceId,
        inference_text: query,
        rank_window_size: 100,
        min_score: 0.5,
      },
    };
  • When kql is present, include the translated DSL in baseBoolQuery.bool.must.

  • For responses, return JSON like { hits, max_score: response.hits.max_score } (or similar structure) to keep compatibility while exposing the new metric.

  • Ensure documentation and tests note the new environment variable and default value.

Example Reranker Query

POST kibana-repo/_search
{
  "size": 20,
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "bool": {
              "must": [
                {
                  "semantic": {
                    "query": "Agent Builder one chat plugin",
                    "field": "semantic_text"
                  }
                }
              ],
              "should": [
                {
                  "term": {
                    "language": {
                      "value": "markdown",
                      "boost": 2
                    }
                  }
                }
              ]
            }
          }
        }
      },
      "field": "semantic_text",
      "inference_id": ".rerank-v1-elasticsearch",
      "inference_text": "Agent Builder one chat plugin",
      "rank_window_size": 100,
      "min_score": 0.5
    }
  }
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions