Skip to content

feat(mcp): add configurable cross-encoder for search result reranking#1174

Open
lehcode wants to merge 24 commits intogetzep:mainfrom
lehcode:fix/mcp-cross-encoder-config
Open

feat(mcp): add configurable cross-encoder for search result reranking#1174
lehcode wants to merge 24 commits intogetzep:mainfrom
lehcode:fix/mcp-cross-encoder-config

Conversation

@lehcode
Copy link
Copy Markdown

@lehcode lehcode commented Jan 23, 2026

Summary

Add CrossEncoderConfig to MCP server configuration, allowing users to configure the cross-encoder (reranker) used for search result ranking.

Problem: The cross-encoder previously defaulted to OpenAIRerankerClient with hardcoded model gpt-4.1-nano, causing "No deployments available" errors when using LiteLLM or other OpenAI-compatible proxies that don't have this model configured.

Solution: Add configurable cross-encoder with support for multiple providers.

Changes

  • Add CrossEncoderConfig and CrossEncoderProvidersConfig to schema.py
  • Add CrossEncoderFactory supporting multiple providers
  • Pass configured cross_encoder to Graphiti initialization
  • Add cross_encoder section to all config YAML files
  • Suppress neo4j.notifications logger to reduce log noise

Supported Providers

Provider Description
openai Direct OpenAI API
openai_generic LiteLLM, Ollama, vLLM compatible
azure_openai Azure OpenAI with AD auth support
gemini Google Gemini
bge Local BGE model (no API required)
none/disabled Disable reranking entirely

Example Configuration

cross_encoder:
  enabled: true
  provider: "openai_generic"  # LiteLLM-compatible
  model: "gpt-4o-mini"
  
  providers:
    openai:
      api_key: ${OPENAI_API_KEY}
      api_url: ${OPENAI_BASE_URL}

Test Plan

  • Verify schema imports and instantiation work correctly
  • Verify factory handles all edge cases (disabled, none, invalid provider, missing config)
  • Verify YAML config loading with cross_encoder section
  • Test with LiteLLM proxy - no more "No deployments available" errors
  • Ruff linting passes

Allow setting log level through LOG_LEVEL env var (DEBUG, INFO, WARNING, ERROR, CRITICAL).
Defaults to INFO for backward compatibility.
- Add LOG_LEVEL to .env.example with description
- Add LOG_LEVEL to README.md Environment Variables section
- Add test_log_level_environment_variable() unit test
@lehcode lehcode force-pushed the fix/mcp-cross-encoder-config branch 3 times, most recently from 7acdf6b to b00f2fa Compare January 23, 2026 18:27
@antonkulaga
Copy link
Copy Markdown

I was trying to run graphici with gemini and had pani with cross-encoder.

lehcode and others added 17 commits January 29, 2026 02:04
Add support for `openai_generic` LLM provider in the MCP server factory.

This provider uses `OpenAIGenericClient` which calls `/chat/completions`
with `response_format` for structured output instead of the `/responses`
endpoint. This enables compatibility with:
- LiteLLM proxy
- Ollama
- vLLM
- Any OpenAI-compatible API that doesn't support `/responses`

The `/responses` endpoint is only available on OpenAI's native API, so
this provider is essential for self-hosted LLM deployments.

Usage in config.yaml:
```yaml
llm:
  provider: "openai_generic"
  model: "your-model"
  providers:
    openai:
      api_key: ${OPENAI_API_KEY}
      api_url: ${OPENAI_BASE_URL}
```
Detect when providers (e.g., LiteLLM with Gemini) return schema
definition instead of data and automatically switch to json_object
mode with schema embedded in prompt.

- Add _is_schema_returned_as_data() detection helper
- Add instance-level _use_json_object_mode fallback state
- Modify _generate_response() to support dual modes
- Fallback persists for client lifetime after first trigger
- Add _extract_json() method to handle responses with trailing content
- Simplify _is_json_schema() detection logic
- Handle "Extra data" JSON parse errors gracefully
- Document openai_generic provider in README.md with LiteLLM and Ollama examples
- Add provider configuration to .env.example
- Add unit tests for _is_schema_returned_as_data() and _extract_json() methods
Revert "Fix dependabot security vulnerabilities (getzep#1184)"

This reverts commit 30cd907.
* Fix dependabot security vulnerabilities in dependencies

Update lock files to address multiple security alerts:
- pyasn1: 0.6.1 → 0.6.2 (CVE-2026-23490)
- langchain-core: 0.3.74 → 0.3.83 (CVE-2025-68664)
- mcp: 1.9.4 → 1.26.0 (DNS rebinding, DoS)
- azure-core: 1.34.0 → 1.38.0 (deserialization)
- starlette: 0.46.2/0.47.1 → 0.50.0/0.52.1 (DoS vulnerabilities)
- python-multipart: 0.0.20 → 0.0.22 (arbitrary file write)
- fastapi: 0.115.14 → 0.128.0 (for starlette compatibility)
- nbconvert: 7.16.6 → 7.17.0
- orjson: 3.11.5 → 3.11.6
- protobuf: 6.33.4 → 6.33.5

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Pin mcp_server to graphiti-core 0.26.3 from PyPI

- Change dependency from >=0.23.1 to ==0.26.3
- Remove editable source override to use published package
- Addresses code review feedback about RC version usage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix remaining security vulnerabilities in mcp_server

Update vulnerable transitive dependencies:
- aiohttp: 3.12.15 → 3.13.3 (High: zip bomb, DoS)
- urllib3: 2.5.0 → 2.6.3 (High: decompression bomb bypass)
- filelock: 3.19.1 → 3.20.3 (Medium: TOCTOU symlink)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Add support for `openai_generic` LLM provider in the MCP server factory.

This provider uses `OpenAIGenericClient` which calls `/chat/completions`
with `response_format` for structured output instead of the `/responses`
endpoint. This enables compatibility with:
- LiteLLM proxy
- Ollama
- vLLM
- Any OpenAI-compatible API that doesn't support `/responses`

The `/responses` endpoint is only available on OpenAI's native API, so
this provider is essential for self-hosted LLM deployments.

Usage in config.yaml:
```yaml
llm:
  provider: "openai_generic"
  model: "your-model"
  providers:
    openai:
      api_key: ${OPENAI_API_KEY}
      api_url: ${OPENAI_BASE_URL}
```
Allow setting log level through LOG_LEVEL env var (DEBUG, INFO, WARNING, ERROR, CRITICAL).
Defaults to INFO for backward compatibility.
Enables Azure Active Directory authentication for Azure OpenAI LLM and Embedder clients.
Conditionally configures the `AsyncOpenAI` client to use an Azure AD token provider when specified.
Retains API key authentication as an alternative.
feat: add openai_generic provider for LiteLLM/Ollama compatibility
feat: add configurable LOG_LEVEL for MCP server
Add CrossEncoderConfig to MCP server configuration, allowing users to
configure the cross-encoder (reranker) used for search result ranking.

Previously, the cross-encoder defaulted to OpenAIRerankerClient with
hardcoded model 'gpt-4.1-nano', causing "No deployments available"
errors when using LiteLLM or other OpenAI-compatible proxies.

Changes:
- Add CrossEncoderConfig and CrossEncoderProvidersConfig to schema.py
- Add CrossEncoderFactory supporting providers: openai, openai_generic,
  azure_openai, gemini, bge, and none/disabled
- Pass configured cross_encoder to Graphiti initialization
- Add cross_encoder section to all config YAML files
- Suppress neo4j.notifications logger to reduce log noise

Supported providers:
- openai: Direct OpenAI API
- openai_generic: LiteLLM, Ollama, vLLM compatible
- azure_openai: Azure OpenAI with AD auth support
- gemini: Google Gemini
- bge: Local BGE model (no API required)
- none/disabled: Disable reranking entirely
@lehcode lehcode force-pushed the fix/mcp-cross-encoder-config branch from 7760624 to 192c3e0 Compare February 8, 2026 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants