Multimodal MCP server that auto-routes requests to the right local model based on input modality — vision, audio, or text. One server, one protocol, all modalities.
Built for Mac M-series with vllm-mlx and Ollama as inference backends.
LLM tooling shouldn't care which model handles a request. Send text, an image, or an audio clip — omni-mcp detects the modality and routes to the best local model automatically. No model-switching, no separate endpoints, no config juggling.
| Modality | Model | Backend |
|---|---|---|
| Text | Qwen3.5 | Ollama |
| Vision | Qwen3-VL | vllm-mlx (Ollama fallback) |
| Audio | Whisper Large v3 Turbo | mlx-whisper |
# Install dependencies
uv sync
# Pull the text model
ollama pull qwen3.5
# Run the MCP server
uv run mcp dev src/omni_mcp/server.pyAdd to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"omni-mcp": {
"command": "uv",
"args": ["run", "--directory", "/path/to/omni-mcp", "mcp", "run", "src/omni_mcp/server.py"]
}
}
}git clone https://github.com/Dharit13/omni-mcp.git
cd omni-mcp
uv syncomni-mcp exposes a single query tool over MCP:
query(prompt, image?, audio?)
- Text: just pass
prompt - Vision: pass
prompt+image(base64 data URI or file path) - Audio: pass
prompt+audio(base64 data URI or file path)
The server detects the modality and routes to the right backend. If both image and audio are provided, audio takes priority.
All settings are configurable via environment variables prefixed with OMNI_:
| Variable | Default | Description |
|---|---|---|
OMNI_OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama API endpoint |
OMNI_OLLAMA_TEXT_MODEL |
qwen3.5 |
Text generation model |
OMNI_VLLM_BASE_URL |
http://localhost:8000 |
vllm-mlx API endpoint |
OMNI_VLLM_VISION_MODEL |
Qwen/Qwen3-VL |
Vision model |
OMNI_WHISPER_MODEL |
mlx-community/whisper-large-v3-turbo |
Whisper model |
OMNI_LOG_LEVEL |
INFO |
Logging level |
OMNI_REQUEST_TIMEOUT |
120.0 |
Request timeout in seconds |
# Install with dev dependencies
uv sync --extra dev
# Run tests
uv run pytest
# Lint
uv run ruff check src/ tests/
# Type check
uv run ty checksrc/omni_mcp/
├── server.py # MCP server entry point (FastMCP)
├── router.py # Modality detection + dispatch
├── config.py # Settings via environment variables
├── schemas.py # Pydantic request/response models
├── backends/
│ ├── base.py # Abstract backend protocol
│ ├── ollama.py # Ollama HTTP client
│ └── vllm_mlx.py # vllm-mlx OpenAI-compat client
└── modalities/
├── text.py # Text processing
├── vision.py # Image handling + encoding
└── audio.py # Audio transcription via mlx-whisper
MIT