Skip to content

Dharit13/omni-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

omni-mcp

Multimodal MCP server that auto-routes requests to the right local model based on input modality — vision, audio, or text. One server, one protocol, all modalities.

Built for Mac M-series with vllm-mlx and Ollama as inference backends.

Why

LLM tooling shouldn't care which model handles a request. Send text, an image, or an audio clip — omni-mcp detects the modality and routes to the best local model automatically. No model-switching, no separate endpoints, no config juggling.

Models

Modality Model Backend
Text Qwen3.5 Ollama
Vision Qwen3-VL vllm-mlx (Ollama fallback)
Audio Whisper Large v3 Turbo mlx-whisper

Prerequisites

  • macOS with Apple Silicon (M1+)
  • Python 3.12+
  • uv package manager
  • Ollama installed and running

Quickstart

# Install dependencies
uv sync

# Pull the text model
ollama pull qwen3.5

# Run the MCP server
uv run mcp dev src/omni_mcp/server.py

Installation

As an MCP server in Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "omni-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/omni-mcp", "mcp", "run", "src/omni_mcp/server.py"]
    }
  }
}

From source

git clone https://github.com/Dharit13/omni-mcp.git
cd omni-mcp
uv sync

Usage

omni-mcp exposes a single query tool over MCP:

query(prompt, image?, audio?)
  • Text: just pass prompt
  • Vision: pass prompt + image (base64 data URI or file path)
  • Audio: pass prompt + audio (base64 data URI or file path)

The server detects the modality and routes to the right backend. If both image and audio are provided, audio takes priority.

Configuration

All settings are configurable via environment variables prefixed with OMNI_:

Variable Default Description
OMNI_OLLAMA_BASE_URL http://localhost:11434 Ollama API endpoint
OMNI_OLLAMA_TEXT_MODEL qwen3.5 Text generation model
OMNI_VLLM_BASE_URL http://localhost:8000 vllm-mlx API endpoint
OMNI_VLLM_VISION_MODEL Qwen/Qwen3-VL Vision model
OMNI_WHISPER_MODEL mlx-community/whisper-large-v3-turbo Whisper model
OMNI_LOG_LEVEL INFO Logging level
OMNI_REQUEST_TIMEOUT 120.0 Request timeout in seconds

Development

# Install with dev dependencies
uv sync --extra dev

# Run tests
uv run pytest

# Lint
uv run ruff check src/ tests/

# Type check
uv run ty check

Architecture

src/omni_mcp/
├── server.py            # MCP server entry point (FastMCP)
├── router.py            # Modality detection + dispatch
├── config.py            # Settings via environment variables
├── schemas.py           # Pydantic request/response models
├── backends/
│   ├── base.py          # Abstract backend protocol
│   ├── ollama.py        # Ollama HTTP client
│   └── vllm_mlx.py      # vllm-mlx OpenAI-compat client
└── modalities/
    ├── text.py          # Text processing
    ├── vision.py        # Image handling + encoding
    └── audio.py         # Audio transcription via mlx-whisper

License

MIT

About

Multimodal MCP server — auto-routes to vision, audio, or text models on local Mac M-series hardware

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages