omni-mcp

Multimodal MCP server that auto-routes requests to the right local model based on input modality — vision, audio, or text. One server, one protocol, all modalities.

Built for Mac M-series with vllm-mlx and Ollama as inference backends.

Why

LLM tooling shouldn't care which model handles a request. Send text, an image, or an audio clip — omni-mcp detects the modality and routes to the best local model automatically. No model-switching, no separate endpoints, no config juggling.

Models

Modality	Model	Backend
Text	Qwen3.5	Ollama
Vision	Qwen3-VL	vllm-mlx (Ollama fallback)
Audio	Whisper Large v3 Turbo	mlx-whisper

Prerequisites

macOS with Apple Silicon (M1+)
Python 3.12+
uv package manager
Ollama installed and running

Quickstart

# Install dependencies
uv sync

# Pull the text model
ollama pull qwen3.5

# Run the MCP server
uv run mcp dev src/omni_mcp/server.py

Installation

As an MCP server in Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "omni-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/omni-mcp", "mcp", "run", "src/omni_mcp/server.py"]
    }
  }
}

From source

git clone https://github.com/Dharit13/omni-mcp.git
cd omni-mcp
uv sync

Usage

omni-mcp exposes a single query tool over MCP:

query(prompt, image?, audio?)

Text: just pass prompt
Vision: pass prompt + image (base64 data URI or file path)
Audio: pass prompt + audio (base64 data URI or file path)

The server detects the modality and routes to the right backend. If both image and audio are provided, audio takes priority.

Configuration

All settings are configurable via environment variables prefixed with OMNI_:

Variable	Default	Description
`OMNI_OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama API endpoint
`OMNI_OLLAMA_TEXT_MODEL`	`qwen3.5`	Text generation model
`OMNI_VLLM_BASE_URL`	`http://localhost:8000`	vllm-mlx API endpoint
`OMNI_VLLM_VISION_MODEL`	`Qwen/Qwen3-VL`	Vision model
`OMNI_WHISPER_MODEL`	`mlx-community/whisper-large-v3-turbo`	Whisper model
`OMNI_LOG_LEVEL`	`INFO`	Logging level
`OMNI_REQUEST_TIMEOUT`	`120.0`	Request timeout in seconds

Development

# Install with dev dependencies
uv sync --extra dev

# Run tests
uv run pytest

# Lint
uv run ruff check src/ tests/

# Type check
uv run ty check

Architecture

src/omni_mcp/
├── server.py            # MCP server entry point (FastMCP)
├── router.py            # Modality detection + dispatch
├── config.py            # Settings via environment variables
├── schemas.py           # Pydantic request/response models
├── backends/
│   ├── base.py          # Abstract backend protocol
│   ├── ollama.py        # Ollama HTTP client
│   └── vllm_mlx.py      # vllm-mlx OpenAI-compat client
└── modalities/
    ├── text.py          # Text processing
    ├── vision.py        # Image handling + encoding
    └── audio.py         # Audio transcription via mlx-whisper

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/omni_mcp		src/omni_mcp
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

omni-mcp

Why

Models

Prerequisites

Quickstart

Installation

As an MCP server in Claude Desktop

From source

Usage

Configuration

Development

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

omni-mcp

Why

Models

Prerequisites

Quickstart

Installation

As an MCP server in Claude Desktop

From source

Usage

Configuration

Development

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages