Multi-provider AI orchestrator with MCP, RAG, and semantic memory. Built on Spring AI.
graph TB
User["User"]
subgraph "AscendAI Platform"
Agent["AscendAgent<br/>REST API :9917<br/>Spring Boot · Java 21"]
subgraph "MCP Tool Services"
AudioScribe["AudioScribe<br/>:7017<br/>Audio Transcription"]
Weather["WeatherMCP<br/>:9998<br/>Weather Data"]
WebSearch["AscendWebSearch<br/>:7021<br/>Web Search"]
PaddleOCR["PaddleOCR<br/>:7022<br/>OCR"]
end
Memory["AscendMemory<br/>:7020<br/>Semantic Memory"]
end
subgraph "AI Providers"
Local["LM Studio<br/>(local, default)"]
Cloud["OpenAI · Anthropic<br/>Gemini · MiniMax"]
end
subgraph "Data Layer"
PG["PostgreSQL<br/>Chat · Metadata"]
RD["Redis<br/>Chat cache"]
QD["Qdrant<br/>Vector DB"]
S3["MinIO<br/>Documents"]
end
User -->|"REST"| Agent
Agent -->|"MCP"| AudioScribe
Agent -->|"MCP"| Weather
Agent -->|"MCP"| WebSearch
Agent -->|"MCP"| PaddleOCR
Agent -->|"REST"| Memory
Agent -.-> Local
Agent -.-> Cloud
Agent --> PG
Agent --> RD
Agent --> QD
Agent --> S3
Memory --> QD
- Why this exists
- Features
- How it compares
- Demo
- Architecture
- Quick Start
- Supported AI Providers
- Configuration & Ports
- Documentation
I built AscendAI because off-the-shelf orchestrators don't let you swap providers per request, run a privacy-respecting search backend you fully control, and persist semantic memory across sessions in one coherent platform. AscendAI does all three. Each prompt routes to the model you pick at call time (local LM Studio, OpenAI, Anthropic, Gemini, MiniMax). MCP tool servers handle audio, web, weather, and OCR. Long-term context lives in a Mem0-backed Qdrant store so conversations actually accumulate knowledge.
- Per-request provider routing. Pick LM Studio, OpenAI, Anthropic, Gemini, or MiniMax on every API call without restarts or config changes.
- RAG pipeline with Qdrant. Thresholded soft-retrieval over ingested documents using provider-matched embedding dimensions (768 / 1536).
- Semantic memory via Mem0. Long-lived, user-scoped memories searchable across sessions through the AscendMemory service.
- MCP tool servers. First-class integrations for audio transcription (AudioScribe), web search (AscendWebSearch + SearXNG), weather (WeatherMCP), and OCR (PaddleOCR).
- Document ingestion to MinIO. Drop files (Markdown, PDF, DOCX) into a bucket and the pipeline parses them via Docling / Unstructured and indexes them automatically.
- Hybrid chat history. Redis for the active context window, PostgreSQL for durable long-term archives and analytics.
- Privacy-respecting web. SearXNG meta-search plus FlareSolverr for Cloudflare-protected pages, all self-hosted.
The honest peer set is other deployable AI orchestration backends that bundle multi-provider routing, RAG, memory, and tools into one self-hosted service. Not chat UIs, not low-code workflow builders, not pure router proxies, not libraries. All four below are mature and well-known in this niche.
| AscendAI | R2R | Letta | Onyx | Quivr | LangChain | |
|---|---|---|---|---|---|---|
| Shape | Deployable service | Deployable service | Deployable service | Deployable service | Deployable service | Library / framework |
| Stack | Java 21 / Spring AI | Python | Python | Python | Python | Python (TS port) |
| API-first (no UI shipped) | Yes | Yes | Yes (server on :8283) |
UI bundled, API-driven | UI bundled, API exposed | N/A, you build it |
| Per-request provider switch | Built-in | Built-in | Built-in | Built-in | Built-in | Possible via chain rebuild |
| RAG over uploaded docs | Built-in (Qdrant + threshold) | Built-in (multimodal, hybrid, KGs) | Lighter, agent-state focused | Built-in (40+ connectors) | Built-in (pluggable stores) | Many backends, you wire it |
| Persistent semantic memory | Mem0 + Qdrant | Add-on | Native (OS-style hierarchical) | Add-on | Built-in | Roll-your-own |
| Tool integration model | MCP-native (Spring AI MCP client) | Function tools | Function tools | Function tools + connectors | Function tools | Tools + MCP via adapters |
| Single docker compose deploy | Yes | Yes | Yes | Yes | Yes | Bring-your-own |
LangChain isn't strictly a peer. It's a framework, not a deployable service. It's in the table because it's the most likely thing readers reach for when they think "AI orchestration", and the honest answer is "if you're already wiring your own service in LangChain, you don't need AscendAI."
- JVM-native. Every credible peer in this niche is Python or TypeScript. If you live in Spring Boot already, AscendAI drops in alongside the rest of your services without a polyglot deploy.
- MCP-first tool model. Onyx and Letta do tool use; AscendAI is built around MCP from day one with multiple bundled MCP servers (audio, OCR, web search, weather). Add new tools by pointing the agent at another MCP server, no code changes.
- Breadth of integration in one stack. RAG, semantic memory, MCP tools, multi-provider routing, hot / archive chat history. All present, no add-ons.
- No UI. Onyx and Quivr ship one. AscendAI is a backend you put behind your own client.
- Smaller community. All four peers above have more stars, more contributors, more battle testing.
- RAG depth. R2R has a more sophisticated RAG pipeline (knowledge graphs, multimodal). AscendAI's RAG is solid but plain.
- Memory depth. Letta's memory architecture is more advanced than the Mem0-based memory here.
If you're already happy in Python with R2R or Letta, you don't need this. AscendAI exists because I wanted these capabilities in a Spring-native deployment.
Send a prompt with per-request provider and model selection. The endpoint accepts multipart/form-data (so you can
attach an optional image or document).
Bash:
curl -X POST http://localhost:9917/api/v1/ai/prompt \
-H "X-User-Id: luksarna" \
-F "prompt=Summarize my notes on Spring AI and MCP." \
-F "provider=anthropic" \
-F "model=claude-sonnet-4-6" \
-F "embeddingProvider=lmstudio"PowerShell 7+ (-Form supports multipart):
Invoke-RestMethod -Uri http://localhost:9917/api/v1/ai/prompt -Method Post -Headers @{ "X-User-Id" = "luksarna" } -Form @{ prompt = "Summarize my notes on Spring AI and MCP."; provider = "anthropic"; model = "claude-sonnet-4-6"; embeddingProvider = "lmstudio" }Sample response (AiResponse: content plus an unwrapped Spring AI ChatResponseMetadata and the list of MCP tools
invoked during the turn):
{
"content": "Your notes describe AscendAI as a Spring AI orchestrator that routes prompts across providers and uses MCP for tool calls. Per-request model selection happens via /api/v1/ai/prompt; RAG runs over Qdrant collections (ascendai-768 / -1536); semantic memory is backed by Mem0…",
"id": "msg_01ABcDEf…",
"model": "claude-sonnet-4-6",
"usage": { "promptTokens": 1842, "completionTokens": 312, "totalTokens": 2154 },
"toolsUsed": ["ascend_memory_search", "web_search"]
}Two architecture entry points, depending on what you're after.
- Monorepo architecture. System overview, service interactions, deployment, ADRs.
- AscendAgent arc42. Internals, component diagrams, module ADRs.
| Module | Stack | Port | Role |
|---|---|---|---|
| AscendAgent | Java 21 / Spring Boot | 9917 | API gateway, multi-provider AI, RAG, MCP client |
| AudioScribe | Python / FastMCP | 7017 | Audio transcription (Whisper / OpenAI / HF) |
| AscendWebSearch | Python / FastMCP | 7021 | Web search and scraping via SearXNG |
| AscendMemory | Python / FastAPI | 7020 | Semantic memory (Mem0 + Qdrant) |
| WeatherMCP | Java / Spring Boot | 9998 | Weather data MCP server |
| PaddleOCR | Python / FastMCP | 7022 | OCR service |
How a single prompt traverses the platform.
sequenceDiagram
autonumber
participant U as User
participant A as AscendAgent
participant R as Redis<br/>(short-term)
participant M as AscendMemory<br/>(Mem0)
participant Q as Qdrant<br/>(RAG)
participant T as MCP Tools<br/>(Web/Audio/OCR/Weather)
participant L as LLM Provider<br/>(per-request)
participant P as PostgreSQL<br/>(long-term)
U->>A: POST /api/v1/ai/prompt
A->>R: Load chat window
A->>M: Search semantic memory
M->>Q: Vector search (memory collection)
M-->>A: Top-k memories
A->>Q: RAG retrieval (doc collection)
Q-->>A: Top-k chunks (above threshold)
A->>L: Prompt + memory + RAG + tool defs
L-->>A: Tool call request (optional)
A->>T: Invoke MCP tool
T-->>A: Tool result
A->>L: Tool result, then final answer
L-->>A: Response + usage
A->>R: Append turn
A->>P: Persist transcript
A->>M: Extract and store new memories
A-->>U: AiResponse {content, metadata, toolsUsed}
- Docker Desktop
- Java 21
- PostgreSQL on
5432, Redis on6379, Qdrant on6333/6334, MinIO on9070/9071(admin/password)
1. Provision secrets.
Copy the example file and fill in the API keys you plan to use. .env is gitignored. Provider keys are optional
individually. Leave a key blank and just don't pick that provider at request time.
Bash:
cp .env.example .envPowerShell:
Copy-Item .env.example .env2. Bring up the stack.
The main compose file pulls in ascend-scrapper.docker-compose.yaml via include:,
so a single up brings up the full stack (AscendAgent + tool services + scrapper).
Bash:
docker compose up -d --buildPowerShell:
docker compose up -d --buildOptional, bring up only the web-scraping stack as its own Docker Desktop group.
Bash:
docker compose -f ascend-scrapper.docker-compose.yaml up -d --buildPowerShell:
docker compose -f ascend-scrapper.docker-compose.yaml up -d --build3. Ensure PostgreSQL has the ascend_ai database (user postgres, password local).
On first start the agent creates the MinIO knowledge-base bucket and initialises metadata tables. The API is then
available at http://localhost:9917. Check the startup banner for live status of every
dependency.
4. Optional: run the agent on the host.
For active development with hot reload and an attached debugger, run the agent on the host instead of in the container.
Stop the container first (docker compose stop ascend-agent) so port 9917 is free.
Bash:
cd AscendAgent./gradlew bootRunPowerShell:
cd AscendAgent./gradlew bootRunFor advanced compose flags, per-service rebuilds, and production notes see docs/DEPLOYMENT.md. For document ingestion see docs/INGESTION.md.
Per-request selection across the providers below. Models listed are the ones currently wired in
application.yaml (chat default, memory extraction, and history
compaction). Any model the provider accepts works at request time via the model form field; these are the values
that ship with the agent.
- OpenAI.
gpt-4o(default),gpt-4o-mini(extraction + compaction). - Anthropic.
claude-sonnet-4-5(default),claude-3-5-haiku-20241022(extraction),claude-haiku-4-5(compaction). - Gemini.
gemini-flash-latest(default),gemini-flash-lite-latest(extraction + compaction). - MiniMax.
MiniMax-M2.7(default + extraction + compaction). - LM Studio.
meta-llama-3.1-8b-instruct(default, local).
Override per request with the provider and model form fields, or globally via the *_MODEL env vars listed in
.env.example.
Each service ships both REST and MCP surfaces (except WeatherMCP, MCP-only). The "Used by AscendAgent via" column shows the actual transport AscendAgent uses today. The other surface is available for direct external use.
| Service | Port | Surfaces | Used by AscendAgent via | Role |
|---|---|---|---|---|
| AscendAgent | 9917 |
REST | (this is the agent) | API gateway and orchestrator. POST /api/v1/ai/prompt is the entry. |
| AscendMemory | 7020 |
REST + MCP | REST | Semantic memory store (Mem0 + Qdrant). Search / insert per user. |
| AudioScribe | 7017 |
REST + MCP | MCP (Streamable HTTP) | Speech-to-text (faster-whisper / OpenAI / HF / Audacity merge). |
| AscendWebSearch | 7021 |
REST + MCP | MCP (Streamable HTTP) | Web search + content extraction (SearXNG, Cloudflare, NoVNC). |
| PaddleOCR | 7022 |
REST + MCP | MCP (Streamable HTTP) | Image OCR. |
| WeatherMCP | 9998 |
MCP only (SSE) | MCP (SSE) | Weather data tool (reference Spring AI MCP server). |
| Service | Port | Default credentials | Role |
|---|---|---|---|
| SearXNG | 9020 |
(none) | Privacy-respecting meta-search; backend for AscendWebSearch. |
| FlareSolverr | 8191 |
(none) | Cloudflare bypass proxy used by AscendWebSearch. |
| ngrok (web-search) | (none) | NGROK_AUTHTOKEN |
Public tunnel to AscendWebSearch's NoVNC for remote CAPTCHA intervention. |
| Docling Serve | 5001 |
(none) | PDF / DOCX to structured JSON (used by ingestion pipeline). |
| Unstructured API | 9080 |
(none) | Generic document parsing fallback for ingestion. |
Full setup and usage in observability/README.md.
| Service | Port | Exposed | Role |
|---|---|---|---|
| Grafana | 7078 → 3000 |
Browser UI | Dashboards + Explore. Anonymous Viewer; admin / admin to edit. |
| Prometheus | 7077 → 9090 |
Browser UI | Scrapes metrics from the 6 services, Qdrant, and MinIO. |
| Loki | 3100 |
Internal only | Log store; receives logs from Vector. |
| Tempo | (none) | Internal only | Trace store; receives traces from the OTel Collector. |
| Vector | (none) | Internal only | Tails the 6 app containers' Docker logs and ships them to Loki. |
| OTel Collector | (none) | Internal only | Receives OTLP traces from the services and exports them to Tempo. |
| Service | Port | Default credentials | Role |
|---|---|---|---|
| PostgreSQL | 5432 |
postgres / local |
Chat-history archive, ingestion metadata, user instructions. |
| Redis | 6379 |
(none) | Short-term chat-history cache, session state. |
| Qdrant | 6333 / 6334 |
(none) | Vector DB for RAG (ascendai-768/1536) and Mem0 memory. |
| MinIO | 9070 / 9071 |
admin / password |
S3-compatible object store for ingested documents. |
Canonical index. Every doc the repo ships, in one place.
| File | What's in it |
|---|---|
| docs/architecture/README.md | Monorepo architecture: system view, ADRs, deployment topology. |
| docs/architecture/arc42/01-introduction-and-goals.md | Arc42 entry point for the platform. |
| AscendAgent/docs/architecture/arc42/01-introduction-and-goals.md | Arc42 for the agent internals. |
| docs/DEPLOYMENT.md | Docker Compose recipes, image publishing, prod notes. |
| docs/INGESTION.md | Upload flows for the RAG pipeline. |
| docs/TROUBLESHOOTING.md | Qdrant / MinIO / PostgreSQL / Redis reset recipes. |
| docs/OBSERVABILITY.md | Metrics, logs, traces — what is collected, dashboards, how to instrument. |
| observability/README.md | Observability stack services (Grafana / Prometheus / Loki / Tempo / Vector / OTel), pipeline, and how to view logs. |
| docs/AGENT_TOOLING.md | Agent-standards import, OpenSpec workflow. |
| docs/AGENTS-UPDATE.md | Per-OS selective refresh of skills, subagents, and shipped docs. |
| docs/MCP_SETUP.md | How to configure the MCP servers wired into agent sessions. |
| AscendAgent/e2e/README.md | End-to-end capability tests, fixtures, Bruno collection. |
| AGENTS.md | Shared instructions for any AI coding agent operating in this repo. |
| .github/workflows/README.md | CI and Release workflow operator notes: secrets, bump convention, how to cut a release. |
Released under the MIT License.