Skip to content

Lukk17/AscendAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AscendAI

Multi-provider AI orchestrator with MCP, RAG, and semantic memory. Built on Spring AI.

License: MIT Java 21 Spring Boot 3.5 Spring AI 1.1.5 MCP enabled Python 3.11+ Docker Compose Last commit

PostgreSQL Redis Qdrant MinIO FastAPI Mem0 SearXNG Liquibase

graph TB
    User["User"]

    subgraph "AscendAI Platform"
        Agent["AscendAgent<br/>REST API :9917<br/>Spring Boot · Java 21"]

        subgraph "MCP Tool Services"
            AudioScribe["AudioScribe<br/>:7017<br/>Audio Transcription"]
            Weather["WeatherMCP<br/>:9998<br/>Weather Data"]
            WebSearch["AscendWebSearch<br/>:7021<br/>Web Search"]
            PaddleOCR["PaddleOCR<br/>:7022<br/>OCR"]
        end

        Memory["AscendMemory<br/>:7020<br/>Semantic Memory"]
    end

    subgraph "AI Providers"
        Local["LM Studio<br/>(local, default)"]
        Cloud["OpenAI · Anthropic<br/>Gemini · MiniMax"]
    end

    subgraph "Data Layer"
        PG["PostgreSQL<br/>Chat · Metadata"]
        RD["Redis<br/>Chat cache"]
        QD["Qdrant<br/>Vector DB"]
        S3["MinIO<br/>Documents"]
    end

    User -->|"REST"| Agent
    Agent -->|"MCP"| AudioScribe
    Agent -->|"MCP"| Weather
    Agent -->|"MCP"| WebSearch
    Agent -->|"MCP"| PaddleOCR
    Agent -->|"REST"| Memory
    Agent -.-> Local
    Agent -.-> Cloud
    Agent --> PG
    Agent --> RD
    Agent --> QD
    Agent --> S3
    Memory --> QD
Loading

Table of Contents


Why this exists

I built AscendAI because off-the-shelf orchestrators don't let you swap providers per request, run a privacy-respecting search backend you fully control, and persist semantic memory across sessions in one coherent platform. AscendAI does all three. Each prompt routes to the model you pick at call time (local LM Studio, OpenAI, Anthropic, Gemini, MiniMax). MCP tool servers handle audio, web, weather, and OCR. Long-term context lives in a Mem0-backed Qdrant store so conversations actually accumulate knowledge.


Features

  • Per-request provider routing. Pick LM Studio, OpenAI, Anthropic, Gemini, or MiniMax on every API call without restarts or config changes.
  • RAG pipeline with Qdrant. Thresholded soft-retrieval over ingested documents using provider-matched embedding dimensions (768 / 1536).
  • Semantic memory via Mem0. Long-lived, user-scoped memories searchable across sessions through the AscendMemory service.
  • MCP tool servers. First-class integrations for audio transcription (AudioScribe), web search (AscendWebSearch + SearXNG), weather (WeatherMCP), and OCR (PaddleOCR).
  • Document ingestion to MinIO. Drop files (Markdown, PDF, DOCX) into a bucket and the pipeline parses them via Docling / Unstructured and indexes them automatically.
  • Hybrid chat history. Redis for the active context window, PostgreSQL for durable long-term archives and analytics.
  • Privacy-respecting web. SearXNG meta-search plus FlareSolverr for Cloudflare-protected pages, all self-hosted.

How it compares

The honest peer set is other deployable AI orchestration backends that bundle multi-provider routing, RAG, memory, and tools into one self-hosted service. Not chat UIs, not low-code workflow builders, not pure router proxies, not libraries. All four below are mature and well-known in this niche.

AscendAI R2R Letta Onyx Quivr LangChain
Shape Deployable service Deployable service Deployable service Deployable service Deployable service Library / framework
Stack Java 21 / Spring AI Python Python Python Python Python (TS port)
API-first (no UI shipped) Yes Yes Yes (server on :8283) UI bundled, API-driven UI bundled, API exposed N/A, you build it
Per-request provider switch Built-in Built-in Built-in Built-in Built-in Possible via chain rebuild
RAG over uploaded docs Built-in (Qdrant + threshold) Built-in (multimodal, hybrid, KGs) Lighter, agent-state focused Built-in (40+ connectors) Built-in (pluggable stores) Many backends, you wire it
Persistent semantic memory Mem0 + Qdrant Add-on Native (OS-style hierarchical) Add-on Built-in Roll-your-own
Tool integration model MCP-native (Spring AI MCP client) Function tools Function tools Function tools + connectors Function tools Tools + MCP via adapters
Single docker compose deploy Yes Yes Yes Yes Yes Bring-your-own

LangChain isn't strictly a peer. It's a framework, not a deployable service. It's in the table because it's the most likely thing readers reach for when they think "AI orchestration", and the honest answer is "if you're already wiring your own service in LangChain, you don't need AscendAI."

Where AscendAI is honestly distinctive

  • JVM-native. Every credible peer in this niche is Python or TypeScript. If you live in Spring Boot already, AscendAI drops in alongside the rest of your services without a polyglot deploy.
  • MCP-first tool model. Onyx and Letta do tool use; AscendAI is built around MCP from day one with multiple bundled MCP servers (audio, OCR, web search, weather). Add new tools by pointing the agent at another MCP server, no code changes.
  • Breadth of integration in one stack. RAG, semantic memory, MCP tools, multi-provider routing, hot / archive chat history. All present, no add-ons.

Where it loses

  • No UI. Onyx and Quivr ship one. AscendAI is a backend you put behind your own client.
  • Smaller community. All four peers above have more stars, more contributors, more battle testing.
  • RAG depth. R2R has a more sophisticated RAG pipeline (knowledge graphs, multimodal). AscendAI's RAG is solid but plain.
  • Memory depth. Letta's memory architecture is more advanced than the Mem0-based memory here.

If you're already happy in Python with R2R or Letta, you don't need this. AscendAI exists because I wanted these capabilities in a Spring-native deployment.


Demo

Send a prompt with per-request provider and model selection. The endpoint accepts multipart/form-data (so you can attach an optional image or document).

Bash:

curl -X POST http://localhost:9917/api/v1/ai/prompt \
  -H "X-User-Id: luksarna" \
  -F "prompt=Summarize my notes on Spring AI and MCP." \
  -F "provider=anthropic" \
  -F "model=claude-sonnet-4-6" \
  -F "embeddingProvider=lmstudio"

PowerShell 7+ (-Form supports multipart):

Invoke-RestMethod -Uri http://localhost:9917/api/v1/ai/prompt -Method Post -Headers @{ "X-User-Id" = "luksarna" } -Form @{ prompt = "Summarize my notes on Spring AI and MCP."; provider = "anthropic"; model = "claude-sonnet-4-6"; embeddingProvider = "lmstudio" }

Sample response (AiResponse: content plus an unwrapped Spring AI ChatResponseMetadata and the list of MCP tools invoked during the turn):

{
  "content": "Your notes describe AscendAI as a Spring AI orchestrator that routes prompts across providers and uses MCP for tool calls. Per-request model selection happens via /api/v1/ai/prompt; RAG runs over Qdrant collections (ascendai-768 / -1536); semantic memory is backed by Mem0…",
  "id": "msg_01ABcDEf…",
  "model": "claude-sonnet-4-6",
  "usage": { "promptTokens": 1842, "completionTokens": 312, "totalTokens": 2154 },
  "toolsUsed": ["ascend_memory_search", "web_search"]
}

Architecture

Two architecture entry points, depending on what you're after.

Module Stack Port Role
AscendAgent Java 21 / Spring Boot 9917 API gateway, multi-provider AI, RAG, MCP client
AudioScribe Python / FastMCP 7017 Audio transcription (Whisper / OpenAI / HF)
AscendWebSearch Python / FastMCP 7021 Web search and scraping via SearXNG
AscendMemory Python / FastAPI 7020 Semantic memory (Mem0 + Qdrant)
WeatherMCP Java / Spring Boot 9998 Weather data MCP server
PaddleOCR Python / FastMCP 7022 OCR service

Request flow

How a single prompt traverses the platform.

sequenceDiagram
    autonumber
    participant U as User
    participant A as AscendAgent
    participant R as Redis<br/>(short-term)
    participant M as AscendMemory<br/>(Mem0)
    participant Q as Qdrant<br/>(RAG)
    participant T as MCP Tools<br/>(Web/Audio/OCR/Weather)
    participant L as LLM Provider<br/>(per-request)
    participant P as PostgreSQL<br/>(long-term)

    U->>A: POST /api/v1/ai/prompt
    A->>R: Load chat window
    A->>M: Search semantic memory
    M->>Q: Vector search (memory collection)
    M-->>A: Top-k memories
    A->>Q: RAG retrieval (doc collection)
    Q-->>A: Top-k chunks (above threshold)
    A->>L: Prompt + memory + RAG + tool defs
    L-->>A: Tool call request (optional)
    A->>T: Invoke MCP tool
    T-->>A: Tool result
    A->>L: Tool result, then final answer
    L-->>A: Response + usage
    A->>R: Append turn
    A->>P: Persist transcript
    A->>M: Extract and store new memories
    A-->>U: AiResponse {content, metadata, toolsUsed}
Loading

Quick Start

Prerequisites

  • Docker Desktop
  • Java 21
  • PostgreSQL on 5432, Redis on 6379, Qdrant on 6333 / 6334, MinIO on 9070 / 9071 (admin / password)

Run it

1. Provision secrets.

Copy the example file and fill in the API keys you plan to use. .env is gitignored. Provider keys are optional individually. Leave a key blank and just don't pick that provider at request time.

Bash:

cp .env.example .env

PowerShell:

Copy-Item .env.example .env

2. Bring up the stack.

The main compose file pulls in ascend-scrapper.docker-compose.yaml via include:, so a single up brings up the full stack (AscendAgent + tool services + scrapper).

Bash:

docker compose up -d --build

PowerShell:

docker compose up -d --build

Optional, bring up only the web-scraping stack as its own Docker Desktop group.

Bash:

docker compose -f ascend-scrapper.docker-compose.yaml up -d --build

PowerShell:

docker compose -f ascend-scrapper.docker-compose.yaml up -d --build

3. Ensure PostgreSQL has the ascend_ai database (user postgres, password local).

On first start the agent creates the MinIO knowledge-base bucket and initialises metadata tables. The API is then available at http://localhost:9917. Check the startup banner for live status of every dependency.

4. Optional: run the agent on the host.

For active development with hot reload and an attached debugger, run the agent on the host instead of in the container. Stop the container first (docker compose stop ascend-agent) so port 9917 is free.

Bash:

cd AscendAgent
./gradlew bootRun

PowerShell:

cd AscendAgent
./gradlew bootRun

For advanced compose flags, per-service rebuilds, and production notes see docs/DEPLOYMENT.md. For document ingestion see docs/INGESTION.md.


Supported AI Providers

Per-request selection across the providers below. Models listed are the ones currently wired in application.yaml (chat default, memory extraction, and history compaction). Any model the provider accepts works at request time via the model form field; these are the values that ship with the agent.

  • OpenAI. gpt-4o (default), gpt-4o-mini (extraction + compaction).
  • Anthropic. claude-sonnet-4-5 (default), claude-3-5-haiku-20241022 (extraction), claude-haiku-4-5 (compaction).
  • Gemini. gemini-flash-latest (default), gemini-flash-lite-latest (extraction + compaction).
  • MiniMax. MiniMax-M2.7 (default + extraction + compaction).
  • LM Studio. meta-llama-3.1-8b-instruct (default, local).

Override per request with the provider and model form fields, or globally via the *_MODEL env vars listed in .env.example.


Configuration & Ports

AscendAI services

Each service ships both REST and MCP surfaces (except WeatherMCP, MCP-only). The "Used by AscendAgent via" column shows the actual transport AscendAgent uses today. The other surface is available for direct external use.

Service Port Surfaces Used by AscendAgent via Role
AscendAgent 9917 REST (this is the agent) API gateway and orchestrator. POST /api/v1/ai/prompt is the entry.
AscendMemory 7020 REST + MCP REST Semantic memory store (Mem0 + Qdrant). Search / insert per user.
AudioScribe 7017 REST + MCP MCP (Streamable HTTP) Speech-to-text (faster-whisper / OpenAI / HF / Audacity merge).
AscendWebSearch 7021 REST + MCP MCP (Streamable HTTP) Web search + content extraction (SearXNG, Cloudflare, NoVNC).
PaddleOCR 7022 REST + MCP MCP (Streamable HTTP) Image OCR.
WeatherMCP 9998 MCP only (SSE) MCP (SSE) Weather data tool (reference Spring AI MCP server).

Support services (in-stack, deployed via compose)

Service Port Default credentials Role
SearXNG 9020 (none) Privacy-respecting meta-search; backend for AscendWebSearch.
FlareSolverr 8191 (none) Cloudflare bypass proxy used by AscendWebSearch.
ngrok (web-search) (none) NGROK_AUTHTOKEN Public tunnel to AscendWebSearch's NoVNC for remote CAPTCHA intervention.
Docling Serve 5001 (none) PDF / DOCX to structured JSON (used by ingestion pipeline).
Unstructured API 9080 (none) Generic document parsing fallback for ingestion.

Observability stack (in-stack, deployed via compose)

Full setup and usage in observability/README.md.

Service Port Exposed Role
Grafana 70783000 Browser UI Dashboards + Explore. Anonymous Viewer; admin / admin to edit.
Prometheus 70779090 Browser UI Scrapes metrics from the 6 services, Qdrant, and MinIO.
Loki 3100 Internal only Log store; receives logs from Vector.
Tempo (none) Internal only Trace store; receives traces from the OTel Collector.
Vector (none) Internal only Tails the 6 app containers' Docker logs and ships them to Loki.
OTel Collector (none) Internal only Receives OTLP traces from the services and exports them to Tempo.

External prerequisites (not in compose, managed / cloud in production)

Service Port Default credentials Role
PostgreSQL 5432 postgres / local Chat-history archive, ingestion metadata, user instructions.
Redis 6379 (none) Short-term chat-history cache, session state.
Qdrant 6333 / 6334 (none) Vector DB for RAG (ascendai-768/1536) and Mem0 memory.
MinIO 9070 / 9071 admin / password S3-compatible object store for ingested documents.

Documentation

Canonical index. Every doc the repo ships, in one place.

File What's in it
docs/architecture/README.md Monorepo architecture: system view, ADRs, deployment topology.
docs/architecture/arc42/01-introduction-and-goals.md Arc42 entry point for the platform.
AscendAgent/docs/architecture/arc42/01-introduction-and-goals.md Arc42 for the agent internals.
docs/DEPLOYMENT.md Docker Compose recipes, image publishing, prod notes.
docs/INGESTION.md Upload flows for the RAG pipeline.
docs/TROUBLESHOOTING.md Qdrant / MinIO / PostgreSQL / Redis reset recipes.
docs/OBSERVABILITY.md Metrics, logs, traces — what is collected, dashboards, how to instrument.
observability/README.md Observability stack services (Grafana / Prometheus / Loki / Tempo / Vector / OTel), pipeline, and how to view logs.
docs/AGENT_TOOLING.md Agent-standards import, OpenSpec workflow.
docs/AGENTS-UPDATE.md Per-OS selective refresh of skills, subagents, and shipped docs.
docs/MCP_SETUP.md How to configure the MCP servers wired into agent sessions.
AscendAgent/e2e/README.md End-to-end capability tests, fixtures, Bruno collection.
AGENTS.md Shared instructions for any AI coding agent operating in this repo.
.github/workflows/README.md CI and Release workflow operator notes: secrets, bump convention, how to cut a release.

License

Released under the MIT License.

About

Spring AI Orchestrator with MCP

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages