daimon

The spirit that runs alongside your AI app.

Daimon is a local sidecar process that gives your application a single, stable HTTP interface to any LLM. Swap providers, rotate keys, add tracing, wire up MCP tools, query vector stores, or traverse knowledge graphs — without touching your app code.

Inspired by Dapr's component model, adapted for AI-native primitives: streaming responses, pluggable providers, MCP tool calls, vector/graph stores, and persistent sessions.

How it works

your app  ──POST /v1/converse/claude──▶  daimon  ──▶  Anthropic API
          ◀── text/event-stream ────────────────────────────────────
                                            │
                                     MCP tool server(s)
                                   (filesystem, GitHub, ...)
                                            │
                                   vector stores (Chroma, Qdrant,
                                     Redis, pgvector, in-memory)
                                            │
                                   graph stores (Neo4j, Memgraph)

Daimon runs on localhost:3500. Your app speaks plain HTTP + Server-Sent Events. The provider, model, credentials, and tool servers all live in a YAML config — not in your code.

Quick start

Prerequisites: An OpenAI or Anthropic API key.

1 — Install

macOS / Linux — Homebrew

brew tap sonicboom15/tap
brew install daimon

Windows — winget

winget install sonicboom15.daimon

Windows — Scoop

scoop bucket add sonicboom15 https://github.com/sonicboom15/scoop-bucket
scoop install daimon

Linux — apt / rpm Download the .deb or .rpm from the latest release and install with dpkg -i or rpm -i.

Build from source

git clone https://github.com/sonicboom15/daimon.git && cd daimon && make build
# → ./bin/daimon

2 — Create a config

# config.yaml
port: 3500

components:
  - name: claude
    type: anthropic
    metadata:
      default_model: claude-haiku-4-5-20251001
      # api_key: sk-ant-...  # or set ANTHROPIC_API_KEY

  - name: gpt4o
    type: openai
    metadata:
      default_model: gpt-4o-mini
      # api_key: sk-...  # or set OPENAI_API_KEY

3 — Run

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
daimon serve --config config.yaml

INFO daimon listening addr=127.0.0.1:3500

4 — First request

curl:

curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is a daimon?"}]}'

data: {"type":"text","text":"In ancient Greek thought, a daimon"}
data: {"type":"text","text":" is a guiding spirit..."}
data: {"type":"done"}

Python SDK:

pip install daimon-client

import daimon_client as daimon

with daimon.Client() as client:
    for text in client.stream("claude", "What is a daimon?"):
        print(text, end="", flush=True)

TypeScript SDK:

npm install daimon-client

import { Client } from 'daimon-client';

const client = new Client();
for await (const text of client.stream('claude', 'What is a daimon?')) {
  process.stdout.write(text);
}

Configuration

port: 3500

components:

  # ── Embedder (declare before vector stores) ──────────────────────────────
  # - name: embedder
  #   type: embedding/openai
  #   metadata:
  #     base_url: http://localhost:11434/v1   # Ollama; omit for OpenAI
  #     model: nomic-embed-text
  #     dimensions: "768"

  # ── Session store (optional; defaults to in-memory) ──────────────────────
  # - name: sessions
  #   type: session/redis
  #   metadata:
  #     addr: localhost:6379
  #     ttl: "24h"

  # ── Vector / document stores ─────────────────────────────────────────────
  # - name: docs
  #   type: inmemory          # BM25 lexical, no deps — dev/testing only
  #
  # - name: chroma-docs
  #   type: chroma
  #   metadata:
  #     base_url: http://localhost:8000
  #     collection: daimon
  #     create_if_missing: "true"
  #
  # - name: qdrant-docs
  #   type: qdrant
  #   metadata:
  #     base_url: http://localhost:6333
  #     collection: daimon
  #     embedder: embedder
  #     create_if_missing: "true"

  # ── Graph stores ──────────────────────────────────────────────────────────
  # - name: kg
  #   type: neo4j
  #   metadata:
  #     bolt_url: bolt://localhost:7687
  #     username: neo4j
  #     password: secret

  # ── LLM components ────────────────────────────────────────────────────────
  - name: claude
    type: anthropic
    # memory_store: chroma-docs   # enable transparent RAG from a vector store
    metadata:
      default_model: claude-opus-4-7
      # api_key: sk-ant-...  # or set ANTHROPIC_API_KEY
    # defaults:
    #   temperature: 1.0
    #   max_tokens: 4096
    #   top_p: 0.9
    #   top_k: 50          # Anthropic-specific
    #   stop: ["Human:"]
    #   system: "You are a helpful assistant."

  - name: gpt4o
    type: openai
    metadata:
      default_model: gpt-4o
      # api_key: sk-...  # or set OPENAI_API_KEY
    # defaults:
    #   temperature: 0.7
    #   max_tokens: 2048
    #   frequency_penalty: 0.0
    #   presence_penalty: 0.0
    #   seed: 42

  - name: local
    type: llamacpp
    metadata:
      base_url: http://localhost:11434/v1   # Ollama default
      # base_url: http://localhost:1234/v1  # LM Studio default
      # base_url: http://localhost:8080/v1  # llama.cpp default
      default_model: llama3.2:3b

# MCP tool servers — daimon connects at startup and injects their tools into
# every chat request automatically. The model can call them; daimon runs the loop.
# mcp_servers:
#   - name: filesystem
#     command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
#   - name: github
#     command: ["npx", "-y", "@modelcontextprotocol/server-github"]

telemetry:
  otlp_endpoint: ""   # e.g. "localhost:4318" — leave empty to disable

All component types — LLMs, embedders, session stores, vector stores, and graph stores — live under components:. Declaration order matters: embedders before vector stores, vector stores before LLMs that reference them via memory_store:. See examples/config.yaml for the fully-documented reference.

API

`POST /v1/converse/{component}`

Send a chat request and receive a streaming response over Server-Sent Events.

Request body:

{
  "messages": [
    { "role": "system",    "content": "You are a helpful assistant." },
    { "role": "user",      "content": "What is a daimon?" }
  ],
  "model":             "gpt-4o-mini",
  "system":            "Override or set a system prompt here.",
  "max_tokens":        512,
  "temperature":       0.7,
  "top_p":             0.9,
  "top_k":             50,
  "stop":              ["Human:"],
  "frequency_penalty": 0.0,
  "presence_penalty":  0.0,
  "seed":              42,
  "tools": [
    {
      "name":        "get_weather",
      "description": "Get current weather for a city.",
      "input_schema": {
        "type": "object",
        "properties": { "city": { "type": "string" } },
        "required":   ["city"]
      }
    }
  ]
}

All fields except messages are optional. Omitted inference parameters fall back to the component's configured defaults.

Sessions: include "session_id" to have daimon maintain conversation history server-side. Only send the new user turn — the server prepends stored history automatically.

# Turn 1
curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"session_id":"chat-1","messages":[{"role":"user","content":"My name is Alice."}]}'

# Turn 2 — server prepends the previous exchange automatically
curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"session_id":"chat-1","messages":[{"role":"user","content":"What is my name?"}]}'

Clear a session with DELETE /v1/sessions/{id} (returns 204, idempotent).

Provider support matrix:

Parameter	OpenAI	Anthropic	llamacpp
`temperature`	✓	✓	✓
`max_tokens`	✓	✓	✓
`top_p`	✓	✓	✓
`top_k`	—	✓	—
`stop`	✓	✓	✓
`frequency_penalty`	✓	—	✓
`presence_penalty`	✓	—	✓
`seed`	✓	—	✓

Unsupported parameters are silently ignored per provider.

Response (text/event-stream):

data: {"type":"text","text":"In ancient Greek thought..."}

data: {"type":"tool_call","tool_call":{"id":"call_1","name":"get_weather","input":{"city":"London"}}}

data: {"type":"text","text":"The weather in London is 12°C."}

data: {"type":"done"}

Each data: line is a JSON object:

`type`	additional fields	meaning
`text`	`text`	a fragment of the model's response
`tool_call`	`tool_call.id`, `.name`, `.input`	model invoked a tool (daimon executes it and continues)
`done`	—	stream finished successfully
`error`	`error`	terminal error; stream ends

tool_call events are forwarded so clients can show progress ("calling tool X…"). Daimon executes the tool automatically and loops back to the model — no client-side action needed.

`DELETE /v1/sessions/{id}`

Clears server-side session history for the given ID. Returns 204 No Content. Idempotent — deleting a session that does not exist is not an error.

`GET /healthz`

Returns 200 ok when the sidecar is up.

Python SDK

Install:

pip install daimon-client

Streaming text:

import daimon_client as daimon

# context manager reuses the HTTP connection
with daimon.Client() as client:
    for text in client.stream("claude", "Explain recursion in one sentence."):
        print(text, end="", flush=True)
print()

Convenience: collect the full response:

reply = client.chat("gpt4o", "What is the capital of France?")
print(reply)  # "The capital of France is Paris."

Multi-turn conversation:

messages = [
    daimon.Message(role="system", content="You are a helpful assistant."),
    daimon.Message(role="user",   content="My name is Alice."),
]
reply = client.chat("claude", messages)
messages.append(daimon.Message(role="assistant", content=reply))
messages.append(daimon.Message(role="user", content="What is my name?"))
print(client.chat("claude", messages))

Sessions:

client.chat("claude", "My name is Alice.", session_id="chat-1")
reply = client.chat("claude", "What is my name?", session_id="chat-1")
# reply: "Your name is Alice."
client.clear_session("chat-1")

With inference parameters:

reply = client.chat(
    "gpt4o",
    "Write a haiku about Go.",
    model="gpt-4o",
    temperature=0.9,
    max_tokens=64,
)

Observing tool calls:

def on_tool(tc: daimon.ToolCall) -> None:
    print(f"[tool: {tc.name}({tc.input})]")

for text in client.stream("claude", "What's the weather in Tokyo?", on_tool_call=on_tool):
    print(text, end="", flush=True)

Async:

import asyncio
import daimon_client as daimon

async def main():
    async with daimon.AsyncClient() as client:
        async for text in client.stream("claude", "Hello!"):
            print(text, end="", flush=True)

asyncio.run(main())

Full runnable examples: examples/client/chat.py · examples/client/chat_async.py

TypeScript SDK

Install:

npm install daimon-client

Streaming text:

import { Client } from 'daimon-client';

const client = new Client();
for await (const text of client.stream('claude', 'Explain recursion in one sentence.')) {
  process.stdout.write(text);
}

Convenience: collect the full response:

const reply = await client.chat('gpt4o', 'What is the capital of France?');
console.log(reply); // "The capital of France is Paris."

Sessions:

await client.chat('claude', 'My name is Alice.', { session_id: 'chat-1' });
const reply = await client.chat('claude', 'What is my name?', { session_id: 'chat-1' });
// reply: "Your name is Alice."
await client.clearSession('chat-1');

With inference parameters:

const reply = await client.chat('gpt4o', 'Write a haiku about Go.', {
  model:       'gpt-4o',
  temperature: 0.9,
  max_tokens:  64,
});

Full runnable examples: sdk/typescript/examples/

Tool calls via MCP

Daimon acts as an MCP client. Configure MCP servers in YAML and daimon:

Connects to each server at startup and fetches its tool catalogue.
Injects all tools into every chat request automatically.
When the model calls a tool, daimon executes it via the MCP server and feeds the result back — looping until the model returns a plain text response.

Your application sees a single streaming response with the final answer, plus tool_call events for progress:

mcp_servers:
  - name: filesystem
    command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
  - name: brave-search
    command: ["npx", "-y", "@modelcontextprotocol/server-brave-search"]

No client-side changes required.

Memory & Graph Stores

Daimon ships with five vector stores and two graph stores, all configured the same way — as components: entries.

Vector stores

Type	External service	Embedding
`inmemory`	None	BM25 (lexical)
`chroma`	Chroma	Server-side
`qdrant`	Qdrant	Configurable endpoint
`redis`	Redis Stack	Configurable endpoint
`pgvector`	PostgreSQL + pgvector	Configurable endpoint

HTTP API: PUT /v1/memory/{store}/{id} · POST /v1/memory/{store} · POST /v1/memory/{store}/query · DELETE /v1/memory/{store}/{id}

Python SDK:

store = client.memory("docs")
store.upsert("The Eiffel Tower is 330 m tall.", id="doc1", metadata={"src": "wiki"})
results = store.query("tall Paris structures", top_k=3)
# results[0].id, .content, .score, .metadata
store.delete("doc1")

TypeScript SDK:

const store = client.memory('docs');
await store.upsert('The Eiffel Tower is 330 m tall.', { id: 'doc1', metadata: { src: 'wiki' } });
const results = await store.query('tall Paris structures', 3);
await store.delete('doc1');

Transparent RAG

Add memory_store: <name> to any LLM component and daimon automatically queries the store before every chat request, injecting the top results as a system message:

- name: claude
  type: anthropic
  memory_store: chroma-docs

No client code changes needed — the enrichment happens inside the sidecar.

Graph stores

Type	External service	Protocol
`neo4j`	Neo4j	Bolt (default) / HTTP
`memgraph`	Memgraph	Bolt (default) / HTTP

HTTP API: PUT /v1/graph/{store}/nodes/{id} · POST /v1/graph/{store}/edges · POST /v1/graph/{store}/cypher · DELETE /v1/graph/{store}/nodes/{id}

Python SDK:

graph = client.graph("kg")
graph.add_node(id="alice", labels=["Person"], props={"name": "Alice"})
graph.add_edge("alice", "bob", "KNOWS")
rows = graph.cypher("MATCH (a)-[:KNOWS]->(b) RETURN a.name, b.name")

Both stores also generate {name}_cypher, {name}_add_node, and {name}_add_edge tools that the LLM can call directly via the agentic loop.

Supported providers

Type	Env var	Default model
`openai`	`OPENAI_API_KEY`	`gpt-4o`
`anthropic`	`ANTHROPIC_API_KEY`	`claude-opus-4-7`
`llamacpp`	—	(required)

llamacpp connects to any OpenAI-compatible local server: llama.cpp, Ollama, or LM Studio. Set base_url in metadata to point at your server's /v1 endpoint.

Adding a provider

Create internal/components/llm/<name>/<name>.go.

Implement conversation.Conversation:

type Component struct { /* ... */ }

func (c *Component) Chat(ctx context.Context, req conversation.Request) (<-chan conversation.Chunk, error) {
    // stream chunks through the returned channel
}

Register in init():

func init() {
    conversation.Register("<name>", func(cfg conversation.ComponentConfig) (conversation.Conversation, error) {
        return New(cfg)
    })
}

Blank-import the package from cmd/daimon/serve.go and cmd/daimon/run.go.
Add a worked example to examples/config.yaml.

No changes to the server, config loader, or any other package. See Development for adding vector stores or graph stores.

Development

make build          # compile → ./bin/daimon
make run            # build + run with examples/config.yaml
make test           # go test ./...
make lint           # golangci-lint
make fmt            # gofmt + goimports
make license-check

Integration tests (require API keys / Docker):

# OpenAI + Anthropic
OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... \
  go test -tags integration -v ./internal/components/...

# llamacpp — starts Ollama in Docker automatically, pulls qwen2.5:1.5b
go test -tags integration -v ./internal/components/llm/llamacpp/

# Full e2e suite (Go + Python SDK + TypeScript SDK) — requires Docker
go test -tags integration -v -timeout 20m ./test/e2e/

Python SDK tests:

cd sdk/python
pip install -e ".[dev]"
pytest tests/ -v

TypeScript SDK tests:

cd sdk/typescript
npm install
npm test

Roadmap

AI-native memory systems (Zep, Mem0) — session-aware, auto-summarising, distinct from vector stores
Middleware pipeline — per-request hooks for moderation, PII redaction, semantic cache, rate limiting
Multi-agent routing — fallback chains, load balancing across LLM components
Metrics alongside traces (OTel)
Authentication and per-client rate limiting

Explicitly out of scope for now: gRPC, external plugin loading, pub/sub.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
cmd/daimon		cmd/daimon
docs		docs
examples		examples
internal		internal
sdk		sdk
test/e2e		test/e2e
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yaml		.goreleaser.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
go.mod		go.mod
go.sum		go.sum
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

daimon

How it works

Quick start

1 — Install

2 — Create a config

3 — Run

4 — First request

Configuration

API

`POST /v1/converse/{component}`

`DELETE /v1/sessions/{id}`

`GET /healthz`

Python SDK

TypeScript SDK

Tool calls via MCP

Memory & Graph Stores

Vector stores

Transparent RAG

Graph stores

Supported providers

Adding a provider

Development

Roadmap

License

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

daimon

How it works

Quick start

1 — Install

2 — Create a config

3 — Run

4 — First request

Configuration

API

POST /v1/converse/{component}

DELETE /v1/sessions/{id}

GET /healthz

Python SDK

TypeScript SDK

Tool calls via MCP

Memory & Graph Stores

Vector stores

Transparent RAG

Graph stores

Supported providers

Adding a provider

Development

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /v1/converse/{component}`

`DELETE /v1/sessions/{id}`

`GET /healthz`

Packages