Skip to content

sonicboom15/daimon

Repository files navigation

daimon

The spirit that runs alongside your AI app.

GitHub release PyPI npm License Docs

Daimon is a local sidecar process that gives your application a single, stable HTTP interface to any LLM. Swap providers, rotate keys, add tracing, wire up MCP tools, query vector stores, or traverse knowledge graphs — without touching your app code.

Inspired by Dapr's component model, adapted for AI-native primitives: streaming responses, pluggable providers, MCP tool calls, vector/graph stores, and persistent sessions.


How it works

your app  ──POST /v1/converse/claude──▶  daimon  ──▶  Anthropic API
          ◀── text/event-stream ────────────────────────────────────
                                            │
                                     MCP tool server(s)
                                   (filesystem, GitHub, ...)
                                            │
                                   vector stores (Chroma, Qdrant,
                                     Redis, pgvector, in-memory)
                                            │
                                   graph stores (Neo4j, Memgraph)

Daimon runs on localhost:3500. Your app speaks plain HTTP + Server-Sent Events. The provider, model, credentials, and tool servers all live in a YAML config — not in your code.


Quick start

Prerequisites: An OpenAI or Anthropic API key.

1 — Install

macOS / Linux — Homebrew

brew tap sonicboom15/tap
brew install daimon

Windows — winget

winget install sonicboom15.daimon

Windows — Scoop

scoop bucket add sonicboom15 https://github.com/sonicboom15/scoop-bucket
scoop install daimon

Linux — apt / rpm Download the .deb or .rpm from the latest release and install with dpkg -i or rpm -i.

Build from source

git clone https://github.com/sonicboom15/daimon.git && cd daimon && make build
# → ./bin/daimon

2 — Create a config

# config.yaml
port: 3500

components:
  - name: claude
    type: anthropic
    metadata:
      default_model: claude-haiku-4-5-20251001
      # api_key: sk-ant-...  # or set ANTHROPIC_API_KEY

  - name: gpt4o
    type: openai
    metadata:
      default_model: gpt-4o-mini
      # api_key: sk-...  # or set OPENAI_API_KEY

3 — Run

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
daimon serve --config config.yaml
INFO daimon listening addr=127.0.0.1:3500

4 — First request

curl:

curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is a daimon?"}]}'
data: {"type":"text","text":"In ancient Greek thought, a daimon"}
data: {"type":"text","text":" is a guiding spirit..."}
data: {"type":"done"}

Python SDK:

pip install daimon-client
import daimon_client as daimon

with daimon.Client() as client:
    for text in client.stream("claude", "What is a daimon?"):
        print(text, end="", flush=True)

TypeScript SDK:

npm install daimon-client
import { Client } from 'daimon-client';

const client = new Client();
for await (const text of client.stream('claude', 'What is a daimon?')) {
  process.stdout.write(text);
}

Configuration

port: 3500

components:

  # ── Embedder (declare before vector stores) ──────────────────────────────
  # - name: embedder
  #   type: embedding/openai
  #   metadata:
  #     base_url: http://localhost:11434/v1   # Ollama; omit for OpenAI
  #     model: nomic-embed-text
  #     dimensions: "768"

  # ── Session store (optional; defaults to in-memory) ──────────────────────
  # - name: sessions
  #   type: session/redis
  #   metadata:
  #     addr: localhost:6379
  #     ttl: "24h"

  # ── Vector / document stores ─────────────────────────────────────────────
  # - name: docs
  #   type: inmemory          # BM25 lexical, no deps — dev/testing only
  #
  # - name: chroma-docs
  #   type: chroma
  #   metadata:
  #     base_url: http://localhost:8000
  #     collection: daimon
  #     create_if_missing: "true"
  #
  # - name: qdrant-docs
  #   type: qdrant
  #   metadata:
  #     base_url: http://localhost:6333
  #     collection: daimon
  #     embedder: embedder
  #     create_if_missing: "true"

  # ── Graph stores ──────────────────────────────────────────────────────────
  # - name: kg
  #   type: neo4j
  #   metadata:
  #     bolt_url: bolt://localhost:7687
  #     username: neo4j
  #     password: secret

  # ── LLM components ────────────────────────────────────────────────────────
  - name: claude
    type: anthropic
    # memory_store: chroma-docs   # enable transparent RAG from a vector store
    metadata:
      default_model: claude-opus-4-7
      # api_key: sk-ant-...  # or set ANTHROPIC_API_KEY
    # defaults:
    #   temperature: 1.0
    #   max_tokens: 4096
    #   top_p: 0.9
    #   top_k: 50          # Anthropic-specific
    #   stop: ["Human:"]
    #   system: "You are a helpful assistant."

  - name: gpt4o
    type: openai
    metadata:
      default_model: gpt-4o
      # api_key: sk-...  # or set OPENAI_API_KEY
    # defaults:
    #   temperature: 0.7
    #   max_tokens: 2048
    #   frequency_penalty: 0.0
    #   presence_penalty: 0.0
    #   seed: 42

  - name: local
    type: llamacpp
    metadata:
      base_url: http://localhost:11434/v1   # Ollama default
      # base_url: http://localhost:1234/v1  # LM Studio default
      # base_url: http://localhost:8080/v1  # llama.cpp default
      default_model: llama3.2:3b

# MCP tool servers — daimon connects at startup and injects their tools into
# every chat request automatically. The model can call them; daimon runs the loop.
# mcp_servers:
#   - name: filesystem
#     command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
#   - name: github
#     command: ["npx", "-y", "@modelcontextprotocol/server-github"]

telemetry:
  otlp_endpoint: ""   # e.g. "localhost:4318" — leave empty to disable

All component types — LLMs, embedders, session stores, vector stores, and graph stores — live under components:. Declaration order matters: embedders before vector stores, vector stores before LLMs that reference them via memory_store:. See examples/config.yaml for the fully-documented reference.


API

POST /v1/converse/{component}

Send a chat request and receive a streaming response over Server-Sent Events.

Request body:

{
  "messages": [
    { "role": "system",    "content": "You are a helpful assistant." },
    { "role": "user",      "content": "What is a daimon?" }
  ],
  "model":             "gpt-4o-mini",
  "system":            "Override or set a system prompt here.",
  "max_tokens":        512,
  "temperature":       0.7,
  "top_p":             0.9,
  "top_k":             50,
  "stop":              ["Human:"],
  "frequency_penalty": 0.0,
  "presence_penalty":  0.0,
  "seed":              42,
  "tools": [
    {
      "name":        "get_weather",
      "description": "Get current weather for a city.",
      "input_schema": {
        "type": "object",
        "properties": { "city": { "type": "string" } },
        "required":   ["city"]
      }
    }
  ]
}

All fields except messages are optional. Omitted inference parameters fall back to the component's configured defaults.

Sessions: include "session_id" to have daimon maintain conversation history server-side. Only send the new user turn — the server prepends stored history automatically.

# Turn 1
curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"session_id":"chat-1","messages":[{"role":"user","content":"My name is Alice."}]}'

# Turn 2 — server prepends the previous exchange automatically
curl -sN http://127.0.0.1:3500/v1/converse/claude \
  -H "Content-Type: application/json" \
  -d '{"session_id":"chat-1","messages":[{"role":"user","content":"What is my name?"}]}'

Clear a session with DELETE /v1/sessions/{id} (returns 204, idempotent).

Provider support matrix:

Parameter OpenAI Anthropic llamacpp
temperature
max_tokens
top_p
top_k
stop
frequency_penalty
presence_penalty
seed

Unsupported parameters are silently ignored per provider.

Response (text/event-stream):

data: {"type":"text","text":"In ancient Greek thought..."}

data: {"type":"tool_call","tool_call":{"id":"call_1","name":"get_weather","input":{"city":"London"}}}

data: {"type":"text","text":"The weather in London is 12°C."}

data: {"type":"done"}

Each data: line is a JSON object:

type additional fields meaning
text text a fragment of the model's response
tool_call tool_call.id, .name, .input model invoked a tool (daimon executes it and continues)
done stream finished successfully
error error terminal error; stream ends

tool_call events are forwarded so clients can show progress ("calling tool X…"). Daimon executes the tool automatically and loops back to the model — no client-side action needed.

DELETE /v1/sessions/{id}

Clears server-side session history for the given ID. Returns 204 No Content. Idempotent — deleting a session that does not exist is not an error.

GET /healthz

Returns 200 ok when the sidecar is up.


Python SDK

Install:

pip install daimon-client

Streaming text:

import daimon_client as daimon

# context manager reuses the HTTP connection
with daimon.Client() as client:
    for text in client.stream("claude", "Explain recursion in one sentence."):
        print(text, end="", flush=True)
print()

Convenience: collect the full response:

reply = client.chat("gpt4o", "What is the capital of France?")
print(reply)  # "The capital of France is Paris."

Multi-turn conversation:

messages = [
    daimon.Message(role="system", content="You are a helpful assistant."),
    daimon.Message(role="user",   content="My name is Alice."),
]
reply = client.chat("claude", messages)
messages.append(daimon.Message(role="assistant", content=reply))
messages.append(daimon.Message(role="user", content="What is my name?"))
print(client.chat("claude", messages))

Sessions:

client.chat("claude", "My name is Alice.", session_id="chat-1")
reply = client.chat("claude", "What is my name?", session_id="chat-1")
# reply: "Your name is Alice."
client.clear_session("chat-1")

With inference parameters:

reply = client.chat(
    "gpt4o",
    "Write a haiku about Go.",
    model="gpt-4o",
    temperature=0.9,
    max_tokens=64,
)

Observing tool calls:

def on_tool(tc: daimon.ToolCall) -> None:
    print(f"[tool: {tc.name}({tc.input})]")

for text in client.stream("claude", "What's the weather in Tokyo?", on_tool_call=on_tool):
    print(text, end="", flush=True)

Async:

import asyncio
import daimon_client as daimon

async def main():
    async with daimon.AsyncClient() as client:
        async for text in client.stream("claude", "Hello!"):
            print(text, end="", flush=True)

asyncio.run(main())

Full runnable examples: examples/client/chat.py · examples/client/chat_async.py


TypeScript SDK

Install:

npm install daimon-client

Streaming text:

import { Client } from 'daimon-client';

const client = new Client();
for await (const text of client.stream('claude', 'Explain recursion in one sentence.')) {
  process.stdout.write(text);
}

Convenience: collect the full response:

const reply = await client.chat('gpt4o', 'What is the capital of France?');
console.log(reply); // "The capital of France is Paris."

Sessions:

await client.chat('claude', 'My name is Alice.', { session_id: 'chat-1' });
const reply = await client.chat('claude', 'What is my name?', { session_id: 'chat-1' });
// reply: "Your name is Alice."
await client.clearSession('chat-1');

With inference parameters:

const reply = await client.chat('gpt4o', 'Write a haiku about Go.', {
  model:       'gpt-4o',
  temperature: 0.9,
  max_tokens:  64,
});

Full runnable examples: sdk/typescript/examples/


Tool calls via MCP

Daimon acts as an MCP client. Configure MCP servers in YAML and daimon:

  1. Connects to each server at startup and fetches its tool catalogue.
  2. Injects all tools into every chat request automatically.
  3. When the model calls a tool, daimon executes it via the MCP server and feeds the result back — looping until the model returns a plain text response.

Your application sees a single streaming response with the final answer, plus tool_call events for progress:

mcp_servers:
  - name: filesystem
    command: ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
  - name: brave-search
    command: ["npx", "-y", "@modelcontextprotocol/server-brave-search"]

No client-side changes required.


Memory & Graph Stores

Daimon ships with five vector stores and two graph stores, all configured the same way — as components: entries.

Vector stores

Type External service Embedding
inmemory None BM25 (lexical)
chroma Chroma Server-side
qdrant Qdrant Configurable endpoint
redis Redis Stack Configurable endpoint
pgvector PostgreSQL + pgvector Configurable endpoint

HTTP API: PUT /v1/memory/{store}/{id} · POST /v1/memory/{store} · POST /v1/memory/{store}/query · DELETE /v1/memory/{store}/{id}

Python SDK:

store = client.memory("docs")
store.upsert("The Eiffel Tower is 330 m tall.", id="doc1", metadata={"src": "wiki"})
results = store.query("tall Paris structures", top_k=3)
# results[0].id, .content, .score, .metadata
store.delete("doc1")

TypeScript SDK:

const store = client.memory('docs');
await store.upsert('The Eiffel Tower is 330 m tall.', { id: 'doc1', metadata: { src: 'wiki' } });
const results = await store.query('tall Paris structures', 3);
await store.delete('doc1');

Transparent RAG

Add memory_store: <name> to any LLM component and daimon automatically queries the store before every chat request, injecting the top results as a system message:

- name: claude
  type: anthropic
  memory_store: chroma-docs

No client code changes needed — the enrichment happens inside the sidecar.

Graph stores

Type External service Protocol
neo4j Neo4j Bolt (default) / HTTP
memgraph Memgraph Bolt (default) / HTTP

HTTP API: PUT /v1/graph/{store}/nodes/{id} · POST /v1/graph/{store}/edges · POST /v1/graph/{store}/cypher · DELETE /v1/graph/{store}/nodes/{id}

Python SDK:

graph = client.graph("kg")
graph.add_node(id="alice", labels=["Person"], props={"name": "Alice"})
graph.add_edge("alice", "bob", "KNOWS")
rows = graph.cypher("MATCH (a)-[:KNOWS]->(b) RETURN a.name, b.name")

Both stores also generate {name}_cypher, {name}_add_node, and {name}_add_edge tools that the LLM can call directly via the agentic loop.


Supported providers

Type Env var Default model
openai OPENAI_API_KEY gpt-4o
anthropic ANTHROPIC_API_KEY claude-opus-4-7
llamacpp (required)

llamacpp connects to any OpenAI-compatible local server: llama.cpp, Ollama, or LM Studio. Set base_url in metadata to point at your server's /v1 endpoint.


Adding a provider

  1. Create internal/components/llm/<name>/<name>.go.
  2. Implement conversation.Conversation:
    type Component struct { /* ... */ }
    
    func (c *Component) Chat(ctx context.Context, req conversation.Request) (<-chan conversation.Chunk, error) {
        // stream chunks through the returned channel
    }
  3. Register in init():
    func init() {
        conversation.Register("<name>", func(cfg conversation.ComponentConfig) (conversation.Conversation, error) {
            return New(cfg)
        })
    }
  4. Blank-import the package from cmd/daimon/serve.go and cmd/daimon/run.go.
  5. Add a worked example to examples/config.yaml.

No changes to the server, config loader, or any other package. See Development for adding vector stores or graph stores.


Development

make build          # compile → ./bin/daimon
make run            # build + run with examples/config.yaml
make test           # go test ./...
make lint           # golangci-lint
make fmt            # gofmt + goimports
make license-check

Integration tests (require API keys / Docker):

# OpenAI + Anthropic
OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-... \
  go test -tags integration -v ./internal/components/...

# llamacpp — starts Ollama in Docker automatically, pulls qwen2.5:1.5b
go test -tags integration -v ./internal/components/llm/llamacpp/

# Full e2e suite (Go + Python SDK + TypeScript SDK) — requires Docker
go test -tags integration -v -timeout 20m ./test/e2e/

Python SDK tests:

cd sdk/python
pip install -e ".[dev]"
pytest tests/ -v

TypeScript SDK tests:

cd sdk/typescript
npm install
npm test

Roadmap

  • AI-native memory systems (Zep, Mem0) — session-aware, auto-summarising, distinct from vector stores
  • Middleware pipeline — per-request hooks for moderation, PII redaction, semantic cache, rate limiting
  • Multi-agent routing — fallback chains, load balancing across LLM components
  • Metrics alongside traces (OTel)
  • Authentication and per-client rate limiting

Explicitly out of scope for now: gRPC, external plugin loading, pub/sub.


License

Apache 2.0 — see LICENSE.

About

The spirit that runs alongside your AI app. A pluggable sidecar runtime for LLMs, memory, and tools.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors