Calcifer

Your personal AI assistant. Fully local. Fully yours.

No cloud APIs. No subscriptions. No telemetry. Just your machine, your data, your models.

Status: Under active development. Open-source release coming soon.

What It Does

Chat with your own knowledge base. Calcifer collects content from Reddit, YouTube, web pages, and your own notes, indexes everything with vector embeddings, and answers questions grounded in real sources. It tells you where its answers come from.

Talk to it. Three voice modes, each suited to different tasks:

Assistant -- You speak, Calcifer listens (Whisper STT), thinks (your Ollama model), and speaks back (Kokoro TTS). Full agent pipeline -- it can search your knowledge base, look things up on the web, check the weather, save notes, anything the text chat can do. This is the main mode.
Dictate -- Voice input, text output. Same agent capabilities as Assistant but responds in text instead of speech. Useful when you want to talk but don't need audio back, or if you haven't set up TTS.
Conversation -- Experimental. Full-duplex speech-to-speech using Moshi, a duplex spoken dialogue model. Sub-200ms latency, feels like a real back-and-forth conversation. Good for casual chitchat. It runs its own small model, so it is not as smart as your main LLM and cannot call tools or access your knowledge base. Think of it as a fun tech demo, not a productivity feature.

Pick from several TTS voices or clone your own with a reference audio file.

Generate images, video, and music. Connected to ComfyUI for local generation. Ask it to create a portrait, a Studio Ghibli-style video, or a melody, and it runs the workflow on your GPU.

Recording studio. Two modes for turning text or voice into any voice you want:

Voice mode (real-time style transfer) -- Record yourself speaking, then convert it to a target voice using Seed-VC. This is zero-shot voice conversion -- it takes your raw speech and transforms it into the target voice while preserving your original timing, pauses, intonation, and emotion. You sound like someone else, but the performance is still yours. No training, no fine-tuning, just a short reference clip of the target voice.
Text mode (TTS with voice cloning) -- Type text, pick a voice, get speech. Uses Qwen3-TTS, a new text-to-speech model with both built-in speaker voices and zero-shot voice cloning from a reference audio file. Natural-sounding output with good prosody and multilingual support.

Maps and navigation. Interactive map with markers, geocoding, directions, and nearby POI search -- all through OpenStreetMap. The agent can save locations and give directions during conversation.

Remember things about you. Long-term memory that persists across conversations. It learns your preferences, facts about you, and things you ask it to remember.

Search everything. Web search (via SearxNG), Reddit search, YouTube search and transcript extraction, URL crawling. All results can be ingested into the knowledge base.

Knowledge base. Browse, search, and manage all collected content. Filter by source (Reddit, web, YouTube, manual), subreddit, date, score.

Personal notes. Write and organize notes with topic tags. Notes are searchable and available to the AI during conversations.

Admin dashboard. Monitor collection runs, manage subreddit watchers, configure MCP servers, view system stats. Live GPU monitoring in the sidebar.

Architecture

You  --->  Next.js frontend  --->  FastAPI backend  --->  Ollama (LLM)
                                        |
                                        +---> ChromaDB (vector search)
                                        +---> SQLite (metadata, conversations, notes)
                                        +---> SearxNG (private web search)
                                        +---> ComfyUI (image/video/audio generation)
                                        +---> Whisper / Kokoro / Qwen3-TTS / Seed-VC (voice)
                                        +---> MCP servers (extensible tools)

Everything runs in Docker containers on your local machine, except Ollama which runs directly on the host for best GPU performance.

Tech Stack

Layer	Tech
Frontend	Next.js 15, App Router, Tailwind CSS, shadcn/ui
Backend	Python 3.12, FastAPI, SQLModel
Vector DB	ChromaDB with sentence-transformers embeddings
Metadata DB	SQLite
LLM	Ollama (any model -- qwen3, llama, mistral, etc.)
Speech-to-Text	faster-whisper (local)
Text-to-Speech	Kokoro (chat voices), Qwen3-TTS (studio cloning + built-in speakers)
Voice Conversion	Seed-VC (zero-shot style transfer)
Web Search	SearxNG (self-hosted, no tracking)
Media Generation	ComfyUI (Image, video, audio workflows)
Container	Docker with NVIDIA GPU passthrough
MCP	Extensible tool system via Model Context Protocol

Agent Tools

The AI agent has access to these tools during conversation:

knowledge_search -- search the local vector database
web_search -- private web search via SearxNG
crawl_url -- fetch and parse any URL
reddit_search -- search Reddit directly
youtube_search / youtube_transcript -- find videos and pull transcripts
save_note / get_notes -- read and write personal notes
ingest_to_knowledge -- add content to the knowledge base
find_location / get_directions / find_nearby -- maps and navigation via OSM
save_map_marker / get_map_markers -- persistent map pins
get_weather -- current weather for any location
save_memory -- remember facts about the user
run_comfyui_workflow -- generate images, video, or music locally
Any tool from connected MCP servers

Hardware Reference

This is what Calcifer is developed and tested on:

Component	Spec
GPU	NVIDIA GeForce RTX 4070 Ti SUPER (16 GB VRAM)
CPU	Intel Core i7-14700
RAM	32 GB

The GPU handles LLM inference (via Ollama), embedding generation, speech-to-text, text-to-speech, and ComfyUI generation. 16 GB VRAM is comfortable for running a 7-8B parameter model alongside embeddings and voice. Larger models or simultaneous ComfyUI generation may need more.

You could run this on less -- a 3060 12GB would handle smaller models and most features. You could also run it on more -- the stack will happily use whatever you give it.

Privacy

Everything runs locally. Your conversations, your knowledge base, your voice recordings -- none of it leaves your machine. There are no accounts, no analytics, no tracking.

The only network calls go to these services, and only when you use the features that need them:

Service	Used for	Account/API key	Data sent
Ollama (localhost)	LLM inference	No	Nothing leaves your machine
SearxNG (localhost)	Web search	No	Search queries proxied through Google/DuckDuckGo/Bing/Brave (your IP, not an API key)
Open-Meteo	Weather forecasts	No	City name or coordinates
Nominatim	Geocoding, address lookup	No	Address or coordinates
Overpass API	Nearby POI search	No	Coordinates and search radius
OpenStreetMap	Map tiles	No	Tile coordinates (standard map loading)

Zero API keys. Zero accounts. Open-Meteo, Nominatim, and Overpass are free public APIs run by non-profits and open-source projects. SearxNG is a self-hosted metasearch engine that distributes your queries across multiple search engines so no single provider builds a profile on you.

If you disconnect from the internet, everything except web search, weather, and map tiles continues to work -- chat, voice, knowledge base, notes, memories, media generation all run offline.

What Makes This Different

Actually local. Not "local but phones home for embeddings" or "local but needs an API key for search." Every component runs on your machine. SearxNG replaces Google. Ollama replaces OpenAI. Whisper and Kokoro replace cloud speech APIs. ComfyUI replaces Midjourney/DALL-E.

Knowledge-grounded. Answers come from your collected sources with citations. It shows where information came from -- which Reddit post, which article, which YouTube video. When it uses web search as a fallback, it tells you.

Extensible via MCP. Add new tools without touching the core codebase. Drop a config entry for any MCP-compatible server and the agent picks up its tools automatically.

Not a wrapper. This is a full application with its own storage, its own ingestion pipeline, its own agent loop. It is not a thin UI over an API.

Open Source Release

Coming soon. The codebase is being cleaned up for public release. When it ships, you will get:

Full source code (Python backend + Next.js frontend)
Docker Compose setup for one-command deployment
Configuration guides for Ollama, ComfyUI, and SearxNG
Documentation for adding custom MCP tools

Watch this repo for the release.

Support

If you find this project useful and want to support its development:

License

TBD -- will be announced with the open-source release.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
screenshots		screenshots
README.md		README.md
calcifer.png		calcifer.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Calcifer

What It Does

Architecture

Tech Stack

Agent Tools

Hardware Reference

Privacy

What Makes This Different

Open Source Release

Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Calcifer

What It Does

Architecture

Tech Stack

Agent Tools

Hardware Reference

Privacy

What Makes This Different

Open Source Release

Support

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages