Skip to content

Anyesh/calcifer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Calcifer

Calcifer

Your personal AI assistant. Fully local. Fully yours.

Python FastAPI Next.js Ollama Docker ComfyUI SearxNG Built with Claude Code Buy Me a Coffee

No cloud APIs. No subscriptions. No telemetry. Just your machine, your data, your models.

Status: Under active development. Open-source release coming soon.

Main chat interface


What It Does

Chat with your own knowledge base. Calcifer collects content from Reddit, YouTube, web pages, and your own notes, indexes everything with vector embeddings, and answers questions grounded in real sources. It tells you where its answers come from.

Conversation with tool calling, knowledge base search, and map markers

Talk to it. Three voice modes, each suited to different tasks:

  • Assistant -- You speak, Calcifer listens (Whisper STT), thinks (your Ollama model), and speaks back (Kokoro TTS). Full agent pipeline -- it can search your knowledge base, look things up on the web, check the weather, save notes, anything the text chat can do. This is the main mode.
  • Dictate -- Voice input, text output. Same agent capabilities as Assistant but responds in text instead of speech. Useful when you want to talk but don't need audio back, or if you haven't set up TTS.
  • Conversation -- Experimental. Full-duplex speech-to-speech using Moshi, a duplex spoken dialogue model. Sub-200ms latency, feels like a real back-and-forth conversation. Good for casual chitchat. It runs its own small model, so it is not as smart as your main LLM and cannot call tools or access your knowledge base. Think of it as a fun tech demo, not a productivity feature.

Pick from several TTS voices or clone your own with a reference audio file.

Voice conversation

Generate images, video, and music. Connected to ComfyUI for local generation. Ask it to create a portrait, a Studio Ghibli-style video, or a melody, and it runs the workflow on your GPU.

Image generation

Video and music generation

Recording studio. Two modes for turning text or voice into any voice you want:

  • Voice mode (real-time style transfer) -- Record yourself speaking, then convert it to a target voice using Seed-VC. This is zero-shot voice conversion -- it takes your raw speech and transforms it into the target voice while preserving your original timing, pauses, intonation, and emotion. You sound like someone else, but the performance is still yours. No training, no fine-tuning, just a short reference clip of the target voice.
  • Text mode (TTS with voice cloning) -- Type text, pick a voice, get speech. Uses Qwen3-TTS, a new text-to-speech model with both built-in speaker voices and zero-shot voice cloning from a reference audio file. Natural-sounding output with good prosody and multilingual support.

Recording studio

Maps and navigation. Interactive map with markers, geocoding, directions, and nearby POI search -- all through OpenStreetMap. The agent can save locations and give directions during conversation.

Map with marker detail

Map tools and saved markers

Remember things about you. Long-term memory that persists across conversations. It learns your preferences, facts about you, and things you ask it to remember.

Memories

Search everything. Web search (via SearxNG), Reddit search, YouTube search and transcript extraction, URL crawling. All results can be ingested into the knowledge base.

Available tools

Knowledge base. Browse, search, and manage all collected content. Filter by source (Reddit, web, YouTube, manual), subreddit, date, score.

Knowledge base

Personal notes. Write and organize notes with topic tags. Notes are searchable and available to the AI during conversations.

Notes

Admin dashboard. Monitor collection runs, manage subreddit watchers, configure MCP servers, view system stats. Live GPU monitoring in the sidebar.

Admin panel


Architecture

You  --->  Next.js frontend  --->  FastAPI backend  --->  Ollama (LLM)
                                        |
                                        +---> ChromaDB (vector search)
                                        +---> SQLite (metadata, conversations, notes)
                                        +---> SearxNG (private web search)
                                        +---> ComfyUI (image/video/audio generation)
                                        +---> Whisper / Kokoro / Qwen3-TTS / Seed-VC (voice)
                                        +---> MCP servers (extensible tools)

Everything runs in Docker containers on your local machine, except Ollama which runs directly on the host for best GPU performance.

Tech Stack

Layer Tech
Frontend Next.js 15, App Router, Tailwind CSS, shadcn/ui
Backend Python 3.12, FastAPI, SQLModel
Vector DB ChromaDB with sentence-transformers embeddings
Metadata DB SQLite
LLM Ollama (any model -- qwen3, llama, mistral, etc.)
Speech-to-Text faster-whisper (local)
Text-to-Speech Kokoro (chat voices), Qwen3-TTS (studio cloning + built-in speakers)
Voice Conversion Seed-VC (zero-shot style transfer)
Web Search SearxNG (self-hosted, no tracking)
Media Generation ComfyUI (Image, video, audio workflows)
Container Docker with NVIDIA GPU passthrough
MCP Extensible tool system via Model Context Protocol

Agent Tools

The AI agent has access to these tools during conversation:

  • knowledge_search -- search the local vector database
  • web_search -- private web search via SearxNG
  • crawl_url -- fetch and parse any URL
  • reddit_search -- search Reddit directly
  • youtube_search / youtube_transcript -- find videos and pull transcripts
  • save_note / get_notes -- read and write personal notes
  • ingest_to_knowledge -- add content to the knowledge base
  • find_location / get_directions / find_nearby -- maps and navigation via OSM
  • save_map_marker / get_map_markers -- persistent map pins
  • get_weather -- current weather for any location
  • save_memory -- remember facts about the user
  • run_comfyui_workflow -- generate images, video, or music locally
  • Any tool from connected MCP servers

Hardware Reference

This is what Calcifer is developed and tested on:

Component Spec
GPU NVIDIA GeForce RTX 4070 Ti SUPER (16 GB VRAM)
CPU Intel Core i7-14700
RAM 32 GB

The GPU handles LLM inference (via Ollama), embedding generation, speech-to-text, text-to-speech, and ComfyUI generation. 16 GB VRAM is comfortable for running a 7-8B parameter model alongside embeddings and voice. Larger models or simultaneous ComfyUI generation may need more.

You could run this on less -- a 3060 12GB would handle smaller models and most features. You could also run it on more -- the stack will happily use whatever you give it.


Privacy

Everything runs locally. Your conversations, your knowledge base, your voice recordings -- none of it leaves your machine. There are no accounts, no analytics, no tracking.

The only network calls go to these services, and only when you use the features that need them:

Service Used for Account/API key Data sent
Ollama (localhost) LLM inference No Nothing leaves your machine
SearxNG (localhost) Web search No Search queries proxied through Google/DuckDuckGo/Bing/Brave (your IP, not an API key)
Open-Meteo Weather forecasts No City name or coordinates
Nominatim Geocoding, address lookup No Address or coordinates
Overpass API Nearby POI search No Coordinates and search radius
OpenStreetMap Map tiles No Tile coordinates (standard map loading)

Zero API keys. Zero accounts. Open-Meteo, Nominatim, and Overpass are free public APIs run by non-profits and open-source projects. SearxNG is a self-hosted metasearch engine that distributes your queries across multiple search engines so no single provider builds a profile on you.

If you disconnect from the internet, everything except web search, weather, and map tiles continues to work -- chat, voice, knowledge base, notes, memories, media generation all run offline.


What Makes This Different

Actually local. Not "local but phones home for embeddings" or "local but needs an API key for search." Every component runs on your machine. SearxNG replaces Google. Ollama replaces OpenAI. Whisper and Kokoro replace cloud speech APIs. ComfyUI replaces Midjourney/DALL-E.

Knowledge-grounded. Answers come from your collected sources with citations. It shows where information came from -- which Reddit post, which article, which YouTube video. When it uses web search as a fallback, it tells you.

Extensible via MCP. Add new tools without touching the core codebase. Drop a config entry for any MCP-compatible server and the agent picks up its tools automatically.

Not a wrapper. This is a full application with its own storage, its own ingestion pipeline, its own agent loop. It is not a thin UI over an API.


Open Source Release

Coming soon. The codebase is being cleaned up for public release. When it ships, you will get:

  • Full source code (Python backend + Next.js frontend)
  • Docker Compose setup for one-command deployment
  • Configuration guides for Ollama, ComfyUI, and SearxNG
  • Documentation for adding custom MCP tools

Watch this repo for the release.


Support

If you find this project useful and want to support its development:

Buy Me a Coffee

License

TBD -- will be announced with the open-source release.

About

Your personal AI assistant. Fully local. Fully yours.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors