How to deploy FormicOS from clone to running stack.
No GPU required. Three containers: FormicOS + Qdrant + Docker proxy.
git clone https://github.com/Intradyne/FormicOS.git
cd FormicOS
cp .env.example .env
# Edit .env: set ANTHROPIC_API_KEY=sk-ant-...
docker compose build && docker compose up -dVerify:
docker compose ps # 3 containers should show "healthy"
curl http://localhost:8080/healthNavigate to http://localhost:8080. Wait for the startup panel to clear and the Queen welcome message to appear.
| Requirement | Minimum |
|---|---|
| Docker | Docker Engine 24+ with Compose V2, or Docker Desktop 4.30+ |
| API key | At least one of: ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY |
Embeddings use sentence-transformers MiniLM (~80 MB auto-download, 384-dim). No GPU needed.
Five containers: adds llama.cpp LLM + Qwen3-Embedding sidecar (GPU).
Production local model: Qwen3.5-35B-A3B (MoE, 3.5B active params/token). Benchmarked at 0.503 average quality across 5/5 real-repo tasks with zero hangs. This is the default and recommended local profile.
| Requirement | Minimum | Recommended |
|---|---|---|
| GPU | 24 GB VRAM (RTX 3090 / 4090 / A5000) | 32 GB VRAM (RTX 5090) |
| CPU | 4 cores | 8+ cores |
| RAM | 32 GB | 64 GB |
| Disk | ~20 GB (models + Docker images) | 40 GB+ |
| NVIDIA | NVIDIA Container Toolkit installed | Driver 555+ for Blackwell GPUs |
bash scripts/setup-local-gpu.sh
docker compose up -dThis downloads models, builds the Blackwell llama.cpp image, and enables the
local-gpu Docker Compose profile in your .env.
cp .env.example .envUncomment the "Local GPU override" block in .env, then:
# Download models
mkdir -p .models && cd .models
huggingface-cli download Qwen/Qwen3.5-35B-A3B-GGUF \
Qwen3.5-35B-A3B-Q4_K_M.gguf --local-dir .
huggingface-cli download Qwen/Qwen3-Embedding-0.6B-GGUF \
Qwen3-Embedding-0.6B-Q8_0.gguf --local-dir .
cd ..
# Build Blackwell-native llama.cpp image
bash scripts/build_llm_image.sh
# Start (5 containers)
docker compose up -dFor non-Blackwell GPUs, set LLM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-cuda
in .env. The generic image uses PTX JIT and falls back to ~16k effective
context.
Cloud-first uses 384-dim MiniLM embeddings. Local GPU uses 1024-dim Qwen3
via the sidecar. You cannot switch between cloud and local embedding on
an existing Qdrant collection without re-indexing. Choose once at setup
time. To re-index: stop FormicOS, delete the qdrant-data volume, restart.
| Container | Port | Purpose | Profile |
|---|---|---|---|
formicos-colony |
8080 | FormicOS application (backend + frontend) | (always) |
formicos-qdrant |
6333, 6334 | Qdrant vector store | (always) |
formicos-docker-proxy |
-- (internal 2375) | Docker socket proxy for sandbox spawning | (always) |
formicos-llm |
8008 → 8080 | llama.cpp LLM inference (GPU) | local-gpu |
formicos-embed |
8200 | Qwen3-Embedding sidecar (GPU) | local-gpu |
Profile-gated services only start when COMPOSE_PROFILES=local-gpu is set
in .env. Adapters handle missing endpoints gracefully at first LLM call.
Copy .env.example to .env. See .env.example for the full list with
inline documentation. Key variables:
| Variable | Default | Purpose |
|---|---|---|
ANTHROPIC_API_KEY |
(none) | Anthropic API key for cloud models (Claude) |
COMPOSE_PROFILES |
(none) | Set to local-gpu to enable LLM + embedding containers |
QUEEN_MODEL |
anthropic/claude-sonnet-4-6 |
Queen model (env var overrides formicos.yaml) |
CODER_MODEL |
anthropic/claude-sonnet-4-6 |
Coder model |
REVIEWER_MODEL |
anthropic/claude-haiku-4-5 |
Reviewer model |
EMBED_MODEL |
all-MiniLM-L6-v2 |
Embedding model (sentence-transformers or sidecar) |
EMBED_URL |
(empty) | Embedding sidecar URL (set for local GPU) |
EMBED_DIMENSIONS |
384 |
Embedding vector dimensions (384 for MiniLM, 1024 for Qwen3) |
LLM_HOST |
(empty) | Local LLM endpoint (set for local GPU) |
FORMICOS_DATA_DIR |
./data (dev) / /data (Docker) |
Persistent data directory |
SANDBOX_ENABLED |
true |
Enable/disable Docker sandbox for code execution |
| File | Purpose |
|---|---|
config/formicos.yaml |
Model routing, tier definitions, context windows |
config/caste_recipes.yaml |
Caste prompts, tool lists, model assignments |
config/templates/ |
Colony templates (7 built-in) |
On multi-GPU systems, set CUDA_DEVICE in .env:
CUDA_DEVICE=0This sets CUDA_VISIBLE_DEVICES inside the LLM and embedding containers.
Docker Desktop on WSL2 ignores device_ids in the deploy block and
passes through all GPUs. CUDA_VISIBLE_DEVICES is the effective control.
Without pinning, llama.cpp may split model layers across GPUs, causing segfaults or severe throughput degradation.
FormicOS is event-sourced. A single SQLite file is the source of truth. On startup, events replay into in-memory projections.
Rules:
- Use named Docker volumes (the default). The
formicos-datavolume indocker-compose.ymlis a named volume. - Never bind-mount the SQLite database on macOS or Windows Docker Desktop.
Docker Desktop uses a Linux VM with filesystem translation (gRPC-FUSE or
VirtioFS). SQLite WAL mode requires POSIX shared-memory semantics
(
mmapon.db-shm) that do not translate correctly through this layer. Bind-mounting will cause silent corruption or locking failures under load. - Keep
.db,.db-wal, and.db-shmon the same filesystem. WAL mode requires all three files to be co-located. Moving or copying just the.dbfile without its WAL companions will lose uncommitted data. - FormicOS is a single-writer system. Do not run multiple FormicOS instances against the same SQLite file.
The SQLite adapter currently enables WAL journaling (PRAGMA journal_mode=WAL).
Qdrant stores knowledge embeddings for vector search. Data persists in the
qdrant-data named volume.
- Qdrant v1.16.2 is required for server-side BM25 hybrid search support.
- Back up by snapshotting the
qdrant-datavolume (see Backup below). - Qdrant data is reconstructable from the event store — if you lose the Qdrant volume, restart FormicOS and embeddings will be re-indexed from events on replay.
The event store is the primary backup target. Back up the formicos-data
volume:
docker run --rm -v formicos-data:/data -v $(pwd):/backup \
alpine tar czf /backup/formicos-backup-$(date +%Y%m%d).tar.gz /dataTo restore:
docker compose down
docker volume rm formicosa_formicos-data
docker volume create formicosa_formicos-data
docker run --rm -v formicosa_formicos-data:/data -v $(pwd):/backup \
alpine tar xzf /backup/formicos-backup-YYYYMMDD.tar.gz -C /
docker compose up -ddocker compose down -v
docker compose up -dThe code_execute agent tool runs code inside disposable Docker containers
with:
--network=none— no network access--memory=256m— memory limit--read-only— read-only root filesystem--tmpfs /tmp:size=10m— small writable temp space
These provide basic isolation for code execution tasks.
The FormicOS container routes Docker API calls through a socket proxy
(tecnativa/docker-socket-proxy) configured in docker-compose.yml.
The proxy restricts API access to container operations only (CONTAINERS=1,
POST=1; images, networks, volumes, and all other operations are blocked).
The raw Docker socket is mounted read-only into the proxy container, not into FormicOS itself. This limits the blast radius of a compromise — FormicOS can create/start/stop containers but cannot pull images, create networks, or access other Docker API endpoints.
Mitigations:
- Set
SANDBOX_ENABLED=falseto disable sandbox spawning entirely. - The socket proxy is the default path — no raw socket mount is needed.
- Set
DOCKER_HOST=tcp://docker-proxy:2375(already configured indocker-compose.yml).
For stronger isolation, consider running FormicOS inside a Sysbox or gVisor-based runtime where nested containers do not require host socket access. This is not yet a shipped configuration.
The workspace executor (for repo-backed commands like git, test runners,
and build tools) runs commands inside disposable Docker containers when
WORKSPACE_ISOLATION=true (the default). The workspace directory is
bind-mounted into the container, and commands run with --cap-drop=ALL,
--security-opt=no-new-privileges, --pids-limit, and a custom seccomp
profile.
Phase-aware networking: dependency-install commands (pip install,
npm install, etc.) get network access; test and build commands run with
--network=none.
When Docker is unavailable, the executor falls back to host-shell
asyncio.create_subprocess_shell. Set WORKSPACE_ISOLATION=false to
force this fallback. The host-shell path does not have container
isolation.
| Control | Status |
|---|---|
Sandbox containers with --network=none, --memory, --read-only |
Enforced |
Sandbox --cap-drop=ALL, --no-new-privileges, --pids-limit=256 |
Enforced |
Custom seccomp profile for sandbox (config/seccomp-sandbox.json) |
Enforced |
| Knowledge security scanning (5-axis) | Enforced |
| Event-sourced audit trail | Enforced |
| Layer isolation (CI-enforced import rules) | Enforced |
| Bayesian federation trust scoring | Enforced |
| Docker socket proxy (container-ops only) | Enforced |
Containerized workspace executor (WORKSPACE_ISOLATION=true) |
Enforced (fallback to host-shell when Docker unavailable) |
| Git clone security defaults (hooks disabled, no submodules, symlinks off) | Enforced via safe_git_clone() |
For local development, HTTPS is not needed. Claude Desktop and Claude Code
both connect over HTTP via localhost (Claude Desktop uses mcp-remote to
bridge stdio to HTTP -- see docs/DEVELOPER_BRIDGE.md).
HTTPS is only needed when exposing FormicOS to external clients over the network. The repo includes an optional Caddy reverse proxy sidecar that terminates TLS on port 8443 using locally-trusted certificates generated by mkcert.
# Install mkcert
# Windows: winget install FiloSottile.mkcert
# macOS: brew install mkcert
# Linux: see https://github.com/FiloSottile/mkcert#installation
# Install the local CA into system trust stores (one-time)
mkcert -install
# Generate certs for localhost
mkdir -p certs
mkcert -cert-file certs/localhost.pem -key-file certs/localhost-key.pem \
localhost 127.0.0.1 ::1The certs/ directory is gitignored -- certificates are generated per-machine.
The caddy service is commented out in docker-compose.yml by default.
Uncomment it, or use the override file:
docker compose -f docker-compose.yml -f docker-compose.https.yml up -dCaddy listens on :8443 and reverse-proxies all traffic to formicos:8080.
The Caddyfile at the repo root configures TLS:
:8443 {
tls /certs/localhost.pem /certs/localhost-key.pem
reverse_proxy formicos:8080
}
curl https://localhost:8443/healthIf this fails with a certificate error, run mkcert -install to add the
local CA to your system trust store.
| Component | VRAM | Notes |
|---|---|---|
| Qwen3.5-35B-A3B Q4_K_M weights | ~19.5 GB | MoE, 3.5B active params/token |
| KV cache (65k ctx × 5 slots) | ~5.5 GB | --fit on auto-sizes to available VRAM |
| Compute buffers | ~2.4 GB | |
| Qwen3-Embedding-0.6B Q8_0 | ~0.7 GB | Set EMBED_GPU_LAYERS=0 for CPU fallback |
| Total GPU | ~28.1 GB | ~3.9 GB headroom |
The same stack fits with reduced context. Set LLM_CONTEXT_SIZE=32000 in
.env. With --fit on, llama.cpp auto-sizes the KV cache to available VRAM.
To free ~700 MB VRAM, move embedding to CPU:
EMBED_GPU_LAYERS=0Add API keys to .env and skip model downloads and image builds:
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AI...The tier system routes all castes to cloud providers. Comment out the llm
and formicos-embed services in docker-compose.yml if desired (FormicOS
will still need Qdrant for vector search, or will fall back to the
sentence-transformers embedding path).
git pull
docker compose build formicos
docker compose up -dImportant: After modifying frontend source or pulling updates, you must
rebuild the Docker image. The frontend is built inside the Docker multi-stage
build — a local npm run build does not update the served bundle inside the
container.
docker logs formicos-colony # Application (structlog JSON)
docker logs formicos-llm # LLM inference
docker logs formicos-embed # Embedding sidecar
docker logs formicos-qdrant # Vector store| Endpoint | What it checks |
|---|---|
GET /health (port 8080) |
FormicOS app, replay status, workspace/thread/colony counts |
GET /health (port 8008) |
llama.cpp LLM readiness |
GET /health (port 8200) |
Embedding sidecar readiness |
GET /healthz (port 6333) |
Qdrant readiness |
FormicOS ships two telemetry adapters:
- JSONL sink (
adapters/telemetry_jsonl.py) — lightweight, always available. Events appended to a JSONL file in the data directory. - OpenTelemetry adapter (
adapters/telemetry_otel.py) — activates whenopentelemetry-apiis importable. Instruments LLM calls, colony lifecycle, and round execution with spans and metrics. Configure via standard OTel environment variables (OTEL_EXPORTER_OTLP_ENDPOINT, etc.).
The OTel adapter is additive — the JSONL sink remains usable as a debug-level fallback even when OTel is configured. Live integration into the runtime call sites is available but not yet wired into all execution paths.
See RUNBOOK.md for detailed troubleshooting guidance covering:
- LLM container startup failures
- Embedding sidecar crashes and multi-GPU issues
- Qdrant connection problems
- Colony execution issues
- Provider fallback behavior
- Frontend bundle staleness after rebuilds
Devstral Small 2 can be used as an alternative local model. It has strong instruction following but is significantly slower than Qwen3.5 MoE on consumer hardware due to its dense architecture.
Status: Experiment/reference profile. Not recommended as the default production path on current consumer GPUs.
To use Devstral:
- Copy
.env.devstralto.env(or merge its model settings) - Download the Devstral GGUF into
.models/ - Set
LLM_FLASH_ATTN=off,LLM_CACHE_TYPE_K=f16,LLM_CACHE_TYPE_V=f16 - Use conservative slot/context settings (see
.env.devstral)
Devstral is useful for testing instruction-following quality or as a comparison baseline. For iterative colony work, Qwen3.5-35B remains materially faster and is the recommended production profile.
Set FORMICOS_ASYNCIO_DEBUG=1 in .env to enable asyncio slow-callback
warnings. This logs any callback that blocks the event loop for >100ms,
with enough context to identify the blocking function.
Use this when diagnosing app health-check failures or WebSocket stream deaths during sustained colony work. Disable in normal operation.
Colony completion hooks (memory extraction, transcript harvest) now drain through a deferred idle queue instead of competing immediately with live colony work. This prevents LLM capacity starvation during bursty multi-colony completions. No configuration needed — always active.
The local OpenAI-compatible adapter uses explicit httpx connection-pool
limits (max_connections=10, max_keepalive_connections=5) on top of
earlier transport hardening (Connection: close, transport reset/retry).
An Ollama variant is documented (commented out) in docker-compose.yml.
To use Ollama instead of llama.cpp:
- Uncomment the
ollamaservice block indocker-compose.yml - Comment out the
llmservice - Set
LLM_HOST=http://ollama:11434in the FormicOS environment - Update model defaults in
config/formicos.yamlto useollama/*prefixes - Pull the model:
docker exec formicos-ollama ollama pull qwen3:30b-a3b
Ollama is simpler to set up but has higher VRAM overhead and less control over inference parameters.
For development without Docker:
# Backend
uv sync --dev
python -m formicos # Starts on :8080
# Frontend (separate terminal, for HMR)
cd frontend && npm ci && npm run dev
# Full CI
uv run ruff check src/ && uv run pyright src/ && python scripts/lint_imports.py && python -m pytest -qYou need either a running local LLM (llama.cpp, Ollama) or cloud API keys
in .env. Qdrant must be running for vector search (docker compose up qdrant).