Skip to content

Commit dc5a8f0

Browse files
committed
wave 75 to 89
1 parent 6f0bd2b commit dc5a8f0

285 files changed

Lines changed: 44259 additions & 2376 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/settings.json

Lines changed: 231 additions & 1 deletion
Large diffs are not rendered by default.

.env.example

Lines changed: 100 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,78 +1,124 @@
11
# FormicOS environment configuration
2-
# Copy to .env and configure for your setup.
2+
# Copy to .env and set your API key. That's it.
33
#
4-
# Default stack: llama.cpp (GPU, Blackwell-native) + Qwen3-Embedding sidecar + Qdrant.
5-
# See docs/DEPLOYMENT.md for the full deployment guide.
6-
# See docker-compose.yml for service definitions.
4+
# Cloud-first by default: 3 containers, no GPU needed.
5+
# For local GPU: uncomment the "Local GPU" section below,
6+
# then run: bash scripts/setup-local-gpu.sh
77

8-
# --- Local LLM Docker image ---
9-
# Default: Blackwell-native image (sm_120, CUDA 12.8). Build first:
10-
# bash scripts/build_llm_image.sh
11-
#
12-
# Fallback for non-Blackwell GPUs (PTX JIT, ~10x slower on RTX 5090,
13-
# --fit on auto-sizes context down to ~16k):
14-
# LLM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-cuda
15-
16-
# --- Cloud LLM API keys (optional — enables cloud model routing) ---
17-
# ANTHROPIC_API_KEY=sk-ant-...
8+
# --- Cloud API keys (set at least one) ---
9+
ANTHROPIC_API_KEY=sk-ant-...
1810
# GEMINI_API_KEY=AI...
11+
# OPENAI_API_KEY=sk-...
1912
# DEEPSEEK_API_KEY=sk-...
20-
# MINIMAX_API_KEY=eyJ...
2113

22-
# --- Local LLM model file ---
23-
# Override the default Qwen3-30B-A3B model:
24-
# LLM_MODEL_FILE=Qwen3-30B-A3B-Instruct-2507-Q4_K_M.gguf
14+
# --- Project binding (Wave 81) ---
15+
# Set to your project directory. Colonies will read/write against this root.
16+
# PROJECT_DIR=/path/to/your/project
2517

26-
# --- Local LLM context size ---
27-
# Default: 80000. With --fit on, llama.cpp auto-sizes KV cache to VRAM.
28-
# The Blackwell image supports 80k on RTX 5090 (32 GB) with comfortable headroom.
29-
# The generic CUDA image falls back to ~16k via PTX JIT.
30-
# Must match config/formicos.yaml context_window for llama-cpp/gpt-4.
31-
# LLM_CONTEXT_SIZE=80000
32-
33-
# --- Local LLM parallel slots ---
34-
# Number of concurrent inference slots. The adapter reads this env var
35-
# to set its concurrency semaphore — no manual coupling needed.
18+
# --- Local GPU override (uncomment entire block) ---
19+
# COMPOSE_PROFILES=local-gpu
20+
# QUEEN_MODEL=llama-cpp/qwen3.5-35b
21+
# CODER_MODEL=llama-cpp/qwen3.5-35b
22+
# REVIEWER_MODEL=llama-cpp/qwen3.5-35b
23+
# RESEARCHER_MODEL=llama-cpp/qwen3.5-35b
24+
# ARCHIVIST_MODEL=llama-cpp/qwen3.5-35b
25+
# LLM_HOST=http://llm:8080
26+
# EMBED_URL=http://formicos-embed:8200
27+
# EMBED_MODEL=nomic-ai/nomic-embed-text-v1.5
28+
# EMBED_DIMENSIONS=768
29+
#
30+
# Local LLM tuning:
31+
# LLM_IMAGE=local/llama.cpp:server-cuda-blackwell
32+
# LLM_MODEL_FILE=Qwen3.5-35B-A3B-Q4_K_M.gguf
33+
# LLM_MODEL_ALIAS=qwen3.5-35b
34+
# LLM_CHAT_TEMPLATE_ARGS=--chat-template-file /config/qwen35-chat.jinja
35+
# LLM_FLASH_ATTN=on
36+
# LLM_CACHE_TYPE_K=q4_0
37+
# LLM_CACHE_TYPE_V=q4_0
38+
# LLM_BATCH_SIZE=8192
39+
# LLM_UBATCH_SIZE=4096
40+
# LLM_CONTEXT_SIZE=65536
3641
# LLM_SLOTS=2
37-
38-
# --- Slot prompt similarity ---
39-
# Controls how aggressively slots reuse cached prompt prefixes (0.0-1.0).
40-
# Higher values = more aggressive reuse. Good for multi-agent shared prompts.
4142
# LLM_SLOT_PROMPT_SIMILARITY=0.5
42-
43-
# --- Prompt cache in system RAM (MB) ---
44-
# Stores previously computed KV cache states for prefix reuse.
45-
# Free performance for multi-agent workloads with shared system prompts.
4643
# LLM_CACHE_RAM=1024
47-
48-
# --- Embedding GPU layers ---
49-
# Set to 0 to move embedding model to CPU, freeing ~700 MB VRAM.
50-
# Useful if VRAM is tight at 131k context.
5144
# EMBED_GPU_LAYERS=99
45+
# CUDA_DEVICE=0
5246

53-
# --- Local LLM port (host-side) ---
54-
# LLM_PORT=8008
47+
# --- Devstral local eval (24B dense; safer defaults on 32 GB VRAM) ---
48+
# COMPOSE_PROFILES=local-gpu
49+
# QUEEN_MODEL=llama-cpp/devstral-small-2-24b
50+
# CODER_MODEL=llama-cpp/devstral-small-2-24b
51+
# REVIEWER_MODEL=llama-cpp/devstral-small-2-24b
52+
# RESEARCHER_MODEL=llama-cpp/devstral-small-2-24b
53+
# ARCHIVIST_MODEL=llama-cpp/devstral-small-2-24b
54+
# FORMICOS_ENV_FILE=.env.devstral
55+
# LLM_HOST=http://llm:8080
56+
# EMBED_URL=http://formicos-embed:8200
57+
# LLM_MODEL_FILE=mistralai_Devstral-Small-2-24B-Instruct-2512-Q4_K_M.gguf
58+
# LLM_MODEL_ALIAS=devstral-small-2-24b
59+
# LLM_CHAT_TEMPLATE_ARGS=
60+
# LLM_FLASH_ATTN=off
61+
# LLM_CACHE_TYPE_K=f16
62+
# LLM_CACHE_TYPE_V=f16
63+
# LLM_BATCH_SIZE=4096
64+
# LLM_UBATCH_SIZE=2048
65+
# LLM_CACHE_RAM=0
66+
# LLM_CONTEXT_SIZE=32768
67+
# LLM_SLOTS=3
5568

56-
# --- Model directory (shared by LLM and embedding sidecar) ---
57-
# LLM_MODEL_DIR=./.models
69+
# --- Hybrid routing (GPU + API keys — RECOMMENDED for local GPU users) ---
70+
# Queen on cloud (unlimited context), colonies on local GPU (fast parallel).
71+
# COMPOSE_PROFILES=local-gpu
72+
# QUEEN_MODEL=anthropic/claude-sonnet-4-6
73+
# CODER_MODEL=llama-cpp/qwen3.5-35b
74+
# REVIEWER_MODEL=llama-cpp/qwen3.5-35b
75+
# RESEARCHER_MODEL=anthropic/claude-haiku-4-5
76+
# ARCHIVIST_MODEL=llama-cpp/qwen3.5-35b
77+
# LLM_SLOTS=3
5878

59-
# --- GPU device index ---
60-
# Pin LLM + embedding to a specific GPU. Essential on multi-GPU systems.
61-
# Docker Desktop / WSL2 ignores device_ids in compose deploy blocks;
62-
# CUDA_VISIBLE_DEVICES (set via this variable) is the effective control.
63-
# Without pinning, llama.cpp may split layers across GPUs causing segfaults.
79+
# --- Multi-GPU pinning ---
80+
# Each GPU-using service has its own device variable.
81+
# Default: everything on GPU 0. Multi-GPU splits the load.
82+
#
83+
# Single GPU (default):
84+
# CUDA_DEVICE=0
85+
#
86+
# Multi-GPU (recommended for 2+ GPUs):
87+
# GPU 0 (primary, e.g. RTX 5090): Queen model only — full VRAM for large context
88+
# GPU 1 (secondary, e.g. RTX 3080): Swarm workers + embedding — uses multi-arch image
6489
# CUDA_DEVICE=0
90+
# CUDA_DEVICE_SWARM=1
91+
# CUDA_DEVICE_EMBED=1
92+
# EMBED_IMAGE=ghcr.io/ggml-org/llama.cpp:server-cuda
93+
#
94+
# The swarm image defaults to the official multi-arch build (ghcr.io/ggml-org/llama.cpp:server-cuda)
95+
# which runs on any CUDA GPU. Override LLM_SWARM_IMAGE for a native build on specific hardware.
96+
97+
# --- Local Swarm (parallel colony workers on a second llama.cpp instance) ---
98+
# Setup: bash scripts/setup-local-swarm.sh
99+
# Start: docker compose -f docker-compose.yml -f docker-compose.local-swarm.yml up -d
100+
#
101+
# Deep Queen (RECOMMENDED for multi-GPU):
102+
# Queen gets full 65K context on GPU 0, 4 parallel workers on GPU 1.
103+
# GPU 0 VRAM: ~23GB (35B weights + bf16 KV). GPU 1 VRAM: ~8.7GB (4B + embed).
104+
# LLM_SWARM_HOST=http://llm-swarm:8080
105+
# LLM_SLOTS=1
106+
# LLM_CONTEXT_SIZE=65536
107+
# LLM_SWARM_CTX=128000
108+
# LLM_SWARM_SLOTS=4
109+
# CODER_MODEL=llama-cpp-swarm/qwen3.5-4b-swarm
110+
# REVIEWER_MODEL=llama-cpp-swarm/qwen3.5-4b-swarm
111+
# ARCHIVIST_MODEL=llama-cpp-swarm/qwen3.5-4b-swarm
65112

66113
# --- Sandbox execution ---
67114
# Set to false to disable Docker sandbox container spawning (code_execute tool).
68-
# Also remove the /var/run/docker.sock mount from docker-compose.yml if you
69-
# want to opt out of Docker daemon access entirely.
70115
# SANDBOX_ENABLED=true
71116

72117
# --- Data directory ---
73118
# Default: ./data in development, /data in Docker.
74119
# IMPORTANT: Use named Docker volumes for SQLite persistence.
75-
# Never bind-mount the SQLite database on macOS/Windows Docker Desktop —
76-
# WAL mode requires POSIX shared-memory semantics that don't translate
77-
# through Docker Desktop's filesystem layer.
78120
# FORMICOS_DATA_DIR=./data
121+
122+
# --- Benchmark directory (dev only) ---
123+
# Mount a benchmark exercises directory into the container.
124+
# BENCHMARK_DIR=/path/to/polyglot-benchmark

.gitignore

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,18 @@ lint-imports-*
5151

5252
# Lock file (committed but generated)
5353
# uv.lock
54+
55+
# TLS certs (mkcert -- never commit private keys)
56+
certs/
57+
.certs/
58+
59+
# Test mock artifacts
60+
MagicMock/
61+
62+
# Temp directories
63+
.codex_tmp/
64+
.tmp_pytest/
65+
.tmp_runtime_tests/
66+
67+
# Ruff cache
68+
.ruff_cache/

.tmp_inspect_db.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import sqlite3, sys
2+
conn = sqlite3.connect(sys.argv[1])
3+
c = conn.cursor()
4+
c.execute("SELECT name FROM sqlite_master WHERE type='table'")
5+
tables = [t[0] for t in c.fetchall()]
6+
print("Tables:", tables)
7+
for t in tables:
8+
c.execute("SELECT * FROM " + t + " LIMIT 5")
9+
cols = [d[0] for d in c.description]
10+
rows = c.fetchall()
11+
if rows:
12+
print("--- " + t + " (" + str(cols) + ") ---")
13+
for r in rows:
14+
print(r)
15+
conn.close()

CLAUDE.md

Lines changed: 78 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,29 @@
22

33
Open-source Python system: AI agents coordinate through shared environmental
44
signals (pheromones), not direct messaging. Tree-structured data model.
5-
Event-sourced (69 events, closed union). Single operator. Local-first with
5+
Event-sourced (70 events, closed union). Single operator. Local-first with
66
cloud model support. Bayesian knowledge metabolism with Thompson Sampling
77
retrieval. Federated knowledge exchange via Computational CRDTs.
8-
Multi-colony orchestration via DelegationPlan DAG parallelism.
9-
MCP developer bridge (27 tools, 9 resources, 6 prompts) for Claude Code
10-
integration. Queen Command & Control surface with behavioral overrides,
11-
display board, tool tracking, and context budget visibility.
8+
Multi-colony orchestration via DelegationPlan DAG parallelism with deferred
9+
group dispatch. MCP developer bridge (29 tools, 12 resources, 8 prompts)
10+
for Claude Code integration. Queen Command & Control surface with behavioral
11+
overrides, display board, tool tracking, and context budget visibility.
12+
Planning workbench with structural analysis, replay-derived capability
13+
calibration, deterministic reviewed-plan validation and dispatch, saved plan
14+
patterns, and DAG editing. Live planning policy (`planning_policy.py`) as
15+
the Queen routing authority with fast_path enforcement at the execution
16+
layer. Project binding with real-repo codebase indexing (14K+ chunks).
17+
Production local profile: Qwen3.5-35B MoE (0.804 quality on fast_path
18+
tasks, 5/5 real-repo tasks completed, zero hangs).
19+
20+
## Knowledge base
21+
22+
The FormicOS knowledge system contains 100+ entries covering agent
23+
architecture state of the art (loop patterns, tool calling, context
24+
engineering, multi-agent coordination, production deployment, evaluation).
25+
Before making architectural choices, search the knowledge base via the
26+
`search_knowledge` MCP tool or the Queen's `memory_search` tool. Cite
27+
relevant entries by title when they influence design decisions.
1228

1329
## Architecture
1430

@@ -275,6 +291,48 @@ not a DAG; they are Queen scaffolding. When a colony completes a step, the
275291
system prompts the Queen with the next pending step via the follow_up_colony
276292
summary.
277293

294+
### Metering and billing (Wave 75)
295+
296+
Token metering aggregates `TokensConsumed` events per billing period with
297+
chain-hash integrity. `formicos billing` CLI subgroup (status, estimate,
298+
attest, history, self-test). Attestations are deterministic and stored in
299+
`data_dir/attestations/`. The `metering.py` surface module computes fees
300+
from tiered token thresholds. `scripts/attribution.py` computes contributor
301+
revenue-share proportions from git history.
302+
303+
### A2A economic contracts (Wave 75)
304+
305+
Task receipts (`surface/task_receipts.py`) produce deterministic cost/quality
306+
summaries for completed A2A work. `get_task_receipt` MCP tool and
307+
`formicos://receipt/{task_id}` resource expose receipts to clients.
308+
`search_knowledge` MCP tool provides full-pipeline retrieval from external
309+
clients (semantic + Thompson + freshness + co-occurrence + graph proximity).
310+
311+
### Structural integrity (Wave 76)
312+
313+
16 correctness fixes across 3 teams (data truth, operational safety,
314+
context integrity). No new features -- fixes silent errors and race
315+
conditions that would surface under real multi-client load.
316+
317+
Data truth: `BudgetSnapshot.total_tokens` includes reasoning tokens.
318+
Agent-to-colony reverse index (`_agent_colony_index`) in ProjectionStore
319+
for O(1) token attribution. Daily spend persistence to disk with reload
320+
on restart. Budget reconciliation (estimated vs actual colony cost)
321+
wired through `_post_colony_hooks`.
322+
323+
Operational safety: Action queue compaction preserves `pending_review`
324+
items. State transition validation via `_VALID_TRANSITIONS` map (409 on
325+
invalid). Operational sweep reentrancy guard (`asyncio.Lock`).
326+
Kill/completion race guard at both colony completion paths. Journal
327+
entries for all approval/execution branches. Operator-idle detection
328+
includes Queen thread messages.
329+
330+
Context integrity: Budget caps on memory retrieval, notes, and thread
331+
context injections. Workspace-scoped session and plan paths with
332+
migration fallback. Queen chat workspace propagation across all 4
333+
dispatch sites. Settings and queen-overview workspace resolution via
334+
`activeWorkspaceId` property.
335+
278336
## Tech stack
279337

280338
Use Python 3.12+, uv, Pydantic v2 (sole serialization), asyncio, httpx,
@@ -483,15 +541,28 @@ IMPORTANT: These are non-negotiable. Violating any of these requires operator ap
483541
| `surface/self_maintenance.py` | MaintenanceDispatcher, autonomy policy, blast radius, autonomy scoring | Self-maintenance |
484542
| `surface/project_plan.py` | Project plan parser/helper, milestone tools, plan rendering | Project plan |
485543
| `surface/queen_budget.py` | 9-slot proportional Queen context budget (ADR-051) | Queen budget |
486-
| `surface/queen_tools.py` | Queen tool dispatch (42 tools), spawn_parallel, DelegationPlan | Queen tools |
544+
| `surface/queen_tools.py` | Queen tool dispatch (~45 tools, dynamic toolsets), spawn_parallel, DelegationPlan | Queen tools |
487545
| `surface/transcript_view.py` | Canonical colony transcript schema | A2A/MCP export |
488546
| `surface/proactive_intelligence.py` | 17 deterministic briefing rules (7 knowledge + 4 performance + evaporation + branching + earned autonomy + template health + outcome digest + popular unexamined) | Proactive intel |
489547
| `surface/routes/api.py` | REST endpoints: outcomes, create-demo, project-plan, autonomy-status, maintenance-policy, add-model | API surface |
490548
| `surface/workflow_learning.py` | Deterministic workflow pattern recognition + procedure suggestions (Wave 72) | Workflow learning |
491549
| `docs/AUTONOMOUS_OPERATIONS.md` | Autonomy operator runbook: action queue, levels, learning, controls | Reference |
492550
| `docs/DEVELOPER_BRIDGE.md` | Developer onboarding guide for Claude Code integration | Reference |
493-
| `surface/mcp_server.py` | MCP server (27 tools, 9 resources, 6 prompts) | MCP surface |
551+
| `surface/mcp_server.py` | MCP server (29 tools, 12 resources, 8 prompts) | MCP surface |
494552
| `config/templates/demo-workspace.yaml` | Demo workspace template with seeded entries | Demo path |
553+
| `surface/workspace_roots.py` | Project/library/runtime root resolution (Wave 81) | Project binding |
554+
| `surface/parallel_plans.py` | Deferred group dispatch, honest plan aggregation (Wave 81) | Parallel execution |
555+
| `surface/planning_signals.py` | Structured planning signal builder (Wave 82) | Planning |
556+
| `surface/structural_planner.py` | File matching, import coupling, grouping hints (Wave 82) | Planning |
557+
| `surface/capability_profiles.py` | Replay-derived capability calibration (Wave 82) | Planning |
558+
| `surface/reviewed_plan.py` | Reviewed-plan validation and normalization (Wave 83) | Planning workbench |
559+
| `surface/plan_patterns.py` | YAML-backed saved plan-pattern store (Wave 83) | Planning workbench |
560+
| `surface/planning_policy.py` | Consolidated routing: `decide_planning_route()` + `PlanningDecision` (Wave 85) | Queen routing |
561+
| `surface/commands.py` | WebSocket command handlers incl. `confirm_reviewed_plan` | WS surface |
562+
| `surface/metering.py` | Token metering, fee computation, attestation generation | Billing |
563+
| `surface/task_receipts.py` | Deterministic task receipts for A2A economic contracts | A2A economics |
564+
| `scripts/attribution.py` | Contributor revenue-share attribution from git history | Billing |
565+
| `docs/waves/wave_81/real_repo_task_pack.md` | Real-repo evaluation tasks (rtp-01 through rtp-05) | Benchmark |
495566

496567
## Common patterns
497568

Caddyfile

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
:8443 {
2+
tls /certs/localhost.pem /certs/localhost-key.pem
3+
4+
# CORS headers for Claude Desktop connector
5+
header {
6+
Access-Control-Allow-Origin *
7+
Access-Control-Allow-Methods "GET, POST, OPTIONS"
8+
Access-Control-Allow-Headers "Content-Type, Accept, Authorization, Mcp-Session-Id"
9+
Access-Control-Expose-Headers "Mcp-Session-Id"
10+
}
11+
12+
# Handle CORS preflight
13+
@options method OPTIONS
14+
respond @options 204
15+
16+
reverse_proxy formicos:8080
17+
}

Dockerfile

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ RUN npm run build
99
# Stage 2: Python runtime
1010
FROM python:3.12-slim AS runtime
1111

12+
# System dependencies (git required for shadow checkpoints — Wave 78)
13+
RUN apt-get update \
14+
&& apt-get install -y --no-install-recommends git \
15+
&& rm -rf /var/lib/apt/lists/*
16+
1217
# Install uv for fast dependency resolution
1318
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
1419

METERING.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,30 @@ the canonical method for computing Total Tokens and producing Usage
88
Attestations under Tier 2 and Tier 3 Commercial Licenses.
99

1010

11+
## Implementation status (Wave 75)
12+
13+
**Implemented:**
14+
- Event-store-backed token aggregation (`surface/metering.py`)
15+
- Fee computation (`compute_fee`) — single source of truth
16+
- Unsigned v1 attestation generation
17+
- CLI: `formicos billing status|estimate|attest|history|self-test`
18+
- REST: `GET /api/v1/billing/status`
19+
- MCP: `formicos://billing` resource, `economic-status` prompt
20+
21+
**Deferred:**
22+
- Ed25519 key derivation and signing (attestations are `"unsigned"`)
23+
- Billing submission endpoint (`formicos billing submit`)
24+
- External billing service integration
25+
26+
**Repo truth notes:**
27+
- `TokensConsumed` events carry `cost` (not `cost_usd` as shown in the
28+
schema example below). The aggregate uses the actual event field name.
29+
- `TokensConsumed` events do not carry a `provider` field. The example
30+
below shows `provider` for specification completeness. The implementation
31+
derives provider best-effort from model name prefixes. `by_model` is
32+
canonical; `by_provider` is not available in the current event schema.
33+
34+
1135
## What is metered
1236

1337
**Total Tokens** is the sum of all input tokens, output tokens, and

0 commit comments

Comments
 (0)