◉ ⬡ ⬡ 🧠 ⚙ ⚙ → AI Workbench
A local AI development workbench — real-time token usage monitoring for claude-code, multi-model serving, and MCP tool calling, all on one machine.
| Component | Description | Docs |
|---|---|---|
| Usage Widget | Python tray app — daily/weekly token totals, per-project breakdown, rolling averages, claude.ai account stats via CDP, local LLM toggle | widget/USAGE_WIDGET.md |
| AI Infrastructure | Main compose (LiteLLM + Unsloth + vLLM), multi-model serving, GPU configuration | ai/AI_INFRA.md |
All settings live in config.json (project root) and .env. The widget reads config.json at startup and applies changes immediately via the Settings window (tray right-click → Settings…).
See the Usage Widget docs for the full key reference.
make setupRun make help to list all available targets, including any service stacks currently registered in the Makefile.
make network # Create shared Docker network (ai_shared)All containers share a ai_shared Docker network so they can resolve each other by container name. make setup creates it automatically.
make up # Start all services
make down # Stop all services
make clean # Stop and remove containers + volumes
make very-clean # Stop, remove containers, volumes, and images
make logs # Follow logs
make build # Build imagesThe Makefile uses a macro to generate up-*, down-*, clean-*, very-clean-*, logs-*, and build-* targets for each service stack. Adding or removing a service is a two-step change.
- Create
ai/docker-compose.<name>.ymlwith the service definition. - Add one line to the
Makefilewith the stack name and the space-separated list of Docker Compose service names to target:
$(eval $(call service,<name>,<service1> <service2> ...))Example — adding a Whisper stack that runs two containers:
$(eval $(call service,whisper,whisper-api whisper-worker))This immediately makes make up-whisper, make down-whisper, make clean-whisper, make very-clean-whisper, make logs-whisper, and make build-whisper available. The compose file must be named ai/docker-compose.whisper.yml.
Delete the corresponding $(eval $(call service,...)) line from the Makefile. That's it — all five generated targets disappear with it.
| Constraint | Detail |
|---|---|
| Stack name | Must match the suffix of the compose filename: service,foo,... → ai/docker-compose.foo.yml |
| Service list | Space-separated Docker Compose service names (second argument). These are passed directly to docker compose up/stop/rm/build. |
| Multiple services | All listed services are started/stopped together as a group (e.g. vllm-qwen vllm-llama). |
All inference services (Unsloth, vLLM, Kokoro) require a CUDA-capable NVIDIA GPU. The full list of supported GPUs is at https://developer.nvidia.com/cuda/gpus. The Unsloth build defaults to compute capability 89 (Ada Lovelace — RTX 4080/4090); adjust Dockerfile.unsloth for other architectures.
Copy .env.example to .env and fill in the values.
| Variable | Default | Description |
|---|---|---|
DEBUG_LOGGING |
false |
Write DEBUG-level logs to the log file |
INCLUDE_PATHS |
(empty) | Comma-separated path prefixes to filter project sessions. Leave blank to include all |
EXCLUDE_WEEKDAYS |
5,6 |
Days excluded from rolling averages (0=Monday, 6=Sunday) |
CONSOLE_FETCHER_ENABLED |
false |
Scrape console.anthropic.com for account-level usage |
CONSOLE_REFRESH_MINUTES |
30 |
Minutes between console scraping refreshes |
CONSOLE_HEADLESS |
true |
Run Chrome headless for scraping (false keeps window visible) |
CHROME_PATHS_VAR |
(empty) | Alternative Chrome executable path. Leave empty to use defaults |
LLM_LOG_MAX_LINES |
200 |
Max lines kept in the LLM Backend server log |
LLM_URL |
http://localhost:8001 |
Base URL for the local LLM server |
LLM_API_KEY |
(empty) | API key sent to the local LLM server |
LLM_MODEL |
(empty) | Model alias passed to Claude Code |
LLAMA_SERVER_CMD |
(empty) | Full shell command to launch llama-server |
BROWSER_DEBUG_PORT |
9222 |
Chrome remote-debugging port for CDP |
KEEP_LLM_ACTIVE |
true |
Keep the local LLM server running when switching away |
HF_TOKEN |
(empty) | HuggingFace token for gated model downloads (vLLM, LiteLLM) |
DEFAULT_LITELLM_MASTER_KEY |
(empty) | Master key for the LiteLLM proxy |
DEFAULT_LITELLM_MODEL_NAME |
(empty) | Model alias used in LiteLLM requests |
DEFAULT_LITELLM_MODEL |
(empty) | Full model spec (e.g. openai/unsloth/Qwen3.6-35B-A3B-GGUF) |
DEFAULT_LITELLM_MODEL_API_KEY |
(empty) | API key forwarded to the upstream LLM |
DEFAULT_LITELLM_MODEL_API_BASE |
(empty) | Upstream API base URL (must include /v1 for OpenAI-compatible endpoints) |
DEFAULT_LITELLM_DATABASE_URL |
(empty) | PostgreSQL connection string for LiteLLM |
DEFAULT_LITELLM_MCP_PHOENIX_URL |
(empty) | URL for the Phoenix MCP server |
DEFAULT_LITELLM_MCP_PHOENIX_AUTH_VALUE |
(empty) | Bearer token for Phoenix MCP authentication |
KOKORO_APP_URL |
http://kokoro-app:8085 |
URL the Kokoro API proxy uses to reach the inference container. Set to http://localhost:8080 for local dev |