Skip to content

Ambyli/AI-Workbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

◉ ⬡ ⬡ 🧠 ⚙ ⚙ → AI Workbench

AI Workbench

A local AI development workbench — real-time token usage monitoring for claude-code, multi-model serving, and MCP tool calling, all on one machine.

Components

Component Description Docs
Usage Widget Python tray app — daily/weekly token totals, per-project breakdown, rolling averages, claude.ai account stats via CDP, local LLM toggle widget/USAGE_WIDGET.md
AI Infrastructure Main compose (LiteLLM + Unsloth + vLLM), multi-model serving, GPU configuration ai/AI_INFRA.md

Configuration

All settings live in config.json (project root) and .env. The widget reads config.json at startup and applies changes immediately via the Settings window (tray right-click → Settings…).

See the Usage Widget docs for the full key reference.

Quick start

make setup

Docker Compose Commands

Run make help to list all available targets, including any service stacks currently registered in the Makefile.

Network

make network         # Create shared Docker network (ai_shared)

All containers share a ai_shared Docker network so they can resolve each other by container name. make setup creates it automatically.

Main stack

make up              # Start all services
make down            # Stop all services
make clean           # Stop and remove containers + volumes
make very-clean      # Stop, remove containers, volumes, and images
make logs            # Follow logs
make build           # Build images

Adding or Removing a Service

The Makefile uses a macro to generate up-*, down-*, clean-*, very-clean-*, logs-*, and build-* targets for each service stack. Adding or removing a service is a two-step change.

Add a service

  1. Create ai/docker-compose.<name>.yml with the service definition.
  2. Add one line to the Makefile with the stack name and the space-separated list of Docker Compose service names to target:
$(eval $(call service,<name>,<service1> <service2> ...))

Example — adding a Whisper stack that runs two containers:

$(eval $(call service,whisper,whisper-api whisper-worker))

This immediately makes make up-whisper, make down-whisper, make clean-whisper, make very-clean-whisper, make logs-whisper, and make build-whisper available. The compose file must be named ai/docker-compose.whisper.yml.

Remove a service

Delete the corresponding $(eval $(call service,...)) line from the Makefile. That's it — all five generated targets disappear with it.

Naming rules

Constraint Detail
Stack name Must match the suffix of the compose filename: service,foo,...ai/docker-compose.foo.yml
Service list Space-separated Docker Compose service names (second argument). These are passed directly to docker compose up/stop/rm/build.
Multiple services All listed services are started/stopped together as a group (e.g. vllm-qwen vllm-llama).

GPU Compatibility

All inference services (Unsloth, vLLM, Kokoro) require a CUDA-capable NVIDIA GPU. The full list of supported GPUs is at https://developer.nvidia.com/cuda/gpus. The Unsloth build defaults to compute capability 89 (Ada Lovelace — RTX 4080/4090); adjust Dockerfile.unsloth for other architectures.

Environment Variables

Copy .env.example to .env and fill in the values.

Variable Default Description
DEBUG_LOGGING false Write DEBUG-level logs to the log file
INCLUDE_PATHS (empty) Comma-separated path prefixes to filter project sessions. Leave blank to include all
EXCLUDE_WEEKDAYS 5,6 Days excluded from rolling averages (0=Monday, 6=Sunday)
CONSOLE_FETCHER_ENABLED false Scrape console.anthropic.com for account-level usage
CONSOLE_REFRESH_MINUTES 30 Minutes between console scraping refreshes
CONSOLE_HEADLESS true Run Chrome headless for scraping (false keeps window visible)
CHROME_PATHS_VAR (empty) Alternative Chrome executable path. Leave empty to use defaults
LLM_LOG_MAX_LINES 200 Max lines kept in the LLM Backend server log
LLM_URL http://localhost:8001 Base URL for the local LLM server
LLM_API_KEY (empty) API key sent to the local LLM server
LLM_MODEL (empty) Model alias passed to Claude Code
LLAMA_SERVER_CMD (empty) Full shell command to launch llama-server
BROWSER_DEBUG_PORT 9222 Chrome remote-debugging port for CDP
KEEP_LLM_ACTIVE true Keep the local LLM server running when switching away
HF_TOKEN (empty) HuggingFace token for gated model downloads (vLLM, LiteLLM)
DEFAULT_LITELLM_MASTER_KEY (empty) Master key for the LiteLLM proxy
DEFAULT_LITELLM_MODEL_NAME (empty) Model alias used in LiteLLM requests
DEFAULT_LITELLM_MODEL (empty) Full model spec (e.g. openai/unsloth/Qwen3.6-35B-A3B-GGUF)
DEFAULT_LITELLM_MODEL_API_KEY (empty) API key forwarded to the upstream LLM
DEFAULT_LITELLM_MODEL_API_BASE (empty) Upstream API base URL (must include /v1 for OpenAI-compatible endpoints)
DEFAULT_LITELLM_DATABASE_URL (empty) PostgreSQL connection string for LiteLLM
DEFAULT_LITELLM_MCP_PHOENIX_URL (empty) URL for the Phoenix MCP server
DEFAULT_LITELLM_MCP_PHOENIX_AUTH_VALUE (empty) Bearer token for Phoenix MCP authentication
KOKORO_APP_URL http://kokoro-app:8085 URL the Kokoro API proxy uses to reach the inference container. Set to http://localhost:8080 for local dev

About

Displays usage stats by acquiring claude's token counts from a user's session history in their .claude/ folder.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors