Warm a box, sync the diff, run the suite.
Crabbox is a remote software testing and execution control plane for maintainers and AI agents. Lease fast managed cloud capacity, point at an existing SSH host, or use an agent sandbox provider — then sync your dirty checkout, run commands remotely, stream output, collect evidence, and release. Local edit-save-run loop, cloud-grade compute, agent-ready observability.
crabbox run -- pnpm testBehind that one command: a Go CLI on your laptop, a Cloudflare Worker broker that owns provider credentials and lease state, and a managed or delegated runner.
your laptop Cloudflare Worker cloud provider
------------- ------------------ --------------
crabbox CLI -- HTTPS --> Fleet Durable Object --> Hetzner / AWS / Azure / GCP
| lease + cost state |
| |
+------------ SSH + rsync to leased runner <--------------+
- CLI — Go binary. Loads config, mints a per-lease SSH key, asks the broker for a lease, waits for SSH, seeds remote Git, rsyncs the dirty checkout (with a fingerprint skip when nothing changed), runs the command, streams output, releases.
- Broker — Cloudflare Worker plus a single Fleet Durable Object. Owns provider credentials, serializes lease state, enforces active-lease and monthly spend caps, and expires stale leases by alarm. Auth is GitHub browser login or a shared bearer token.
- Runner — a throwaway machine reachable over SSH on the primary port
(default
2222) plus configured fallback ports, prepared with Crabbox's sync/run prerequisites. Linux uses Ubuntu with cloud-init and/work/crabbox; native Windows uses OpenSSH, Git for Windows, andC:\crabbox. No broker credentials live on the box. Project runtimes (Go, Node, Docker, services, secrets) come from your repo's GitHub Actions hydration, devcontainer, Nix, mise/asdf, or setup scripts — not from Crabbox.
The data plane — SSH, rsync, command execution — always runs directly from the CLI to the runner. The broker only manages leases, cost, and observability.
Only aws, azure, gcp, and hetzner can be brokered through the Worker,
and even those run direct from the CLI when no broker URL is configured. Every
other provider always runs direct. A direct-provider mode
(--provider hetzner|aws|azure|gcp|proxmox with local credentials) exists for
debugging the broker itself or using private infrastructure.
For the full mental model, see How Crabbox Works. For the doc-to-code map, see Source Map.
brew install openclaw/tap/crabbox
crabbox --versionNo Homebrew? Grab a GoReleaser archive for macOS, Linux, or Windows.
Laptop prerequisites: git, ssh, ssh-keygen, rsync, curl.
Broker access is deployment-specific. Use a coordinator URL from your team, use direct-provider mode for a personal cloud account, or self-host the Worker broker with your own provider credentials and spend caps. See Getting started and Infrastructure for the setup paths.
# log in once per machine (stores a broker token in user config)
crabbox login --url https://broker.example.com
# verify local prerequisites and broker reachability
crabbox doctor
# one-shot: lease, sync, run, release
crabbox run -- pnpm test
# named repo workflow from .crabbox.yaml
crabbox job run full-ci
# or warm a box once, then reuse it
crabbox warmup # prints cbx_... + a slug
crabbox run --id blue-lobster -- pnpm test:changed
crabbox ssh --id blue-lobster
crabbox stop blue-lobsterEvery lease has a stable cbx_... ID and a friendly crustacean slug
(blue-lobster, swift-hermit, …). Either works wherever an --id is
accepted. Use --slug <name> on fresh leases when a specific reusable slug
helps, and --label <text> on run when the history entry needs a
human-readable name.
Coordinator: brokered providers can run through the Worker (or direct when no
broker is configured); every other provider always runs direct from the CLI.
Targets: Linux, MacOS, Windows.
| Provider | provider: (aliases) |
Targets | Coordinator | Notes |
|---|---|---|---|---|
| AWS EC2 | aws |
L / M / W | brokered | EC2 instances and EC2 Mac; native AMI/EBS checkpoints. |
| Azure | azure |
L / W | brokered | VMs with Tailscale support; native Windows and WSL2. |
| Google Cloud | gcp (google, google-cloud) |
L | brokered | Linux Compute Engine VMs with Tailscale support. |
| Hetzner Cloud | hetzner |
L | brokered | Linux VMs with desktop/browser/code and Tailscale. |
| Parallels | parallels |
L / M / W | direct | Local or remote macOS host; checkpoint/fork/restore/snapshot. |
| Proxmox | proxmox |
L | direct | Clone Linux QEMU templates on a private Proxmox VE cluster. |
| Static SSH | ssh (static, static-ssh) |
L / M / W | direct | Existing machines; no provisioning. |
| Local Container | local-container (docker, container, local-docker) |
L | direct | Local Docker-compatible runtime (Docker Desktop, OrbStack, Colima). |
| exe.dev | exe-dev (exe, exedev) |
L | direct | exe.dev VMs exposed as public SSH leases. |
| Namespace Devbox | namespace-devbox (namespace, namespace-devboxes) |
L | direct | Namespace.so Devboxes over SSH. |
| Semaphore | semaphore (sem) |
L | direct | A Semaphore CI job leased as a testbox. |
| Sprites | sprites |
L | direct | Sprites microVMs through sprite proxy. |
| Daytona | daytona |
L | direct | Daytona-managed dev sandbox over SSH. |
| RunPod | runpod (run-pod, runpodio) |
L | direct | RunPod GPU pods with public SSH. |
| Provider | provider: (aliases) |
Targets | Notes |
|---|---|---|---|
| Cloudflare | cloudflare (cf) |
L | Cloudflare Containers via the Worker runtime. |
| E2B | e2b |
L | E2B Firecracker sandbox. |
| Islo | islo |
L | Islo sandbox. |
| Modal | modal |
L | Modal Sandbox through the local Python client. |
| Railway | railway (rail, railwayapp) |
L | Redeploy and stream an existing Railway service. |
| Tensorlake | tensorlake (tl, tensorlake-sbx) |
L | Tensorlake Firecracker sandbox via the Tensorlake CLI. |
| Upstash Box | upstash-box (upstash, box, upstashbox) |
L | Upstash Box through the Box REST API. |
| Azure Dynamic Sessions | azure-dynamic-sessions |
L | Azure Container Apps dynamic sessions. |
| Blacksmith Testbox | blacksmith-testbox (blacksmith) |
L | Delegated Blacksmith CI Testbox lifecycle and execution. |
| W&B Sandboxes | wandb (weights-and-biases) |
L | Weights & Biases Sandboxes; reuses wandb login credentials. |
See Providers for the full reference, capabilities, and authoring guide.
- One-shot or warm workspaces.
crabbox runfor fire-and-forget;crabbox warmup+--idfor repeated runs against the same box. See warmup and run. - Named repo jobs.
crabbox job run <name>lets repos define warmup, optional Actions hydration, run command, and cleanup policy in.crabbox.yaml. See Jobs. - Local-first workspace sync. No clean-checkout requirement. Tracked and nonignored files only, fingerprint skip on no-op runs, sanity checks against suspicious mass deletions, optional shallow base-ref hydration for changed-test workflows. See Sync.
- Run observability. Every coordinator-backed run gets an early
run_...handle. Usecrabbox attach <run-id>while it is active,crabbox events <run-id>for durable lifecycle/output events, andcrabbox logs <run-id>for retained output after completion. See History and logs and Observability. - GitHub Actions hydration.
crabbox actions hydrateruns supported setup steps from the repo's workflow locally over SSH, so leased boxes get the same runtimes and tooling without GitHub write access. Use--github-runneronly when setup needs full Actions semantics such as repository secrets, OIDC, service containers, or unsupporteduses:steps. See Actions hydration. - Failure capsules.
crabbox capsule from-actions <run-url>captures a failing CI run into a portable, replayable bundle;capsule replayreruns it. See Capsules. - Checkpoints. Save VM-or-workspace state and
restore/forkfrom it, via workspace archives or provider-native snapshots/images. See Checkpoints. - Pond peer groups. Leases that share a
--pond <name>label form an emergent peer group with discovery (pond peers), an SSH-mesh ofssh -Lforwards to members'--exposeports (pond connect), and bulkpond release. See Pond. - Brokered cloud with cost guardrails. Maintainers and agents share infra
without sharing provider tokens. Hetzner, AWS, Azure, and Google Cloud are
the managed providers; per-lease and monthly spend caps reject over-budget
leases. Providers fall back across compatible instance families when capacity
or quota rejects a request.
crabbox usagesummarizes spend by user, org, provider, and type. See Coordinator, Capacity fallback, and Cost and usage. - Interactive desktop, browser, and code leases.
--browserprovisions Chrome/Chromium for headless automation,--desktopprovisions a visible UI with tunnel-only VNC takeover, and--codeprovisions code-server on managed Linux.crabbox desktop click/paste/type/keyprovide first-class input helpers;desktop proofcaptures metadata, screenshot, diagnostics, MP4, and a contact-sheet PNG in one publishable bundle. See Interactive desktop and VNC. - Authenticated web portal. Browser login opens owner-scoped and shared
lease/run views with run logs/events, WebVNC, code-server, and telemetry
charts.
crabbox webvnc/crabbox codebridge a lease into the portal;crabbox sharegrants a lease to a user or the owning org. See Portal. - Agent workspace evidence. History, logs, events, telemetry, JUnit summaries, screenshots, recordings, artifacts, and PR publishing make autonomous work reviewable instead of only ephemeral terminal output. See Artifacts and Telemetry.
- Stable timing records.
--timing-jsononrun,warmup, andactions hydrategives scripts one machine-readable sync/command/total timing schema across providers. - Hardened coordinator auth. GitHub browser login, owner-scoped leases, admin-only routes, optional GitHub team allowlists, Cloudflare Access JWT verification, and service-token support keep normal use and operator automation separate. See Auth and admin and Security.
- OpenClaw plugin. The repo root is a native OpenClaw plugin for box lifecycle operations. See OpenClaw plugin below and OpenClaw plugin.
beast is the default for providers that expose class-based managed capacity.
The providers below fall back across ordered instance-type lists unless --type
pins a specific provider-native size.
Hetzner standard ccx33, cpx62, cx53
fast ccx43, cpx62, cx53
large ccx53, ccx43, cpx62, cx53
beast ccx63, ccx53, ccx43, cpx62, cx53
AWS Linux standard c7a/c7i/m7a/m7i.8xlarge family
fast …16xlarge family
large …24xlarge family
beast …48xlarge family, falling back to 32x/24x/16x
arm64 c7g/m7g/r7g families with --arch arm64
AWS Win standard m7i.large, m7a.large, t3.large
fast m7i.xlarge, m7a.xlarge, t3.xlarge
large m7i.2xlarge, m7a.2xlarge, t3.2xlarge
beast m7i.4xlarge, m7a.4xlarge, m7i.2xlarge
AWS WSL2 standard m8i.large, m8i-flex.large, c8i.large, r8i.large
fast m8i.xlarge, m8i-flex.xlarge, c8i.xlarge, r8i.xlarge
large m8i.2xlarge, m8i-flex.2xlarge, c8i.2xlarge, r8i.2xlarge
beast m8i.4xlarge, m8i-flex.4xlarge, c8i.4xlarge, r8i.4xlarge, m8i.2xlarge
AWS macOS all mac2.metal, then mac1.metal unless --type is set
Azure standard Standard_D32ads_v6, Standard_D32ds_v6, Standard_F32s_v2, then 16-vCPU fallbacks
fast Standard_D64ads_v6, Standard_D64ds_v6, Standard_F64s_v2, then 48/32-vCPU fallbacks
large Standard_D96ads_v6, Standard_D96ds_v6, then 64/48-vCPU fallbacks
beast Standard_D192ds_v6, Standard_D128ds_v6, then 96/64-vCPU fallbacks
arm64 Standard_D*ps_v6 / D*pds_v6 Cobalt families with --arch arm64
Azure Win/
WSL2 standard Standard_D2ads_v6, Standard_D2ds_v6, Standard_D2ads_v5, Standard_D2ds_v5, Standard_D2as_v6
fast Standard_D4ads_v6, Standard_D4ds_v6, Standard_D4ads_v5, Standard_D4ds_v5, Standard_D4as_v6
large Standard_D8ads_v6, Standard_D8ds_v6, Standard_D8ads_v5, Standard_D8ds_v5, Standard_D8as_v6
beast Standard_D16ads_v6, Standard_D16ds_v6, Standard_D16ads_v5, Standard_D16ds_v5, Standard_D8ads_v6
Namespace standard S
fast M
large L
beast XL
Cloudflare standard standard-4
fast standard-4
large standard-4
beast standard-4
Override with --type or CRABBOX_SERVER_TYPE for a specific instance. Use
--arch arm64 / architecture: arm64 for Linux ARM capacity on Azure or AWS;
explicit ARM provider types also select ARM images when no custom image is set.
Cloudflare also accepts lite, basic, standard-1, standard-2, and
standard-3 as smaller explicit --type values; standard-4 is the default.
Providers without a row either use provider-native capacity settings or reject
class/type selection.
Config resolves in order: flags → env → repo .crabbox.yaml → user
~/.config/crabbox/config.yaml → defaults.
broker:
url: https://broker.example.com
provider: aws
token: ...
class: beast
capacity:
market: spot
strategy: most-available
fallback: on-demand-after-120s
hints: true
aws:
region: eu-west-1
rootGB: 400
lease:
idleTimeout: 30m
ttl: 90m
ssh:
key: ~/.ssh/id_ed25519
user: crabbox
port: "2222"
# Ordered fallback ports tried after ssh.port; use [] to disable fallback.
fallbackPorts:
- "22"Forwarded environment is intentionally narrow: NODE_OPTIONS and CI. Do not
pass secrets as command-line arguments. For live-secret smoke tests, use
crabbox run --env-from-profile <file> --allow-env NAME so Crabbox forwards
only selected names and prints redacted presence/length metadata. For stale warm
boxes, --full-resync (alias --fresh-sync) resets the remote workdir before
syncing. For larger commands, use --script <file> or --script-stdin so the
remote runner executes an uploaded file instead of a giant quoted shell string.
For binary or terminal-hostile output, use crabbox run --capture-stdout <path>
or --capture-stderr <path>. Add --preflight for a remote capability
snapshot, --keep-on-failure to SSH into the exact failed one-shot lease, or
--download remote=local to copy a successful-run artifact back. Failed
SSH-backed and Blacksmith delegated runs save local .crabbox/captures/*.tar.gz
bundles by default. Captured files are not redacted by Crabbox.
Optional Tailscale reachability for managed Linux leases:
tailscale:
enabled: true
network: auto
tags:
- tag:crabbox
hostnameTemplate: crabbox-{slug}
authKeyEnv: CRABBOX_TAILSCALE_AUTH_KEY
exitNode: mac-studio.example.ts.net
exitNodeAllowLanAccess: trueTailscale is a network plane, not a provider. --tailscale joins new managed
Linux leases to the tailnet; --network auto|tailscale|public chooses how SSH
and VNC tunnel commands resolve the host. Brokered mode uses Worker OAuth
secrets to mint one-off keys; direct-provider mode reads the auth key from the
configured env var. See Tailscale.
A few provider-specific config snippets:
# Static macOS or Windows target (existing machine, no provisioning)
provider: ssh
target: windows
windows:
mode: normal # or wsl2
static:
host: win-dev.local
user: alice
port: "22"
workRoot: C:\crabbox# Local container (alias: docker; works with OrbStack as the active context)
provider: local-container
localContainer:
runtime: docker
image: debian:bookworm
workRoot: /work/crabbox# Delegated Blacksmith CI Testbox
provider: blacksmith-testbox
blacksmith:
org: example-org
workflow: .github/workflows/ci-check-testbox.yml
job: test
ref: main
idleTimeout: 90mKeep provider tokens in environment variables, not repo config (for example
CRABBOX_SEMAPHORE_TOKEN, CRABBOX_SPRITES_TOKEN, RUNPOD_API_KEY,
E2B_API_KEY, DAYTONA_API_KEY). The full env-var reference, per-provider
sections, and per-command flags are in docs/cli.md,
Configuration, and the
provider docs.
The repo root is a native OpenClaw plugin package. Once installed, it exposes Crabbox as agent tools:
crabbox_run,crabbox_warmup,crabbox_status,crabbox_list,crabbox_stop
The plugin shells out to the configured crabbox binary with argv arrays, so
local config, broker login, repo claims, and sync behavior stay owned by the
CLI. Set plugins.entries.crabbox.config.binary if crabbox is not on PATH.
Durable run inspection is intentionally CLI/skill-led instead of additional
plugin tools: use crabbox history, crabbox events --after --limit,
crabbox attach, crabbox logs, crabbox results, and crabbox usage from a
shell-capable agent. See OpenClaw plugin.
# Go CLI
go build -trimpath -o bin/crabbox ./cmd/crabbox
go vet ./...
go test -race ./...
# Cloudflare Worker (Node 22+ locally; CI runs Node 24)
npm ci --prefix worker
npm test --prefix worker
npm run build --prefix worker
# Docs
npm run docs:check
# Optional live smoke, when broker/provider credentials are available
CRABBOX_LIVE=1 CRABBOX_LIVE_REPO=/path/to/my-app scripts/live-smoke.shCI runs the full gate (gofmt, vet, race tests, all Go modules, coverage
threshold, docs link/build check, GoReleaser snapshot, and Worker
lint/typecheck/tests/build) on every push and PR. Tagged pushes matching v*
publish Go archives via GoReleaser and bump the Homebrew formula at
openclaw/homebrew-tap.
Worker deployment, required secrets, and DNS routing live in docs/infrastructure.md.
- Get the model: How Crabbox Works, Architecture, Concepts, Orchestrator
- Use the CLI: CLI, Commands, Features, Configuration
- Choose a provider: Providers, AWS, Azure, GCP, Hetzner
- Advanced features: Actions hydration, Capsules, Checkpoints, Jobs, Pond
- Interactive QA: Interactive Desktop and VNC, Artifacts, Portal
- Operate it: Operations, Observability, Troubleshooting, Performance
- Set it up or audit it: Infrastructure, Security, Getting Started, Source Map
- Changes: CHANGELOG.md
The GitHub Pages site at https://openclaw.github.io/crabbox/ is generated from
the docs/ Markdown:
npm run docs:check
open dist/docs-site/index.htmlMIT — see LICENSE.
