A fork of beam-cloud/beta9 for Agentosaurus
Enable external worker support with a new Go-based agent and unified inference routing, so you can run hybrid GPU workloads over SSH/Tailscale with clear docs and a multi-arch CI build pipeline.
Work in Progress - This fork is under active development as the compute infrastructure layer for Agentosaurus, a platform focused on democratizing GPU access for climate research and AI workloads within the European Union.
Build a distributed GPU compute platform that:
- Brings Your Own GPU: Connect any machine (cloud VMs, workstations, Mac Studios) to a unified compute pool
- Supports Heterogeneous Hardware: NVIDIA CUDA, Apple MPS, AMD ROCm (planned), Intel Arc (planned)
- Enables Serverless by Default: Scale to zero, pay only for compute time used
- Maintains EU Data Sovereignty: All data and compute remains within European infrastructure
- Prioritizes Carbon Efficiency: Verified renewable energy usage with transparent carbon reporting
TAILSCALE MESH VPN
(Encrypted overlay network - 100.x.x.x addressing)
┌─────────────────────────────────────────────────────────────┐
│ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ GATEWAY │ │ WORKER 1 │ │ WORKER 2 │
│ (OCI Cloud) │ │ (Mac MPS) │ │ (NVIDIA GPU) │
│ │ │ │ │ │
│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │
│ │ Gateway │◄─┼─────┼──│ b9agent │ │ │ │ b9agent │ │
│ │ :1993/94 │ │ │ │ :9999 │ │ │ │ :9999 │ │
│ └───────────┘ │ │ └─────┬─────┘ │ │ └─────┬─────┘ │
│ │ │ │ │ │ │ │ │
│ ▼ │ │ ▼ │ │ ▼ │
│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │
│ │ k3s API │◄─┼─────┼──│ Ollama │ │ │ │ vLLM │ │
│ │ Scheduler │ │ │ │ :11434 │ │ │ │ :8000 │ │
│ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Control Plane MPS Inference CUDA Inference
- New Go-based Agent (
b9agent): Includes TUI, persistent config, control API (start/stop/pull models), keepalive, and job monitoring. - Unified Inference Routing: Gateway inference router, model registry, and OpenAI-compatible endpoints; enabled Tailscale and hostNetwork for external connectivity.
- External Worker API: Expanded machine API with register/keepalive and TTL-based lifecycle; added detailed API/docs for external workers and self-hosting.
- Python SDK: Added
beta9.inferencemodule (chat/generate/embed) and test scripts. - Flexible Configuration: External worker config supports direct Redis host, external image registries/ports, CoreDNS override, and Podman-friendly k3d settings.
- Multi-Arch CI: New GitHub Action to build/push multi-arch images to
registry.agentosaurus.comwith improved sequencing and reliability.
- Registry Credentials: Create
registry-credentialssecret and setExternalImageRegistry/ runner registry in config. - Networking: Configure
TAILSCALE_AUTHKEYor directREDIS_HOST; expose NodePorts for Redis/S3/Registry; deploy Gateway withhostNetwork: true. - Agent Setup: Initialize and run
b9agent, then set up SSH tunnel (forward 1994; reverse 6443 if needed). - K8s Config: Apply CoreDNS and metrics-server manifests; update k3d config for Podman compatibility.
Real-time terminal interface showing:
╔══ Beta9 Agent: 1c1b50c8 ═══════════════════════════════════════════════╗
║ Status: READY │ Gateway: http://100.72.101.23:1994 │ Pool: external ║
║ CPU: 28.1% │ Memory: 78.1% │ GPUs: 0 │ Last Heartbeat: 25s ago ║
╠════════════════════════════════════════════════════════════════════════╣
║ WORKER PODS ║
╟────────────────────────────────────────────────────────────────────────╢
║ No jobs yet ║
╠════════════════════════════════════════════════════════════════════════╣
║ INFERENCE ║
╟────────────────────────────────────────────────────────────────────────╢
║ Status: running │ Endpoint: 100.100.74.117:11434 ║
║ Models: gemma3:1b ║
╠════════════════════════════════════════════════════════════════════════╣
║ LOGS ║
╟────────────────────────────────────────────────────────────────────────╢
║ 10:15:23 Control API listening on :9999 ║
║ 10:15:24 Inference: starting Ollama... ║
║ 10:15:26 Inference: ready on :11434 ║
╚════════════════════════════════════════════════════════════════════════╝
Press Ctrl+C to quit
- Go 1.21+ (for building the agent)
- Tailscale account and network
- Ollama (for Mac inference)
cd backend/beta9
go build ./cmd/b9agent/..../b9agent init \
--gateway <GATEWAY_TAILSCALE_IP>:1994 \
--token <MACHINE_TOKEN> \
--pool external./b9agent# Test inference pipeline
TEST_MODEL=llama3.2 ./backend/remote_servers/scripts/dgpu/test_inference.shAgent configuration is stored in ~/.b9agent/config.yaml:
gateway:
host: "100.72.101.23"
port: 1994
machine:
id: "1c1b50c8"
token: "<machine-token>"
hostname: "100.100.74.117"
pool: "external"
k3s:
token: "<k3s-bearer-token>"The inference module provides a lightweight client for inference endpoints:
from beta9 import inference
# Configure endpoint
inference.configure(host="100.100.74.117", port=11434)
# Chat completion
result = inference.chat(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}]
)
print(result.content)
# Text generation
result = inference.generate(
model="llama3.2",
prompt="Once upon a time"
)
# Embeddings
embedding = inference.embed(
model="nomic-embed-text",
input="Hello world"
)
# List models
models = inference.list_models()Run the inference test suite:
# Default model (llama3.2)
./backend/remote_servers/scripts/dgpu/test_inference.sh
# Custom model
TEST_MODEL=gemma3:1b ./backend/remote_servers/scripts/dgpu/test_inference.sh
# Custom host
BETA9_INFERENCE_HOST=100.100.74.117 ./backend/remote_servers/scripts/dgpu/test_inference.shTest output:
[0/6] Sending start-inference command to agent... ✓
[1/6] Testing health endpoint... ✓
[2/6] Checking model availability... ✓
[3/6] Testing chat via curl... ✓
[4/6] Testing Python SDK... ✓
[5/6] Testing latency (3 requests)... ✓
[6/6] Stopping inference server... ✓
- External worker registration and keepalive
- Apple Silicon (MPS) inference via Ollama
- Control API for inference lifecycle
- TUI dashboard with inference status and logs
- Python SDK for inference
- vLLM integration for NVIDIA GPUs
- SGLang integration for structured outputs
- Model routing based on hardware capabilities
- Automatic model format conversion (GGUF/Safetensors)
- Prometheus metrics export
- Carbon footprint tracking
- Rate limiting and quotas
- Multi-tenant isolation
- EU AI Act compliance reporting
- Model provenance tracking
- Audit logging
- SSO integration
beta9/
├── cmd/
│ └── b9agent/ # Go agent binary
│ └── main.go
├── pkg/
│ └── agent/
│ ├── agent.go # Agent lifecycle management
│ ├── control.go # HTTP control API
│ ├── inference.go # OllamaManager and inference types
│ ├── state.go # Agent state for TUI
│ └── tui.go # Terminal UI rendering
├── sdk/
│ └── src/
│ └── beta9/
│ └── inference.py # Python inference SDK
└── docs/
└── external-workers/ # External worker documentation
- Agentosaurus - Organization discovery platform and parent project
- FlowState - AI presentation system using this compute layer
- beam-cloud/beta9 - Upstream project
This fork maintains the same AGPL-3.0 license as the upstream beta9 project.
Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.
- Beam Cloud for the original beta9 project
- Ollama for the inference server
- Tailscale for the mesh VPN infrastructure
Beta9 uses Tailscale for secure mesh networking between components.
- Network Isolation: All traffic between Gateway, Workers, and Clients travels over an encrypted WireGuard mesh.
- Endpoint Security:
- Internal management endpoints (inference control, keepalives) are bound to
0.0.0.0but are effectively protected because the nodes are only reachable via the private Tailscale network. - The Gateway uses
hostNetwork: trueto expose these services directly to the mesh. - Note: Do not expose the Gateway's ports (1993, 1994) to the public internet. Access should only be possible via the Tailscale mesh or a secure ingress.
- Internal management endpoints (inference control, keepalives) are bound to
The following security enhancements are planned for the upcoming Security Epic but are currently mitigated by the network architecture:
- Inference Endpoint Auth (
beta9-b1j): Direct inference endpoints (port 11434/8000) currently do not require per-request authentication.- Mitigation: These ports are only accessible within the encrypted Tailscale mesh. Workers are isolated from the public internet.
- RBAC Scoping (
beta9-7aw): The k3d manifest uses broad ClusterRole permissions for simplicity during beta.- Mitigation: The cluster is intended for single-tenant use. Scoped RBAC profiles will be introduced in Phase 3.
- Token Binding (
beta9-bx0): Keepalive tokens are not strictly bound to machine IDs in the current implementation.- Mitigation: Tokens are secret and transmitted over encrypted channels. Token binding will be enforced in the next auth refactor.