Multi-tenant NanoClaw on AWS
Create Bots · Connect Channels · Run Claude Agents in Isolated Cloud Environments
| 📐 Architecture Doc Full design details |
🚀 Deploy Guide One-command deploy |
💻 Local Dev Dev setup |
📨 Message Flow End-to-end walkthrough |
| 🔒 Security Auth & isolation |
📦 Packages Monorepo structure |
📋 TODO Roadmap & backlog |
📚 Architecture Deep-Dive Docs
| Doc | Topic |
|---|---|
| 04 — Layered Architecture | Service layers, channels, providers |
| 05 — Data Model | DynamoDB tables, S3 layout |
| 06–07 — Lifecycles | Bot & session lifecycle |
| 08 — Channel Management | Telegram, Discord, Slack, Feishu |
| 09–10 — Agent Runtime | AgentCore, Claude SDK, MCP tools |
| 11–12 — Security & Observability | ABAC, WAF, CloudWatch |
| 15 — CDK Deployment | 6-stack CDK infrastructure |
| 16 — System Prompt Builder | Agent context construction |
Evolved from NanoClaw — a single-user local bot framework — into a fully managed, multi-tenant cloud platform. Each user gets their own Bots with independent memory, conversations, and scheduled tasks.
User (Telegram/Discord/Slack)
│
▼ Webhook
ALB ──► ECS Fargate (Control Plane)
├── Webhook Handler → SQS FIFO
├── SQS Consumer → AgentCore Runtime (microVM)
│ └── Claude Agent SDK
│ └── Bedrock Claude
├── Reply Consumer → Channel API → User
└── REST API (JWT auth) ◄── Web Console (React SPA on CloudFront)
Data Layer: DynamoDB (state) │ S3 (sessions, memory) │ Secrets Manager (credentials)
Scheduling: EventBridge Scheduler → SQS → Agent
Auth: Cognito User Pool (JWT)
Security: WAF │ ABAC via STS SessionTags │ Per-tenant S3/DynamoDB isolation
Deployment modes: The diagram above shows the default
agentcoremode. Inecsmode (for AWS China regions), AgentCore microVMs are replaced by ECS Fargate dedicated tasks (one per botId#groupJid session, with a warm pool for instant dispatch), Cognito is replaced by a self-hosted OIDC auth service, and Bedrock is replaced by Anthropic API. See ECS Mode below.
graph TB
Users["👤 用户终端<br/>飞书/ Discord / Slack "]
subgraph Ingress["入口层"]
CloudFront["CloudFront + S3<br/>Web 控制台"]
ALB["ALB<br/>/api/* · /webhook/*"]
end
subgraph ECS["ECS Fargate (Control Plane)"]
Fastify["Fastify HTTP Server<br/>REST API · Webhook · 健康检查"]
SQSConsumer["SQS Consumer<br/>入站 FIFO "]
Adapters["Channel Adapters"]
end
subgraph Agent["Agent Execution Layer"]
MicroVM["AgentCore microVM<br/>Claude Agent SDK + Bedrock<br/>工具 · MCP · Skills"]
end
subgraph Data["Data Layer"]
DynamoDB[(DynamoDB)]
S3[(S3)]
Secrets[(Secrets Mgr)]
EventBridge[(EventBridge)]
end
Users -->|HTTPS / Webhook| Ingress
CloudFront --> ALB
ALB --> Fastify
Fastify --> SQSConsumer
SQSConsumer -->|InvokeAgent| MicroVM
MicroVM -->|send_message| SQSConsumer
SQSConsumer --> Adapters
Adapters -->|回复| Users
MicroVM --> DynamoDB
MicroVM --> S3
Fastify --> Secrets
MicroVM --> EventBridge
style Ingress fill:#E3F2FD,stroke:#1565C0,color:#000
style ECS fill:#FFF3E0,stroke:#E65100,color:#000
style Agent fill:#F3E5F5,stroke:#6A1B9A,color:#000
style Data fill:#FCE4EC,stroke:#C62828,color:#000
| Package | Description |
|---|---|
shared/ |
TypeScript types and utilities (ported from NanoClaw) |
infra/ |
AWS CDK — 6 stacks (Foundation, Auth, Agent, ControlPlane, Frontend, Monitoring) |
control-plane/ |
Fastify HTTP server + SQS consumers (runs on ECS Fargate) |
agent-runtime/ |
Claude Agent SDK wrapper (runs in AgentCore microVMs) |
web-console/ |
React SPA — bot management, channel config, message history, tasks |
auth-service/ |
Self-hosted OIDC auth service (JWT + DynamoDB user store, ECS mode only) |
| Decision | Choice | Why |
|---|---|---|
| Tenant model | One user, many Bots | Per-scenario isolation |
| Channel credentials | BYOK (Bring Your Own Key) | User controls their bots |
| Control plane | ECS Fargate (always-on) | No 15-min Lambda timeout |
| Agent runtime | AgentCore (microVM) / ECS Fargate (China) | Per-session isolation (global) / dedicated task per session with ABAC (China) |
| Agent SDK | Claude Agent SDK + Bedrock / Anthropic API | Configurable via AGENT_MODE |
| Message queue | SQS FIFO | Per-group ordering, cross-group parallelism |
| Database | DynamoDB | Serverless, millisecond latency |
| Auth | Cognito / Self-hosted OIDC | Cognito globally, self-hosted JWT for China |
| IaC | CDK (TypeScript) | Type-safe, same language as app |
| NanoClaw (single-user) | ClawBot Cloud (multi-tenant) |
|---|---|
| SQLite | DynamoDB (7 tables) |
Local filesystem (groups/) |
S3 (sessions, CLAUDE.md memory) |
| Docker containers | AgentCore microVMs |
| File-based IPC | MCP tools → AWS SDK (SQS, DynamoDB, EventBridge) |
| Polling loop | SQS FIFO consumer |
| Channel self-registration | Webhook HTTP endpoints |
| Credential proxy | IAM Roles + STS ABAC |
- Node.js >= 20
- Docker (for building ARM64 container images)
- AWS CLI configured (
aws configure) - AWS CDK bootstrapped (
cd infra && npx cdk bootstrap) jqinstalled (used by deploy script for JSON parsing)
# Full deployment (default stage: dev)
ADMIN_EMAIL=admin@example.com ADMIN_PASSWORD=SecurePass123! ./scripts/deploy.sh
# Deploy to a specific stage
CDK_STAGE=prod AWS_REGION=us-east-1 ADMIN_EMAIL=admin@company.com ADMIN_PASSWORD=Pr0d!Pass ./scripts/deploy.sh
ADMIN_EMAILandADMIN_PASSWORDare required — the script will abort if not set.
For deployment in AWS China regions (cn-north-1, cn-northwest-1) where Cognito, Bedrock, and AgentCore are unavailable:
# ECS mode deployment
DEPLOY_MODE=ecs ADMIN_EMAIL=admin@example.com ADMIN_PASSWORD=SecurePass123! ./scripts/deploy.sh
# ECS mode uses:
# - Self-hosted OIDC auth service (replaces Cognito)
# - ECS Fargate dedicated task per session (replaces AgentCore microVMs)
# Each botId#groupJid gets its own Fargate task with warm pool for instant dispatch
# - Anthropic API (replaces Bedrock) — requires per-user API keys| Variable | Required | Default | Description |
|---|---|---|---|
DEPLOY_MODE |
No | agentcore |
Deployment mode: agentcore (default) or ecs (China regions) |
ADMIN_EMAIL |
Yes | — | Email for the initial admin account |
ADMIN_PASSWORD |
Yes | — | Password for the initial admin account |
CDK_STAGE |
No | dev |
Deployment stage name |
AWS_REGION |
No | us-west-2 |
Target AWS region (use cn-northwest-1 or cn-north-1 for China) |
These parameters are passed via CDK context and control the ECS dedicated task model:
| Parameter | Default | Description |
|---|---|---|
minWarmTasks |
2 |
Number of pre-started idle Fargate tasks kept in warm pool for instant dispatch |
maxTasks |
500 |
Maximum total ECS agent tasks in the cluster |
idleTimeoutMinutes |
15 |
Minutes of inactivity before a dedicated task auto-stops |
The warm pool eliminates cold-start latency (~30-90s) for new sessions. When a message arrives for a new session, the control plane claims a warm task instantly from the pool and replenishes it in the background. If the pool is empty, a cold-start fallback launches a new task. Each session (botId#groupJid) gets its own dedicated Fargate task; tasks self-stop after the idle timeout.
The DEPLOY_MODE=ecs flag:
- Builds and pushes an additional auth-service Docker image
- Passes
--context mode=ecsto CDK (creates auth ECS service + agent ECS service instead of Cognito + AgentCore) - Skips AgentCore registration steps (8, 9, 9b, 10, 11)
- Configures web-console for OIDC auth instead of Cognito
- Seeds admin directly in DynamoDB (bcrypt hash generated via Node.js)
The deploy script runs 17 steps in order:
- Pre-flight checks (aws, docker, node, jq)
npm install+ build all workspaces- ECR login (creates repos if missing)
- Build & push control-plane Docker image (ARM64)
- Build & push agent-runtime Docker image (ARM64)
- CDK deploy all 6 stacks
- Read stack outputs (Cognito IDs, bucket names, role ARNs, ALB DNS, CDN domain)
- Register AgentCore runtime (idempotent — skips if already exists)
- Wait for AgentCore status READY
- Update ECS task definition with
AGENTCORE_RUNTIME_ARNenv var - Force new ECS deployment
- Build web-console with Cognito config injected via env vars
- Sync
web-console/dist/to S3 frontend bucket - CloudFront cache invalidation
- Smoke test (
/healthendpoint) - Seed default admin account (idempotent — skips if already exists)
- Write AgentCore runtime ARN to SSM Parameter Store
Steps 5b, 8-11, 16-17 behave differently in ECS mode. See
DEPLOY_MODE=ecsabove.
Admin account: Since Cognito self-signup is disabled, Step 16 creates the initial admin user.
ADMIN_EMAILandADMIN_PASSWORDare required env vars — the script will not start without them.
./scripts/destroy.sh # default stage: dev
CDK_STAGE=prod ./scripts/destroy.sh # specific stageReverse order: delete AgentCore runtime (wait for deletion) → CDK destroy all stacks → delete ECR repos.
# Run control plane locally (pointing at deployed AWS resources)
cd control-plane
cp .env.example .env # fill in values from CDK outputs
npm run dev
# Run web console locally
cd web-console
npm run dev # opens http://localhost:5173cloud_native_nanoclaw/
├── scripts/
│ ├── deploy.sh # One-command full deployment (17 steps)
│ └── destroy.sh # Reverse teardown
├── shared/src/
│ ├── types.ts # User, Bot, Channel, Message, Task, Session...
│ ├── xml-formatter.ts # Agent context formatting (from NanoClaw)
│ └── text-utils.ts # Output processing
├── auth-service/src/
│ ├── server.ts # Fastify auth service (login, refresh, admin)
│ ├── jwt.ts # RS256 signing, JWKS endpoint
│ └── password.ts # bcrypt password hashing
├── infra/
│ ├── bin/app.ts # CDK app entry
│ └── lib/
│ ├── foundation-stack.ts # VPC, S3, DynamoDB, SQS, ECR
│ ├── auth-stack.ts # Cognito
│ ├── agent-stack.ts # IAM Roles (ABAC)
│ ├── control-plane-stack.ts# ALB, ECS Fargate, WAF
│ ├── frontend-stack.ts # CloudFront + S3
│ └── monitoring-stack.ts # CloudWatch, alarms
├── control-plane/src/
│ ├── index.ts # Fastify app + SQS consumer startup
│ ├── webhooks/ # Telegram, Discord, Slack handlers
│ ├── sqs/ # Message dispatcher + reply consumer
│ ├── routes/api/ # REST API (bots, channels, groups, tasks)
│ ├── services/ # DynamoDB, cache, credential lookups
│ └── channels/ # Channel API clients
├── agent-runtime/src/
│ ├── server.ts # HTTP server (/invocations, /ping)
│ ├── agent.ts # Claude Agent SDK integration
│ ├── session.ts # S3 session sync
│ ├── memory.ts # Multi-layer CLAUDE.md
│ ├── scoped-credentials.ts # STS ABAC
│ ├── mcp-tools.ts # send_message, schedule_task, etc.
│ └── mcp-server.ts # MCP stdio server
└── web-console/src/
├── pages/ # Login, Dashboard, BotDetail, ChannelSetup...
├── lib/ # Auth (Cognito), API client
└── components/ # Layout
- User sends
@Bot helloin Telegram group - Telegram POST →
/webhook/telegram/{bot_id}(ALB → Fargate) - Webhook handler verifies signature, stores message in DynamoDB, enqueues to SQS FIFO
- SQS consumer dequeues, loads recent messages, invokes AgentCore Runtime (async fire-and-forget)
- AgentCore returns
acceptedimmediately, agent runs in background → Claude Agent SDKquery() - Agent generates response, optionally uses MCP tools (schedule_task, send_message)
- Final reply sent via SQS reply queue → Reply Consumer stores in DynamoDB, sends via Telegram API
- User sees reply in Telegram
- Auth: Cognito JWT on all
/api/*routes (agentcore mode) or self-hosted JWKS JWT (ecs mode) - Webhooks: Per-channel signature verification (Telegram secret token, Discord Ed25519, Slack HMAC-SHA256)
- Data isolation: ABAC via STS SessionTags — agents can only access their owner's S3 paths and DynamoDB records
- Network: Fargate in private subnets, WAF rate limiting (2000 req/5min/IP)
- Credentials: Channel tokens stored in Secrets Manager, never exposed to agents
| Component | ~Monthly Cost |
|---|---|
| AgentCore (30 msgs/day, 18s avg) | $0.40 |
| Bedrock Claude tokens | $5.40 |
| Fargate (2 tasks, 0.5 vCPU) | $30 |
| ALB | $16 |
| DynamoDB (on-demand) | $0.50 |
| S3 + CloudFront | $0.60 |
| Total (1 user) | ~$53/mo |
| 100 users (amortized) | ~$8/user/mo |
| Resource | Description |
|---|---|
| 📊 Architecture PPT | Visual system overview slides |
| 📐 Cloud Architecture | Full design document with all details |
| 📋 TODO & Roadmap | Backlog, deferred items, future work |