GitHub - aws-samples/sample-cloud-native-nanoclaw

Multi-tenant NanoClaw on AWS

Create Bots · Connect Channels · Run Claude Agents in Isolated Cloud Environments

📐 Architecture Doc _{Full design details}	🚀 Deploy Guide _{One-command deploy}	💻 Local Dev _{Dev setup}	📨 Message Flow _{End-to-end walkthrough}
🔒 Security _{Auth & isolation}	📦 Packages _{Monorepo structure}	📋 TODO _{Roadmap & backlog}

📚 Architecture Deep-Dive Docs

Doc	Topic
04 — Layered Architecture	Service layers, channels, providers
05 — Data Model	DynamoDB tables, S3 layout
06–07 — Lifecycles	Bot & session lifecycle
08 — Channel Management	Telegram, Discord, Slack, Feishu
09–10 — Agent Runtime	AgentCore, Claude SDK, MCP tools
11–12 — Security & Observability	ABAC, WAF, CloudWatch
15 — CDK Deployment	6-stack CDK infrastructure
16 — System Prompt Builder	Agent context construction

Evolved from NanoClaw — a single-user local bot framework — into a fully managed, multi-tenant cloud platform. Each user gets their own Bots with independent memory, conversations, and scheduled tasks.

Architecture

User (Telegram/Discord/Slack)
  │
  ▼ Webhook
ALB ──► ECS Fargate (Control Plane)
         ├── Webhook Handler → SQS FIFO
         ├── SQS Consumer → AgentCore Runtime (microVM)
         │                    └── Claude Agent SDK
         │                        └── Bedrock Claude
         ├── Reply Consumer → Channel API → User
         └── REST API (JWT auth) ◄── Web Console (React SPA on CloudFront)

Data Layer: DynamoDB (state) │ S3 (sessions, memory) │ Secrets Manager (credentials)
Scheduling: EventBridge Scheduler → SQS → Agent
Auth: Cognito User Pool (JWT)
Security: WAF │ ABAC via STS SessionTags │ Per-tenant S3/DynamoDB isolation

Deployment modes: The diagram above shows the default agentcore mode. In ecs mode (for AWS China regions), AgentCore microVMs are replaced by ECS Fargate dedicated tasks (one per botId#groupJid session, with a warm pool for instant dispatch), Cognito is replaced by a self-hosted OIDC auth service, and Bedrock is replaced by Anthropic API. See ECS Mode below.

graph TB
    Users["👤 用户终端<br/>飞书/ Discord / Slack "]

    subgraph Ingress["入口层"]
        CloudFront["CloudFront + S3<br/>Web 控制台"]
        ALB["ALB<br/>/api/* · /webhook/*"]
    end

    subgraph ECS["ECS Fargate (Control Plane)"]
        Fastify["Fastify HTTP Server<br/>REST API · Webhook · 健康检查"]
        SQSConsumer["SQS Consumer<br/>入站 FIFO "]
        Adapters["Channel Adapters"]
    end

    subgraph Agent["Agent Execution Layer"]
        MicroVM["AgentCore microVM<br/>Claude Agent SDK + Bedrock<br/>工具 · MCP · Skills"]
    end

    subgraph Data["Data Layer"]
        DynamoDB[(DynamoDB)]
        S3[(S3)]
        Secrets[(Secrets Mgr)]
        EventBridge[(EventBridge)]
    end

    Users -->|HTTPS / Webhook| Ingress
    CloudFront --> ALB
    ALB --> Fastify
    Fastify --> SQSConsumer
    SQSConsumer -->|InvokeAgent| MicroVM
    MicroVM -->|send_message| SQSConsumer
    SQSConsumer --> Adapters
    Adapters -->|回复| Users

    MicroVM --> DynamoDB
    MicroVM --> S3
    Fastify --> Secrets
    MicroVM --> EventBridge

    style Ingress fill:#E3F2FD,stroke:#1565C0,color:#000
    style ECS fill:#FFF3E0,stroke:#E65100,color:#000
    style Agent fill:#F3E5F5,stroke:#6A1B9A,color:#000
    style Data fill:#FCE4EC,stroke:#C62828,color:#000

Packages

Package	Description
`shared/`	TypeScript types and utilities (ported from NanoClaw)
`infra/`	AWS CDK — 6 stacks (Foundation, Auth, Agent, ControlPlane, Frontend, Monitoring)
`control-plane/`	Fastify HTTP server + SQS consumers (runs on ECS Fargate)
`agent-runtime/`	Claude Agent SDK wrapper (runs in AgentCore microVMs)
`web-console/`	React SPA — bot management, channel config, message history, tasks
`auth-service/`	Self-hosted OIDC auth service (JWT + DynamoDB user store, ECS mode only)

Key Decisions

Decision	Choice	Why
Tenant model	One user, many Bots	Per-scenario isolation
Channel credentials	BYOK (Bring Your Own Key)	User controls their bots
Control plane	ECS Fargate (always-on)	No 15-min Lambda timeout
Agent runtime	AgentCore (microVM) / ECS Fargate (China)	Per-session isolation (global) / dedicated task per session with ABAC (China)
Agent SDK	Claude Agent SDK + Bedrock / Anthropic API	Configurable via AGENT_MODE
Message queue	SQS FIFO	Per-group ordering, cross-group parallelism
Database	DynamoDB	Serverless, millisecond latency
Auth	Cognito / Self-hosted OIDC	Cognito globally, self-hosted JWT for China
IaC	CDK (TypeScript)	Type-safe, same language as app

NanoClaw → Cloud Mapping

NanoClaw (single-user)	ClawBot Cloud (multi-tenant)
SQLite	DynamoDB (7 tables)
Local filesystem (`groups/`)	S3 (sessions, CLAUDE.md memory)
Docker containers	AgentCore microVMs
File-based IPC	MCP tools → AWS SDK (SQS, DynamoDB, EventBridge)
Polling loop	SQS FIFO consumer
Channel self-registration	Webhook HTTP endpoints
Credential proxy	IAM Roles + STS ABAC

Prerequisites

Node.js >= 20
Docker (for building ARM64 container images)
AWS CLI configured (aws configure)
AWS CDK bootstrapped (cd infra && npx cdk bootstrap)
jq installed (used by deploy script for JSON parsing)

Deployment

One-Command Deploy

# Full deployment (default stage: dev)
ADMIN_EMAIL=admin@example.com ADMIN_PASSWORD=SecurePass123! ./scripts/deploy.sh

# Deploy to a specific stage
CDK_STAGE=prod AWS_REGION=us-east-1 ADMIN_EMAIL=admin@company.com ADMIN_PASSWORD=Pr0d!Pass ./scripts/deploy.sh

ADMIN_EMAIL and ADMIN_PASSWORD are required — the script will abort if not set.

ECS Mode (China Regions)

For deployment in AWS China regions (cn-north-1, cn-northwest-1) where Cognito, Bedrock, and AgentCore are unavailable:

# ECS mode deployment
DEPLOY_MODE=ecs ADMIN_EMAIL=admin@example.com ADMIN_PASSWORD=SecurePass123! ./scripts/deploy.sh

# ECS mode uses:
# - Self-hosted OIDC auth service (replaces Cognito)
# - ECS Fargate dedicated task per session (replaces AgentCore microVMs)
#   Each botId#groupJid gets its own Fargate task with warm pool for instant dispatch
# - Anthropic API (replaces Bedrock) — requires per-user API keys

Variable	Required	Default	Description
`DEPLOY_MODE`	No	`agentcore`	Deployment mode: `agentcore` (default) or `ecs` (China regions)
`ADMIN_EMAIL`	Yes	—	Email for the initial admin account
`ADMIN_PASSWORD`	Yes	—	Password for the initial admin account
`CDK_STAGE`	No	`dev`	Deployment stage name
`AWS_REGION`	No	`us-west-2`	Target AWS region (use `cn-northwest-1` or `cn-north-1` for China)

ECS Mode Parameters (CDK context)

These parameters are passed via CDK context and control the ECS dedicated task model:

Parameter	Default	Description
`minWarmTasks`	`2`	Number of pre-started idle Fargate tasks kept in warm pool for instant dispatch
`maxTasks`	`500`	Maximum total ECS agent tasks in the cluster
`idleTimeoutMinutes`	`15`	Minutes of inactivity before a dedicated task auto-stops

The warm pool eliminates cold-start latency (~30-90s) for new sessions. When a message arrives for a new session, the control plane claims a warm task instantly from the pool and replenishes it in the background. If the pool is empty, a cold-start fallback launches a new task. Each session (botId#groupJid) gets its own dedicated Fargate task; tasks self-stop after the idle timeout.

The DEPLOY_MODE=ecs flag:

Builds and pushes an additional auth-service Docker image
Passes --context mode=ecs to CDK (creates auth ECS service + agent ECS service instead of Cognito + AgentCore)
Skips AgentCore registration steps (8, 9, 9b, 10, 11)
Configures web-console for OIDC auth instead of Cognito
Seeds admin directly in DynamoDB (bcrypt hash generated via Node.js)

The deploy script runs 17 steps in order:

Pre-flight checks (aws, docker, node, jq)
npm install + build all workspaces
ECR login (creates repos if missing)
Build & push control-plane Docker image (ARM64)
Build & push agent-runtime Docker image (ARM64)
CDK deploy all 6 stacks
Read stack outputs (Cognito IDs, bucket names, role ARNs, ALB DNS, CDN domain)
Register AgentCore runtime (idempotent — skips if already exists)
Wait for AgentCore status READY
Update ECS task definition with AGENTCORE_RUNTIME_ARN env var
Force new ECS deployment
Build web-console with Cognito config injected via env vars
Sync web-console/dist/ to S3 frontend bucket
CloudFront cache invalidation
Smoke test (/health endpoint)
Seed default admin account (idempotent — skips if already exists)
Write AgentCore runtime ARN to SSM Parameter Store

Steps 5b, 8-11, 16-17 behave differently in ECS mode. See DEPLOY_MODE=ecs above.

Admin account: Since Cognito self-signup is disabled, Step 16 creates the initial admin user. ADMIN_EMAIL and ADMIN_PASSWORD are required env vars — the script will not start without them.

Teardown

./scripts/destroy.sh                    # default stage: dev
CDK_STAGE=prod ./scripts/destroy.sh     # specific stage

Reverse order: delete AgentCore runtime (wait for deletion) → CDK destroy all stacks → delete ECR repos.

Local Development

# Run control plane locally (pointing at deployed AWS resources)
cd control-plane
cp .env.example .env   # fill in values from CDK outputs
npm run dev

# Run web console locally
cd web-console
npm run dev            # opens http://localhost:5173

Project Structure

cloud_native_nanoclaw/
├── scripts/
│   ├── deploy.sh             # One-command full deployment (17 steps)
│   └── destroy.sh            # Reverse teardown
├── shared/src/
│   ├── types.ts              # User, Bot, Channel, Message, Task, Session...
│   ├── xml-formatter.ts      # Agent context formatting (from NanoClaw)
│   └── text-utils.ts         # Output processing
├── auth-service/src/
│   ├── server.ts             # Fastify auth service (login, refresh, admin)
│   ├── jwt.ts                # RS256 signing, JWKS endpoint
│   └── password.ts           # bcrypt password hashing
├── infra/
│   ├── bin/app.ts            # CDK app entry
│   └── lib/
│       ├── foundation-stack.ts   # VPC, S3, DynamoDB, SQS, ECR
│       ├── auth-stack.ts         # Cognito
│       ├── agent-stack.ts        # IAM Roles (ABAC)
│       ├── control-plane-stack.ts# ALB, ECS Fargate, WAF
│       ├── frontend-stack.ts     # CloudFront + S3
│       └── monitoring-stack.ts   # CloudWatch, alarms
├── control-plane/src/
│   ├── index.ts              # Fastify app + SQS consumer startup
│   ├── webhooks/             # Telegram, Discord, Slack handlers
│   ├── sqs/                  # Message dispatcher + reply consumer
│   ├── routes/api/           # REST API (bots, channels, groups, tasks)
│   ├── services/             # DynamoDB, cache, credential lookups
│   └── channels/             # Channel API clients
├── agent-runtime/src/
│   ├── server.ts             # HTTP server (/invocations, /ping)
│   ├── agent.ts              # Claude Agent SDK integration
│   ├── session.ts            # S3 session sync
│   ├── memory.ts             # Multi-layer CLAUDE.md
│   ├── scoped-credentials.ts # STS ABAC
│   ├── mcp-tools.ts          # send_message, schedule_task, etc.
│   └── mcp-server.ts         # MCP stdio server
└── web-console/src/
    ├── pages/                # Login, Dashboard, BotDetail, ChannelSetup...
    ├── lib/                  # Auth (Cognito), API client
    └── components/           # Layout

Message Flow

User sends @Bot hello in Telegram group
Telegram POST → /webhook/telegram/{bot_id} (ALB → Fargate)
Webhook handler verifies signature, stores message in DynamoDB, enqueues to SQS FIFO
SQS consumer dequeues, loads recent messages, invokes AgentCore Runtime (async fire-and-forget)
AgentCore returns accepted immediately, agent runs in background → Claude Agent SDK query()
Agent generates response, optionally uses MCP tools (schedule_task, send_message)
Final reply sent via SQS reply queue → Reply Consumer stores in DynamoDB, sends via Telegram API
User sees reply in Telegram

Security

Auth: Cognito JWT on all /api/* routes (agentcore mode) or self-hosted JWKS JWT (ecs mode)
Webhooks: Per-channel signature verification (Telegram secret token, Discord Ed25519, Slack HMAC-SHA256)
Data isolation: ABAC via STS SessionTags — agents can only access their owner's S3 paths and DynamoDB records
Network: Fargate in private subnets, WAF rate limiting (2000 req/5min/IP)
Credentials: Channel tokens stored in Secrets Manager, never exposed to agents

Cost Estimate (single user)

Component	~Monthly Cost
AgentCore (30 msgs/day, 18s avg)	$0.40
Bedrock Claude tokens	$5.40
Fargate (2 tasks, 0.5 vCPU)	$30
ALB	$16
DynamoDB (on-demand)	$0.50
S3 + CloudFront	$0.60
Total (1 user)	~$53/mo
100 users (amortized)	~$8/user/mo

Documentation

Resource	Description
📊 Architecture PPT	Visual system overview slides
📐 Cloud Architecture	Full design document with all details
📋 TODO & Roadmap	Backlog, deferred items, future work

Name		Name	Last commit message	Last commit date
Latest commit History 284 Commits
agent-runtime		agent-runtime
auth-service		auth-service
control-plane		control-plane
docs		docs
infra		infra
scripts		scripts
shared		shared
web-console		web-console
.gitignore		.gitignore
.npmrc		.npmrc
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Architecture

Packages

Key Decisions

NanoClaw → Cloud Mapping

Prerequisites

Deployment

One-Command Deploy

ECS Mode (China Regions)

ECS Mode Parameters (CDK context)

Teardown

Local Development

Project Structure

Message Flow

Security

Cost Estimate (single user)

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Architecture

Packages

Key Decisions

NanoClaw → Cloud Mapping

Prerequisites

Deployment

One-Command Deploy

ECS Mode (China Regions)

ECS Mode Parameters (CDK context)

Teardown

Local Development

Project Structure

Message Flow

Security

Cost Estimate (single user)

Documentation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages