Prioritized list of upgrades as user load increases. Each tier assumes the previous is done.
All battles, oracle rounds, and user operations share 10 connections. Under load this queues up fast.
File: backend/src/db/connection.ts
Change: max: 10 → max: 25 — Applied
Standard Render can handle more than 500 sockets. Bump the cap and adjust thresholds.
Env vars:
MAX_GLOBAL_CONNECTIONS=1000
LOAD_YELLOW_THRESHOLD=0.70
LOAD_RED_THRESHOLD=0.90
Currently unlimited — a bot spam attack could create hundreds of battles and choke the tick loop. Add a max (e.g. 100 concurrent battles).
File: backend/src/services/clawbotBattleManager.ts
Change: Check battles.size >= 100 before creating new battles — Applied
Enables horizontal scaling (multiple backend instances behind a load balancer). Use Redis for:
- Socket.IO adapter (multi-instance pub/sub)
- Price cache (avoid duplicate Pyth calls)
- Session/rate limit state
Cost: ~$10/mo (Render Redis or Upstash)
Currently only SOL runs oracle rounds. Adding assets multiplies capacity since each runs independently.
File: backend/src/services/predictionService.ts
Change: Call addAsset('ETH'), addAsset('BTC') at startup. Price feeds already support them.
If you're on a shared Postgres instance, upgrade to dedicated. Watch for connection limit caps on the DB side (most shared plans cap at 20-50 connections).
With Redis in place (Tier 2), spin up a second Render instance behind a load balancer.
Requirements:
- Redis adapter for Socket.IO (so events broadcast across instances)
- Sticky sessions OR stateless battle tick design
- Shared DB (already PostgreSQL, so this works)
Move battle tick execution to a dedicated worker process. The main process handles HTTP/WebSocket, the worker handles tick loops. Prevents battle CPU from blocking socket handling.
Architecture:
Main process: HTTP + WebSocket + Oracle rounds
Worker process: Battle tick loops + bot execution
Communication: Redis pub/sub
If Vercel isn't already caching aggressively, ensure static assets (JS bundles, images, sounds) are served from CDN edge. Vercel handles this by default, but verify cache headers.
Separate the price polling into its own microservice. Publishes to Redis; all backend instances subscribe. Eliminates duplicate Pyth/Jupiter calls.
Route read-heavy queries (leaderboard, stats, history) to read replicas. Writes (bets, settlements) go to primary.
Assign battles to specific worker instances (shard by battle ID). Each worker handles a subset of battles, distributing CPU load evenly.
Move on-chain settlement (credit_winnings, transfer_to_global_vault) to a job queue with retry logic. Currently settlement is inline — if Solana RPC is slow, it blocks the round lifecycle.
Track these metrics to know when to move to the next tier:
| Metric | Yellow Flag | Red Flag |
|---|---|---|
| Socket connections | >500 sustained | >800 sustained |
| DB pool utilization | >60% (6/10 active) | >80% (8/10 active) |
| Battle tick drift | >500ms late | >2s late |
| Oracle settle latency | >3s | >10s |
| Memory usage | >70% of limit | >85% of limit |
| API response time (p95) | >500ms | >2s |
Check these via:
GET /api/health(with admin key for detailed view)- Discord alerts (sustained red tier, settlement failures)
- Render metrics dashboard (CPU, memory, response times)
| Tier | Additional Cost | When |
|---|---|---|
| Tier 1 | $0 | Now |
| Tier 2 | ~$10-25/mo (Redis + DB upgrade) | 200+ users |
| Tier 3 | ~$30-50/mo (2nd instance + worker) | 500+ users |
| Tier 4 | ~$100-200/mo (replicas, sharding) | 2000+ users |