Skip to content

Latest commit

 

History

History
122 lines (85 loc) · 4.44 KB

File metadata and controls

122 lines (85 loc) · 4.44 KB

DegenDome Scaling Priorities

Prioritized list of upgrades as user load increases. Each tier assumes the previous is done.


Tier 1: Quick Config Changes (do now, 10 minutes)

1. Bump DB connection pool from 10 → 25 ✅ DONE

All battles, oracle rounds, and user operations share 10 connections. Under load this queues up fast.

File: backend/src/db/connection.ts Change: max: 10max: 25Applied

2. Increase MAX_GLOBAL_CONNECTIONS to 1000

Standard Render can handle more than 500 sockets. Bump the cap and adjust thresholds.

Env vars:

MAX_GLOBAL_CONNECTIONS=1000
LOAD_YELLOW_THRESHOLD=0.70
LOAD_RED_THRESHOLD=0.90

3. Add a battle concurrency cap ✅ DONE

Currently unlimited — a bot spam attack could create hundreds of battles and choke the tick loop. Add a max (e.g. 100 concurrent battles).

File: backend/src/services/clawbotBattleManager.ts Change: Check battles.size >= 100 before creating new battles — Applied


Tier 2: When You Hit ~200 Concurrent Users (1-2 days)

4. Add Redis for shared state

Enables horizontal scaling (multiple backend instances behind a load balancer). Use Redis for:

  • Socket.IO adapter (multi-instance pub/sub)
  • Price cache (avoid duplicate Pyth calls)
  • Session/rate limit state

Cost: ~$10/mo (Render Redis or Upstash)

5. Add more Oracle assets (ETH, BTC)

Currently only SOL runs oracle rounds. Adding assets multiplies capacity since each runs independently.

File: backend/src/services/predictionService.ts Change: Call addAsset('ETH'), addAsset('BTC') at startup. Price feeds already support them.

6. Upgrade PostgreSQL plan

If you're on a shared Postgres instance, upgrade to dedicated. Watch for connection limit caps on the DB side (most shared plans cap at 20-50 connections).


Tier 3: When You Hit ~500+ Concurrent Users (1 week)

7. Horizontal scaling — 2+ backend instances

With Redis in place (Tier 2), spin up a second Render instance behind a load balancer.

Requirements:

  • Redis adapter for Socket.IO (so events broadcast across instances)
  • Sticky sessions OR stateless battle tick design
  • Shared DB (already PostgreSQL, so this works)

8. Battle tick worker separation

Move battle tick execution to a dedicated worker process. The main process handles HTTP/WebSocket, the worker handles tick loops. Prevents battle CPU from blocking socket handling.

Architecture:

Main process: HTTP + WebSocket + Oracle rounds
Worker process: Battle tick loops + bot execution
Communication: Redis pub/sub

9. CDN for static assets

If Vercel isn't already caching aggressively, ensure static assets (JS bundles, images, sounds) are served from CDN edge. Vercel handles this by default, but verify cache headers.


Tier 4: When You Hit ~2000+ Concurrent Users (2-4 weeks)

10. Dedicated price service

Separate the price polling into its own microservice. Publishes to Redis; all backend instances subscribe. Eliminates duplicate Pyth/Jupiter calls.

11. Database read replicas

Route read-heavy queries (leaderboard, stats, history) to read replicas. Writes (bets, settlements) go to primary.

12. Battle sharding

Assign battles to specific worker instances (shard by battle ID). Each worker handles a subset of battles, distributing CPU load evenly.

13. Queue-based settlement

Move on-chain settlement (credit_winnings, transfer_to_global_vault) to a job queue with retry logic. Currently settlement is inline — if Solana RPC is slow, it blocks the round lifecycle.


Monitoring Checklist

Track these metrics to know when to move to the next tier:

Metric Yellow Flag Red Flag
Socket connections >500 sustained >800 sustained
DB pool utilization >60% (6/10 active) >80% (8/10 active)
Battle tick drift >500ms late >2s late
Oracle settle latency >3s >10s
Memory usage >70% of limit >85% of limit
API response time (p95) >500ms >2s

Check these via:

  • GET /api/health (with admin key for detailed view)
  • Discord alerts (sustained red tier, settlement failures)
  • Render metrics dashboard (CPU, memory, response times)

Cost Estimates

Tier Additional Cost When
Tier 1 $0 Now
Tier 2 ~$10-25/mo (Redis + DB upgrade) 200+ users
Tier 3 ~$30-50/mo (2nd instance + worker) 500+ users
Tier 4 ~$100-200/mo (replicas, sharding) 2000+ users