- System Architecture
- Folder Structure
- Technology Stack
- Component Design Reasoning
- Request Flow Logic
- Prerequisites & Output
- Deployment Guide
- API Endpoints
The core philosophy revolves around enforcing scalable, distributed rate limits across multiple stateless gateway instances, utilizing a centralized in-memory datastore to track token metrics without blocking the critical request proxy path.
The repository is divided into discrete microservices ensuring separation of concerns:
SAP02 - Rate Limiter/
├── .env # Environment variables configuration
├── docker-compose.yml # Production-ready compose configuration
├── LICENSE
├── README.md # System documentation
├── run_tests.sh # Shell script for running burst and load balance tests
├── backend/ # Simulated business logic servers
│ ├── app.py # Mock API returning container replica hostnames
│ ├── Dockerfile # Backend containerization strategy
│ └── requirements.txt # Python dependencies
├── docs/ # Image assets
│ ├── docker compose ps.png
│ ├── SAP02-Architecture.png
│ └── test output.png
├── nginx/ # Reverse proxy configuration
│ └── nginx.conf # Load balancing strategy and Docker DNS resolution
├── rateguard/ # The core Python API Gateway & Middleware
│ ├── app.py # Main Flask application and routing
│ ├── config.py # Environment configurations
│ ├── Dockerfile # Gateway containerization strategy
│ ├── middleware.py # Token Bucket algorithm and Redis state logic
│ ├── proxy.py # Downstream request forwarding system
│ ├── requirements.txt # Python dependencies
│ └── token_bucket.lua # Atomic Redis Lua script for rate limiting
└── tests/ # Analytics and validation scripts
├── burst_test.py # Validates rate-limiter Token Bucket capacity
└── load_balance_test.py # Confirms NGINX round-robin functionality
- Python (Flask): Extremely lightweight WSGI web application framework utilized for rapid HTTP intercept and reverse proxy processing.
- Gunicorn: Industrial-grade Python WSGI HTTP server executing the Flask applications in concurrent multi-worker environments to handle intense load operations.
- Requests: Standard HTTP library used for smoothly piping client payloads directly to backend endpoints.
- Redis: Immensely fast in-memory key-value data store selected for its atomic operations, acting as the singular source of truth for rate limiting configurations and concurrent token calculations.
- NGINX: High-performance asynchronous edge proxy dynamically resolving Docker container DNS to automatically load balance client traffic across dynamically scaling gateway replicas.
- Docker Compose: Containerization module streamlining the infrastructure mapping, enabling flawless deployment using the internal Compose network and
deploy: replicasreplication factor logic.
The system utilizes the classic Token Bucket Algorithm to control burst traffic while maintaining sustained velocity. A dynamic mathematical simulation replenishes available capacity iteratively upon each isolated interaction, flawlessly limiting usage spikes.
RateGuard gateways are completely stateless. Each replica processes inbound connections independently and delegates all rate-limit tracking to a centralized Redis cache. This prevents isolated memory fragmentation and ensures rate limits are applied fairly regardless of which gateway replica receives the request.
By decoupling the state (Redis) from the compute (RateGuard), the system seamlessly enables horizontal scaling. New gateway replicas can be spun up dynamically via Docker Compose, and NGINX's internal DNS resolution will automatically begin routing traffic to them.
To prevent users from exploiting race conditions during concurrent request bursts, RateGuard utilizes atomic Redis Lua Scripts. This mathematically guarantees that no two threads can evaluate and deduct a single token simultaneously, ensuring strict rate limit enforcement at scale.
A lightweight Flask @before_request middleware intercepts incoming traffic to validate token availability. If approved, the payload is transparently forwarded downstream via the proxy utility; if denied, the request is halted immediately at the gateway level.
The system enforces strict HTTP response standards: denied actions immediately return an HTTP 429 Too Many Requests error, whereas successful requests pass through to the backend, returning the backend's JSON payload along with diagnostic replica metadata.
High availability is achieved through a Fail‑Open strategy. If the Redis cache goes offline or times out, the gateway catches the exception and gracefully permits traffic to pass through. This prioritizes continuous user access over strict rate limit enforcement during partial outages.
As the single entry point, NGINX proxies external traffic across the internal Docker network. It utilizes a Round-Robin distribution methodology, efficiently balancing connections across all active RateGuard gateway replicas.
Custom Python CLI scripts are provided to simulate concurrent traffic bursts and sequential load distribution. These scripts validate that the RateGuard token bucket correctly throttles excess requests (HTTP 429) and that NGINX successfully round-robbins traffic across all backend replicas.
sequenceDiagram
participant User as Client
participant Nginx as NGINX Load Balancer
participant Gateway as RateGuard Gateway
participant Redis as Redis Cache
participant Backend as Backend Service
User->>Nginx: 1. HTTP Request (e.g., GET /api/data)
Nginx->>Gateway: 2. Round-Robin to Gateway Replica
Gateway->>Gateway: 3. Extract identifier (IP/API key) & endpoint
Gateway->>Redis: 4. Build Redis Key (rate_limit:{user}:{path})
Redis-->>Gateway: Return current tokens & last refill
Gateway->>Gateway: Calculate token refill mathematically
alt Tokens < 1 (Rate Limited)
Gateway-->>User: 5. Return HTTP 429 Too Many Requests immediately
else Tokens >= 1 (Allowed)
Gateway->>Redis: 9. Persist updated token count & timestamp
Gateway->>Backend: 5. Forward request downstream
Backend-->>Gateway: 6. Logic processed & response generated
Gateway-->>Gateway: 7. Middleware logs metrics (optional)
Gateway-->>User: 8. Return HTTP 200 with JSON payload
end
The Client sends an initial HTTP request to a target path (e.g., GET /api/data).
NGINX receives the external request and actively chooses a specific RateGuard gateway instance utilizing an internal Docker DNS Round-Robin strategy.
The selected RateGuard instance intercepts the connection natively, extracting the identifying client footprint (such as the true IP parsed from headers or an API key) alongside the requested endpoint.
The modular check_rate_limit middleware engages:
- It systematically builds the exact string mapping:
rate_limit:{user_id}:{endpoint}. - It queries Redis for the specific bucket state (tokens & timestamp).
- In-memory mathematics instantly calculate new token regeneration based sequentially on the elapsed time delta.
The gateway explicitly bifurcates based on the newly calculated token state:
- If capacity is exhausted (<1): RateGuard intercepts and kills the proxy chain natively, instantly returning a
429 Too Many RequestsJSON response. - If authorized (≥1): The gateway securely decrements exactly 1 token and authorizes the proxy downstream.
The authorized connection flows seamlessly into the internal execution zone. A discrete Backend Service Replica captures the input and generates the application payload (e.g., {"message": "Hello from backend 2"}).
The backend explicitly returns the completed JSON block structurally to the originating RateGuard gateway, where the middleware sequence captures it temporarily for logging metrics or system validations.
RateGuard transparently ejects the identical HTTP response synchronously back to the exact initial client footprint, accompanied cleanly by an HTTP 200 OK header block.
Simultaneous to the transaction processing natively, Redis organically synchronizes the new decremented token capacity securely, preparing actively for subsequent rapid sequence interactions.
To execute the infrastructure mapping, your local environment requires:
- Docker Engine (v24.0 or newer)
- Docker Compose Module (v2.0 or newer)
- Port
8080clear of localized bindings
The entire structural framework actively provisions precisely out of the box dynamically leveraging Docker Compose.
- Clone the repository framework locally.
- Validate Docker Engine core variables.
- Natively execute the container build script completely within the repository base:
docker compose up --build -d
- Access the unified backend API cleanly mapped via the edge proxy:
http://localhost:8080/api/info
Simulates aggressive parallel concurrent requests natively testing the architectural burst limitations.
python tests/burst_test.pyOutputs an optimized color-coded log physically marking exactly which threads legally claimed tokens versus which interactions were violently throttled.
Generates linear, sequential requests analyzing how internal routing shifts the backend response nodes asynchronously.
python tests/load_balance_test.pyVisually renders the distinct dynamic hostname variables returned by discrete Docker backend replicas, technically validating the networking flow distribution.
Below are the screenshots and outputs captured during development and testing.


