CodeArena is a distributed, real-time competitive programming platform designed for high concurrency and fault tolerance. It enables users to participate in live coding contests, solving algorithmic problems against the clock.
Core Problem: Evaluating untrusted user-submitted code in real-time, under heavy load, while maintaining a synchronized live leaderboard for thousands of concurrent participants.
Non-Trivial Aspects:
- Safe Code Execution: Running untrusted code requires isolated sandboxing (Docker).
- Race Conditions: Handling concurrent submissions and score updates without data corruption.
- Real-Time Consistency: Ensuring the leaderboard reflects the true state of the contest instantaneously across all connected clients.
- Distributed State: Managing state across API nodes, Workers, Redis, and Postgres.
The system is composed of decoupled microservices to ensure scalability and isolation of concerns.
[ User Browser ]
| (HTTP / SSE)
v
[ API Service ] <-- Authenticates, Validates, Enqueues
|
+--------------+
| |
v v
[ PostgreSQL ] [ Redis ]
(Source of Truth) (Queue, PubSub, Live State)
^
|
[ Worker Service ] ---+
(Claims Jobs, Executes in Docker, Scores)
Roles:
- API Service: Handles HTTP requests, authentication (
JWT), and real-time updates via Server-Sent Events (SSE). It acts as the gatekeeper, pushing valid submissions to the job queue. - Worker Service: A stateless consumer that claims jobs from Redis, executes code in ephemeral Docker containers, calculates scores, and persists results.
- Redis: faster-than-disk storage used for the job queue (
submissions), distributed locking (processing:*), idempotency checks (processed:*), real-time event streaming (events:*), and live ranking (Sorted Sets). - PostgreSQL: The persistent source of truth for Users, Contests, Problems, Submissions, and historical Leaderboard Events.
- Submission: User POSTs code to
/api/submit. API validates the request (auth, file size) and existence of the contest/problem. - Enqueue: API pushes a job payload JSON to the Redis List
submissions. - Claim: A generic Worker pops the job (
BRPOP) and attempts to acquire a distributed lock (SET processing:<id> ... NX). - Idempotency Check: Worker checks if
processed:<id>exists. If so, it acks and drops the duplicate. - Execution: Worker spins up a Docker container (python/node), mounts the user's code and test inputs, and runs the solution with strict timeouts.
- Scoring: Output is compared against expected results. Penalties are calculated based on wrong attempts.
- Persist & PubSub:
- Results are saved to Postgres (
submission_results,contest_penalties). - Live events (
TEST_RESULT,FINISHED) are published to Redis PubSub for the frontend. - Leaderboard ZSET in Redis is updated (
ZINCRBY).
- Results are saved to Postgres (
- Completion: Job is marked
processed:<id>and the distributed lock is released.
sequenceDiagram
participant User
participant API
participant Redis
participant Worker
participant PG as PostgreSQL
User->>API: POST /submit (code)
API->>PG: Validate & Create Submission Record
API->>Redis: LPUSH "submissions" (Job Payload)
API-->>User: 202 Accepted (Pending)
loop Polling / SSE
User->>API: Listening for updates...
end
Worker->>Redis: BRPOP "submissions"
Redis-->>Worker: Job Payload
Worker->>Redis: SET "processing:id" NX (Acquire Lock)
activate Worker
Worker->>Worker: Run Docker Container (Sandboxed)
Worker->>Redis: PUBLISH "events:id" (Test Results)
Redis-->>API: Stream to User
Worker->>Worker: Calculate Score & Penalties
Worker->>PG: INSERT submission_results
Worker->>PG: INSERT contest_penalties (if incorrect)
alt If Valid & Not Frozen
Worker->>Redis: ZINCRBY "leaderboard:id" (Update Rank)
Worker->>PG: INSERT leaderboard_events
end
Worker->>Redis: PUBLISH "events:id" (Finished)
Worker->>Redis: DEL "processing:id"
deactivate Worker
Redis is the high-performance backbone enabling real-time features.
-
Job Queue (
submissionsList):- Used for buffering incoming submissions.
BRPOPallows workers to block until work is available, minimizing CPU idle loops.
- Used for buffering incoming submissions.
-
Processing Locks (
processing:<id>Key):- Prevents multiple workers from executing the same submission if a queue message is duplicated. Implemented via
SET resource val NX EX 60.
- Prevents multiple workers from executing the same submission if a queue message is duplicated. Implemented via
-
Idempotency (
processed:<id>Key):- Acts as a tombstone for completed jobs, ensuring "exactly-once" processing effects even with "at-least-once" message delivery.
-
Pub/Sub (
events:<id>Channel):- Enables decoupling. The Worker publishes granular progress (e.g., "Test Case 3/10 Passed") which the API Service streams to the specific user via SSE.
-
Live Leaderboard (
leaderboard:<contest_id>Sorted Set):- Leverages Redis
ZSETcommands (ZREVRANGE,ZINCRBY) to provide O(log(N)) rank queries, essential for live dashboards with thousands of users.
- Leverages Redis
While Redis handles transient state and speed, Postgres ensures data durability and auditability.
Key Schema:
contests: Defines start/end/freeze times.submissions: The raw immutable record of user code.submission_results: Derived outcome (Verdict, Score, Execution Time).contest_penalties: Explicit tracking of wrong attempts per user/contest for tie-breaking logic.leaderboard_events: An append-only log of every score change. This allows reconstructing the leaderboard state at any point in time (Time Travel debugging).
Constraints:
- Foreign Keys ensure data integrity (e.g., a result cannot exist without a submission).
ON CONFLICTclauses handle race conditions during concurrent inserts for penalties or results.
-
Duplicate Job Delivery:
- Handled via the
processed:<id>Redis key. If a job is re-delivered, the worker checks this key and skips execution.
- Handled via the
-
Worker Crashes:
- If a worker crashes mid-execution, the
processing:<id>lock eventually expires (TTL). The job remains in the queue (if using reliable queues) or can be re-queued. Current implementation relies on Redis persistence for the queue state.
- If a worker crashes mid-execution, the
-
Concurrency Control:
- Database updates use atomic increments and upserts (
ON CONFLICT DO UPDATE) to safely increment penalties or update scores without read-modify-write race conditions.
- Database updates use atomic increments and upserts (
-
Leaderboard Consistency:
- The Redis leaderboard is updated after the Postgres transaction succeeds, ensuring the visual rank reflects the durable state.
-
Lifecycle:
- RUNNING: Normal submission and scoring. Leaderboard updates are live.
- FROZEN: Submissions are accepted and scored, but the public Redis Leaderboard is not updated. This hides the final standings until the winners reveal.
- ENDED: New submissions are rejected (or marked as practice).
-
Scoring Model:
- Score = Sum of points for accepted problems.
- Tie-Breaker = Penalties (Wrong submission count).
- Partial Scoring logic handles passing specific test cases for partial credit.
-
Why Redis?
- Calculating ranks on the fly in SQL (
ORDER BY score DESC, penalty ASC) is expensive (O(N log N)) for every page view. - Redis ZSETs maintain order automatically, allowing O(log N) retrieval of top users or a specific user's rank.
- Calculating ranks on the fly in SQL (
-
Why Postgres Events?
- The
leaderboard_eventstable stores the delta (+100 points). This creates an audit log. If Redis data is lost, the entire leaderboard can be replayed/rebuilt from this table.
- The
Services are containerized for consistent behavior across dev and production.
Components:
api: Node.js/Expressworker: Node.js + Docker client (spawns sibling containers)postgres&redis: Standard official images
Setup:
- Initialize:
./setup.sh(Checks deps, starts DBs, applies schema, seeds data). - Build & Run:
docker-compose up --build.
- Distributed Systems: Decoupled producer-consumer architecture.
- Concurrency Patterns: Distributed locking and idempotent processing.
- Database Design: Normalized schema with history tables for auditability.
- Real-Time UX: bridging backend events to frontend streams (SSE).
- Reliability Engineering: Designing for failure (crashes, race conditions).
- Kubernetes (K8s) Deployment: better orchestration of worker monitoring and auto-scaling based on queue depth.
- AST-Based Plagiarism Detection: Integrating tools like
MossorJPlagas a separate analysis pipeline on thesubmissionstable. - Multi-Region Workers: Deploying workers closer to users (Edge) to reduce code upload latency, synchronized via global Redis.
- Language Agents: Adding more execution runtimes (Java, C++, Rust, Go) by adding new Docker images for the executor.