Skip to content

Hari19hk/CodeArena

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeArena

1. Project Overview

CodeArena is a distributed, real-time competitive programming platform designed for high concurrency and fault tolerance. It enables users to participate in live coding contests, solving algorithmic problems against the clock.

Core Problem: Evaluating untrusted user-submitted code in real-time, under heavy load, while maintaining a synchronized live leaderboard for thousands of concurrent participants.

Non-Trivial Aspects:

  • Safe Code Execution: Running untrusted code requires isolated sandboxing (Docker).
  • Race Conditions: Handling concurrent submissions and score updates without data corruption.
  • Real-Time Consistency: Ensuring the leaderboard reflects the true state of the contest instantaneously across all connected clients.
  • Distributed State: Managing state across API nodes, Workers, Redis, and Postgres.

2. High-Level Architecture

The system is composed of decoupled microservices to ensure scalability and isolation of concerns.

[ User Browser ]
       | (HTTP / SSE)
       v
[   API Service   ]  <-- Authenticates, Validates, Enqueues
       |
       +--------------+
       |              |
       v              v
[ PostgreSQL ]    [ Redis ]
(Source of Truth) (Queue, PubSub, Live State)
                      ^
                      |
[ Worker Service ] ---+
(Claims Jobs, Executes in Docker, Scores)

Roles:

  • API Service: Handles HTTP requests, authentication (JWT), and real-time updates via Server-Sent Events (SSE). It acts as the gatekeeper, pushing valid submissions to the job queue.
  • Worker Service: A stateless consumer that claims jobs from Redis, executes code in ephemeral Docker containers, calculates scores, and persists results.
  • Redis: faster-than-disk storage used for the job queue (submissions), distributed locking (processing:*), idempotency checks (processed:*), real-time event streaming (events:*), and live ranking (Sorted Sets).
  • PostgreSQL: The persistent source of truth for Users, Contests, Problems, Submissions, and historical Leaderboard Events.

3. Execution Flow (End-to-End)

  1. Submission: User POSTs code to /api/submit. API validates the request (auth, file size) and existence of the contest/problem.
  2. Enqueue: API pushes a job payload JSON to the Redis List submissions.
  3. Claim: A generic Worker pops the job (BRPOP) and attempts to acquire a distributed lock (SET processing:<id> ... NX).
  4. Idempotency Check: Worker checks if processed:<id> exists. If so, it acks and drops the duplicate.
  5. Execution: Worker spins up a Docker container (python/node), mounts the user's code and test inputs, and runs the solution with strict timeouts.
  6. Scoring: Output is compared against expected results. Penalties are calculated based on wrong attempts.
  7. Persist & PubSub:
    • Results are saved to Postgres (submission_results, contest_penalties).
    • Live events (TEST_RESULT, FINISHED) are published to Redis PubSub for the frontend.
    • Leaderboard ZSET in Redis is updated (ZINCRBY).
  8. Completion: Job is marked processed:<id> and the distributed lock is released.
sequenceDiagram
    participant User
    participant API
    participant Redis
    participant Worker
    participant PG as PostgreSQL

    User->>API: POST /submit (code)
    API->>PG: Validate & Create Submission Record
    API->>Redis: LPUSH "submissions" (Job Payload)
    API-->>User: 202 Accepted (Pending)
    
    loop Polling / SSE
        User->>API: Listening for updates...
    end

    Worker->>Redis: BRPOP "submissions"
    Redis-->>Worker: Job Payload
    
    Worker->>Redis: SET "processing:id" NX (Acquire Lock)
    activate Worker
    
    Worker->>Worker: Run Docker Container (Sandboxed)
    Worker->>Redis: PUBLISH "events:id" (Test Results)
    Redis-->>API: Stream to User
    
    Worker->>Worker: Calculate Score & Penalties
    
    Worker->>PG: INSERT submission_results
    Worker->>PG: INSERT contest_penalties (if incorrect)
    
    alt If Valid & Not Frozen
        Worker->>Redis: ZINCRBY "leaderboard:id" (Update Rank)
        Worker->>PG: INSERT leaderboard_events
    end
    
    Worker->>Redis: PUBLISH "events:id" (Finished)
    Worker->>Redis: DEL "processing:id"
    deactivate Worker
Loading

4. Redis Usage (Deep Dive)

Redis is the high-performance backbone enabling real-time features.

  • Job Queue (submissions List):

    • Used for buffering incoming submissions. BRPOP allows workers to block until work is available, minimizing CPU idle loops.
  • Processing Locks (processing:<id> Key):

    • Prevents multiple workers from executing the same submission if a queue message is duplicated. Implemented via SET resource val NX EX 60.
  • Idempotency (processed:<id> Key):

    • Acts as a tombstone for completed jobs, ensuring "exactly-once" processing effects even with "at-least-once" message delivery.
  • Pub/Sub (events:<id> Channel):

    • Enables decoupling. The Worker publishes granular progress (e.g., "Test Case 3/10 Passed") which the API Service streams to the specific user via SSE.
  • Live Leaderboard (leaderboard:<contest_id> Sorted Set):

    • Leverages Redis ZSET commands (ZREVRANGE, ZINCRBY) to provide O(log(N)) rank queries, essential for live dashboards with thousands of users.

5. PostgreSQL as Source of Truth

While Redis handles transient state and speed, Postgres ensures data durability and auditability.

Key Schema:

  • contests: Defines start/end/freeze times.
  • submissions: The raw immutable record of user code.
  • submission_results: Derived outcome (Verdict, Score, Execution Time).
  • contest_penalties: Explicit tracking of wrong attempts per user/contest for tie-breaking logic.
  • leaderboard_events: An append-only log of every score change. This allows reconstructing the leaderboard state at any point in time (Time Travel debugging).

Constraints:

  • Foreign Keys ensure data integrity (e.g., a result cannot exist without a submission).
  • ON CONFLICT clauses handle race conditions during concurrent inserts for penalties or results.

6. Reliability & Safety Guarantees

  • Duplicate Job Delivery:

    • Handled via the processed:<id> Redis key. If a job is re-delivered, the worker checks this key and skips execution.
  • Worker Crashes:

    • If a worker crashes mid-execution, the processing:<id> lock eventually expires (TTL). The job remains in the queue (if using reliable queues) or can be re-queued. Current implementation relies on Redis persistence for the queue state.
  • Concurrency Control:

    • Database updates use atomic increments and upserts (ON CONFLICT DO UPDATE) to safely increment penalties or update scores without read-modify-write race conditions.
  • Leaderboard Consistency:

    • The Redis leaderboard is updated after the Postgres transaction succeeds, ensuring the visual rank reflects the durable state.

7. Contest Logic

  • Lifecycle:

    • RUNNING: Normal submission and scoring. Leaderboard updates are live.
    • FROZEN: Submissions are accepted and scored, but the public Redis Leaderboard is not updated. This hides the final standings until the winners reveal.
    • ENDED: New submissions are rejected (or marked as practice).
  • Scoring Model:

    • Score = Sum of points for accepted problems.
    • Tie-Breaker = Penalties (Wrong submission count).
    • Partial Scoring logic handles passing specific test cases for partial credit.

8. Leaderboard Design

  • Why Redis?

    • Calculating ranks on the fly in SQL (ORDER BY score DESC, penalty ASC) is expensive (O(N log N)) for every page view.
    • Redis ZSETs maintain order automatically, allowing O(log N) retrieval of top users or a specific user's rank.
  • Why Postgres Events?

    • The leaderboard_events table stores the delta (+100 points). This creates an audit log. If Redis data is lost, the entire leaderboard can be replayed/rebuilt from this table.

9. Containerization & Local Setup

Services are containerized for consistent behavior across dev and production.

Components:

  • api: Node.js/Express
  • worker: Node.js + Docker client (spawns sibling containers)
  • postgres & redis: Standard official images

Setup:

  1. Initialize: ./setup.sh (Checks deps, starts DBs, applies schema, seeds data).
  2. Build & Run: docker-compose up --build.

10. What This Project Demonstrates

  • Distributed Systems: Decoupled producer-consumer architecture.
  • Concurrency Patterns: Distributed locking and idempotent processing.
  • Database Design: Normalized schema with history tables for auditability.
  • Real-Time UX: bridging backend events to frontend streams (SSE).
  • Reliability Engineering: Designing for failure (crashes, race conditions).

11. Future Improvements

  • Kubernetes (K8s) Deployment: better orchestration of worker monitoring and auto-scaling based on queue depth.
  • AST-Based Plagiarism Detection: Integrating tools like Moss or JPlag as a separate analysis pipeline on the submissions table.
  • Multi-Region Workers: Deploying workers closer to users (Edge) to reduce code upload latency, synchronized via global Redis.
  • Language Agents: Adding more execution runtimes (Java, C++, Rust, Go) by adding new Docker images for the executor.

About

CodeArena is a distributed, real-time competitive programming platform designed for high concurrency and fault tolerance. It enables users to participate in live coding contests, solving algorithmic problems against the clock.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors