Skip to content

Background Job Runner#60

Merged
charlieroth merged 6 commits intomainfrom
20-background-job-runner
Aug 24, 2025
Merged

Background Job Runner#60
charlieroth merged 6 commits intomainfrom
20-background-job-runner

Conversation

@charlieroth
Copy link
Owner

@charlieroth charlieroth commented Aug 24, 2025

Background Job Runner Implementation

Summary

This PR implements a complete background job runner system that enables reliable processing of deferred work outside the HTTP request path. The system provides configurable concurrency, automatic retry logic with exponential backoff, graceful shutdown, and comprehensive observability.

Addresses #20

Amp Thread: https://ampcode.com/threads/T-5b8f4b05-3ae3-4b3f-8c71-b3ad64e36d21

Changes Made

  • Feature/functionality changes
  • Bug fixes
  • Refactoring
  • Documentation updates
  • Test additions/improvements
  • Infrastructure/CI changes

Detailed Changes

  • Database Schema: Added comprehensive jobs table with state management, visibility timeouts, retry tracking, and worker coordination
  • Job Handler System: Implemented trait-based job handlers with dynamic registry for different job types
  • Worker Supervisor: Created async worker process with configurable concurrency, polling, and graceful shutdown
  • Retry Logic: Implemented exponential backoff with jitter (30s base, 2^attempt growth, ±30% jitter)
  • Concurrency Control: Added semaphore-based limiting and visibility timeouts to prevent duplicate processing
  • Example Implementation: Created ExampleJobHandler and demo program for testing
  • Comprehensive Testing: Added unit tests for backoff/registry and integration tests for full job lifecycle
  • Documentation: Updated AGENTS.md with new worker commands and usage instructions

Testing

  • All existing tests pass (make test)
  • New tests added for new functionality
  • Manual testing completed
  • Edge cases considered and tested

Test Commands Run

# Full test suite
make test

# Specific job runner tests
cargo test jobs::backoff::tests
cargo test jobs::registry::tests  
cargo test --test job_integration

# Build verification
cargo build --bin worker --example job_runner_demo

Test Coverage Added:

  • Unit tests: backoff algorithm edge cases, job registry functionality
  • Integration tests: job enqueue/fetch, state transitions, retry logic, visibility timeouts, concurrent processing
  • 6 comprehensive integration tests covering full job lifecycle
  • All tests run against real PostgreSQL database using #[sqlx::test]

Code Quality

  • Code follows project style guidelines (make fmt)
  • No linting errors (make lint)
  • Full check passes (make check)
  • Code is well-documented where necessary
  • No security vulnerabilities introduced

Database Changes

  • No database changes
  • Migration scripts included
  • make prepare run after schema changes
  • Backward compatibility maintained

Migration Details:

  • New migration: 20250824104647_update_jobs_table.up.sql
  • Updates existing job_status enum: 'done''succeeded'
  • Adds columns: payload, max_attempts, backoff_seconds, visibility_till, reserved_by, created_at, updated_at
  • Converts kind from enum to text for extensibility
  • Adds indexes for efficient job polling and management
  • Includes proper rollback in down migration

Breaking Changes

  • No breaking changes
  • Breaking changes documented below

Breaking Changes Details

Database Migration Required:

  • Must run make db-migrate before starting workers
  • Changes job_status enum values (renames 'done' to 'succeeded')
  • Updates job table structure with new required columns

Job Entity Changes:

  • Job entity updated with new fields matching database schema
  • Old job enum types removed in favor of flexible string-based job kinds

Deployment Notes

  • No special deployment considerations
  • Environment variables need to be updated
  • Dependencies need to be updated
  • Special deployment steps required (documented below)

Special Deployment Steps

  1. Database Migration: Run make db-migrate to apply schema changes

  2. Worker Configuration: Optional environment variables for worker tuning:

    • WORKER_CONCURRENCY=4 (default: 4)
    • WORKER_POLL_INTERVAL_MS=1000 (default: 1000ms)
    • WORKER_VISIBILITY_TIMEOUT_SECS=300 (default: 5 minutes)
    • WORKER_BASE_BACKOFF_SECS=30 (default: 30s)
  3. Worker Deployment:

    • Start worker process: cargo run --bin worker
    • Can run multiple worker instances for horizontal scaling
    • Each worker gets unique UUID for coordination

Documentation

  • No documentation changes needed
  • README updated
  • API documentation updated
  • Contributing guidelines updated
  • Other documentation updated (specify below)

Documentation Updates:

  • Updated AGENTS.md with new commands:
    • cargo run --bin worker (starts background job processor)
    • cargo run --example job_runner_demo (enqueues example jobs)

Architecture Overview

Core Components:

graph TD
    A[Producer<br/>API/cron] --> B[(jobs table<br/>PostgreSQL)]
    B --> C[Worker Pool N<br/>1 task = 1 job]
    B --> D[Fetcher<br/>polling]
    D --> C
    
    classDef producer fill:#2d3748,stroke:#4a5568,color:#fff
    classDef database fill:#1a365d,stroke:#2b6cb0,color:#fff  
    classDef worker fill:#2d5016,stroke:#38a169,color:#fff
    classDef fetcher fill:#744210,stroke:#d69e2e,color:#fff
    
    class A producer
    class B database
    class C worker
    class D fetcher
Loading

Job State Machine:

stateDiagram-v2
    [*] --> queued
    queued --> running : worker picks up job
    running --> succeeded : job completes successfully
    running --> failed_retry : job fails (attempts < max)
    failed_retry --> queued : retry with backoff
    running --> failed : job fails (attempts >= max)
    failed --> [*]
    succeeded --> [*]
    
    state failed_retry {
        [*] --> calculating_backoff
        calculating_backoff --> scheduled_retry
        scheduled_retry --> [*]
    }
Loading

Key Features:

  • Reliability: Visibility timeouts prevent duplicate work, ACID transactions ensure consistency
  • Scalability: Configurable concurrency, horizontal scaling via multiple worker instances
  • Observability: Structured logging with correlation IDs, job lifecycle tracking
  • Flexibility: Trait-based handlers allow easy extension for new job types
  • Resilience: Exponential backoff with jitter, configurable max attempts, graceful shutdown

Reviewer Checklist

  • Code review completed
  • Architecture/design approved
  • Security considerations reviewed
  • Performance impact assessed
  • Documentation reviewed

Additional Notes

Future Extensibility:

  • Job registry designed for easy addition of new job types
  • Worker configuration supports runtime tuning without code changes
  • Database schema supports job scheduling, deduplication, and priority queues
  • Metrics integration points available via structured logging

Production Readiness:

  • Comprehensive error handling and logging
  • Graceful shutdown drains in-flight work
  • Database connection pooling and proper resource management
  • Full test coverage including edge cases and concurrent scenarios

Testing the Implementation:

  1. Start database: make db-up && make db-migrate
  2. Enqueue demo jobs: cargo run --example job_runner_demo
  3. Start worker: cargo run --bin worker
  4. Monitor logs to see job processing in action

Adds a production-ready job runner that executes deferred work outside of
the request path.

Key components:
- DB migration: new `jobs` table with state, timestamps, visibility timeout
- Trait-based job handlers with registry for dynamic dispatch
- Async worker supervisor supporting configurable concurrency and graceful
  shutdown
- Exponential back-off w/ jitter for automatic retries
- Visibility-timeout guard to avoid duplicate processing
- Structured logs across the entire job lifecycle
- Unit + integration tests for back-off, registry, state transitions,
  retries, and concurrency control
- Docs: AGENTS.md now lists `cargo run --bin worker` usage

Breaking change:
- Requires running `make db-migrate` before starting workers

Usage:
- Start worker: `cargo run --bin worker`
- Demo jobs: `cargo run --example job_runner_demo`
@charlieroth charlieroth linked an issue Aug 24, 2025 that may be closed by this pull request
7 tasks
@charlieroth charlieroth self-assigned this Aug 24, 2025
@charlieroth
Copy link
Owner Author

Merging with failing tests, will be fixed in another commit. JWTs seems to be randomly failing

@charlieroth charlieroth merged commit b9f9ebb into main Aug 24, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Background Job Runner

1 participant