Skip to content

baddonkey/pdfdiff-turbo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDFDiff Turbo

This tool compares two PDF sets by rendering pages and performing pixel-level image diffs, highlighting even subtle visual changes across versions. It was built in relaxed “vibe coding” sessions using VS Code together with GitHub Copilot to streamline development.

PDFDiff Turbo overview PDFDiff Turbo overview

Comparison Report (PDF)

Features

  • Pixel-level visual diff: OpenCV-based comparison highlights even subtle changes
  • Interactive magnifier: Toggle zoom lens to inspect differences in detail (2.5x magnification)
  • Responsive viewer: PDFs scale dynamically to fit browser window
  • Real-time updates: WebSocket connections show live job progress
  • Navigation controls: Jump between differences with Prev/Next buttons
  • Side-by-side comparison: View Set A and Set B PDFs with synchronized scrolling
  • Job management: Track multiple comparison jobs with status updates

Architecture

  • API: FastAPI (async SQLAlchemy)
  • Worker: Celery tasks with PyMuPDF rendering + OpenCV diff
  • Storage: Shared Docker volume mounted at /data
  • UIs: Angular (viewer + admin) served by Nginx

Quick Start (Docker)

  1. Start stack:
    • docker compose up --build
  2. Run migrations:
    • docker compose exec api alembic upgrade head
  3. Seed users:
    • docker compose exec api python -m app.seed.seed_users

Defaults (override with env vars below):

  • Admin: admin@example.com / admin123
  • User: user@example.com / user123

Environment Variables

API/Worker:

  • DATABASE_URL (default set in docker-compose)
  • CELERY_BROKER_URL
  • CELERY_RESULT_BACKEND
  • JWT_SECRET
  • RENDER_DPI
  • DIFF_THRESHOLD
  • SEED_ADMIN_EMAIL, SEED_ADMIN_PASSWORD
  • SEED_USER_EMAIL, SEED_USER_PASSWORD

Endpoints (Core)

  • Auth: /auth/register, /auth/login, /auth/refresh, /auth/logout, /auth/me
  • Jobs: /jobs, /jobs/{job_id}/upload, /jobs/{job_id}/start
  • Job status: /jobs/{job_id}
  • Files/pages: /jobs/{job_id}/files, /jobs/{job_id}/files/{file_id}/pages
  • Artifacts: /jobs/{job_id}/files/{file_id}/pages/{page_index}/overlay
  • PDF stream: /jobs/{job_id}/files/{file_id}/content?set=A|B
  • Admin: /admin/jobs, /admin/jobs/{job_id}/cancel, /admin/users, /admin/users/{user_id}

UIs

Scaling Notes

  • Increase worker concurrency: update celery worker --concurrency=N.
  • Add more worker replicas in docker compose or orchestration.
  • Use a dedicated RabbitMQ and Postgres for production.
  • Consider GPU-enabled workers for faster rendering if available.
  • Move /data to a networked volume with high throughput for large PDFs.

Data Layout

  • Uploaded files: /data/jobs/{job_id}/setA|setB/...
  • Overlays: /data/jobs/{job_id}/artifacts/{file_id}/page_{page_index}.svg

Notes

  • Incompatible page sizes are marked incompatible_size and have diff_score=null.
  • Missing pages are tracked per file and per page.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published