A FastAPI-based REST API service that provides OCR (Optical Character Recognition) for images, RAG (Retrieval-Augmented Generation) for PDF documents, and Speech-to-Text conversion for audio files. Built with PaddleOCR, LangChain, OpenAI Whisper, Qdrant vector search, and asynchronous job processing via Dramatiq and Redis.
π Live Demo: https://kingsley-api.name.ng/docs
- πΌοΈ Image to Text Conversion: Upload images and extract text using advanced OCR
- π RAG with PDF: Upload PDFs and query them using Retrieval-Augmented Generation with vector search
- π€ Sound to Text Conversion: Upload audio files and transcribe them using OpenAI Whisper model
- β‘ Background Job Processing: Heavy tasks processed asynchronously via Dramatiq and Redis
- π Authentication System: JWT-based authentication with PostgreSQL
- π€ User Management: Register, login, email verification, and token refresh
- π Protected Routes: Secure endpoints with token-based authentication
- π³ Docker Support: Fully containerized with Docker and Docker Compose
- π Auto-reload: Development mode with automatic file watching and reloading
- π Interactive API Docs: Swagger UI documentation at
/docs - π¨ Custom Error Pages: Beautiful 404 error page for invalid routes
- π Vector Search: Qdrant vector database for semantic search and RAG
- Framework: FastAPI
- Database: PostgreSQL (with SQLAlchemy ORM)
- Vector Database: Qdrant
- Queue/Cache: Redis with Dramatiq
- Authentication: JWT (PyJWT), bcrypt for password hashing
- OCR Engine: PaddleOCR
- Speech-to-Text: OpenAI Whisper (via Hugging Face Transformers)
- RAG Framework: LangChain (with OpenAI embeddings)
- LLM Options: Multiple cloud models (OpenAI GPT, Google Gemini, DeepSeek) or Ollama (local LLM)
- ML Framework: PyTorch
- Python Version: 3.11
- Container: Docker & Docker Compose
- Docker and Docker Compose installed
- (Optional) Python 3.11+ for local development
-
Clone the repository (if applicable):
git clone https://github.com/kingrocfella/image-to-text-app cd image-to-text-app -
Create environment file:
cp .env.example .env
-
Start the service:
docker-compose up --build
-
Access the API:
- API Base URL:
http://localhost:8000 - API Documentation:
http://localhost:8000/docs - Health Check:
http://localhost:8000/health - PostgreSQL:
postgresql://localhost:5432 - Qdrant Dashboard:
http://localhost:6333/dashboard - Ollama API:
http://localhost:11434 - Redis:
redis://localhost:6379
- API Base URL:
-
Configure environment variables:
- Make sure you update your .env with your own values.
image-to-text-app/
βββ app/
β βββ __init__.py # App package initialization
β βββ main.py # FastAPI application entry point
β βββ database/
β β βββ __init__.py # Database module exports
β β βββ postgres.py # PostgreSQL connection and configuration
β β βββ postgres_models.py # Database models (User, TokenBlacklist, PDFRequest)
β β βββ redis.py # Redis broker and results backend configuration
β βββ dependencies/
β β βββ __init__.py # Dependencies module exports
β β βββ dependencies.py # FastAPI dependencies for authentication
β βββ middleware/
β β βββ __init__.py # Middleware module exports
β β βββ logging_middleware.py # Request/response logging middleware
β βββ queues/
β β βββ __init__.py # Queue module exports
β β βββ job_queue.py # Dramatiq actors and job enqueuing functions
β β βββ job_status.py # Unified job status checking utilities
β βββ workers/
β β βββ __init__.py # Workers module exports
β β βββ image_worker.py # Image-to-text processing worker
β β βββ sound_worker.py # Sound-to-text processing worker (Whisper)
β β βββ rag_worker.py # RAG PDF processing worker
β βββ schemas/
β β βββ __init__.py # Schemas module exports
β β βββ schemas.py # Pydantic models for API responses
β β βββ auth_schemas.py # Authentication request/response models
β βββ utils/
β β βββ __init__.py # Utils module exports
β β βββ auth_utils.py # Authentication utilities (JWT, password hashing)
β β βββ file_utils.py # File utilities (temp file cleanup)
β β βββ logger.py # Logging configuration
β β βββ utils.py # Utility functions for image/audio validation
β β βββ rag_cloudmodel_response.py # RAG utilities for cloud models
β β βββ rag_ollama_response.py # RAG utilities for Ollama integration
β β βββ rag_vectorstore.py # Vector store utilities (load/process PDFs)
β β βββ constants.py # Model constants and configuration
β βββ routes/
β β βββ __init__.py # Router initialization
β β βββ health.py # Health check endpoint
β β βββ auth.py # Authentication endpoints
β β βββ image_to_text.py # Image to text conversion endpoint
β β βββ sound_to_text.py # Sound to text conversion endpoint
β β βββ rag_with_pdf.py # RAG with PDF endpoint
β β βββ jobs.py # Unified job status endpoint
β βββ templates/
β βββ NotFound.html # 404 error page
βββ tests/
β βββ __init__.py # Tests package initialization
β βββ conftest.py # Pytest fixtures and configuration
β βββ test_auth.py # Authentication route tests
β βββ test_image_to_text.py # Image conversion route tests
β βββ test_sound_to_text.py # Sound to text conversion route tests
β βββ test_rag_with_pdf.py # RAG with PDF route tests
β βββ test_jobs.py # Unified job status endpoint tests
βββ logs/ # Application logs directory
βββ Dockerfile # Docker image configuration
βββ docker-compose.yml # Docker Compose configuration (development)
βββ docker-compose.prod.yml # Docker Compose configuration (production)
βββ requirements.txt # Python dependencies
βββ pytest.ini # Pytest configuration
βββ pyproject.toml # Python project configuration
βββ pyrightconfig.json # Pyright type checker configuration
βββ .gitignore # Git ignore rules
βββ README.md # This file
The docker-compose.yml includes:
- PostgreSQL database service with health checks
- Qdrant vector database service with health checks
- Redis service for job queuing and results storage
- Ollama service for local LLM inference
- Web service with hot-reload for development
- Worker service for background job processing (Dramatiq)
- Volume mounting for hot-reload during development
- Shared volume for temporary files between web and worker services
- Environment variable loading from
.envfile - Persistent data volumes for PostgreSQL, Qdrant, Redis, and Ollama
The docker-compose.prod.yml includes:
- PostgreSQL database service
- Qdrant vector database service
- Redis service for job queuing and results storage
- Ollama service for local LLM inference
- Web service without hot-reload
- Worker service for background job processing (Dramatiq)
- Persistent data volumes for all services
- Services wait for dependencies to be healthy before starting
- 400 Bad Request: Invalid model, missing required parameters, or validation errors
- 403 Forbidden: Unauthorized access or unverified email
- 404 Not Found: Custom HTML error page for invalid routes or request ID not found
- 500 Internal Server Error: Standard FastAPI error responses
Key dependencies include:
fastapi- Web frameworksqlalchemy- SQL toolkit and ORMasyncpg- Async PostgreSQL driverPyJWT- JWT token handlingpasslib[bcrypt]- Password hashingpython-jose- JWT encoding/decodingemail-validator- Email validationpaddleocr- OCR enginetransformers- Hugging Face Transformers library (for Whisper model)torch- PyTorch for ML modelslibrosa- Audio analysis librarysoundfile- Audio file I/O librarynumpy- Numerical computing libraryuvicorn- ASGI serverpydantic- Data validationlangchain- RAG frameworklangchain-openai- OpenAI embeddings integrationlangchain-qdrant- Qdrant vector store integrationqdrant-client- Qdrant Python clientollama- Ollama Python client for local LLM inferencepypdf- PDF processing libraryopenai- OpenAI Python client (for GPT, Gemini, and DeepSeek models)dramatiq[redis]- Background task queueredis- Redis Python client
See requirements.txt for the complete list.
- After login, you receive both
access_tokenandrefresh_token - Use
access_tokenin theAuthorizationheader for protected routes:Authorization: Bearer <access_token> - When the access token expires, use
/auth/refreshto get a new access token - Call
/auth/logoutto invalidate tokens when logging out
/convert/image/text- Requires valid access token and verified email/convert/sound/text- Requires valid access token and verified email/pdf/get/response- Requires valid access token and verified email/job/{message_id}- Requires valid access token and verified email/auth/refresh- Requires valid refresh token/auth/logout- Requires valid access token
The project includes comprehensive tests using pytest and pytest-asyncio for async testing.
# Run all tests
pytest
# Run with coverage
pytest --cov=app --cov-report=html
# Run specific test file
pytest tests/test_auth.py
# Run with verbose output
pytest -vtests/conftest.py- Pytest fixtures and configurationtests/test_auth.py- Authentication route teststests/test_image_to_text.py- Image conversion route teststests/test_sound_to_text.py- Sound to text conversion route teststests/test_rag_with_pdf.py- RAG with PDF route teststests/test_jobs.py- Unified job status endpoint tests
Tests cover:
- User registration and authentication
- Token refresh and logout
- Protected route access
- Image-to-text job queueing
- Sound-to-text job queueing
- RAG with PDF job queueing
- Unified job status checking
- Error handling
- Fork the repository
- Create a feature branch
- Make your changes
- Test your changes
- Submit a pull request