An automated end-to-end CAPTCHA recognition system evolved from traditional machine learning to state-of-the-art Deep Learning (CRNN+CTC). This project demonstrates the journey of improving OCR accuracy from 44% to over 98%.
This project underwent three major phases of technical improvement:
- Method: Character segmentation (Otsu's thresholding) + HOG Feature Extraction + LinearSVC.
- Limitation: Highly sensitive to segmentation errors; struggling with overlapping characters.
- Accuracy: ~44%.
- Method: Improved segmentation + 4-layer CNN for character classification.
- Link: Original CNN Research
- Accuracy: ~91% (Character level).
- Method: CNN (Feature extraction) + RNN (BLSTM for sequence) + CTC Loss (Alignment).
- Advantage: No explicit segmentation required. Recognizes the entire 5-character sequence as a whole.
- Accuracy: 98.13% (Word Accuracy) / 98.88% (Char Accuracy).
- Goal: Beyond text recognition, expanding to Object Classification based challenges.
- Method: EfficientNet-B1 backbone for 3x3 grid image classification.
- Integration: Unified both Text and Image CAPTCHA into a single FastAPI serving platform.
The system is designed as a modular FastAPI application that serves multiple types of CAPTCHA models through a unified inference pipeline.
graph TD
A[Client Request] --> B[FastAPI Gateway]
B --> C{Captcha Type?}
C -- "v1 (Text)" --> D[CRNN + CTC Model]
C -- "v2 (Image)" --> E[EfficientNet-B1 Model]
D --> F[Unified Response]
E --> F
- Inference Service: Separate service layers for
v1andv2, sharing a consistent preprocessing and logging logic. - Scalable Serving: Models and preprocessing scripts are encapsulated for seamless deployment across local and cloud environments.
| Metric | Value | Description |
|---|---|---|
| Word Accuracy | 98.13% | Entire 5-char sequence match (Exact) |
| Char Accuracy | 99.63% | Individual character prediction success |
| Precision | 0.99 | Ratio of true positive predictions |
| F1-Score | 0.99 | Balanced harmonic mean of P & R |
| CTC Loss | 0.0578 | Model convergence error rate |
- Deep Learning: PyTorch, TorchInfo
- Backend: FastAPI (Python 3.13)
- Tracking: MLflow (Experiment Logging & Metric Archiving)
- Frontend: Glassmorphism UI (Tailwind CSS, Vanilla JS)
- Verification: 5-Fold Cross Validation for robust evaluation
Captcha/
├── app/ # FastAPI Web Server
│ ├── services/ # OCR & Metadata Logic
│ ├── routers/ # API Endpoints
│ └── templates/ # Dashboard (UI)
├── models/ # Best Weights (.pth) & Metadata (.json)
├── scripts/ # Training, Tuning, & Integration Scripts
├── notebooks/ # Research (Jupyter Notebooks)
└── requirements.txt # Dependency mapping
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt./.venv/bin/python -m uvicorn app.main:app --reload --port 80The project is configured for automated deployment to AWS EC2 via GitHub Actions & SSH. Both Text and Image CAPTCHA models are bundled into a single Docker image for unified serving.
- Integrated Image: A single Dockerfile handles model weights, metadata, and dependencies for all versions.
- Automated CD: Push to the
mainbranch triggers the GitHub Actions pipeline. - Serving Environment: The workflow pushes the image to Docker Hub and restarts the container on EC2 port 80.
To ensure consistent inference between development and production, we use a unified deployment script for both model versions.
- Standardized Inference: All models use a
RecaptchaPredictorpattern (as seen inscripts/inference.py) to wrap model loading and preprocessing. - Metadata-Driven Dashboard:
generate_v1_metadata.pyandgenerate_v2_metadata.pyextract performance metrics, which are then used to dynamically populate the web dashboard at runtime.
# General training
python scripts/train_crnn_ctc.py
# Robust verification
python scripts/train_crnn_kfold.pyDeveloped as part of an Advanced AI OCR Portfolio. Latest metrics exported via export_metadata.py.



