Skip to content

VedJoshi/CAPTCHA

 
 

Repository files navigation

CAPTCHA Recognition with CRNN

A deep learning system for recognizing alphanumeric CAPTCHA images of variable lengths using a Convolutional Recurrent Neural Network (CRNN) architecture with CTC loss.

📊 Performance (Based on test dataset N=2000)

  • Exact Match Accuracy: 63.4%
  • Character Error Rate (CER): 0.100
  • Model Architecture: CNN + Bidirectional LSTM with CTC decoding

Note: CER is based on Levenstein distance, number of deletions/insertions/edits required to change from the output string to the correct string. Character Error Rate = total number of edits divided by total number of characters.

Predictions

🏗️ Architecture

Core Components

  1. CNN Feature Extractor

    • Residual blocks for deep feature learning
    • 4-channel input (RGB + Sobel edge detection)
    • Multiple Conv + Pooling layers
    • Output: 256-channel feature maps
  2. Recurrent Sequence Modeling

    • 2-layer Bidirectional LSTM (256 hidden units)
    • Sequence modeling of variable-length CAPTCHAs
    • Dropout for regularization
  3. CTC Decoding

    • Connectionist Temporal Classification loss
    • Greedy decoding for inference
    • Supports variable-length output sequences

Key Features

  • Data Augmentation: Color jitter, affine transforms, blurring
  • Edge Enhancement: Sobel edge detection as additional channel
  • Image Preprocessing: Noise removal, contrast enhancement, aspect-preserving resize
  • Batch Processing: Handles variable-width images via padding

🚀 Installation

Prerequisites

  • Python 3.10
  • CUDA-capable GPU (recommended)

Install Dependencies (conda 🐍)

conda env create -f environment.yml

Install Dependencies (pip or docker🐳)

pip install -r requirements.txt

Instructions to run:

Training script

python -m src.train

Inference script: - runs inference on all test images (N=2000)

python -m src.evaluate

Predict 1 image

python -m src.predict path/to/captcha.png

Visualise augmentation:

python -m src.visualise_aug.py

⚙️ Configuration

Use config.yaml to adjust hyperparameters.

Failed approaches

  • Beamsearch does not yield results and slows computation down significantly given python's computational speed
  • Mapping to HSV space instead of RGB doesn't yield significant improvements. It was initally tested for segmentation of overlapping characters with different colours.
  • Addition of squeeze and excitation blocks did not yield significant improvement over the normal Res-Net. Refer to https://medium.com/@tahasamavati/squeeze-and-excitation-explained-387b5981f249

References

  • An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition (Shi et al.’s)
  • Deep Residual Learning for Image Recognition (He et al., 2015)

Image source

https://drive.google.com/drive/folders/1JikBA_bt7HwUYge73WuohRibamdsBTcC?usp=drive_link

About

Captcha solver built on CRNN architecture

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 95.1%
  • Python 4.9%