Skip to content

Bhashini-IITJ/IndicPhotoOCR

Repository files navigation

IndicPhotoOCR Logo

A Comprehensive Toolkit for Scene Text Recognition in Indian Languages

Open Source Visitor Count GitHub Repo stars GitHub forks arXiv

Hugging Face Open In Colab Documentation


Welcome to IndicPhotoOCR! ⚡ We've built an extremely fast, robust, and comprehensive scene text recognition toolkit designed for detecting, identifying, and recognizing text across 11 Indian languages (plus English).

Supported Languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu, and English. (with Urdu and Meitei in the pipeline!)

It is expertly crafted to handle the unique scripts and complex structures of Indian languages. And with our latest v2 upgrades, it runs up to 5x faster natively, with built-in support for Batch Inference and precision Confidence Scoring! 🔥


✨ What's New & Exciting?

  • 🚀 Lightning Fast Caching: The pipeline now intelligently caches models in GPU memory, drastically accelerating sequential scans by ~80% (5x Speedup) out of the box!
  • Batch Inference Engine: Got an image with hundreds of words? Just pass batch_size=32 into the core engine to process bounding boxes concurrently, slashing execution times even further.
  • 🎯 Confidence Scores: Our public APIs now optionally expose exact neural network confidence probabilities so you can reliably filter out low-certainty predictions or calculate metrics.
  • 🛡️ Atomic & Self-Contained: Auto-downloads models safely without corrupting, and uses system-agnostic absolute paths so you can run it from anywhere.

📅 Updates Timeline

[April 2026]: Added Batch Inference Engine, Model Caching, and Neural Confidence Scores resulting in ~5x speedup.
[August 2025]: Project page created.
[April 2025]: Documentation page created using Sphnix.
[March 2025]: Support for Huggingface Demo extened to 12 languages.
[Feburary 2025]: Added option to choose between tri-lingual and 12 class script identifiction models.
[Feburary 2025]: Added recoginition models for Malayalam and Kannada.
[January 2025]: Added ViT based script identification models.
[January 2025]: Demo available in huggingface space.
Currently demo supports scene images containing bi-lingual Hindi and English text.
[December 2024]: Detection Module: TextBPN++ added.
[November 2024]: Code available at Google Colab.
[November 2024]: Added support for 10 languages in the recognition module.
[September 2024]: Repository created.


📦 Quick Installation

We recommend creating a virtual environment before installing:

conda create -n indicphotoocr python=3.9 -y
conda activate indicphotoocr

git clone https://github.com/Bhashini-IITJ/IndicPhotoOCR.git
cd IndicPhotoOCR
chmod +x setup.sh
./setup.sh

💡 How to Use

Using IndicPhotoOCR is incredibly simple. You can execute the entire End-to-End Scene Text Recognition pipeline (Detection ➡️ Identification ➡️ Recognition) with just three lines of Python!

💥 End-to-End Pipeline (Fastest Method)

from IndicPhotoOCR.ocr import OCR

# Initialize the OCR Engine
ocr_system = OCR(verbose=False, identifier_lang="auto", device="cuda:0")

# Boom! Run the whole pipeline natively
results = ocr_system.ocr("test_images/image_141.jpg")

# The output is a structured list of lines (paragraphs), where each line is a list of words sequentially ordered left-to-right!
# Example Output:
# [
#    ["राजीव", "चौक", "मेट्रो", "स्टेशन"],   <-- Line 1
#    ["Rajiv", "Chowk", "Metro", "Station"]  <-- Line 2
# ]

# 🔥 PRO-TIP: Process very large images with thousands of words concurrently!
fast_results = ocr_system.ocr("test_images/image_141.jpg", batch_size=32)

🎯 Modular Execution (Advanced)

If you do not want to run the entire pipeline at once, you can hook into individual modules manually:

1. Text Detection Module Extract coordinates of all bounding boxes containing text in an image.
from IndicPhotoOCR.ocr import OCR

ocr_system = OCR(verbose=True, device="cuda:0")

# Get raw bounding box detections
detections = ocr_system.detect("test_images/image_141.jpg")

# Optional: Visualize and save the detected bounding boxes
ocr_system.visualize_detection("test_images/image_141.jpg", detections)
# Saves an image with boxes drawn over it
2. Script Identification Module Take a single, cropped image of a word and predict what language it is written in.
from IndicPhotoOCR.ocr import OCR

ocr_system = OCR(verbose=True, identifier_lang="auto", device="cuda:0")

# Identify script of a cropped word
lang = ocr_system.identify("test_images/cropped_word.jpg")
print(lang)
# Output: 'hindi'
3. Text Recognition Module Extract the literal text string from a cropped word image (and optionally get its confidence score).
from IndicPhotoOCR.ocr import OCR

ocr_system = OCR(verbose=True, device="cuda:0")

# Recognize text (old behavior, returns string)
text = ocr_system.recognise("test_images/cropped_word.jpg", "hindi")

# Recognize text WITH Confidence Score (new behavior)
text, conf_score = ocr_system.recognise("test_images/cropped_word.jpg", "hindi", return_confidence=True)
print(f"Recognized: {text} | Certainty: {conf_score * 100:.2f}%")

📚 Related Datasets & Citations

  • Bharat Scene Text Dataset - BSTD

🎉 Our paper has been officially accepted in IJDAR (International Journal on Document Analysis and Recognition)!

If you use IndicPhotoOCR in your research, please cite us:

@misc{ipo,
  author = {Anik De et al.},
  title      = {{I}ndic{P}hoto{O}CR: A comprehensive toolkit for {I}ndian language scene text understanding},
  howpublished = {\url{https://github.com/Bhashini-IITJ/IndicPhotoOCR/}},
  year         = 2024,
}

🤝 Project Contributors

Anik De - Tech Lead & Main Contributor
Abhirama Aditya Rathore Harshiv Shah
Sagar Agarwal Rajeev Yadav Pravin Kumar
Anand Mishra - Project Investigator

🙏 Acknowledgements

📬 Contact us

For any queries, please contact us at:

Releases

No releases published

Packages

 
 
 

Contributors