GitHub - Bhashini-IITJ/IndicPhotoOCR: Comprehensive Scene Text Recognition Toolkit across 11 Indian Languages

A Comprehensive Toolkit for Scene Text Recognition in Indian Languages

Welcome to IndicPhotoOCR! ⚡ We've built an extremely fast, robust, and comprehensive scene text recognition toolkit designed for detecting, identifying, and recognizing text across 11 Indian languages (plus English).

Supported Languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu, and English. (with Urdu and Meitei in the pipeline!)

It is expertly crafted to handle the unique scripts and complex structures of Indian languages. And with our latest v2 upgrades, it runs up to 5x faster natively, with built-in support for Batch Inference and precision Confidence Scoring! 🔥

✨ What's New & Exciting?

🚀 Lightning Fast Caching: The pipeline now intelligently caches models in GPU memory, drastically accelerating sequential scans by ~80% (5x Speedup) out of the box!
⚡ Batch Inference Engine: Got an image with hundreds of words? Just pass batch_size=32 into the core engine to process bounding boxes concurrently, slashing execution times even further.
🎯 Confidence Scores: Our public APIs now optionally expose exact neural network confidence probabilities so you can reliably filter out low-certainty predictions or calculate metrics.
🛡️ Atomic & Self-Contained: Auto-downloads models safely without corrupting, and uses system-agnostic absolute paths so you can run it from anywhere.

📅 Updates Timeline

[April 2026]: Added Batch Inference Engine, Model Caching, and Neural Confidence Scores resulting in ~5x speedup.
[August 2025]: Project page created.
[April 2025]: Documentation page created using Sphnix.
[March 2025]: Support for Huggingface Demo extened to 12 languages.
[Feburary 2025]: Added option to choose between tri-lingual and 12 class script identifiction models.
[Feburary 2025]: Added recoginition models for Malayalam and Kannada.
[January 2025]: Added ViT based script identification models.
[January 2025]: Demo available in huggingface space.
Currently demo supports scene images containing bi-lingual Hindi and English text.
[December 2024]: Detection Module: TextBPN++ added.
[November 2024]: Code available at Google Colab.
[November 2024]: Added support for 10 languages in the recognition module.
[September 2024]: Repository created.

📦 Quick Installation

We recommend creating a virtual environment before installing:

conda create -n indicphotoocr python=3.9 -y
conda activate indicphotoocr

git clone https://github.com/Bhashini-IITJ/IndicPhotoOCR.git
cd IndicPhotoOCR
chmod +x setup.sh
./setup.sh

💡 How to Use

Using IndicPhotoOCR is incredibly simple. You can execute the entire End-to-End Scene Text Recognition pipeline (Detection ➡️ Identification ➡️ Recognition) with just three lines of Python!

💥 End-to-End Pipeline (Fastest Method)

from IndicPhotoOCR.ocr import OCR

# Initialize the OCR Engine
ocr_system = OCR(verbose=False, identifier_lang="auto", device="cuda:0")

# Boom! Run the whole pipeline natively
results = ocr_system.ocr("test_images/image_141.jpg")

# The output is a structured list of lines (paragraphs), where each line is a list of words sequentially ordered left-to-right!
# Example Output:
# [
#    ["राजीव", "चौक", "मेट्रो", "स्टेशन"],   <-- Line 1
#    ["Rajiv", "Chowk", "Metro", "Station"]  <-- Line 2
# ]

# 🔥 PRO-TIP: Process very large images with thousands of words concurrently!
fast_results = ocr_system.ocr("test_images/image_141.jpg", batch_size=32)

🎯 Modular Execution (Advanced)

If you do not want to run the entire pipeline at once, you can hook into individual modules manually:

1. Text Detection Module

Extract coordinates of all bounding boxes containing text in an image.

from IndicPhotoOCR.ocr import OCR

ocr_system = OCR(verbose=True, device="cuda:0")

# Get raw bounding box detections
detections = ocr_system.detect("test_images/image_141.jpg")

# Optional: Visualize and save the detected bounding boxes
ocr_system.visualize_detection("test_images/image_141.jpg", detections)
# Saves an image with boxes drawn over it

2. Script Identification Module

Take a single, cropped image of a word and predict what language it is written in.

from IndicPhotoOCR.ocr import OCR

ocr_system = OCR(verbose=True, identifier_lang="auto", device="cuda:0")

# Identify script of a cropped word
lang = ocr_system.identify("test_images/cropped_word.jpg")
print(lang)
# Output: 'hindi'

3. Text Recognition Module

Extract the literal text string from a cropped word image (and optionally get its confidence score).

from IndicPhotoOCR.ocr import OCR

ocr_system = OCR(verbose=True, device="cuda:0")

# Recognize text (old behavior, returns string)
text = ocr_system.recognise("test_images/cropped_word.jpg", "hindi")

# Recognize text WITH Confidence Score (new behavior)
text, conf_score = ocr_system.recognise("test_images/cropped_word.jpg", "hindi", return_confidence=True)
print(f"Recognized: {text} | Certainty: {conf_score * 100:.2f}%")

📚 Related Datasets & Citations

Bharat Scene Text Dataset - BSTD

🎉 Our paper has been officially accepted in IJDAR (International Journal on Document Analysis and Recognition)!

If you use IndicPhotoOCR in your research, please cite us:

@misc{ipo,
  author = {Anik De et al.},
  title      = {{I}ndic{P}hoto{O}CR: A comprehensive toolkit for {I}ndian language scene text understanding},
  howpublished = {\url{https://github.com/Bhashini-IITJ/IndicPhotoOCR/}},
  year         = 2024,
}

🤝 Project Contributors


Anik De - Tech Lead & Main Contributor


Abhirama	Aditya Rathore	Harshiv Shah


Sagar Agarwal	Rajeev Yadav	Pravin Kumar


Anand Mishra - Project Investigator

🙏 Acknowledgements

Text Recognition: PARseq
Text Detection: TextBPN++ Original Repository
EAST Re-implementation: EAST Repository
National Language Translation Mission: Bhashini

📬 Contact us

For any queries, please contact us at:

Anik De

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
IndicPhotoOCR		IndicPhotoOCR
static/pics		static/pics
test_images		test_images
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
benchmark_batch.py		benchmark_batch.py
optimazation_report_april2026.md		optimazation_report_april2026.md
pytest.ini		pytest.ini
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Comprehensive Toolkit for Scene Text Recognition in Indian Languages

✨ What's New & Exciting?

📅 Updates Timeline

📦 Quick Installation

💡 How to Use

💥 End-to-End Pipeline (Fastest Method)

🎯 Modular Execution (Advanced)

📚 Related Datasets & Citations

🤝 Project Contributors

🙏 Acknowledgements

📬 Contact us

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Comprehensive Toolkit for Scene Text Recognition in Indian Languages

✨ What's New & Exciting?

📅 Updates Timeline

📦 Quick Installation

💡 How to Use

💥 End-to-End Pipeline (Fastest Method)

🎯 Modular Execution (Advanced)

📚 Related Datasets & Citations

🤝 Project Contributors

🙏 Acknowledgements

📬 Contact us

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages