Skip to content

Commit 49dbc2a

Browse files
committed
docs: rewrite README with comprehensive feature showcase
- Remove Portuguese section (English-only for GitHub) - Add detailed PDF editor features (merge, reorder, rotate, delete, create from images) - Add complete OCR engine capabilities (130+ languages, 4 precision levels) - Add image preprocessing pipeline details (6-mode perspective correction, deskew, dewarp) - Add export format table (PDF, PDF/A, custom quality, text, ODF with 4 modes) - Add screen capture and batch processing sections - Add dependency table and architecture overview - Add badges (license, Python, GTK4)
1 parent 18a18db commit 49dbc2a

1 file changed

Lines changed: 174 additions & 82 deletions

File tree

README.md

Lines changed: 174 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,108 @@
1+
<div align="center">
2+
13
# BigOcrPDF
24

3-
Add OCR to your PDF documents to make them searchable — powered by **RapidOCR PP-OCRv5**.
4-
Modern GTK4 + Libadwaita interface for BigLinux and Arch-based distributions.
5+
**The complete OCR toolkit for Linux — turn scanned PDFs and images into searchable, editable documents.**
6+
7+
[![License: GPL-3.0](https://img.shields.io/badge/License-GPL%203.0-blue.svg)](LICENSE)
8+
[![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-3776AB.svg)](https://python.org)
9+
[![GTK4 + Libadwaita](https://img.shields.io/badge/GTK4-Libadwaita-4A86CF.svg)](https://gnome.org)
10+
11+
</div>
12+
13+
---
14+
15+
BigOcrPDF is a powerful, all-in-one OCR application that adds searchable text layers to scanned PDFs, extracts text from images, and provides a full-featured PDF editor — all from a modern, native Linux interface.
16+
17+
## Why BigOcrPDF?
18+
19+
- **AI-Powered OCR** — Uses **RapidOCR PP-OCRv5** with OpenVINO hardware acceleration for fast, accurate text recognition across **130+ languages**
20+
- **Edit, Merge & Organize PDFs** — Reorder pages, rotate, delete, and combine multiple PDFs and images into a single document
21+
- **Smart Preprocessing** — Automatic perspective correction, deskew, dewarping, and illumination normalization — even photos of documents come out clean
22+
- **Multiple Export Formats** — Searchable PDF, PDF/A-2b archival, plain text, and ODF/ODT with layout-aware formatting
23+
- **Screen Capture OCR** — Select any region on screen and instantly extract text
24+
- **Batch Processing** — Process dozens of files at once with checkpoint/resume support
25+
- **File Manager Integration** — Right-click any PDF or image to OCR it directly
26+
27+
---
28+
29+
## Key Features
530

6-
## Features
31+
### PDF Editor
32+
33+
Manage your documents before and after OCR — no need for a separate tool.
34+
35+
- **Drag-and-drop page reordering** with thumbnail previews
36+
- **Rotate pages** left or right in 90° increments
37+
- **Delete pages** you don't need
38+
- **Merge files** — combine pages from multiple PDFs and images into one document
39+
- **Create PDFs from images** — import JPEG, PNG, TIFF, WebP, RAW photos, and more
40+
- **EXIF-aware import** — automatically applies correct orientation from camera metadata
41+
- **Zoom control** — 50% to 200% thumbnail scaling
42+
- **Select pages for OCR** — choose exactly which pages to process
743

844
### OCR Engine
9-
- **RapidOCR PP-OCRv5** AI models for state-of-the-art text recognition
10-
- **27 languages** including Latin, Chinese, Japanese, Korean, Arabic, Cyrillic, Devanagari, and more
11-
- **Parallel processing** with multi-core CPU utilization for batch jobs
12-
- **BiDi text support** via `fribidi` for right-to-left scripts (Arabic, Hebrew)
13-
14-
### Image Processing
15-
- **Auto deskew** — automatic rotation correction for scanned pages
16-
- **Orientation detection** — auto-detect and fix 90°/180°/270° rotations
17-
- **Perspective correction** — straighten photographed documents
18-
- **Quality preservation** — auto-detect original JPEG quality to avoid recompression
19-
20-
### Output Formats
21-
- **PDF with OCR layer** — searchable PDF preserving the original layout
22-
- **PDF/A-2b** — archival format with JPEG 2000 compression
23-
- **Text export** — auto-save extracted text to `.txt` files
24-
- **ODF export** — export to LibreOffice/OpenDocument format
25-
26-
### Image OCR Mode (bigocrimage)
27-
- **Screen capture** — select a region to extract text instantly
28-
- **Image file OCR** — open any image and extract text
29-
- **Drag and drop** support
30-
31-
### User Interface
32-
- **GTK4 + Libadwaita** — clean, accessible design following GNOME HIG
33-
- **Adw.StatusPage** welcome and loading screens
34-
- **Toast notifications** for non-intrusive feedback
35-
- **Before/After comparison** — track file size changes
36-
- **Processing history** — view statistics of processed files
37-
- **20+ languages** for the UI (translations via gettext)
38-
39-
## System Requirements
40-
41-
- **Python** 3.10+
42-
- **GTK4** and **Libadwaita**
43-
- **poppler-utils**`pdfimages`, `pdftoppm`, `pdfinfo` for PDF image extraction
44-
- **ghostscript** — PDF/A-2b conversion
45-
- **fribidi** — BiDi text reordering for Arabic/Hebrew OCR
45+
46+
State-of-the-art text recognition powered by deep learning.
47+
48+
- **RapidOCR PP-OCRv5** models with OpenVINO inference (ONNX fallback)
49+
- **130+ languages** across 12 script families: Latin, Chinese, Japanese, Korean, Arabic, Cyrillic, Greek, Devanagari, Tamil, Telugu, Thai, and more
50+
- **4 precision levels** — from fast to very precise, tunable per job
51+
- **Parallel processing** — multi-core batch OCR with automatic worker scaling
52+
- **Invisible text layer** — preserves original page appearance while adding searchable text
53+
- **Smart detection** — auto-identifies image-only vs. mixed-content PDFs
54+
- **Re-OCR support** — replace existing text layers with improved recognition
55+
- **Right-to-left text** — full BiDi support for Arabic and Hebrew via `fribidi`
56+
57+
### Image Preprocessing
58+
59+
Automatically clean up scans and photos before OCR for maximum accuracy.
60+
61+
- **Perspective correction** — 6-mode cascade that straightens photographed documents
62+
- **Auto deskew** — fixes tilted scans using morphological analysis + Hough transform
63+
- **Baseline dewarp** — per-line polynomial fitting to flatten curved text
64+
- **Orientation detection** — auto-correct 90°/180°/270° rotations
65+
- **Illumination normalization** — even out uneven lighting
66+
- **Scanner effect** — LAB-space background normalization
67+
- **Denoising** — bilateral filter and Non-Local Means
68+
- **All toggles individually controllable** from the settings page
69+
70+
### Export Options
71+
72+
Get your text out in the format you need.
73+
74+
| Format | Description |
75+
|--------|-------------|
76+
| **Searchable PDF** | Original pages with invisible OCR text layer |
77+
| **PDF/A-2b** | ISO archival standard with JPEG 2000 compression |
78+
| **Custom Quality PDF** | Choose JPEG quality: 30%, 50%, 70%, 85%, or 95% |
79+
| **Plain Text (.txt)** | Extracted text from all pages |
80+
| **ODF/ODT** | 4 modes: formatted + images, images + simple text, formatted text only, or plain text |
81+
82+
ODF export includes **layout analysis**: automatic paragraph/heading detection, table detection, image embedding, and proper page breaks.
83+
84+
### Screen Capture & Image OCR
85+
86+
Extract text from anything on your screen.
87+
88+
- **Region capture** — select an area and get the text instantly
89+
- **Works with**: Spectacle (KDE), GNOME Screenshot, Flameshot
90+
- **Open any image** — JPEG, PNG, WebP, TIFF, RAW formats (CR2, DNG, NEF, ARW, and more)
91+
- **Copy to clipboard** with one click
92+
- **Standalone mode** — run `bigocrimage` for a dedicated image OCR window
93+
94+
### Batch Processing & Session Management
95+
96+
Handle large workloads efficiently.
97+
98+
- **Multi-file queue** — add files via drag-and-drop or file chooser
99+
- **Checkpoint/resume** — interrupted sessions automatically resume on next launch
100+
- **Processing history** — tracks file sizes, page counts, processing time, and success/failure
101+
- **Cancel anytime** with clean cleanup
102+
- **Auto-split output** — configurable maximum file size (10MB–100MB)
103+
- **Results page** with per-file statistics, text viewer, and export actions
104+
105+
---
46106

47107
## Installation
48108

@@ -60,13 +120,31 @@ cd bigocrpdf
60120
pip install -e .
61121
```
62122

123+
#### Dependencies
124+
125+
| Package | Purpose |
126+
|---------|---------|
127+
| `python >= 3.10` | Runtime |
128+
| `gtk4`, `libadwaita` | User interface |
129+
| `python-rapidocr-pp-ocrv5` | OCR engine |
130+
| `python-rapidocr-openvino` | Hardware-accelerated inference |
131+
| `poppler-utils` | PDF image extraction (`pdfimages`, `pdftoppm`, `pdfinfo`) |
132+
| `ghostscript` | PDF/A-2b conversion |
133+
| `python-opencv` | Image preprocessing |
134+
| `python-numpy` | Array operations |
135+
| `python-pillow` | Image format support |
136+
| `python-odfpy` | ODF/ODT export |
137+
| `fribidi` | BiDi text reordering (Arabic, Hebrew) |
138+
139+
---
140+
63141
## Usage
64142

65-
### GUI Application
143+
### GUI
66144

67145
```bash
68-
bigocrpdf # Start the main PDF OCR interface
69-
bigocrimage # Start the Image OCR window
146+
bigocrpdf # PDF OCR interface
147+
bigocrimage # Image OCR window
70148
```
71149

72150
### Command Line
@@ -75,58 +153,72 @@ bigocrimage # Start the Image OCR window
75153
bigocrpdf [OPTIONS] [FILES...]
76154
77155
Options:
78-
-v, --version Print version information and exit
79-
-d, --debug Enable debug mode
80-
--verbose Enable verbose output
81-
--image-mode Start in image OCR mode
82-
FILES PDF or image files to process
156+
-v, --version Show version and exit
157+
-d, --debug Enable debug logging
158+
--verbose Verbose output
159+
--image-mode Launch in image OCR mode
160+
FILES PDF or image files to open
83161
```
84162

85-
### Context Menu Integration
163+
### File Manager Integration
86164

87-
Right-click on PDF files in your file manager and select **OCR PDF**.
88-
Right-click on image files and select **Extract text from image (OCR)**.
165+
- **Right-click a PDF***Recognize text in scanned PDF (OCR)*
166+
- **Right-click an image***Extract text from image (OCR)*
167+
- **KDE Dolphin** context menu integration included
89168

90-
### Screen Capture OCR
169+
### Screen Capture
91170

92-
Press **Print Screen**, select a region, then export to **Extract text from image (OCR)**.
171+
Press **Print Screen**select a region export to **Extract text from image (OCR)**.
93172

94-
## Project Structure
173+
---
95174

96-
```
97-
src/bigocrpdf/
98-
├── application.py # Adw.Application entry point
99-
├── window.py # Main PDF OCR window
100-
├── config.py # Constants and configuration
101-
├── services/ # Business logic (OCR, capture, export)
102-
│ ├── processor.py # OCR engine interface
103-
│ ├── screen_capture.py # Screen capture + image OCR
104-
│ ├── export_service.py # PDF/text/ODF export
105-
│ └── rapidocr_service/ # RapidOCR PP-OCRv5 integration
106-
├── ui/ # Presentation layer (GTK4 widgets)
107-
│ ├── image_ocr_window.py # Standalone image OCR window
108-
│ ├── settings_page.py # Settings page
109-
│ └── pdf_editor/ # PDF page editor
110-
└── utils/ # Pure Python helpers
111-
├── i18n.py # Internationalization
112-
├── odf_exporter.py # ODF document generation
113-
└── pdf_utils.py # PDF manipulation utilities
114-
```
175+
## Interface
115176

116-
## License
177+
### UI Highlights
117178

118-
GPL-3.0-or-later
179+
- **GTK4 + Libadwaita** — clean, modern design following GNOME Human Interface Guidelines
180+
- **Multi-page wizard** — Settings → Processing → Results
181+
- **Toast notifications** — non-intrusive status feedback
182+
- **Before/After comparison** — track file size changes after OCR
183+
- **Window size persistence** — remembers your preferred dimensions
184+
- **28 UI languages** — Bulgarian, Chinese, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Croatian, Hungarian, Icelandic, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian
119185

120186
---
121187

122-
# PT-BR
123-
124-
OCR para PDF e arquivos de imagem integrado no sistema.
188+
## Architecture
125189

126-
Arquivos em formato PDF que foram digitalizados não possuem a opção de efetuar buscas ou copiar o texto. No BigLinux, basta clicar com o botão direito no arquivo e utilizar a opção de OCR — será criado um novo arquivo com esses recursos.
190+
```
191+
src/bigocrpdf/
192+
├── application.py # Adw.Application entry point
193+
├── window.py # Main PDF OCR window
194+
├── config.py # Constants and configuration
195+
├── services/
196+
│ ├── processor.py # OCR engine interface
197+
│ ├── screen_capture.py # Screen capture + image OCR
198+
│ ├── export_service.py # PDF/text/ODF export
199+
│ ├── contour_analysis.py # Document contour detection
200+
│ ├── perspective_correction.py
201+
│ └── rapidocr_service/ # RapidOCR PP-OCRv5 integration
202+
│ ├── engine.py # Singleton OCR engine
203+
│ ├── ocr_worker.py # Subprocess OCR worker
204+
│ ├── preprocessor.py # Image preprocessing pipeline
205+
│ ├── rotation.py # Orientation detection
206+
│ └── ...
207+
├── ui/
208+
│ ├── image_ocr_window.py # Standalone image OCR
209+
│ ├── settings_page.py # OCR settings
210+
│ ├── conclusion_page.py # Results & export
211+
│ ├── pdf_editor/ # PDF page editor
212+
│ └── ...
213+
└── utils/
214+
├── odf_exporter.py # ODF document generation
215+
├── layout_analyzer.py # Document structure detection
216+
├── checkpoint_manager.py # Session resume support
217+
└── ...
218+
```
127219

128-
Se for necessário efetuar o procedimento em vários arquivos PDF, basta selecionar todos e utilizar a opção de OCR uma vez.
220+
---
129221

130-
Também é possível extrair o texto de um arquivo de imagem, basta clicar com o botão direito e utilizar a opção: **"Extrair texto da imagem (OCR)"**.
222+
## License
131223

132-
E ainda é possível utilizar diretamente da ferramenta de captura de tela: aperte **Print Screen**, use a ferramenta de **"Região Retangular"**, selecione a região com o texto e depois clique em **"Exportar"****"Extrair o texto da imagem (OCR)"**.
224+
[GPL-3.0-or-later](LICENSE)

0 commit comments

Comments
 (0)