You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BigOcrPDF is a powerful, all-in-one OCR application that adds searchable text layers to scanned PDFs, extracts text from images, and provides a full-featured PDF editor — all from a modern, native Linux interface.
16
+
17
+
## Why BigOcrPDF?
18
+
19
+
-**AI-Powered OCR** — Uses **RapidOCR PP-OCRv5** with OpenVINO hardware acceleration for fast, accurate text recognition across **130+ languages**
20
+
-**Edit, Merge & Organize PDFs** — Reorder pages, rotate, delete, and combine multiple PDFs and images into a single document
21
+
-**Smart Preprocessing** — Automatic perspective correction, deskew, dewarping, and illumination normalization — even photos of documents come out clean
22
+
-**Multiple Export Formats** — Searchable PDF, PDF/A-2b archival, plain text, and ODF/ODT with layout-aware formatting
23
+
-**Screen Capture OCR** — Select any region on screen and instantly extract text
24
+
-**Batch Processing** — Process dozens of files at once with checkpoint/resume support
25
+
-**File Manager Integration** — Right-click any PDF or image to OCR it directly
26
+
27
+
---
28
+
29
+
## Key Features
5
30
6
-
## Features
31
+
### PDF Editor
32
+
33
+
Manage your documents before and after OCR — no need for a separate tool.
34
+
35
+
-**Drag-and-drop page reordering** with thumbnail previews
36
+
-**Rotate pages** left or right in 90° increments
37
+
-**Delete pages** you don't need
38
+
-**Merge files** — combine pages from multiple PDFs and images into one document
39
+
-**Create PDFs from images** — import JPEG, PNG, TIFF, WebP, RAW photos, and more
40
+
-**EXIF-aware import** — automatically applies correct orientation from camera metadata
41
+
-**Zoom control** — 50% to 200% thumbnail scaling
42
+
-**Select pages for OCR** — choose exactly which pages to process
7
43
8
44
### OCR Engine
9
-
-**RapidOCR PP-OCRv5** AI models for state-of-the-art text recognition
10
-
-**27 languages** including Latin, Chinese, Japanese, Korean, Arabic, Cyrillic, Devanagari, and more
11
-
-**Parallel processing** with multi-core CPU utilization for batch jobs
12
-
-**BiDi text support** via `fribidi` for right-to-left scripts (Arabic, Hebrew)
13
-
14
-
### Image Processing
15
-
-**Auto deskew** — automatic rotation correction for scanned pages
16
-
-**Orientation detection** — auto-detect and fix 90°/180°/270° rotations
OCR para PDF e arquivos de imagem integrado no sistema.
188
+
## Architecture
125
189
126
-
Arquivos em formato PDF que foram digitalizados não possuem a opção de efetuar buscas ou copiar o texto. No BigLinux, basta clicar com o botão direito no arquivo e utilizar a opção de OCR — será criado um novo arquivo com esses recursos.
├── checkpoint_manager.py # Session resume support
217
+
└── ...
218
+
```
127
219
128
-
Se for necessário efetuar o procedimento em vários arquivos PDF, basta selecionar todos e utilizar a opção de OCR uma vez.
220
+
---
129
221
130
-
Também é possível extrair o texto de um arquivo de imagem, basta clicar com o botão direito e utilizar a opção: **"Extrair texto da imagem (OCR)"**.
222
+
## License
131
223
132
-
E ainda é possível utilizar diretamente da ferramenta de captura de tela: aperte **Print Screen**, use a ferramenta de **"Região Retangular"**, selecione a região com o texto e depois clique em **"Exportar"** → **"Extrair o texto da imagem (OCR)"**.
0 commit comments