Skip to content

Commit 8de3d16

Browse files
committed
Massive refactoring with a huge number of features, yes, that description doesn't help much.
1 parent a3c4a76 commit 8de3d16

166 files changed

Lines changed: 111027 additions & 73148 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,18 @@ BigOcrPDF is a powerful, all-in-one OCR application that adds searchable text la
3333
Manage your documents before and after OCR — no need for a separate tool.
3434

3535
- **Drag-and-drop page reordering** with thumbnail previews
36-
- **Rotate pages** left or right in 90° increments
36+
- **Rotate & flip pages** left, right, horizontal, and vertical
3737
- **Delete pages** you don't need
3838
- **Merge files** — combine pages from multiple PDFs and images into one document
3939
- **Create PDFs from images** — import JPEG, PNG, TIFF, WebP, RAW photos, and more
4040
- **EXIF-aware import** — automatically applies correct orientation from camera metadata
41-
- **Zoom control** — 50% to 200% thumbnail scaling
41+
- **Zoom control** — 50% to 200% thumbnail scaling with keyboard shortcuts
4242
- **Select pages for OCR** — choose exactly which pages to process
43+
- **Context menu** — right-click any page to save as image or PDF
44+
- **Compress PDF** — reduce file size with configurable quality and DPI
45+
- **Split PDF** — by page count or target file size
46+
- **Undo support** — revert page operations with Ctrl+Z
47+
- **Window size persistence** — remembers your preferred dimensions
4348

4449
### OCR Engine
4550

@@ -65,7 +70,8 @@ Automatically clean up scans and photos before OCR for maximum accuracy.
6570
- **Illumination normalization** — even out uneven lighting
6671
- **Scanner effect** — LAB-space background normalization
6772
- **Denoising** — bilateral filter and Non-Local Means
68-
- **All toggles individually controllable** from the settings page
73+
- **Enhance embedded images** — apply corrections to images inside mixed-content pages
74+
- **All toggles individually controllable** from educational settings dialogs with visual illustrations
6975

7076
### Export Options
7177

@@ -76,6 +82,7 @@ Get your text out in the format you need.
7682
| **Searchable PDF** | Original pages with invisible OCR text layer |
7783
| **PDF/A-2b** | ISO archival standard with metadata injection (preserves original images) |
7884
| **Custom Quality PDF** | Choose JPEG quality: 30%, 50%, 70%, 85%, or 95% |
85+
| **Black & White (JBIG2)** | Pure black-and-white output using JBIG2 — the most compact format for text-only documents |
7986
| **Plain Text (.txt)** | Extracted text from all pages |
8087
| **ODF/ODT** ⚠️ | 4 modes: formatted + images, images + simple text, formatted text only, or plain text *(experimental — formatting quality may vary)* |
8188

@@ -95,7 +102,8 @@ Extract text from anything on your screen.
95102

96103
Handle large workloads efficiently.
97104

98-
- **Multi-file queue** — add files via drag-and-drop or file chooser
105+
- **Multi-file queue** — add files via drag-and-drop or file chooser, with grid and list views
106+
- **File information** — right-click any file to view PDF metadata, fonts, images, and attachments
99107
- **Checkpoint/resume** — interrupted sessions automatically resume on next launch
100108
- **Processing history** — tracks file sizes, page counts, processing time, and success/failure
101109
- **Cancel anytime** with clean cleanup
@@ -172,10 +180,14 @@ Press **Print Screen** → select a region → export to **Extract text from ima
172180

173181
- **GTK4 + Libadwaita** — clean, modern design following GNOME Human Interface Guidelines
174182
- **Multi-page wizard** — Settings → Processing → Results
183+
- **Educational dialogs** — image corrections, output, and advanced settings with SVG illustrations explaining each option
184+
- **Grid / List view toggle** — switch between compact grid and detailed list in the file queue
185+
- **Context menus** — right-click files in the queue or pages in the editor for quick actions
175186
- **Toast notifications** — non-intrusive status feedback
176187
- **Before/After comparison** — track file size changes after OCR
177-
- **Window size persistence** — remembers your preferred dimensions
178-
- **28 UI languages** — Bulgarian, Chinese, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Croatian, Hungarian, Icelandic, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian
188+
- **Window size persistence** — remembers your preferred dimensions for all windows
189+
- **Keyboard shortcuts** — comprehensive shortcuts for all major actions
190+
- **28 UI languages** — Bulgarian, Chinese, Czech, Croatian, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hungarian, Icelandic, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian
179191

180192
---
181193

@@ -219,6 +231,15 @@ graph TD
219231

220232
---
221233

234+
## Quality & Testing
235+
236+
- **311 automated tests** covering OCR pipeline, PDF operations, export, preprocessing, editor logic, and utilities
237+
- **100% i18n coverage** — all 28 languages fully translated (604 strings each)
238+
- **Ruff-enforced** code style and linting
239+
- **WCAG 2.1 Level AA** accessibility considerations
240+
241+
---
242+
222243
## License
223244

224245
[GPL-3.0-or-later](LICENSE)

default.nix

Lines changed: 36 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -5,61 +5,75 @@
55
pkg-config,
66
wrapGAppsHook4,
77
gobject-introspection,
8-
tesseract,
9-
ocrmypdf,
108
poppler_utils,
9+
ghostscript,
10+
fribidi,
11+
jbig2enc ? null,
1112
}:
1213

1314
python3Packages.buildPythonApplication {
1415
pname = "bigocrpdf";
15-
version = "2.0.0";
16+
version = "3.0.0";
1617

1718
src = ./.;
1819

1920
pyproject = true;
2021

2122
build-system = with python3Packages; [ setuptools wheel ];
22-
23+
2324
dependencies = with python3Packages; [
2425
pygobject3
2526
pycairo
26-
ocrmypdf
27+
rapidocr
28+
pikepdf
29+
reportlab
30+
opencv4
31+
pillow
32+
numpy
33+
scipy
34+
odfpy
2735
];
2836

2937
nativeBuildInputs = [
3038
pkg-config
3139
wrapGAppsHook4
3240
gobject-introspection
3341
];
34-
42+
3543
buildInputs = [
3644
gtk4
3745
libadwaita
38-
tesseract
39-
ocrmypdf
40-
poppler_utils # For pdfinfo
41-
];
46+
poppler_utils
47+
ghostscript
48+
fribidi
49+
] ++ (if jbig2enc != null then [ jbig2enc ] else []);
4250

4351
postInstall = ''
44-
# Install desktop file and icons
52+
# Install desktop files
4553
mkdir -p $out/share/applications
46-
mkdir -p $out/share/icons/hicolor/scalable/apps
47-
48-
cp $src/bigocrpdf/usr/share/applications/*.desktop $out/share/applications/ || true
49-
cp -r $src/bigocrpdf/usr/share/icons/hicolor/* $out/share/icons/hicolor/ || true
50-
51-
# Install KDE service menu
54+
cp $src/usr/share/applications/*.desktop $out/share/applications/ || true
55+
56+
# Install icons
57+
mkdir -p $out/share/icons
58+
cp -r $src/usr/share/icons/* $out/share/icons/ || true
59+
60+
# Install service menus
5261
mkdir -p $out/share/kio/servicemenus
53-
cp $src/bigocrpdf/usr/share/kio/servicemenus/*.desktop $out/share/kio/servicemenus/ || true
54-
62+
cp $src/usr/share/kio/servicemenus/*.desktop $out/share/kio/servicemenus/ || true
63+
5564
# Install locale files
56-
cp -r $src/bigocrpdf/usr/share/locale $out/share/ || true
65+
mkdir -p $out/share/locale
66+
cp -r $src/usr/share/locale/* $out/share/locale/ || true
67+
68+
# Install bin wrappers
69+
mkdir -p $out/bin
70+
cp $src/usr/bin/* $out/bin/ || true
5771
'';
5872

5973
meta = {
60-
description = "Add OCR to your PDF documents to make them searchable";
74+
description = "OCR toolkit for Linux — searchable PDFs, image OCR, PDF editor";
6175
homepage = "https://github.com/biglinux/bigocrpdf";
62-
license = "GPL-3.0";
76+
license = "GPL-3.0-or-later";
6377
mainProgram = "bigocrpdf";
6478
};
6579
}

flake.nix

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@
2525
packages = with pkgs; [
2626
python3
2727
python3Packages.pip
28+
python3Packages.pytest
2829
ruff
29-
tesseract
3030
];
3131
};
3232
}

0 commit comments

Comments
 (0)