An ebook format converter — convert various ebook formats to EPUB.
| Source Format | Extension | Method | Layout Preservation |
|---|---|---|---|
| MOBI (KF8) | .mobi |
Extract embedded EPUB | Near lossless |
| AZW3 | .azw3 .azw |
Extract embedded EPUB | Near lossless |
| PDF (text-based) | .pdf |
Page-by-page extraction & rebuild | Good |
Requires Python 3.12+. uv is recommended:
git clone <repo-url>
cd ebook_convert
uv syncOr with pip:
pip install -e .# Output to the same directory with .epub extension
ebook-convert book.mobi
ebook-convert book.azw3
ebook-convert document.pdf
# Specify output path
ebook-convert book.mobi -o ~/Books/output.epub# Convert all supported files in a directory
ebook-convert ./my-books/uv run ebook-convert book.mobiUses KindleUnpack to unpack Kindle files. KF8 format (most azw3 and newer mobi files) contains a full EPUB structure internally — CSS styles, images, and fonts are fully preserved.
For legacy MOBI files that only contain HTML, the converter rebuilds the EPUB from extracted HTML/CSS/images, preserving original styles and chapter structure.
PDF is a fixed-layout format. Converting to reflowable EPUB involves:
- Text content: Fully extracted with bold, italic, and other styles preserved
- Heading detection: Automatically inferred from font size statistics (text larger than body size is recognized as headings)
- Image positioning: Sorted by page coordinates and inserted between corresponding text blocks, preserving relative text-image relationships
- Paragraph layout: 2em text indent, justified alignment, 1.8 line height
- Fonts: Prefers CJK serif fonts (Noto Serif CJK, Source Han Serif, etc.)
Note: Scanned PDFs (image-only) cannot extract text. OCR is not currently supported.
src/ebook_convert/
├── cli.py # CLI entry point (click)
├── converter.py # Conversion dispatcher
└── converters/
├── base.py # Base converter class
├── mobi.py # MOBI → EPUB
├── azw3.py # AZW3 → EPUB
└── pdf.py # PDF → EPUB
Subclass BaseConverter and implement the convert method:
from ebook_convert.converters.base import BaseConverter
class TxtConverter(BaseConverter):
supported_extensions = [".txt"]
def convert(self, input_path, output_path):
# conversion logic
...Then register it in the _CONVERTERS list in converter.py.
- click — CLI framework
- mobi — Kindle format unpacking (based on KindleUnpack)
- ebooklib — EPUB read/write
- PyMuPDF — PDF parsing
MIT