Skip to content

cosmolei/ebook_convert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ebook-convert

An ebook format converter — convert various ebook formats to EPUB.

中文文档

Supported Formats

Source Format Extension Method Layout Preservation
MOBI (KF8) .mobi Extract embedded EPUB Near lossless
AZW3 .azw3 .azw Extract embedded EPUB Near lossless
PDF (text-based) .pdf Page-by-page extraction & rebuild Good

Installation

Requires Python 3.12+. uv is recommended:

git clone <repo-url>
cd ebook_convert

uv sync

Or with pip:

pip install -e .

Usage

Single File

# Output to the same directory with .epub extension
ebook-convert book.mobi
ebook-convert book.azw3
ebook-convert document.pdf

# Specify output path
ebook-convert book.mobi -o ~/Books/output.epub

Batch Convert

# Convert all supported files in a directory
ebook-convert ./my-books/

Run with uv (no install needed)

uv run ebook-convert book.mobi

Conversion Details

MOBI / AZW3

Uses KindleUnpack to unpack Kindle files. KF8 format (most azw3 and newer mobi files) contains a full EPUB structure internally — CSS styles, images, and fonts are fully preserved.

For legacy MOBI files that only contain HTML, the converter rebuilds the EPUB from extracted HTML/CSS/images, preserving original styles and chapter structure.

PDF

PDF is a fixed-layout format. Converting to reflowable EPUB involves:

  • Text content: Fully extracted with bold, italic, and other styles preserved
  • Heading detection: Automatically inferred from font size statistics (text larger than body size is recognized as headings)
  • Image positioning: Sorted by page coordinates and inserted between corresponding text blocks, preserving relative text-image relationships
  • Paragraph layout: 2em text indent, justified alignment, 1.8 line height
  • Fonts: Prefers CJK serif fonts (Noto Serif CJK, Source Han Serif, etc.)

Note: Scanned PDFs (image-only) cannot extract text. OCR is not currently supported.

Project Structure

src/ebook_convert/
├── cli.py              # CLI entry point (click)
├── converter.py        # Conversion dispatcher
└── converters/
    ├── base.py         # Base converter class
    ├── mobi.py         # MOBI → EPUB
    ├── azw3.py         # AZW3 → EPUB
    └── pdf.py          # PDF → EPUB

Adding New Formats

Subclass BaseConverter and implement the convert method:

from ebook_convert.converters.base import BaseConverter

class TxtConverter(BaseConverter):
    supported_extensions = [".txt"]

    def convert(self, input_path, output_path):
        # conversion logic
        ...

Then register it in the _CONVERTERS list in converter.py.

Dependencies

  • click — CLI framework
  • mobi — Kindle format unpacking (based on KindleUnpack)
  • ebooklib — EPUB read/write
  • PyMuPDF — PDF parsing

License

MIT

About

An ebook format converter — convert various ebook formats to EPUB.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages