This document describes how Papyrix handles images in EPUB content.
EPUB HTML → ChapterHtmlSlimParser → ImageConverter → BMP Cache → GfxRenderer
- HTML Parsing: Detects
<img>tags, extractssrcandaltattributes - Data URI Stripping: Removes embedded base64 images before XML parsing (prevents OOM)
- Image Extraction: Extracts image from EPUB ZIP to temp file
- Conversion: Converts JPEG/PNG to BMP format
- Caching: Stores converted BMP on SD card
- Rendering: Displays image centered on page
- JPEG (
.jpg,.jpeg) — Baseline only (see below) - PNG (
.png) — Transparency rendered as opaque - BMP (
.bmp) — Direct display, no conversion needed
Format detection is case-insensitive.
The picojpeg decoder supports:
- Baseline DCT (SOF0) — Standard single-pass JPEG
- Extended sequential DCT (SOF1) — Extended baseline
Not supported (displays as placeholder):
- Progressive DCT (SOF2) — Multi-pass progressive JPEG
- Arithmetic coding (SOF9, SOF10) — Rarely used
Progressive JPEGs are detected by scanning for SOF markers before decoding. If detected, the image is skipped and shows a placeholder instead.
To convert progressive JPEGs to baseline, use tools like ImageMagick:
convert progressive.jpg -interlace none baseline.jpg- Max parse width: 2048px — Memory limit during decoding
- Max parse height: 3072px — Memory limit during decoding
- Max render height: viewport — Images taller than half viewport get a dedicated page
- Min dimension: 20px — Images <20px in width or height are skipped as decorative
- Min free heap: 8KB — Parsing aborts if memory drops below
Images exceeding viewport width are scaled down proportionally while maintaining aspect ratio.
Images are rendered when all conditions are met:
showImagessetting is enabled- Source path is valid and non-empty
- Source is not a data URI
- Format is supported (JPEG/PNG/BMP)
- File exists in EPUB archive
- Conversion succeeds
- Sufficient memory available (≥8KB free)
- Fewer than 3 consecutive failures in current chapter
- Unsupported format — Not JPEG/PNG/BMP (e.g. GIF, SVG, WebP, TIFF). Detected by file extension before any processing.
- Tiny decorative images — Width or height <20px (e.g. 1px-tall JPEG line separators, small spacer PNGs, decorative borders). These are invisible on e-paper and would only waste vertical space.
showImagesdisabled — User preference- Empty/malformed source — Invalid HTML
- Data URI source — Memory protection (see below)
- Progressive/arithmetic JPEG — picojpeg limitation
- File not found — Missing from EPUB archive
- Conversion failure — Corrupt file or I/O error
- Insufficient memory — <8KB free heap
- Failure rate limit — ≥3 consecutive failures
Some EPUBs embed images as base64 data URIs:
<img src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEASABIAAD..." />These can be 1MB+ of text and cause out-of-memory crashes during XML parsing. The expat XML parser must allocate memory to store the entire attribute value.
The DataUriStripper pre-processes HTML buffers before the XML parser sees them:
- Scans for
src="data:patterns (case-insensitive, handles single/double quotes) - Replaces the data URI with
src="#"in-place - Handles patterns that span buffer boundaries (streaming-safe)
This prevents memory allocation for embedded image data while preserving the document structure.
lib/Epub/Epub/parsers/DataUriStripper.h— Header with interfacelib/Epub/Epub/parsers/DataUriStripper.cpp— Implementation
Images are cached to SD card under /.papyrix/epub_<hash>/images/:
.papyrix/
└── epub_12345678/
└── images/
├── a1b2c3d4.bmp # Converted image
├── e5f6g7h8.bmp # Another converted image
└── i9j0k1l2.failed # Failed conversion marker
Cache filenames use FNV-1a hash of the resolved image path:
- Input: Full path within EPUB (e.g.,
OEBPS/images/cover.jpg) - Output: 8-character hex hash (e.g.,
a1b2c3d4.bmp)
This ensures:
- Same image referenced multiple times is cached once
- No path character escaping needed
- Fixed-length filenames
When image conversion fails, a .failed marker file is created:
- Prevents re-attempting conversion on subsequent loads
- Contains no data (empty file)
- Cleared when book cache is cleared
To prevent a corrupt EPUB from causing excessive delays, image processing implements failure rate limiting:
- Threshold: 3 consecutive failures
- Scope: Per chapter (resets when moving to new spine item)
- Behavior: After threshold reached, remaining images in chapter display as placeholders
This ensures that a few corrupt images don't prevent reading the rest of the chapter.
Before processing each image:
- Check
heap_caps_get_largest_free_block(MALLOC_CAP_8BIT) - If < 8KB, skip image and show placeholder
- Log warning for diagnostics
Image extraction uses a temporary file on SD card:
- Extract from ZIP to temp file
- Convert temp file to BMP
- Delete temp file
- Cache BMP result
This avoids holding the entire source image in RAM.
Settings > Display > Show Images
- On (default): Images are rendered inline
- Off: All images display as
[Image: alt-text]placeholders
Disabling images:
- Reduces memory usage
- Speeds up page rendering
- Useful for text-heavy reading
- Check Settings > Display > Show Images is enabled
- Verify image format is JPEG/PNG/BMP
- Check SD card has free space for cache
- Try clearing book cache (Settings > Cleanup > Clear Book Cache)
- First load converts images (slower)
- Subsequent loads use cache (faster)
- Consider disabling images for faster reading
- Large images may exceed available RAM
- Try a different EPUB with smaller images
- Use xteink-epub-optimizer to resize images