Skip to content

Add PaddleOCR-VL-1.5 to the model registry#24

Closed
davanstrien wants to merge 1 commit into
mainfrom
feat/add-paddleocr-vl-1-5
Closed

Add PaddleOCR-VL-1.5 to the model registry#24
davanstrien wants to merge 1 commit into
mainfrom
feat/add-paddleocr-vl-1-5

Conversation

@davanstrien

Copy link
Copy Markdown
Owner

What

Adds PaddleOCR-VL-1.5 (PaddlePaddle/PaddleOCR-VL-1.5, 0.9B) to MODEL_REGISTRY, completing the PaddleOCR-VL pair alongside 1.6 (added in #21).

Why

Follow-up to #21, which added 1.6. Unlike 1.6 (Qwen3.5/flashinfer → needs the prebuilt vllm/vllm-openai image on a100), the 1.5 script uses transformers batch inference (no vLLM/flashinfer), so it's a standard drop-in: default uv-script image on l4x1. #21 itself flagged 1.5 as an easy drop-in follow-up.

Changes

  • run.py — register paddleocr-vl-1.5 as a standard model (l4x1, no image/python/env). Script default --task-mode is ocr (markdown), directly comparable to the other OCR models.
  • test_run.py — registry count 9→10; guard that 1.5 is standard (l4x1, no image-mode kwargs).

Verification

  • uv run ruff check src/ tests/ — clean
  • uv run --extra viewer pytest tests/ -q253 passed
  • ocr-bench run … --list-modelspaddleocr-vl-1.5 shows l4x1 (not image-mode)
  • ocr-bench run … --models paddleocr-vl-1.5 paddleocr-vl-1.6 --dry-run → 1.5 standard image, 1.6 image-mode
  • Smoke test on HF Jobs against davanstrien/encyclopaedia-britannica-1771results added as a comment below

🤖 Generated with Claude Code

1.5 uses transformers batch inference (no vLLM/flashinfer), so it runs on
the default uv-script image on l4x1 — no image-mode config needed, unlike 1.6.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@davanstrien

Copy link
Copy Markdown
Owner Author

Closing — smoke-tested on HF Jobs against davanstrien/encyclopaedia-britannica-1771 and PaddleOCR-VL-1.5 errors on every page:

[OCR ERROR: 'PaddleOCRVLImageProcessor' object has no attribute 'min_pixels']

The paddleocr-vl-1.5.py uv-script (uv-scripts/ocr) reads processor.image_processor.min_pixels, which transformers ≥5.0 removed (the script pins transformers>=5.0.0). The registry wiring in this PR is correct, but the underlying model script is non-functional on the current image, so adding it would only record OCR errors.

PaddleOCR-VL-1.6 (merged in #21) is verified working on Jobs and covers PaddleOCR in the bench. Can revisit 1.5 if/when the script is fixed upstream.

@davanstrien davanstrien deleted the feat/add-paddleocr-vl-1-5 branch June 26, 2026 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant