Add Mistral Document AI for document parsing by r-dh · Pull Request #175 · superlinear-ai/raglite

r-dh · 2026-01-23T15:34:37Z

This PR is work in progress, and not to be merged as is.

Implementing Mistral requires a lot of architectural decisions, which I want to make explicit and I want to invite discourse.

document_processor: MistralOCRConfig | None is concrete if a second processor (Azure Document Intelligence) is added later, this becomes a growing union type and the dispatcher becomes an if/elif chain. A Protocol with a process(doc_path) -> str method would be more extensible. I decided it is acceptable for now with one implementation, but this could rack up tech debt if not properly addressed later.
I'm currently discarding the image data entirely. Users get text descriptions but lose the ability to embed or reference original images. I think it makes sense as a default for RAG since we only process text, but it's a silent, non-configurable choice now.
Right now, Batch Inference is not implemented. This could greatly reduce costs for large projects. Also, no retry logic on the API call has been added.

Additionally, the tests for Mistral are currently weak and need to be improved before this can be merged.

emilradix · 2026-01-26T15:31:49Z

@r-dh I marked this as draft, as I understood from your description it is not ready yet to be merged.

…rror

r-dh force-pushed the rd-mistral-ocr branch from 3656755 to b8b91c9 Compare January 23, 2026 15:44

emilradix marked this pull request as draft January 26, 2026 15:31

r-dh added 2 commits January 27, 2026 11:01

feat: support Mistral Document AI for document processing

5aeab61

chore: clean up code

cd86253

r-dh force-pushed the rd-mistral-ocr branch from b8b91c9 to cd86253 Compare January 27, 2026 10:02

r-dh added 4 commits February 2, 2026 10:03

feat: expose Mistral model explicitly, clarify dependency on import e…

2e35ff1

…rror

feat: add logging for unsupported file types in document processing

8978754

refactor: streamline MistralOCR tests

1a95596

chore: specify type for metadata to satisfy mypy

3d48ccf

Provide feedback