Draft
Conversation
Collaborator
|
@r-dh I marked this as draft, as I understood from your description it is not ready yet to be merged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is work in progress, and not to be merged as is.
Implementing Mistral requires a lot of architectural decisions, which I want to make explicit and I want to invite discourse.
document_processor: MistralOCRConfig | Noneis concrete if a second processor (Azure Document Intelligence) is added later, this becomes a growing union type and the dispatcher becomes an if/elif chain. A Protocol with a process(doc_path) -> str method would be more extensible. I decided it is acceptable for now with one implementation, but this could rack up tech debt if not properly addressed later.I'm currently discarding the image data entirely. Users get text descriptions but lose the ability to embed or reference original images. I think it makes sense as a default for RAG since we only process text, but it's a silent, non-configurable choice now.
Right now, Batch Inference is not implemented. This could greatly reduce costs for large projects. Also, no retry logic on the API call has been added.
Additionally, the tests for Mistral are currently weak and need to be improved before this can be merged.