Refactor codebase to support multimodal inputs (image, video, audio)#109
Open
Refactor codebase to support multimodal inputs (image, video, audio)#109
Conversation
Schema: - documents: add media_type ENUM and mime_type, make document_text nullable - chat_share_documents: same nullable/media_type changes - Add message_attachments table for per-message media files Backend: - db.py: add_document now accepts media_type/mime_type; add add_message_attachment and get_message_attachments helpers - documents/handler.py: detect MIME type of each uploaded file and route images/video/audio to binary-safe path (skipping Tika text extraction); text documents use the existing Tika pipeline - agents/config.py: add OPENAI/ANTHROPIC vision model names, ENABLE_MULTIMODAL flag, and size limits for each media type - agents/reactive_agent.py: _initialize_llm accepts a `vision` flag; process_query_stream accepts optional media_attachments list and switches to the vision-capable model when attachments are present - app.py: extract _parse_message_request() to handle both JSON and multipart/form-data bodies; pass media_attachments through to agent Frontend: - FileUpload.js: extend default accepted types to include images, video, and audio; add getMediaCategory helper; show image thumbnails via object-URL previews; revoke URLs on remove; category badge in file list - RequestConfig.js: add postFormData() for multipart uploads (omits Content-Type so the browser sets the correct boundary) https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Schema:
Backend:
add_message_attachment and get_message_attachments helpers
route images/video/audio to binary-safe path (skipping Tika text
extraction); text documents use the existing Tika pipeline
ENABLE_MULTIMODAL flag, and size limits for each media type
visionflag;process_query_stream accepts optional media_attachments list and
switches to the vision-capable model when attachments are present
multipart/form-data bodies; pass media_attachments through to agent
Frontend:
and audio; add getMediaCategory helper; show image thumbnails via
object-URL previews; revoke URLs on remove; category badge in file list
Content-Type so the browser sets the correct boundary)
https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu