Open
Conversation
Schema: - documents: add media_type ENUM and mime_type, make document_text nullable - chat_share_documents: same nullable/media_type changes - Add message_attachments table for per-message media files Backend: - db.py: add_document now accepts media_type/mime_type; add add_message_attachment and get_message_attachments helpers - documents/handler.py: detect MIME type of each uploaded file and route images/video/audio to binary-safe path (skipping Tika text extraction); text documents use the existing Tika pipeline - agents/config.py: add OPENAI/ANTHROPIC vision model names, ENABLE_MULTIMODAL flag, and size limits for each media type - agents/reactive_agent.py: _initialize_llm accepts a `vision` flag; process_query_stream accepts optional media_attachments list and switches to the vision-capable model when attachments are present - app.py: extract _parse_message_request() to handle both JSON and multipart/form-data bodies; pass media_attachments through to agent Frontend: - FileUpload.js: extend default accepted types to include images, video, and audio; add getMediaCategory helper; show image thumbnails via object-URL previews; revoke URLs on remove; category badge in file list - RequestConfig.js: add postFormData() for multipart uploads (omits Content-Type so the browser sets the correct boundary) https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
Frontend (Chatbot.js): - Add paperclip/image button (faImage) next to the document upload button; triggers a hidden <input accept="image/*"> for picking one or more images - pendingAttachments state holds staged images (File + preview object URL) - Attachment preview strip above the textarea shows thumbnails with X-remove buttons; object URLs are revoked after sending or on remove - Placeholder text adapts to "Ask about your image..." when images are staged - Send button activates when attachments are present even with no text - sendToAPI builds a multipart/form-data request via postFormData() when attachments are present; falls back to the existing JSON path otherwise - User message bubbles render image thumbnails above the text content Backend (reactive_agent.py): - Import HumanMessage from langchain_core.messages - Add _describe_images(): calls the vision LLM (gpt-4o / claude-3-5-sonnet) once per image using the provider-specific content-block format and returns a combined description string - process_query_stream(): if media_attachments present, call _describe_images() and append the image context to the query before invoking the agent; original query text is still what gets stored in the database https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds on the multimodal scaffolding PR to deliver the first real user-facing feature: attaching images directly in the chat input and having the AI describe and reason about them.
Frontend (
Chatbot.js)image/jpeg,png,gif,webp,bmp), allowing multiple selectionssendToAPIbuilds amultipart/form-datarequest via the newpostFormData()helper; existing JSON path is used otherwise (zero regression for text-only messages)Backend (
reactive_agent.py)_describe_images()method calls the vision LLM (GPT-4o / Claude 3.5 Sonnet) once per image attachment using the correct provider-specific content-block format:image_urlwithdata:<mime>;base64,...sourceblock withtype: base64process_query_stream()calls_describe_images()whenmedia_attachmentsare present and appends the resulting descriptions to the query before handing it to the ReAct agent — the original text is still what gets saved to the DBTest plan
https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu