Skip to content

feat: image attachments in chat messages#110

Open
nv78 wants to merge 2 commits intomainfrom
claude/image-chat-attachments
Open

feat: image attachments in chat messages#110
nv78 wants to merge 2 commits intomainfrom
claude/image-chat-attachments

Conversation

@nv78
Copy link
Member

@nv78 nv78 commented Mar 24, 2026

Summary

Builds on the multimodal scaffolding PR to deliver the first real user-facing feature: attaching images directly in the chat input and having the AI describe and reason about them.

Frontend (Chatbot.js)

  • New image attach button (📷) sits beside the document-upload button; clicking it opens a native file picker filtered to common image types (image/jpeg,png,gif,webp,bmp), allowing multiple selections
  • Pending attachment preview strip appears above the textarea — thumbnail per image with an ✕ button to remove before sending; object URLs are revoked on remove and after the message is sent (no memory leaks)
  • Textarea placeholder adapts to "Ask about your image..." when images are staged
  • Send button activates even if the text field is empty, as long as at least one image is attached
  • When attachments are present sendToAPI builds a multipart/form-data request via the new postFormData() helper; existing JSON path is used otherwise (zero regression for text-only messages)
  • User message bubbles display image thumbnails above the text content

Backend (reactive_agent.py)

  • New _describe_images() method calls the vision LLM (GPT-4o / Claude 3.5 Sonnet) once per image attachment using the correct provider-specific content-block format:
    • OpenAI: image_url with data:<mime>;base64,...
    • Anthropic: source block with type: base64
  • process_query_stream() calls _describe_images() when media_attachments are present and appends the resulting descriptions to the query before handing it to the ReAct agent — the original text is still what gets saved to the DB

Test plan

  • Click the image button → file picker opens, limited to image files
  • Select 1-3 images → thumbnails appear in the preview strip
  • Click ✕ on a thumbnail → it disappears
  • Send with images and text → user bubble shows thumbnails + text; assistant replies with image-aware answer
  • Send with images and no text → message still sends
  • Send text-only → no regression, JSON path used (check network tab)
  • Reload page → no dangling object URLs (memory leak check)

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu

claude added 2 commits March 24, 2026 12:02
Schema:
- documents: add media_type ENUM and mime_type, make document_text nullable
- chat_share_documents: same nullable/media_type changes
- Add message_attachments table for per-message media files

Backend:
- db.py: add_document now accepts media_type/mime_type; add
  add_message_attachment and get_message_attachments helpers
- documents/handler.py: detect MIME type of each uploaded file and
  route images/video/audio to binary-safe path (skipping Tika text
  extraction); text documents use the existing Tika pipeline
- agents/config.py: add OPENAI/ANTHROPIC vision model names,
  ENABLE_MULTIMODAL flag, and size limits for each media type
- agents/reactive_agent.py: _initialize_llm accepts a `vision` flag;
  process_query_stream accepts optional media_attachments list and
  switches to the vision-capable model when attachments are present
- app.py: extract _parse_message_request() to handle both JSON and
  multipart/form-data bodies; pass media_attachments through to agent

Frontend:
- FileUpload.js: extend default accepted types to include images, video,
  and audio; add getMediaCategory helper; show image thumbnails via
  object-URL previews; revoke URLs on remove; category badge in file list
- RequestConfig.js: add postFormData() for multipart uploads (omits
  Content-Type so the browser sets the correct boundary)

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
Frontend (Chatbot.js):
- Add paperclip/image button (faImage) next to the document upload button;
  triggers a hidden <input accept="image/*"> for picking one or more images
- pendingAttachments state holds staged images (File + preview object URL)
- Attachment preview strip above the textarea shows thumbnails with X-remove
  buttons; object URLs are revoked after sending or on remove
- Placeholder text adapts to "Ask about your image..." when images are staged
- Send button activates when attachments are present even with no text
- sendToAPI builds a multipart/form-data request via postFormData() when
  attachments are present; falls back to the existing JSON path otherwise
- User message bubbles render image thumbnails above the text content

Backend (reactive_agent.py):
- Import HumanMessage from langchain_core.messages
- Add _describe_images(): calls the vision LLM (gpt-4o / claude-3-5-sonnet)
  once per image using the provider-specific content-block format and returns
  a combined description string
- process_query_stream(): if media_attachments present, call _describe_images()
  and append the image context to the query before invoking the agent;
  original query text is still what gets stored in the database

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants