feat: image attachments in chat messages by nv78 · Pull Request #110 · anote-ai/Autonomous-Intelligence

nv78 · 2026-03-24T21:22:44Z

Summary

Builds on the multimodal scaffolding PR to deliver the first real user-facing feature: attaching images directly in the chat input and having the AI describe and reason about them.

Frontend (`Chatbot.js`)

New image attach button (📷) sits beside the document-upload button; clicking it opens a native file picker filtered to common image types (image/jpeg,png,gif,webp,bmp), allowing multiple selections
Pending attachment preview strip appears above the textarea — thumbnail per image with an ✕ button to remove before sending; object URLs are revoked on remove and after the message is sent (no memory leaks)
Textarea placeholder adapts to "Ask about your image..." when images are staged
Send button activates even if the text field is empty, as long as at least one image is attached
When attachments are present sendToAPI builds a multipart/form-data request via the new postFormData() helper; existing JSON path is used otherwise (zero regression for text-only messages)
User message bubbles display image thumbnails above the text content

Backend (`reactive_agent.py`)

New _describe_images() method calls the vision LLM (GPT-4o / Claude 3.5 Sonnet) once per image attachment using the correct provider-specific content-block format:
- OpenAI: image_url with data:<mime>;base64,...
- Anthropic: source block with type: base64
process_query_stream() calls _describe_images() when media_attachments are present and appends the resulting descriptions to the query before handing it to the ReAct agent — the original text is still what gets saved to the DB

Test plan

Click the image button → file picker opens, limited to image files
Select 1-3 images → thumbnails appear in the preview strip
Click ✕ on a thumbnail → it disappears
Send with images and text → user bubble shows thumbnails + text; assistant replies with image-aware answer
Send with images and no text → message still sends
Send text-only → no regression, JSON path used (check network tab)
Reload page → no dangling object URLs (memory leak check)

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu

Schema: - documents: add media_type ENUM and mime_type, make document_text nullable - chat_share_documents: same nullable/media_type changes - Add message_attachments table for per-message media files Backend: - db.py: add_document now accepts media_type/mime_type; add add_message_attachment and get_message_attachments helpers - documents/handler.py: detect MIME type of each uploaded file and route images/video/audio to binary-safe path (skipping Tika text extraction); text documents use the existing Tika pipeline - agents/config.py: add OPENAI/ANTHROPIC vision model names, ENABLE_MULTIMODAL flag, and size limits for each media type - agents/reactive_agent.py: _initialize_llm accepts a `vision` flag; process_query_stream accepts optional media_attachments list and switches to the vision-capable model when attachments are present - app.py: extract _parse_message_request() to handle both JSON and multipart/form-data bodies; pass media_attachments through to agent Frontend: - FileUpload.js: extend default accepted types to include images, video, and audio; add getMediaCategory helper; show image thumbnails via object-URL previews; revoke URLs on remove; category badge in file list - RequestConfig.js: add postFormData() for multipart uploads (omits Content-Type so the browser sets the correct boundary) https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu

Frontend (Chatbot.js): - Add paperclip/image button (faImage) next to the document upload button; triggers a hidden <input accept="image/*"> for picking one or more images - pendingAttachments state holds staged images (File + preview object URL) - Attachment preview strip above the textarea shows thumbnails with X-remove buttons; object URLs are revoked after sending or on remove - Placeholder text adapts to "Ask about your image..." when images are staged - Send button activates when attachments are present even with no text - sendToAPI builds a multipart/form-data request via postFormData() when attachments are present; falls back to the existing JSON path otherwise - User message bubbles render image thumbnails above the text content Backend (reactive_agent.py): - Import HumanMessage from langchain_core.messages - Add _describe_images(): calls the vision LLM (gpt-4o / claude-3-5-sonnet) once per image using the provider-specific content-block format and returns a combined description string - process_query_stream(): if media_attachments present, call _describe_images() and append the image context to the query before invoking the agent; original query text is still what gets stored in the database https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu

claude added 2 commits March 24, 2026 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: image attachments in chat messages#110

feat: image attachments in chat messages#110
nv78 wants to merge 2 commits intomainfrom
claude/image-chat-attachments

nv78 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nv78 commented Mar 24, 2026

Summary

Frontend (Chatbot.js)

Backend (reactive_agent.py)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Frontend (`Chatbot.js`)

Backend (`reactive_agent.py`)