Refactor codebase to support multimodal inputs (image, video, audio) by nv78 · Pull Request #109 · anote-ai/Autonomous-Intelligence

nv78 · 2026-03-24T18:15:51Z

Schema:

documents: add media_type ENUM and mime_type, make document_text nullable
chat_share_documents: same nullable/media_type changes
Add message_attachments table for per-message media files

Backend:

db.py: add_document now accepts media_type/mime_type; add
add_message_attachment and get_message_attachments helpers
documents/handler.py: detect MIME type of each uploaded file and
route images/video/audio to binary-safe path (skipping Tika text
extraction); text documents use the existing Tika pipeline
agents/config.py: add OPENAI/ANTHROPIC vision model names,
ENABLE_MULTIMODAL flag, and size limits for each media type
agents/reactive_agent.py: _initialize_llm accepts a vision flag;
process_query_stream accepts optional media_attachments list and
switches to the vision-capable model when attachments are present
app.py: extract _parse_message_request() to handle both JSON and
multipart/form-data bodies; pass media_attachments through to agent

Frontend:

FileUpload.js: extend default accepted types to include images, video,
and audio; add getMediaCategory helper; show image thumbnails via
object-URL previews; revoke URLs on remove; category badge in file list
RequestConfig.js: add postFormData() for multipart uploads (omits
Content-Type so the browser sets the correct boundary)

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu

Schema: - documents: add media_type ENUM and mime_type, make document_text nullable - chat_share_documents: same nullable/media_type changes - Add message_attachments table for per-message media files Backend: - db.py: add_document now accepts media_type/mime_type; add add_message_attachment and get_message_attachments helpers - documents/handler.py: detect MIME type of each uploaded file and route images/video/audio to binary-safe path (skipping Tika text extraction); text documents use the existing Tika pipeline - agents/config.py: add OPENAI/ANTHROPIC vision model names, ENABLE_MULTIMODAL flag, and size limits for each media type - agents/reactive_agent.py: _initialize_llm accepts a `vision` flag; process_query_stream accepts optional media_attachments list and switches to the vision-capable model when attachments are present - app.py: extract _parse_message_request() to handle both JSON and multipart/form-data bodies; pass media_attachments through to agent Frontend: - FileUpload.js: extend default accepted types to include images, video, and audio; add getMediaCategory helper; show image thumbnails via object-URL previews; revoke URLs on remove; category badge in file list - RequestConfig.js: add postFormData() for multipart uploads (omits Content-Type so the browser sets the correct boundary) https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor codebase to support multimodal inputs (image, video, audio)#109

Refactor codebase to support multimodal inputs (image, video, audio)#109
nv78 wants to merge 1 commit intomainfrom
claude/add-multimodal-support-QBQca

nv78 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nv78 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants