Skip to content

Refactor codebase to support multimodal inputs (image, video, audio)#109

Open
nv78 wants to merge 1 commit intomainfrom
claude/add-multimodal-support-QBQca
Open

Refactor codebase to support multimodal inputs (image, video, audio)#109
nv78 wants to merge 1 commit intomainfrom
claude/add-multimodal-support-QBQca

Conversation

@nv78
Copy link
Member

@nv78 nv78 commented Mar 24, 2026

Schema:

  • documents: add media_type ENUM and mime_type, make document_text nullable
  • chat_share_documents: same nullable/media_type changes
  • Add message_attachments table for per-message media files

Backend:

  • db.py: add_document now accepts media_type/mime_type; add
    add_message_attachment and get_message_attachments helpers
  • documents/handler.py: detect MIME type of each uploaded file and
    route images/video/audio to binary-safe path (skipping Tika text
    extraction); text documents use the existing Tika pipeline
  • agents/config.py: add OPENAI/ANTHROPIC vision model names,
    ENABLE_MULTIMODAL flag, and size limits for each media type
  • agents/reactive_agent.py: _initialize_llm accepts a vision flag;
    process_query_stream accepts optional media_attachments list and
    switches to the vision-capable model when attachments are present
  • app.py: extract _parse_message_request() to handle both JSON and
    multipart/form-data bodies; pass media_attachments through to agent

Frontend:

  • FileUpload.js: extend default accepted types to include images, video,
    and audio; add getMediaCategory helper; show image thumbnails via
    object-URL previews; revoke URLs on remove; category badge in file list
  • RequestConfig.js: add postFormData() for multipart uploads (omits
    Content-Type so the browser sets the correct boundary)

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu

Schema:
- documents: add media_type ENUM and mime_type, make document_text nullable
- chat_share_documents: same nullable/media_type changes
- Add message_attachments table for per-message media files

Backend:
- db.py: add_document now accepts media_type/mime_type; add
  add_message_attachment and get_message_attachments helpers
- documents/handler.py: detect MIME type of each uploaded file and
  route images/video/audio to binary-safe path (skipping Tika text
  extraction); text documents use the existing Tika pipeline
- agents/config.py: add OPENAI/ANTHROPIC vision model names,
  ENABLE_MULTIMODAL flag, and size limits for each media type
- agents/reactive_agent.py: _initialize_llm accepts a `vision` flag;
  process_query_stream accepts optional media_attachments list and
  switches to the vision-capable model when attachments are present
- app.py: extract _parse_message_request() to handle both JSON and
  multipart/form-data bodies; pass media_attachments through to agent

Frontend:
- FileUpload.js: extend default accepted types to include images, video,
  and audio; add getMediaCategory helper; show image thumbnails via
  object-URL previews; revoke URLs on remove; category badge in file list
- RequestConfig.js: add postFormData() for multipart uploads (omits
  Content-Type so the browser sets the correct boundary)

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants