feat: 7 new agent tools + parallel execution + smart routing#119
Open
feat: 7 new agent tools + parallel execution + smart routing#119
Conversation
Schema: - documents: add media_type ENUM and mime_type, make document_text nullable - chat_share_documents: same nullable/media_type changes - Add message_attachments table for per-message media files Backend: - db.py: add_document now accepts media_type/mime_type; add add_message_attachment and get_message_attachments helpers - documents/handler.py: detect MIME type of each uploaded file and route images/video/audio to binary-safe path (skipping Tika text extraction); text documents use the existing Tika pipeline - agents/config.py: add OPENAI/ANTHROPIC vision model names, ENABLE_MULTIMODAL flag, and size limits for each media type - agents/reactive_agent.py: _initialize_llm accepts a `vision` flag; process_query_stream accepts optional media_attachments list and switches to the vision-capable model when attachments are present - app.py: extract _parse_message_request() to handle both JSON and multipart/form-data bodies; pass media_attachments through to agent Frontend: - FileUpload.js: extend default accepted types to include images, video, and audio; add getMediaCategory helper; show image thumbnails via object-URL previews; revoke URLs on remove; category badge in file list - RequestConfig.js: add postFormData() for multipart uploads (omits Content-Type so the browser sets the correct boundary) https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
- New AutonomousDocumentAgent in backend/agents/autonomous_agent.py
- Native Anthropic tool_use agentic loop with extended thinking support
(claude-3-7+ models) and graceful fallback for other models
- Native OpenAI function-calling agentic loop for model_type=0
- Tools: retrieve_documents, list_documents, get_chat_history, run_python
- run_python enables real code execution for data analysis tasks
- Streams thinking blocks as {type:"thinking"} + backward-compat llm_reasoning events
- Streams tool_start / tool_end events per iteration
- Replaces sequential LangGraph workflow with a true reason→act→observe loop
- app.py now instantiates AutonomousDocumentAgent instead of ReactiveDocumentAgent
https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
…ug fixes Critical bug fix: - Extended thinking was incorrectly enabled for claude-3-5-sonnet (only claude-3-7+ supports it); fix: check "claude-3-7" only, with betas header New capabilities: - search_web tool: DuckDuckGo HTML search (no API key required), returns titles + snippets + URLs for up to 10 results - fetch_url tool: requests + BeautifulSoup to extract readable text from any public web page (boilerplate removed, truncated to 6k chars) Streaming improvements: - Anthropic path now uses client.messages.stream() for real-time text_token and thinking_delta events (users see tokens as they arrive) - OpenAI path uses streaming=True for real-time text_token events - Both paths accumulate tool_use input JSON deltas correctly Robustness: - Context window pruning (_prune_messages): keeps initial message + last 4 tool-result rounds to prevent context overflow on long agent loops - Tool output truncation: capped at 8000 chars with clear truncation notice - Non-streaming fallback if thinking-enabled call fails - Retry without thinking on Anthropic API errors (e.g. unsupported model) Improved system prompt: - 10 explicit numbered principles for tool ordering and citation behavior - Instructs agent to do multi-step tasks (search → fetch → summarize) https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
Backend (autonomous_agent.py): - New create_note tool: agent can save summaries/lists as searchable documents (indexed immediately via the RAG pipeline) - New retrieve_documents_multi tool: parallel multi-query retrieval with deduplication — better for complex cross-document questions - Overhauled system prompt with a 4-step decision framework: Orient → Retrieve → Synthesize → Persist, with explicit tool-selection rules Frontend: - messageUtils.js: handle text_token events → text appears token-by-token as the agent streams; handle thinking events → added to reasoning steps - messageUtils.js: complete event preserves streamed content if answer field is empty (prevents content flash) - ThinkingIndicator.js: new "thinking" step type with yellow accent color (distinct from regular reasoning steps) for extended thinking blocks https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
New tools (15 total): - think: internal reasoning scratchpad before complex actions - send_email: SMTP email via MAIL_USERNAME/MAIL_PASSWORD env vars - create_calendar_invite: ICS file + Google Calendar URL generation - extract_structured_data: AI-powered JSON extraction from raw text - generate_chart: matplotlib bar/line/pie/scatter/histogram → inline PNG - translate_text: LLM-based translation to any language - call_webhook: HTTP POST to external URLs (Zapier, Slack, REST APIs) Orchestration improvements: - Parallel tool execution via ThreadPoolExecutor (up to 4 concurrent) - Comprehensive routing system prompt with explicit tool decision tree - think tool hidden from UI (internal scratchpad, no tool_start event) - chart_generated SSE event emitted per chart for real-time rendering Frontend: - messageUtils.js handles chart_generated and complete.charts events - Chatbot.js renders inline chart images (streaming + final state) - Charts shown during streaming and preserved in final message https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expands the autonomous agent from 8 → 15 tools with parallel execution and a comprehensive routing system prompt.
New Tools
thinksend_emailMAIL_USERNAME/MAIL_PASSWORDenv vars)create_calendar_invite.icsfile + Google Calendar URLextract_structured_datagenerate_charttranslate_textcall_webhookOrchestration Improvements
ThreadPoolExecutor(up to 4 concurrent calls per iteration)thinkcalls are silent (notool_start/tool_endUI events, just internal reasoning)chart_generatedSSE event emitted per chart for real-time streaming displayFrontend
messageUtils.js— handleschart_generatedandcomplete.chartsSSE eventsChatbot.js— renders inline chart images during streaming and in final messageTest plan
thinkthensend_emailtranslate_texttool_startevents fire simultaneously)retrieve_documentsflow unchangedhttps://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu