Skip to content

feat: 7 new agent tools + parallel execution + smart routing#119

Open
nv78 wants to merge 5 commits intomainfrom
claude/autonomous-agent-improvements
Open

feat: 7 new agent tools + parallel execution + smart routing#119
nv78 wants to merge 5 commits intomainfrom
claude/autonomous-agent-improvements

Conversation

@nv78
Copy link
Member

@nv78 nv78 commented Mar 24, 2026

Summary

Expands the autonomous agent from 8 → 15 tools with parallel execution and a comprehensive routing system prompt.

New Tools

Tool What it does
think Internal reasoning scratchpad — agent plans before acting (hidden from UI)
send_email Send email via SMTP (MAIL_USERNAME/MAIL_PASSWORD env vars)
create_calendar_invite Generates .ics file + Google Calendar URL
extract_structured_data AI-powered JSON extraction from any raw text
generate_chart matplotlib bar/line/pie/scatter/histogram → inline PNG image
translate_text LLM translation to any language
call_webhook HTTP POST to Zapier, Slack, Make, or any REST endpoint

Orchestration Improvements

  • Parallel tool execution via ThreadPoolExecutor (up to 4 concurrent calls per iteration)
  • Comprehensive routing prompt — explicit decision tree mapping query types to the right tool(s)
  • think calls are silent (no tool_start/tool_end UI events, just internal reasoning)
  • chart_generated SSE event emitted per chart for real-time streaming display

Frontend

  • messageUtils.js — handles chart_generated and complete.charts SSE events
  • Chatbot.js — renders inline chart images during streaming and in final message

Test plan

  • Ask "send an email to test@example.com about our Q1 results" — should call think then send_email
  • Ask "create a calendar invite for a team meeting next Tuesday at 2pm" — should return Google Calendar URL + ICS
  • Ask "translate this paragraph to French: ..." — should call translate_text
  • Ask "make a bar chart of our quarterly revenue: Q1=100k, Q2=150k, Q3=120k, Q4=200k" — chart should render inline
  • Ask "extract the company name, date, and total from this invoice: ..." — should return JSON
  • Ask a complex multi-part question — verify multiple tools run in parallel (check tool_start events fire simultaneously)
  • Ask a question about uploaded documents — verify existing retrieve_documents flow unchanged

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu

claude added 5 commits March 24, 2026 12:02
Schema:
- documents: add media_type ENUM and mime_type, make document_text nullable
- chat_share_documents: same nullable/media_type changes
- Add message_attachments table for per-message media files

Backend:
- db.py: add_document now accepts media_type/mime_type; add
  add_message_attachment and get_message_attachments helpers
- documents/handler.py: detect MIME type of each uploaded file and
  route images/video/audio to binary-safe path (skipping Tika text
  extraction); text documents use the existing Tika pipeline
- agents/config.py: add OPENAI/ANTHROPIC vision model names,
  ENABLE_MULTIMODAL flag, and size limits for each media type
- agents/reactive_agent.py: _initialize_llm accepts a `vision` flag;
  process_query_stream accepts optional media_attachments list and
  switches to the vision-capable model when attachments are present
- app.py: extract _parse_message_request() to handle both JSON and
  multipart/form-data bodies; pass media_attachments through to agent

Frontend:
- FileUpload.js: extend default accepted types to include images, video,
  and audio; add getMediaCategory helper; show image thumbnails via
  object-URL previews; revoke URLs on remove; category badge in file list
- RequestConfig.js: add postFormData() for multipart uploads (omits
  Content-Type so the browser sets the correct boundary)

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
- New AutonomousDocumentAgent in backend/agents/autonomous_agent.py
- Native Anthropic tool_use agentic loop with extended thinking support
  (claude-3-7+ models) and graceful fallback for other models
- Native OpenAI function-calling agentic loop for model_type=0
- Tools: retrieve_documents, list_documents, get_chat_history, run_python
- run_python enables real code execution for data analysis tasks
- Streams thinking blocks as {type:"thinking"} + backward-compat llm_reasoning events
- Streams tool_start / tool_end events per iteration
- Replaces sequential LangGraph workflow with a true reason→act→observe loop
- app.py now instantiates AutonomousDocumentAgent instead of ReactiveDocumentAgent

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
…ug fixes

Critical bug fix:
- Extended thinking was incorrectly enabled for claude-3-5-sonnet (only
  claude-3-7+ supports it); fix: check "claude-3-7" only, with betas header

New capabilities:
- search_web tool: DuckDuckGo HTML search (no API key required), returns
  titles + snippets + URLs for up to 10 results
- fetch_url tool: requests + BeautifulSoup to extract readable text from
  any public web page (boilerplate removed, truncated to 6k chars)

Streaming improvements:
- Anthropic path now uses client.messages.stream() for real-time text_token
  and thinking_delta events (users see tokens as they arrive)
- OpenAI path uses streaming=True for real-time text_token events
- Both paths accumulate tool_use input JSON deltas correctly

Robustness:
- Context window pruning (_prune_messages): keeps initial message + last 4
  tool-result rounds to prevent context overflow on long agent loops
- Tool output truncation: capped at 8000 chars with clear truncation notice
- Non-streaming fallback if thinking-enabled call fails
- Retry without thinking on Anthropic API errors (e.g. unsupported model)

Improved system prompt:
- 10 explicit numbered principles for tool ordering and citation behavior
- Instructs agent to do multi-step tasks (search → fetch → summarize)

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
Backend (autonomous_agent.py):
- New create_note tool: agent can save summaries/lists as searchable
  documents (indexed immediately via the RAG pipeline)
- New retrieve_documents_multi tool: parallel multi-query retrieval with
  deduplication — better for complex cross-document questions
- Overhauled system prompt with a 4-step decision framework: Orient →
  Retrieve → Synthesize → Persist, with explicit tool-selection rules

Frontend:
- messageUtils.js: handle text_token events → text appears token-by-token
  as the agent streams; handle thinking events → added to reasoning steps
- messageUtils.js: complete event preserves streamed content if answer
  field is empty (prevents content flash)
- ThinkingIndicator.js: new "thinking" step type with yellow accent color
  (distinct from regular reasoning steps) for extended thinking blocks

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
New tools (15 total):
- think: internal reasoning scratchpad before complex actions
- send_email: SMTP email via MAIL_USERNAME/MAIL_PASSWORD env vars
- create_calendar_invite: ICS file + Google Calendar URL generation
- extract_structured_data: AI-powered JSON extraction from raw text
- generate_chart: matplotlib bar/line/pie/scatter/histogram → inline PNG
- translate_text: LLM-based translation to any language
- call_webhook: HTTP POST to external URLs (Zapier, Slack, REST APIs)

Orchestration improvements:
- Parallel tool execution via ThreadPoolExecutor (up to 4 concurrent)
- Comprehensive routing system prompt with explicit tool decision tree
- think tool hidden from UI (internal scratchpad, no tool_start event)
- chart_generated SSE event emitted per chart for real-time rendering

Frontend:
- messageUtils.js handles chart_generated and complete.charts events
- Chatbot.js renders inline chart images (streaming + final state)
- Charts shown during streaming and preserved in final message

https://claude.ai/code/session_01C9mHttiQ4ZAaBbQecVV7uu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants