Problem or Motivation
Current tests cover utilities (sprint, timezone, mentions) and DB queries, but nothing tests the core agent loop pipeline: message in → Claude API call → tool execution → response out. This is where breakage is most likely and most painful.
Proposed Solution
Add integration tests for the agent loop that mock the Anthropic SDK and MCP tool layer:
- Mock
@anthropic-ai/sdk — return canned Claude responses (text-only, with tool calls, multi-round)
- Mock
tool-registry — return canned tool results without real MCP servers
- Mock
grammy Api — capture typing indicators and verify they're sent
Test Scenarios
- Simple text response (no tools)
- Single tool call round (Claude calls a tool, gets result, responds)
- Multi-round tool calls (2-3 rounds of tool usage)
- Tool execution error handling (tool throws, agent recovers)
- Auth failure handling (401 from Claude API)
- Max tool rounds exceeded (10 rounds hit)
- System-initiated messages (no tools available)
- Image attachment handling
Acceptance Criteria
Problem or Motivation
Current tests cover utilities (sprint, timezone, mentions) and DB queries, but nothing tests the core agent loop pipeline: message in → Claude API call → tool execution → response out. This is where breakage is most likely and most painful.
Proposed Solution
Add integration tests for the agent loop that mock the Anthropic SDK and MCP tool layer:
@anthropic-ai/sdk— return canned Claude responses (text-only, with tool calls, multi-round)tool-registry— return canned tool results without real MCP serversgrammyApi — capture typing indicators and verify they're sentTest Scenarios
Acceptance Criteria
src/agent/agent-loop.integration.test.ts