forge-orchestrator Testing Guide

This document lists all implemented features and how to manually test them.

What Was Implemented

Core Components

Component	File	Description
Settings	`src/forge_orchestrator/settings.py`	Configuration via environment variables
CLI	`src/forge_orchestrator/cli.py`	Typer CLI with `info` and `serve` commands
Logging	`src/forge_orchestrator/logging.py`	Structlog JSON logging
SSE Models	`src/forge_orchestrator/models/messages.py`	TokenEvent, ThinkingEvent, ToolCallEvent, etc.
Conversation Models	`src/forge_orchestrator/models/conversation.py`	Message, Conversation, TokenUsage
Storage	`src/forge_orchestrator/storage.py`	JSON file persistence for conversations
MCP Client	`src/forge_orchestrator/mcp_client.py`	Armory connection wrapper
Orchestrator	`src/forge_orchestrator/orchestrator.py`	Pydantic AI agent loop with streaming
Conversation Manager	`src/forge_orchestrator/conversation.py`	High-level conversation operations
Server	`src/forge_orchestrator/server.py`	FastAPI + SSE endpoints

API Endpoints

Method	Path	Description
GET	`/health`	Health check
GET	`/tools`	List available MCP tools
POST	`/tools/refresh`	Refresh tools from Armory
POST	`/conversations`	Create new conversation
GET	`/conversations/{id}`	Get conversation state
DELETE	`/conversations/{id}`	Delete conversation
POST	`/conversations/{id}/messages`	Send message (returns 202)
GET	`/conversations/{id}/stream`	SSE stream for responses
POST	`/conversations/{id}/cancel`	Cancel active generation
DELETE	`/conversations/{id}/messages/{n}`	Delete messages from index N
PATCH	`/conversations/{id}/system-prompt`	Update system prompt
PATCH	`/conversations/{id}/model`	Update model

Prerequisites

cd /MyWork/Projects/agentic-forge/forge-orchestrator

# Install dependencies
uv sync

# Run automated tests first (should all pass)
uv run pytest -v

Manual Testing

1. CLI Commands

Test: Show version and configuration

uv run orchestrator info

Expected output:

Version: 0.1.0
Configuration settings displayed in a table
Conversations directory path shown

Test: Show help

uv run orchestrator --help
uv run orchestrator serve --help

2. Server in Mock Mode

Start the server in mock mode (no API keys needed):

ORCHESTRATOR_MOCK_LLM=true uv run orchestrator serve

Expected: Server starts on http://0.0.0.0:8001

Test: Health Check

curl http://localhost:8001/health

Expected:

{"status":"healthy","armory_connected":false,"active_runs":0}

Test: List Tools (empty in mock mode)

curl http://localhost:8001/tools

Expected:

{"tools":[]}

3. Conversation CRUD

Test: Create Conversation

curl -X POST http://localhost:8001/conversations \
  -H "Content-Type: application/json" \
  -d '{"model": "test-model", "system_prompt": "You are a helpful assistant."}'

Expected: Returns conversation object with metadata.id

Save the conversation ID for subsequent tests:

CONV_ID="<id from response>"

Test: Get Conversation

curl http://localhost:8001/conversations/$CONV_ID

Expected: Returns full conversation with metadata and empty messages array

Test: Update System Prompt

curl -X PATCH "http://localhost:8001/conversations/$CONV_ID/system-prompt" \
  -H "Content-Type: application/json" \
  -d '{"content": "You are a pirate assistant."}'

Expected: Returns updated conversation, system_prompt_history should have 1 entry

Test: Update Model

curl -X PATCH "http://localhost:8001/conversations/$CONV_ID/model" \
  -H "Content-Type: application/json" \
  -d '{"model": "anthropic/claude-sonnet-4"}'

Expected: Returns updated conversation with new model

4. SSE Streaming (Mock Mode)

Test: Basic Message Stream

curl -N "http://localhost:8001/conversations/$CONV_ID/stream?message=Hello"

Expected SSE events:

event: thinking - Thinking event
Multiple event: token - Token events with cumulative text
event: complete - Complete event with usage stats

Test: Weather Tool Call (Mock)

curl -N "http://localhost:8001/conversations/$CONV_ID/stream?message=What%27s%20the%20weather%3F"

Expected SSE events:

event: thinking
event: tool_call with status "pending"
event: tool_call with status "executing"
event: tool_result with mock weather data
Multiple event: token
event: complete

5. Message Management

Test: Send Message via POST

curl -X POST "http://localhost:8001/conversations/$CONV_ID/messages" \
  -H "Content-Type: application/json" \
  -d '{"content": "Remember this test message."}'

Expected: 202 Accepted, then stream available at /stream

Test: Get Conversation with Messages

curl http://localhost:8001/conversations/$CONV_ID

Expected: Messages array should have user and assistant messages

Test: Delete Messages from Index

# First, check current message count
curl http://localhost:8001/conversations/$CONV_ID | jq '.metadata.message_count'

# Delete from message 2 onwards (keeps first 2 messages)
curl -X DELETE "http://localhost:8001/conversations/$CONV_ID/messages/2"

Expected: Returns truncated conversation

6. Cancellation

Test: Cancel Generation

# In terminal 1: Start a stream
curl -N "http://localhost:8001/conversations/$CONV_ID/stream?message=Tell%20me%20a%20long%20story"

# In terminal 2: Cancel it
curl -X POST "http://localhost:8001/conversations/$CONV_ID/cancel"

Expected: Stream should receive event: error with code "CANCELLED"

7. Delete Conversation

curl -X DELETE http://localhost:8001/conversations/$CONV_ID

Expected:

{"deleted":true}

8. Error Handling

Test: Get Non-existent Conversation

curl http://localhost:8001/conversations/non-existent-id

Expected: 404 Not Found

Test: Stream Non-existent Conversation

curl -N "http://localhost:8001/conversations/non-existent-id/stream?message=Hello"

Expected: SSE error event with code "CONVERSATION_NOT_FOUND"

Testing with Real LLM (OpenRouter)

Prerequisites

Get an API key from OpenRouter

Start Armory (optional, for tool access):

cd /MyWork/Projects/agentic-forge/forge-armory
uv run armory serve

Start Server with OpenRouter

export OPENROUTER_API_KEY="your-api-key-here"
uv run orchestrator serve

Test Real Streaming

# Create conversation
CONV_ID=$(curl -s -X POST http://localhost:8001/conversations \
  -H "Content-Type: application/json" \
  -d '{"model": "anthropic/claude-sonnet-4"}' | jq -r '.metadata.id')

# Stream a response
curl -N "http://localhost:8001/conversations/$CONV_ID/stream?message=Hello%2C%20who%20are%20you%3F"

Expected: Real streaming tokens from Claude

Testing with Armory (MCP Tools)

Prerequisites

Start Armory with MCP servers configured
Set ORCHESTRATOR_ARMORY_URL if not default

export ORCHESTRATOR_ARMORY_URL="http://localhost:8080/mcp"
export OPENROUTER_API_KEY="your-api-key"
uv run orchestrator serve

Test Tool Listing

curl http://localhost:8001/tools

Expected: List of tools from connected MCP servers

Test Tool Refresh

curl -X POST http://localhost:8001/tools/refresh

Expected: Updated tool list

Automated Test Suite

# Run all tests
uv run pytest -v

# Run with coverage
uv run pytest --cov=forge_orchestrator --cov-report=term-missing

# Run specific test file
uv run pytest tests/test_server.py -v

# Run specific test
uv run pytest tests/test_orchestrator.py::TestAgentOrchestrator::test_run_stream_mock_basic -v

Current Test Coverage

Module	Tests
test_models.py	16 tests (SSE events, conversation models)
test_storage.py	10 tests (CRUD, atomic writes, metadata)
test_orchestrator.py	7 tests (mock streaming, cancellation)
test_server.py	15 tests (REST endpoints)
Total	48 tests

Known Limitations

No WebSocket support - Uses SSE (by design per spec)
No database - Uses JSON files (by design for simplicity)
Tool results not persisted - Tool calls shown in stream but not saved to conversation messages yet
No authentication - Add auth middleware for production
Single node only - No distributed session support

Environment Variables

Variable	Default	Description
`ORCHESTRATOR_HOST`	0.0.0.0	Server host
`ORCHESTRATOR_PORT`	8001	Server port
`ORCHESTRATOR_ARMORY_URL`	http://localhost:8080/mcp	Armory MCP endpoint
`ORCHESTRATOR_DEFAULT_MODEL`	anthropic/claude-sonnet-4	Default LLM model
`ORCHESTRATOR_CONVERSATIONS_DIR`	~/.forge/conversations	Storage directory
`ORCHESTRATOR_MOCK_LLM`	false	Enable mock mode
`ORCHESTRATOR_SHOW_THINKING`	true	Show thinking events
`ORCHESTRATOR_HEARTBEAT_INTERVAL`	15	Ping interval (seconds)
`ORCHESTRATOR_TOOL_TIMEOUT_WARNING`	30	Tool timeout warning (seconds)
`OPENROUTER_API_KEY`	(required)	OpenRouter API key

Quick Test Script

Create a file test_manual.sh:

#!/bin/bash
set -e

BASE_URL="http://localhost:8001"

echo "=== Health Check ==="
curl -s $BASE_URL/health | jq

echo -e "\n=== Create Conversation ==="
CONV=$(curl -s -X POST $BASE_URL/conversations \
  -H "Content-Type: application/json" \
  -d '{"model": "test-model", "system_prompt": "You are helpful."}')
echo $CONV | jq
CONV_ID=$(echo $CONV | jq -r '.metadata.id')

echo -e "\n=== Get Conversation ==="
curl -s $BASE_URL/conversations/$CONV_ID | jq

echo -e "\n=== Update System Prompt ==="
curl -s -X PATCH "$BASE_URL/conversations/$CONV_ID/system-prompt" \
  -H "Content-Type: application/json" \
  -d '{"content": "You are a pirate."}' | jq '.metadata.system_prompt'

echo -e "\n=== Stream Message ==="
echo "Streaming response (Ctrl+C to stop)..."
curl -N "$BASE_URL/conversations/$CONV_ID/stream?message=Hello"

echo -e "\n\n=== Delete Conversation ==="
curl -s -X DELETE $BASE_URL/conversations/$CONV_ID | jq

echo -e "\n=== Done ==="

Run with:

chmod +x test_manual.sh
./test_manual.sh

forge-ui Testing Guide

What Was Implemented

UI Components

Component	File	Description
App	`src/App.vue`	Main application wrapper
HeaderBar	`src/components/HeaderBar.vue`	Logo, connection status, toggles
WelcomeScreen	`src/components/WelcomeScreen.vue`	New chat, model select, import
ChatView	`src/views/ChatView.vue`	Main chat interface
MessageList	`src/components/MessageList.vue`	Scrollable message history
MessageBubble	`src/components/MessageBubble.vue`	Individual message with markdown
StreamingMessage	`src/components/StreamingMessage.vue`	Live streaming response
ToolCallCard	`src/components/ToolCallCard.vue`	Collapsible tool execution display
ChatInput	`src/components/ChatInput.vue`	Multi-line input with send/stop
ModelSelector	`src/components/ModelSelector.vue`	Model dropdown
SystemPromptEditor	`src/components/SystemPromptEditor.vue`	System prompt dialog
DebugPanel	`src/components/DebugPanel.vue`	Raw SSE event viewer

Composables

Composable	File	Description
useTheme	`src/composables/useTheme.ts`	Dark/light mode toggle
useSSE	`src/composables/useSSE.ts`	SSE connection management
useConversation	`src/composables/useConversation.ts`	Global conversation state

Prerequisites

cd /MyWork/Projects/agentic-forge/forge-ui

# Install dependencies (using Bun)
bun install

# Type check
bun run type-check

# Build (optional - for production)
bun run build

Running the UI

Development Mode

cd /MyWork/Projects/agentic-forge/forge-ui
bun run dev

Expected: Vite dev server starts on http://localhost:5173

With Orchestrator (Mock Mode)

In separate terminals:

# Terminal 1: Start orchestrator in mock mode
cd /MyWork/Projects/agentic-forge/forge-orchestrator
ORCHESTRATOR_MOCK_LLM=true uv run orchestrator serve

# Terminal 2: Start UI
cd /MyWork/Projects/agentic-forge/forge-ui
bun run dev

Manual UI Testing

1. Welcome Screen

Open http://localhost:5173
Expected: Welcome screen with:
- "New Chat" button
- Model selector dropdown
- System prompt textarea (optional)
- "Import Conversation" button

2. Create New Conversation

Select a model from dropdown
Optionally enter a system prompt
Click "New Chat"

Expected: Redirects to chat view with empty message list

3. Send a Message

Type a message in the input area
Click the Send button (or wait for orchestrator mock response)

Expected:

User message appears on the right (blue bubble)
Typing indicator shows
Streaming response appears
Assistant message appears on the left

4. Streaming Response

With mock mode:

Type "Hello" and send
Watch the streaming tokens appear

Expected:

"Generating..." indicator in the assistant bubble
Text streams in word by word
Complete event finalizes the message

5. Tool Call Display

With mock mode:

Type "What's the weather?" and send

Expected:

ToolCallCard appears showing weather__get_current_weather
Card shows "pending" then "executing" status
Card shows result with mock weather data
Response mentions the weather

6. Dark/Light Mode Toggle

Click the sun/moon icon in the header

Expected: UI switches between dark and light themes

7. Basic/Advanced View Toggle

Click "Basic" / "Advanced" toggle in header

In Advanced View, you should see:

Token usage per message
Tool call latency (e.g., "150ms")
Full tool names with server prefix
Model name per message
Debug panel toggle button
Refresh tools button

8. Debug Panel (Advanced View)

Enable Advanced view
Click the code icon to show debug panel

Expected:

Side panel opens on the right
Shows raw SSE events as JSON
Events color-coded by type
Copy button for each event

9. System Prompt Editor

Click the edit icon in the chat toolbar
Modify the system prompt
Click Save

Expected:

Dialog shows current prompt
Shows version history (if any)
Saving updates the conversation

10. Model Switching

Click the model dropdown in the header
Select a different model

Expected: Model updates immediately

11. Message Deletion (Advanced View)

Enable Advanced view
Hover over a message
Click the trash icon

Expected: Confirmation dialog, then message and all following deleted

12. Conversation Export

Click the download icon in chat toolbar

Expected: Downloads conversation-{id}.json file

13. Stop Generation

Send a message
Wait 2-3 seconds for stop button to appear
Click the stop button (or press Escape)

Expected: Generation stops, partial response discarded

14. Connection Status

Look at the header bar

Expected:

Green dot when connected to orchestrator
Red dot when disconnected
Status text shows "Connected" or "Disconnected"

15. Draft Preservation

Type something in the input (don't send)
Refresh the page

Expected: Draft text is preserved in localStorage

Environment Variables (UI)

Create .env file in forge-ui/:

VITE_API_URL=http://localhost:8001

Full Integration Test

Start all services:

# Terminal 1: Armory (optional, for real tools)
cd /MyWork/Projects/agentic-forge/forge-armory
uv run armory serve

# Terminal 2: Orchestrator
cd /MyWork/Projects/agentic-forge/forge-orchestrator
export OPENROUTER_API_KEY="your-key"
uv run orchestrator serve

# Terminal 3: UI
cd /MyWork/Projects/agentic-forge/forge-ui
bun run dev

Open http://localhost:5173
Create a new conversation
Send messages and observe real LLM responses
Test tool calls if Armory is connected

Known UI Limitations

No conversation persistence - Conversations not loaded from storage on refresh
No conversation list - Only one conversation visible at a time
No responsive mobile layout - Optimized for desktop
Large bundle size - Could benefit from code splitting
No keyboard navigation - Accessibility improvements needed

FilesExpand file tree

TESTING.md

Latest commit

History

TESTING.md

File metadata and controls

forge-orchestrator Testing Guide

What Was Implemented

Core Components

API Endpoints

Prerequisites

Manual Testing

1. CLI Commands

Test: Show version and configuration

Test: Show help

2. Server in Mock Mode

Test: Health Check

Test: List Tools (empty in mock mode)

3. Conversation CRUD

Test: Create Conversation

Test: Get Conversation

Test: Update System Prompt

Test: Update Model

4. SSE Streaming (Mock Mode)

Test: Basic Message Stream

Test: Weather Tool Call (Mock)

5. Message Management

Test: Send Message via POST

Test: Get Conversation with Messages

Test: Delete Messages from Index

6. Cancellation

Test: Cancel Generation

7. Delete Conversation

8. Error Handling

Test: Get Non-existent Conversation

Test: Stream Non-existent Conversation

Testing with Real LLM (OpenRouter)

Prerequisites

Start Server with OpenRouter

Test Real Streaming

Testing with Armory (MCP Tools)

Prerequisites

Test Tool Listing

Test Tool Refresh

Automated Test Suite

Current Test Coverage

Known Limitations

Environment Variables

Quick Test Script

forge-ui Testing Guide

What Was Implemented

UI Components

Composables

Prerequisites

Running the UI

Development Mode

With Orchestrator (Mock Mode)

Manual UI Testing

1. Welcome Screen

2. Create New Conversation

3. Send a Message

4. Streaming Response

5. Tool Call Display

6. Dark/Light Mode Toggle

7. Basic/Advanced View Toggle

8. Debug Panel (Advanced View)

9. System Prompt Editor

10. Model Switching

11. Message Deletion (Advanced View)

12. Conversation Export

13. Stop Generation

14. Connection Status

15. Draft Preservation

Environment Variables (UI)

Full Integration Test

Known UI Limitations