A Node.js CLI tool that tests Microsoft Foundry Model Router behavior across two supported runtime paths and compares routing decisions, latency, and reliability.
| Path | SDK / API | Endpoint pattern |
|---|---|---|
| AOAI + Chat Completions | OpenAI JS SDK | https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment> |
| Foundry Project + Chat Completions | OpenAI JS SDK (separate client) | https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment> |
The tool sends a configurable set of prompts (echo, summarize, code, reasoning) through both paths, logs every response to JSONL, and prints a summary with:
- Per-path / per-category latency stats (p50 / p95)
- Timeout and error rates
- Model-choice distribution per prompt category
- A side-by-side diff showing where the same prompt routed to different underlying models between paths
A dedicated --repro408 mode targets the specific 408 timeout issue observed on certain prompts.
The tool includes a built-in web dashboard for interactive testing and result visualization.
Dashboard Overview — Configure runs, view prompt sets, and monitor progress in real time.

Model Comparison — See which models were selected by each routing path for every prompt.

Latency Charts — Compare p50 and p95 latencies across Chat Completions and Project Responses paths.

Error Analysis — Drill into error distributions and detailed error messages per request.

Live Feed — Real-time streaming of results as they come in.

Log Viewer — Browse and inspect historical JSONL log files with parsed table views.

Mobile Responsive — The UI adapts to smaller screens for use on tablets and phones.
RouteLens sends configurable prompts through two distinct Azure AI runtime paths and compares routing decisions, latency, and reliability. The Matrix Runner dispatches prompts to both the Chat Completions Client (OpenAI JS SDK → AOAI endpoint) and the Project Responses Client (@azure/ai-projects → Foundry endpoint). Both paths converge at Microsoft Foundry Model Router, which intelligently selects the optimal backend model from 18 supported models across OpenAI, Meta (Llama), Anthropic (Claude), xAI (Grok), and DeepSeek. Results are logged to JSONL files and rendered in the web dashboard.
- Node.js 18+ (LTS recommended)
- An Azure subscription with a Foundry project in East US 2
- Model Router deployed in your Foundry project (see How to use Model Router)
- An API key from your Azure OpenAI / Foundry resource (find it under Keys and Endpoint in the Azure portal)
# 1. Clone & install
git clone https://github.com/leestott/modelrouter-routelens/
cd modelrouter-routelens
npm install
# 2. Configure
cp .env.example .env
# Edit .env with your endpoints — see "Configuration" below
# 3. Launch the web dashboard
npm run ui
# Open http://localhost:3000
# 4. Or run from the CLI
npm run run:matrix
npm run run:repro408Copy .env.example to .env and fill in:
| Variable | Description |
|---|---|
FOUNDRY_PROJECT_ENDPOINT |
Foundry project endpoint. Format: https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>. Do NOT include /chat/completions or ?api-version |
FOUNDRY_MODEL_DEPLOYMENT |
Deployment name for Model Router in your Foundry project (default: model-router) |
AOAI_BASE_URL |
Azure OpenAI base URL. Format: https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>. Do NOT include /chat/completions or ?api-version |
AOAI_DEPLOYMENT |
Deployment name for Model Router via Chat Completions (default: model-router) |
AOAI_API_KEY |
API key from your Azure OpenAI / Foundry resource |
AOAI_API_VERSION |
Azure OpenAI API version (default: 2024-05-01-preview) |
REQUEST_TIMEOUT_MS |
Per-request timeout in ms (default: 60000) |
RETRY_MAX |
Max retries for transient errors (default: 3) |
RETRY_BACKOFF_MS |
Base backoff between retries (default: 1000) |
RUNS |
Number of times each prompt is sent per path (default: 3) |
CONCURRENCY |
Max concurrent requests (default: 2) |
UI_PORT |
Web dashboard port (default: 3000) |
PROMPTS_FILE |
Optional path to a JSON file overriding the default prompts |
LOG_DIR |
Directory for JSONL logs (default: ./logs) |
- Go to ai.azure.com
- Open your Foundry project
- Under Deployments, find your Model Router deployment
- Click on the deployment → copy the Target URI (format:
https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>) - Use this URL for both
FOUNDRY_PROJECT_ENDPOINTandAOAI_BASE_URL(they can be the same) - Copy your API key from the Keys and Endpoint page →
AOAI_API_KEY - Note the API version shown in the sample code →
AOAI_API_VERSION
npm run ui
# Open http://localhost:3000The dashboard is a single-page dark-themed application with a sidebar on the left and a tabbed results panel on the right.
The interface is split into two columns:
| Area | Position | Purpose |
|---|---|---|
| Header bar | Top | App title, run-status indicator, and request-progress counter |
| Sidebar | Left (320 px) | Configuration status, run controls, prompt set, and run history |
| Main panel | Right | Six tabs: Summary, Model Comparison, Latency, Errors, Live Feed, Logs |
On screens narrower than 900 px the layout collapses to a single column.
The header spans the full width and contains:
- App title — "RouteLens".
- Status badge — a colour-coded dot with a label showing the current state:
- Grey / Idle — no test is running.
- Orange / Running (pulsing) — a test is in progress, with the mode name (e.g. "matrix — running").
- Green / Complete — the last test finished successfully.
- Red / Error — the last test encountered a fatal error.
- Progress counter — appears during a run, showing
completed / totalrequests.
Displays the connection status read from the server at startup:
- Foundry Endpoint — green dot if
FOUNDRY_PROJECT_ENDPOINTis configured, red if missing. - AOAI Endpoint — green dot if
AOAI_BASE_URLis configured, red if missing. - API Key — green dot if
AOAI_API_KEYis configured, red if missing. - Timeout & retries — the current
REQUEST_TIMEOUT_MSandRETRY_MAXvalues.
This card lets you confirm at a glance that the tool can reach both API paths before starting a run.
Where you launch tests:
- Runs per prompt — number input (default 3). How many times each prompt is sent per API path.
- Concurrency — number input (default 2). Maximum parallel requests.
- ▶ Run Matrix button — starts the full prompt × path test matrix.
- ⚡ Repro 408 button — starts the targeted 408-timeout diagnostic.
- Progress bar — appears once a run starts; fills from left to right as requests complete.
- Progress text — e.g. "12 of 24 requests".
Both buttons are disabled while a run is in flight to prevent overlapping tests.
Lists every prompt that will be sent during a matrix run. Each entry shows:
- Prompt ID — a short identifier such as
echo-helloorcode-fibonacci. - Category chip — colour-coded label (echo, summarize, code, reasoning).
- Prompt text — the full text, truncated with ellipsis (hover for full text).
The set is loaded from src/prompts.js (or from the custom file specified by PROMPTS_FILE).
Shows the most recent completed runs (up to 20) with:
- Mode —
matrixorrepro408. - Status chip —
completeorerror. - Timestamp — when the run started.
- Result count — total number of request results.
A KPI dashboard with 9 metric cards followed by a detailed table with one row per path × category combination:
KPI Cards:
| Card | Description |
|---|---|
| Success Rate | % of requests returning 200 |
| Avg TPS | Average tokens per second (total) |
| Avg Gen TPS | Average generation tokens/sec |
| Peak TPS | Best single-request throughput |
| Fastest Response | Lowest latency success + which prompt/path |
| p50 Latency | Median across all requests |
| p95 Latency | 95th percentile |
| Most Reliable | Path with highest success rate |
| Total Tokens | Sum of all tokens consumed |
Detail Table:
| Column | Description |
|---|---|
| Path | chat_completions or project_responses |
| Category | Prompt category (echo, summarize, code, reasoning) |
| Total | Number of requests sent |
| OK | Number of successful (2xx) responses |
| Err | Number of failures (shown in red if > 0) |
| Rate | Success rate (color-coded: green ≥95%, orange ≥50%, red <50%) |
| p50 | Median latency (ms) |
| p95 | 95th-percentile latency (ms) |
| Min | Fastest response (ms) |
| TPS | Average tokens per second |
| Gen TPS | Average generation tokens per second |
| Models | Colour-coded chips showing which underlying models the router chose |
This is the first place to look for an overview of routing behaviour and reliability.
A row-per-prompt table comparing which models each path selected:
| Column | Description |
|---|---|
| Prompt ID | The prompt identifier |
| Chat Completions Models | Blue chips for models chosen via the AOAI path |
| Project Responses Models | Purple chips for models chosen via the Foundry path |
| Match? | ✓ Match (green) when both paths picked the same model(s); ✗ DIFFER (red) when they diverged |
Divergences are the most interesting result — they reveal when the same prompt is routed to different backend models depending on the API surface.
Two horizontal bar charts grouped by path · category:
- p50 Latency — median response time. Bars are colour-coded: blue for Chat Completions, purple for Project Responses.
- p95 Latency — tail latency. Same colour scheme.
Bars are proportional to the slowest p95 in the current run so you can visually compare across categories and paths.
If any requests failed, this tab shows:
- Error Distribution chart — a red bar chart showing error counts grouped by
path · HTTP status(e.g. "project_responses · 408"). - Error Details table — up to the first 50 errors with columns for Prompt, Path, Status, Latency, and the error message (truncated, hover for full text).
When all requests succeed, the tab shows "No errors — all requests succeeded".
A monospace, autoscrolling log of every request as it completes:
✓ chat_completions echo echo-hello 234 ms s=200 [gpt-4o-2024-08-06]
✗ project_responses reasoning reason-logic — ms s=408
Each line is colour-coded green (success) or red (failure) and includes path, category, prompt ID, latency, HTTP status, and the model name.
A file browser for the logs/ directory:
- Lists all
.jsonllog files with a View button. - Clicking View opens a parsed table of every JSON line in that file with columns: Time, Path, Prompt, Status, Latency, Model.
- A ← Back button returns to the file list.
# Full test matrix (default)
node src/index.js
node src/index.js --mode matrix
# 408 repro diagnostic
node src/index.js --repro408
# Override runs and concurrency
node src/index.js --runs 10 --concurrency 4
# npm scripts
npm start # = --mode matrix
npm run run:matrix # = --mode matrix
npm run run:repro408 # = --repro408
npm run ui # = web dashboard| Flag | Description |
|---|---|
--mode matrix |
Run the full prompt × path test matrix (default) |
--repro408 |
Run 408 timeout diagnostic only |
--runs N |
Override number of runs per prompt |
--concurrency N |
Override max concurrent requests |
--help |
Show help |
Each request is logged as a JSON line in logs/:
{
"_ts": "2026-03-13T22:15:00.123Z",
"path": "chat_completions",
"ok": true,
"status": 200,
"latencyMs": 1234,
"model": "gpt-4o-2024-08-06",
"usage": { "prompt_tokens": 25, "completion_tokens": 80, "total_tokens": 105 },
"responseId": "chatcmpl-abc123",
"content": "Here are three bullets…",
"promptId": "summarize-repro",
"category": "summarize",
"run": 1,
"tags": ["repro"]
}After all requests complete, the tool prints:
- Per-path / per-category table — count, success rate, p50/p95 latency, model distribution
- Error breakdown — count by path × HTTP status
- Model-choice comparison — for each prompt, which models each path chose, with divergences highlighted
Create a JSON file:
[
{
"id": "my-prompt-1",
"category": "summarize",
"text": "Summarize the theory of relativity in one sentence.",
"tags": ["custom"]
}
]Set PROMPTS_FILE=./my-prompts.json in .env or pass the path. The file must be a JSON array of objects with at least id, category, and text.
- Add prompts: Edit
src/prompts.js(theDEFAULT_PROMPTSarray) or use a custom JSON file. - Add categories: Just use a new
categorystring — the reporting groups dynamically. - Add a new API path: Create a new client in
src/clients/, export asend*()function matching the same return shape, and wire it intosrc/matrix.js. - Add new diagnostic modes: Follow the pattern in
src/repro408.js— create a new runner and register it in theswitchinsrc/index.js.
These are hypotheses, not confirmed behavior. Model Router is a dynamic system whose routing decisions are not fully documented publicly.
-
Different API surface → different eligible model set. The Chat Completions API and the Responses API may expose different sets of underlying models to the router. If a model doesn't support the Responses API schema, it may be excluded from the eligible pool on that path, causing the router to pick a different backend. (Model Router concepts)
-
Prompt serialization differences. Both paths use the Chat Completions API with
messages: [{role:"user", content:"…"}], but they hit different endpoint URLs. Internal routing or load balancing may differ between the/openai/v1/and/openai/deployments/surfaces. -
Effective context window constraints. Model Router's effective context window is limited by the smallest underlying model in its pool. If the eligible pools differ between paths, the context-window gate may trigger differently. (Model Router concepts)
-
Regional capacity / load balancing. At any given moment, the router may factor in current load or quota across backend models. Two requests arriving milliseconds apart via different paths could see different capacity snapshots.
-
Routing mode configuration. If the two deployments have different routing modes (Balanced / Cost / Quality), model selection will differ by design. Verify both deployments use the same mode in the Azure portal.
- How to use Model Router
- Model Router concepts
- Foundry SDK endpoints overview
- Azure OpenAI REST API reference
routelens/
├── .env.example # Environment template
├── .gitignore
├── package.json
├── README.md
├── public/
│ └── index.html # Web dashboard (single-page app)
└── src/
├── index.js # CLI entry point
├── config.js # .env loader & validation
├── prompts.js # Default prompt set & loader
├── logger.js # JSONL file logger
├── utils.js # Retry, concurrency, timing helpers
├── matrix.js # Full test matrix runner
├── repro408.js # 408 diagnostic runner
├── report.js # Console summary reporting
├── server.js # HTTP server for the web UI
└── clients/
├── chatCompletions.js # AOAI + Chat Completions path
└── projectResponses.js # Foundry Project + Responses path
MIT


