RouteLens

A Node.js CLI tool that tests Microsoft Foundry Model Router behavior across two supported runtime paths and compares routing decisions, latency, and reliability.

What it does

Path	SDK / API	Endpoint pattern
AOAI + Chat Completions	OpenAI JS SDK	`https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>`
Foundry Project + Chat Completions	OpenAI JS SDK (separate client)	`https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>`

The tool sends a configurable set of prompts (echo, summarize, code, reasoning) through both paths, logs every response to JSONL, and prints a summary with:

Per-path / per-category latency stats (p50 / p95)
Timeout and error rates
Model-choice distribution per prompt category
A side-by-side diff showing where the same prompt routed to different underlying models between paths

A dedicated --repro408 mode targets the specific 408 timeout issue observed on certain prompts.

Web UI

The tool includes a built-in web dashboard for interactive testing and result visualization.

Dashboard Overview — Configure runs, view prompt sets, and monitor progress in real time.

Model Comparison — See which models were selected by each routing path for every prompt.

Latency Charts — Compare p50 and p95 latencies across Chat Completions and Project Responses paths.

Error Analysis — Drill into error distributions and detailed error messages per request.

Live Feed — Real-time streaming of results as they come in.

Log Viewer — Browse and inspect historical JSONL log files with parsed table views.

Mobile Responsive — The UI adapts to smaller screens for use on tablets and phones.

Architecture

RouteLens sends configurable prompts through two distinct Azure AI runtime paths and compares routing decisions, latency, and reliability. The Matrix Runner dispatches prompts to both the Chat Completions Client (OpenAI JS SDK → AOAI endpoint) and the Project Responses Client (@azure/ai-projects → Foundry endpoint). Both paths converge at Microsoft Foundry Model Router, which intelligently selects the optimal backend model from 18 supported models across OpenAI, Meta (Llama), Anthropic (Claude), xAI (Grok), and DeepSeek. Results are logged to JSONL files and rendered in the web dashboard.

Prerequisites

Node.js 18+ (LTS recommended)
An Azure subscription with a Foundry project in East US 2
Model Router deployed in your Foundry project (see How to use Model Router)
An API key from your Azure OpenAI / Foundry resource (find it under Keys and Endpoint in the Azure portal)

Quick start

# 1. Clone & install
git clone https://github.com/leestott/modelrouter-routelens/
cd modelrouter-routelens
npm install

# 2. Configure
cp .env.example .env
#    Edit .env with your endpoints — see "Configuration" below

# 3. Launch the web dashboard
npm run ui
#    Open http://localhost:3000

# 4. Or run from the CLI
npm run run:matrix
npm run run:repro408

Configuration

Copy .env.example to .env and fill in:

Variable	Description
`FOUNDRY_PROJECT_ENDPOINT`	Foundry project endpoint. Format: `https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>`. Do NOT include `/chat/completions` or `?api-version`
`FOUNDRY_MODEL_DEPLOYMENT`	Deployment name for Model Router in your Foundry project (default: `model-router`)
`AOAI_BASE_URL`	Azure OpenAI base URL. Format: `https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>`. Do NOT include `/chat/completions` or `?api-version`
`AOAI_DEPLOYMENT`	Deployment name for Model Router via Chat Completions (default: `model-router`)
`AOAI_API_KEY`	API key from your Azure OpenAI / Foundry resource
`AOAI_API_VERSION`	Azure OpenAI API version (default: `2024-05-01-preview`)
`REQUEST_TIMEOUT_MS`	Per-request timeout in ms (default: `60000`)
`RETRY_MAX`	Max retries for transient errors (default: `3`)
`RETRY_BACKOFF_MS`	Base backoff between retries (default: `1000`)
`RUNS`	Number of times each prompt is sent per path (default: `3`)
`CONCURRENCY`	Max concurrent requests (default: `2`)
`UI_PORT`	Web dashboard port (default: `3000`)
`PROMPTS_FILE`	Optional path to a JSON file overriding the default prompts
`LOG_DIR`	Directory for JSONL logs (default: `./logs`)

Finding your endpoints

Go to ai.azure.com
Open your Foundry project
Under Deployments, find your Model Router deployment
Click on the deployment → copy the Target URI (format: https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>)
Use this URL for both FOUNDRY_PROJECT_ENDPOINT and AOAI_BASE_URL (they can be the same)
Copy your API key from the Keys and Endpoint page → AOAI_API_KEY
Note the API version shown in the sample code → AOAI_API_VERSION

Usage

Web UI (recommended)

npm run ui
# Open http://localhost:3000

The dashboard is a single-page dark-themed application with a sidebar on the left and a tabbed results panel on the right.

UI Layout

The interface is split into two columns:

Area	Position	Purpose
Header bar	Top	App title, run-status indicator, and request-progress counter
Sidebar	Left (320 px)	Configuration status, run controls, prompt set, and run history
Main panel	Right	Six tabs: Summary, Model Comparison, Latency, Errors, Live Feed, Logs

On screens narrower than 900 px the layout collapses to a single column.

Header Bar

The header spans the full width and contains:

App title — "RouteLens".
Status badge — a colour-coded dot with a label showing the current state:
- Grey / Idle — no test is running.
- Orange / Running (pulsing) — a test is in progress, with the mode name (e.g. "matrix — running").
- Green / Complete — the last test finished successfully.
- Red / Error — the last test encountered a fatal error.
Progress counter — appears during a run, showing completed / total requests.

Sidebar — Configuration Card

Displays the connection status read from the server at startup:

Foundry Endpoint — green dot if FOUNDRY_PROJECT_ENDPOINT is configured, red if missing.
AOAI Endpoint — green dot if AOAI_BASE_URL is configured, red if missing.
API Key — green dot if AOAI_API_KEY is configured, red if missing.
Timeout & retries — the current REQUEST_TIMEOUT_MS and RETRY_MAX values.

This card lets you confirm at a glance that the tool can reach both API paths before starting a run.

Sidebar — Run Controls Card

Where you launch tests:

Runs per prompt — number input (default 3). How many times each prompt is sent per API path.
Concurrency — number input (default 2). Maximum parallel requests.
▶ Run Matrix button — starts the full prompt × path test matrix.
⚡ Repro 408 button — starts the targeted 408-timeout diagnostic.
Progress bar — appears once a run starts; fills from left to right as requests complete.
Progress text — e.g. "12 of 24 requests".

Both buttons are disabled while a run is in flight to prevent overlapping tests.

Sidebar — Prompt Set Card

Lists every prompt that will be sent during a matrix run. Each entry shows:

Prompt ID — a short identifier such as echo-hello or code-fibonacci.
Category chip — colour-coded label (echo, summarize, code, reasoning).
Prompt text — the full text, truncated with ellipsis (hover for full text).

The set is loaded from src/prompts.js (or from the custom file specified by PROMPTS_FILE).

Sidebar — Run History Card

Shows the most recent completed runs (up to 20) with:

Mode — matrix or repro408.
Status chip — complete or error.
Timestamp — when the run started.
Result count — total number of request results.

Main Panel Tabs

1. Summary

A KPI dashboard with 9 metric cards followed by a detailed table with one row per path × category combination:

KPI Cards:

Card	Description
Success Rate	% of requests returning 200
Avg TPS	Average tokens per second (total)
Avg Gen TPS	Average generation tokens/sec
Peak TPS	Best single-request throughput
Fastest Response	Lowest latency success + which prompt/path
p50 Latency	Median across all requests
p95 Latency	95th percentile
Most Reliable	Path with highest success rate
Total Tokens	Sum of all tokens consumed

Detail Table:

Column	Description
Path	`chat_completions` or `project_responses`
Category	Prompt category (echo, summarize, code, reasoning)
Total	Number of requests sent
OK	Number of successful (2xx) responses
Err	Number of failures (shown in red if > 0)
Rate	Success rate (color-coded: green ≥95%, orange ≥50%, red <50%)
p50	Median latency (ms)
p95	95th-percentile latency (ms)
Min	Fastest response (ms)
TPS	Average tokens per second
Gen TPS	Average generation tokens per second
Models	Colour-coded chips showing which underlying models the router chose

This is the first place to look for an overview of routing behaviour and reliability.

2. Model Comparison

A row-per-prompt table comparing which models each path selected:

Column	Description
Prompt ID	The prompt identifier
Chat Completions Models	Blue chips for models chosen via the AOAI path
Project Responses Models	Purple chips for models chosen via the Foundry path
Match?	✓ Match (green) when both paths picked the same model(s); ✗ DIFFER (red) when they diverged

Divergences are the most interesting result — they reveal when the same prompt is routed to different backend models depending on the API surface.

3. Latency

Two horizontal bar charts grouped by path · category:

p50 Latency — median response time. Bars are colour-coded: blue for Chat Completions, purple for Project Responses.
p95 Latency — tail latency. Same colour scheme.

Bars are proportional to the slowest p95 in the current run so you can visually compare across categories and paths.

4. Errors

If any requests failed, this tab shows:

Error Distribution chart — a red bar chart showing error counts grouped by path · HTTP status (e.g. "project_responses · 408").
Error Details table — up to the first 50 errors with columns for Prompt, Path, Status, Latency, and the error message (truncated, hover for full text).

When all requests succeed, the tab shows "No errors — all requests succeeded".

5. Live Feed

A monospace, autoscrolling log of every request as it completes:

✓  chat_completions   echo        echo-hello       234 ms  s=200  [gpt-4o-2024-08-06]
✗  project_responses  reasoning   reason-logic     —  ms   s=408

Each line is colour-coded green (success) or red (failure) and includes path, category, prompt ID, latency, HTTP status, and the model name.

6. Logs

A file browser for the logs/ directory:

Lists all .jsonl log files with a View button.
Clicking View opens a parsed table of every JSON line in that file with columns: Time, Path, Prompt, Status, Latency, Model.
A ← Back button returns to the file list.

CLI

# Full test matrix (default)
node src/index.js
node src/index.js --mode matrix

# 408 repro diagnostic
node src/index.js --repro408

# Override runs and concurrency
node src/index.js --runs 10 --concurrency 4

# npm scripts
npm start              # = --mode matrix
npm run run:matrix     # = --mode matrix
npm run run:repro408   # = --repro408
npm run ui             # = web dashboard

CLI options

Flag	Description
`--mode matrix`	Run the full prompt × path test matrix (default)
`--repro408`	Run 408 timeout diagnostic only
`--runs N`	Override number of runs per prompt
`--concurrency N`	Override max concurrent requests
`--help`	Show help

Output

JSONL logs

Each request is logged as a JSON line in logs/:

{
  "_ts": "2026-03-13T22:15:00.123Z",
  "path": "chat_completions",
  "ok": true,
  "status": 200,
  "latencyMs": 1234,
  "model": "gpt-4o-2024-08-06",
  "usage": { "prompt_tokens": 25, "completion_tokens": 80, "total_tokens": 105 },
  "responseId": "chatcmpl-abc123",
  "content": "Here are three bullets…",
  "promptId": "summarize-repro",
  "category": "summarize",
  "run": 1,
  "tags": ["repro"]
}

Console report

After all requests complete, the tool prints:

Per-path / per-category table — count, success rate, p50/p95 latency, model distribution
Error breakdown — count by path × HTTP status
Model-choice comparison — for each prompt, which models each path chose, with divergences highlighted

Custom prompt sets

Create a JSON file:

[
  {
    "id": "my-prompt-1",
    "category": "summarize",
    "text": "Summarize the theory of relativity in one sentence.",
    "tags": ["custom"]
  }
]

Set PROMPTS_FILE=./my-prompts.json in .env or pass the path. The file must be a JSON array of objects with at least id, category, and text.

Extending with new probes

Add prompts: Edit src/prompts.js (the DEFAULT_PROMPTS array) or use a custom JSON file.
Add categories: Just use a new category string — the reporting groups dynamically.
Add a new API path: Create a new client in src/clients/, export a send*() function matching the same return shape, and wire it into src/matrix.js.
Add new diagnostic modes: Follow the pattern in src/repro408.js — create a new runner and register it in the switch in src/index.js.

Hypotheses: Why different models may be chosen between paths

These are hypotheses, not confirmed behavior. Model Router is a dynamic system whose routing decisions are not fully documented publicly.

Different API surface → different eligible model set. The Chat Completions API and the Responses API may expose different sets of underlying models to the router. If a model doesn't support the Responses API schema, it may be excluded from the eligible pool on that path, causing the router to pick a different backend. (Model Router concepts)
Prompt serialization differences. Both paths use the Chat Completions API with messages: [{role:"user", content:"…"}], but they hit different endpoint URLs. Internal routing or load balancing may differ between the /openai/v1/ and /openai/deployments/ surfaces.
Effective context window constraints. Model Router's effective context window is limited by the smallest underlying model in its pool. If the eligible pools differ between paths, the context-window gate may trigger differently. (Model Router concepts)
Regional capacity / load balancing. At any given moment, the router may factor in current load or quota across backend models. Two requests arriving milliseconds apart via different paths could see different capacity snapshots.
Routing mode configuration. If the two deployments have different routing modes (Balanced / Cost / Quality), model selection will differ by design. Verify both deployments use the same mode in the Azure portal.

Relevant documentation

Project structure

routelens/
├── .env.example              # Environment template
├── .gitignore
├── package.json
├── README.md
├── public/
│   └── index.html            # Web dashboard (single-page app)
└── src/
    ├── index.js              # CLI entry point
    ├── config.js             # .env loader & validation
    ├── prompts.js            # Default prompt set & loader
    ├── logger.js             # JSONL file logger
    ├── utils.js              # Retry, concurrency, timing helpers
    ├── matrix.js             # Full test matrix runner
    ├── repro408.js           # 408 diagnostic runner
    ├── report.js             # Console summary reporting
    ├── server.js             # HTTP server for the web UI
    └── clients/
        ├── chatCompletions.js    # AOAI + Chat Completions path
        └── projectResponses.js   # Foundry Project + Responses path

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
public		public
screenshots		screenshots
src		src
test		test
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
blog-model-router-validator.md		blog-model-router-validator.md
blog_post.html		blog_post.html
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

RouteLens

What it does

Web UI

Architecture

Prerequisites

Quick start

Configuration

Finding your endpoints

Usage

Web UI (recommended)

UI Layout

Header Bar

Sidebar — Configuration Card

Sidebar — Run Controls Card

Sidebar — Prompt Set Card

Sidebar — Run History Card

Main Panel Tabs

1. Summary

2. Model Comparison

3. Latency

4. Errors

5. Live Feed

6. Logs

CLI

CLI options

Output

JSONL logs

Console report

Custom prompt sets

Extending with new probes

Hypotheses: Why different models may be chosen between paths

Relevant documentation

Project structure

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages