Skip to content

leestott/modelrouter-routelens

Repository files navigation

RouteLens

License: MIT Node.js 18+ Microsoft Foundry Model Router OpenAI SDK

A Node.js CLI tool that tests Microsoft Foundry Model Router behavior across two supported runtime paths and compares routing decisions, latency, and reliability.

What it does

Path SDK / API Endpoint pattern
AOAI + Chat Completions OpenAI JS SDK https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>
Foundry Project + Chat Completions OpenAI JS SDK (separate client) https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>

The tool sends a configurable set of prompts (echo, summarize, code, reasoning) through both paths, logs every response to JSONL, and prints a summary with:

  • Per-path / per-category latency stats (p50 / p95)
  • Timeout and error rates
  • Model-choice distribution per prompt category
  • A side-by-side diff showing where the same prompt routed to different underlying models between paths

A dedicated --repro408 mode targets the specific 408 timeout issue observed on certain prompts.

Web UI

The tool includes a built-in web dashboard for interactive testing and result visualization.

Dashboard Overview — Configure runs, view prompt sets, and monitor progress in real time. Dashboard Overview

Model Comparison — See which models were selected by each routing path for every prompt. Model Comparison

Latency Charts — Compare p50 and p95 latencies across Chat Completions and Project Responses paths. Latency Charts

Error Analysis — Drill into error distributions and detailed error messages per request. Error Analysis

Live Feed — Real-time streaming of results as they come in. Live Feed

Log Viewer — Browse and inspect historical JSONL log files with parsed table views. Log Viewer

Mobile Responsive — The UI adapts to smaller screens for use on tablets and phones.

Mobile Responsive View

Architecture

RouteLens Architecture

RouteLens sends configurable prompts through two distinct Azure AI runtime paths and compares routing decisions, latency, and reliability. The Matrix Runner dispatches prompts to both the Chat Completions Client (OpenAI JS SDK → AOAI endpoint) and the Project Responses Client (@azure/ai-projects → Foundry endpoint). Both paths converge at Microsoft Foundry Model Router, which intelligently selects the optimal backend model from 18 supported models across OpenAI, Meta (Llama), Anthropic (Claude), xAI (Grok), and DeepSeek. Results are logged to JSONL files and rendered in the web dashboard.


Prerequisites

  • Node.js 18+ (LTS recommended)
  • An Azure subscription with a Foundry project in East US 2
  • Model Router deployed in your Foundry project (see How to use Model Router)
  • An API key from your Azure OpenAI / Foundry resource (find it under Keys and Endpoint in the Azure portal)

Quick start

# 1. Clone & install
git clone https://github.com/leestott/modelrouter-routelens/
cd modelrouter-routelens
npm install

# 2. Configure
cp .env.example .env
#    Edit .env with your endpoints — see "Configuration" below

# 3. Launch the web dashboard
npm run ui
#    Open http://localhost:3000

# 4. Or run from the CLI
npm run run:matrix
npm run run:repro408

Configuration

Copy .env.example to .env and fill in:

Variable Description
FOUNDRY_PROJECT_ENDPOINT Foundry project endpoint. Format: https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>. Do NOT include /chat/completions or ?api-version
FOUNDRY_MODEL_DEPLOYMENT Deployment name for Model Router in your Foundry project (default: model-router)
AOAI_BASE_URL Azure OpenAI base URL. Format: https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>. Do NOT include /chat/completions or ?api-version
AOAI_DEPLOYMENT Deployment name for Model Router via Chat Completions (default: model-router)
AOAI_API_KEY API key from your Azure OpenAI / Foundry resource
AOAI_API_VERSION Azure OpenAI API version (default: 2024-05-01-preview)
REQUEST_TIMEOUT_MS Per-request timeout in ms (default: 60000)
RETRY_MAX Max retries for transient errors (default: 3)
RETRY_BACKOFF_MS Base backoff between retries (default: 1000)
RUNS Number of times each prompt is sent per path (default: 3)
CONCURRENCY Max concurrent requests (default: 2)
UI_PORT Web dashboard port (default: 3000)
PROMPTS_FILE Optional path to a JSON file overriding the default prompts
LOG_DIR Directory for JSONL logs (default: ./logs)

Finding your endpoints

  1. Go to ai.azure.com
  2. Open your Foundry project
  3. Under Deployments, find your Model Router deployment
  4. Click on the deployment → copy the Target URI (format: https://<resource>.cognitiveservices.azure.com/openai/deployments/<deployment>)
  5. Use this URL for both FOUNDRY_PROJECT_ENDPOINT and AOAI_BASE_URL (they can be the same)
  6. Copy your API key from the Keys and Endpoint page → AOAI_API_KEY
  7. Note the API version shown in the sample code → AOAI_API_VERSION

Usage

Web UI (recommended)

npm run ui
# Open http://localhost:3000

The dashboard is a single-page dark-themed application with a sidebar on the left and a tabbed results panel on the right.

RouteLens — Web Dashboard


UI Layout

The interface is split into two columns:

Area Position Purpose
Header bar Top App title, run-status indicator, and request-progress counter
Sidebar Left (320 px) Configuration status, run controls, prompt set, and run history
Main panel Right Six tabs: Summary, Model Comparison, Latency, Errors, Live Feed, Logs

On screens narrower than 900 px the layout collapses to a single column.


Header Bar

The header spans the full width and contains:

  • App title — "RouteLens".
  • Status badge — a colour-coded dot with a label showing the current state:
    • Grey / Idle — no test is running.
    • Orange / Running (pulsing) — a test is in progress, with the mode name (e.g. "matrix — running").
    • Green / Complete — the last test finished successfully.
    • Red / Error — the last test encountered a fatal error.
  • Progress counter — appears during a run, showing completed / total requests.

Sidebar — Configuration Card

Displays the connection status read from the server at startup:

  • Foundry Endpoint — green dot if FOUNDRY_PROJECT_ENDPOINT is configured, red if missing.
  • AOAI Endpoint — green dot if AOAI_BASE_URL is configured, red if missing.
  • API Key — green dot if AOAI_API_KEY is configured, red if missing.
  • Timeout & retries — the current REQUEST_TIMEOUT_MS and RETRY_MAX values.

This card lets you confirm at a glance that the tool can reach both API paths before starting a run.


Sidebar — Run Controls Card

Where you launch tests:

  • Runs per prompt — number input (default 3). How many times each prompt is sent per API path.
  • Concurrency — number input (default 2). Maximum parallel requests.
  • ▶ Run Matrix button — starts the full prompt × path test matrix.
  • ⚡ Repro 408 button — starts the targeted 408-timeout diagnostic.
  • Progress bar — appears once a run starts; fills from left to right as requests complete.
  • Progress text — e.g. "12 of 24 requests".

Both buttons are disabled while a run is in flight to prevent overlapping tests.


Sidebar — Prompt Set Card

Lists every prompt that will be sent during a matrix run. Each entry shows:

  • Prompt ID — a short identifier such as echo-hello or code-fibonacci.
  • Category chip — colour-coded label (echo, summarize, code, reasoning).
  • Prompt text — the full text, truncated with ellipsis (hover for full text).

The set is loaded from src/prompts.js (or from the custom file specified by PROMPTS_FILE).


Sidebar — Run History Card

Shows the most recent completed runs (up to 20) with:

  • Modematrix or repro408.
  • Status chipcomplete or error.
  • Timestamp — when the run started.
  • Result count — total number of request results.

Main Panel Tabs

1. Summary

A KPI dashboard with 9 metric cards followed by a detailed table with one row per path × category combination:

KPI Cards:

Card Description
Success Rate % of requests returning 200
Avg TPS Average tokens per second (total)
Avg Gen TPS Average generation tokens/sec
Peak TPS Best single-request throughput
Fastest Response Lowest latency success + which prompt/path
p50 Latency Median across all requests
p95 Latency 95th percentile
Most Reliable Path with highest success rate
Total Tokens Sum of all tokens consumed

Detail Table:

Column Description
Path chat_completions or project_responses
Category Prompt category (echo, summarize, code, reasoning)
Total Number of requests sent
OK Number of successful (2xx) responses
Err Number of failures (shown in red if > 0)
Rate Success rate (color-coded: green ≥95%, orange ≥50%, red <50%)
p50 Median latency (ms)
p95 95th-percentile latency (ms)
Min Fastest response (ms)
TPS Average tokens per second
Gen TPS Average generation tokens per second
Models Colour-coded chips showing which underlying models the router chose

This is the first place to look for an overview of routing behaviour and reliability.

2. Model Comparison

A row-per-prompt table comparing which models each path selected:

Column Description
Prompt ID The prompt identifier
Chat Completions Models Blue chips for models chosen via the AOAI path
Project Responses Models Purple chips for models chosen via the Foundry path
Match? ✓ Match (green) when both paths picked the same model(s); ✗ DIFFER (red) when they diverged

Divergences are the most interesting result — they reveal when the same prompt is routed to different backend models depending on the API surface.

3. Latency

Two horizontal bar charts grouped by path · category:

  • p50 Latency — median response time. Bars are colour-coded: blue for Chat Completions, purple for Project Responses.
  • p95 Latency — tail latency. Same colour scheme.

Bars are proportional to the slowest p95 in the current run so you can visually compare across categories and paths.

4. Errors

If any requests failed, this tab shows:

  • Error Distribution chart — a red bar chart showing error counts grouped by path · HTTP status (e.g. "project_responses · 408").
  • Error Details table — up to the first 50 errors with columns for Prompt, Path, Status, Latency, and the error message (truncated, hover for full text).

When all requests succeed, the tab shows "No errors — all requests succeeded".

5. Live Feed

A monospace, autoscrolling log of every request as it completes:

✓  chat_completions   echo        echo-hello       234 ms  s=200  [gpt-4o-2024-08-06]
✗  project_responses  reasoning   reason-logic     —  ms   s=408

Each line is colour-coded green (success) or red (failure) and includes path, category, prompt ID, latency, HTTP status, and the model name.

6. Logs

A file browser for the logs/ directory:

Logs Tab

  • Lists all .jsonl log files with a View button.
  • Clicking View opens a parsed table of every JSON line in that file with columns: Time, Path, Prompt, Status, Latency, Model.
  • A ← Back button returns to the file list.

Log Viewer Detail

CLI

# Full test matrix (default)
node src/index.js
node src/index.js --mode matrix

# 408 repro diagnostic
node src/index.js --repro408

# Override runs and concurrency
node src/index.js --runs 10 --concurrency 4

# npm scripts
npm start              # = --mode matrix
npm run run:matrix     # = --mode matrix
npm run run:repro408   # = --repro408
npm run ui             # = web dashboard

CLI options

Flag Description
--mode matrix Run the full prompt × path test matrix (default)
--repro408 Run 408 timeout diagnostic only
--runs N Override number of runs per prompt
--concurrency N Override max concurrent requests
--help Show help

Output

JSONL logs

Each request is logged as a JSON line in logs/:

{
  "_ts": "2026-03-13T22:15:00.123Z",
  "path": "chat_completions",
  "ok": true,
  "status": 200,
  "latencyMs": 1234,
  "model": "gpt-4o-2024-08-06",
  "usage": { "prompt_tokens": 25, "completion_tokens": 80, "total_tokens": 105 },
  "responseId": "chatcmpl-abc123",
  "content": "Here are three bullets…",
  "promptId": "summarize-repro",
  "category": "summarize",
  "run": 1,
  "tags": ["repro"]
}

Console report

After all requests complete, the tool prints:

  1. Per-path / per-category table — count, success rate, p50/p95 latency, model distribution
  2. Error breakdown — count by path × HTTP status
  3. Model-choice comparison — for each prompt, which models each path chose, with divergences highlighted

Custom prompt sets

Create a JSON file:

[
  {
    "id": "my-prompt-1",
    "category": "summarize",
    "text": "Summarize the theory of relativity in one sentence.",
    "tags": ["custom"]
  }
]

Set PROMPTS_FILE=./my-prompts.json in .env or pass the path. The file must be a JSON array of objects with at least id, category, and text.

Extending with new probes

  1. Add prompts: Edit src/prompts.js (the DEFAULT_PROMPTS array) or use a custom JSON file.
  2. Add categories: Just use a new category string — the reporting groups dynamically.
  3. Add a new API path: Create a new client in src/clients/, export a send*() function matching the same return shape, and wire it into src/matrix.js.
  4. Add new diagnostic modes: Follow the pattern in src/repro408.js — create a new runner and register it in the switch in src/index.js.

Hypotheses: Why different models may be chosen between paths

These are hypotheses, not confirmed behavior. Model Router is a dynamic system whose routing decisions are not fully documented publicly.

  1. Different API surface → different eligible model set. The Chat Completions API and the Responses API may expose different sets of underlying models to the router. If a model doesn't support the Responses API schema, it may be excluded from the eligible pool on that path, causing the router to pick a different backend. (Model Router concepts)

  2. Prompt serialization differences. Both paths use the Chat Completions API with messages: [{role:"user", content:"…"}], but they hit different endpoint URLs. Internal routing or load balancing may differ between the /openai/v1/ and /openai/deployments/ surfaces.

  3. Effective context window constraints. Model Router's effective context window is limited by the smallest underlying model in its pool. If the eligible pools differ between paths, the context-window gate may trigger differently. (Model Router concepts)

  4. Regional capacity / load balancing. At any given moment, the router may factor in current load or quota across backend models. Two requests arriving milliseconds apart via different paths could see different capacity snapshots.

  5. Routing mode configuration. If the two deployments have different routing modes (Balanced / Cost / Quality), model selection will differ by design. Verify both deployments use the same mode in the Azure portal.


Relevant documentation

Project structure

routelens/
├── .env.example              # Environment template
├── .gitignore
├── package.json
├── README.md
├── public/
│   └── index.html            # Web dashboard (single-page app)
└── src/
    ├── index.js              # CLI entry point
    ├── config.js             # .env loader & validation
    ├── prompts.js            # Default prompt set & loader
    ├── logger.js             # JSONL file logger
    ├── utils.js              # Retry, concurrency, timing helpers
    ├── matrix.js             # Full test matrix runner
    ├── repro408.js           # 408 diagnostic runner
    ├── report.js             # Console summary reporting
    ├── server.js             # HTTP server for the web UI
    └── clients/
        ├── chatCompletions.js    # AOAI + Chat Completions path
        └── projectResponses.js   # Foundry Project + Responses path

License

MIT

About

RouteLens is a Node.js diagnostic tool for Microsoft Foundry Model Router. It sends configurable prompts through two Azure OpenAI runtime paths (Chat Completions and Project Responses), logs every response to JSONL, and surfaces differences in model routing decisions, latency (p50/p95), throughput (tokens/sec), and error rates.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors