Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions docs/runbooks/redacted-route-quality-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Redacted route quality review runbook

Use this workflow to review route decisions without exposing raw prompts or private logs.

## 1) Collect only redacted samples

- Input must be JSONL, one object per line.
- Keep only redacted text snippets suitable for internal sharing.
- Every line must include `"redacted": true`.
- Do **not** include raw conversation logs, credentials, tokens, or user identifiers.

## 2) Required JSONL fields

Each sample line must include:

- `text` (string): redacted prompt text
- `expect` (string): expected **route_id**
- `redacted` (boolean): must be `true`

Optional fields:

- `source` (string)
- `note` (string)

Example:

```json
{"text":"[REDACTED] payment flow timed out in prod","expect":"strong","redacted":true,"source":"incident_review"}
```

> `expect` must be a configured `route_id` (for example `fast`, `strong`), **not** a deployment `target_model` name.

## 3) Import with route-config validation

Convert JSONL to eval YAML and validate expected routes against your active route config:

```bash
uv run python scripts/import_review_samples.py \
--input tests/samples/redacted_review_fixture.jsonl \
--output /tmp/redacted_review_cases.yaml \
--routes config/routes.yaml
```

If a sample uses a `target_model` in `expect`, import fails with a route-id validation error.

## 4) Run `review_decisions` against the decision endpoint

Point the review script at the sidecar decision endpoint:

```bash
uv run python scripts/review_decisions.py \
--cases /tmp/redacted_review_cases.yaml \
--endpoint http://127.0.0.1:8080/v1/route/decision \
--routes config/routes.yaml
```

## 5) Interpret PASS/FAIL safely

- `PASS`: returned `route_id` matches expected `route_id`.
- `FAIL`: returned `route_id` differs from expected `route_id`.
- Use route-level aggregates and mismatch counts for audits.
- Do not copy raw prompts into tickets; reference sample IDs/notes instead.
1 change: 1 addition & 0 deletions tests/samples/redacted_review_fixture.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"text":"[REDACTED] checkout incident summary","expect":"strong","redacted":true,"source":"synthetic_fixture","note":"synthetic sample for tooling checks"}
22 changes: 22 additions & 0 deletions tests/test_import_review_samples.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
from __future__ import annotations

import json
from pathlib import Path

import pytest
import yaml

Expand Down Expand Up @@ -197,3 +199,23 @@ def test_main_invalid_unredacted_input_fails_before_writing(tmp_path, monkeypatc
with pytest.raises(ReviewSampleError, match="redacted=true"):
import_review_samples.main()
assert not output_path.exists()


def test_redacted_fixture_is_redacted_and_uses_route_id_expectation():
fixture_path = Path("tests/samples/redacted_review_fixture.jsonl")
line = fixture_path.read_text(encoding="utf-8").strip()
sample = json.loads(line)

assert sample["redacted"] is True
assert sample["expect"] == "strong"
assert sample["expect"] not in {"cheap-router", "pro-router", "free-probe-router"}


def test_redacted_fixture_converts_with_route_validation():
fixture_path = Path("tests/samples/redacted_review_fixture.jsonl")
raw_lines = fixture_path.read_text(encoding="utf-8").splitlines()

result = convert_review_samples(raw_lines, allowed_route_ids={"fast", "strong", "experimental"})

assert result["cases"][0]["expect"] == "strong"
assert result["cases"][0]["source"] == "production_review:synthetic_fixture"
Loading