diff --git a/docs/runbooks/redacted-route-quality-review.md b/docs/runbooks/redacted-route-quality-review.md new file mode 100644 index 0000000..76575b0 --- /dev/null +++ b/docs/runbooks/redacted-route-quality-review.md @@ -0,0 +1,62 @@ +# Redacted route quality review runbook + +Use this workflow to review route decisions without exposing raw prompts or private logs. + +## 1) Collect only redacted samples + +- Input must be JSONL, one object per line. +- Keep only redacted text snippets suitable for internal sharing. +- Every line must include `"redacted": true`. +- Do **not** include raw conversation logs, credentials, tokens, or user identifiers. + +## 2) Required JSONL fields + +Each sample line must include: + +- `text` (string): redacted prompt text +- `expect` (string): expected **route_id** +- `redacted` (boolean): must be `true` + +Optional fields: + +- `source` (string) +- `note` (string) + +Example: + +```json +{"text":"[REDACTED] payment flow timed out in prod","expect":"strong","redacted":true,"source":"incident_review"} +``` + +> `expect` must be a configured `route_id` (for example `fast`, `strong`), **not** a deployment `target_model` name. + +## 3) Import with route-config validation + +Convert JSONL to eval YAML and validate expected routes against your active route config: + +```bash +uv run python scripts/import_review_samples.py \ + --input tests/samples/redacted_review_fixture.jsonl \ + --output /tmp/redacted_review_cases.yaml \ + --routes config/routes.yaml +``` + +If a sample uses a `target_model` in `expect`, import fails with a route-id validation error. + +## 4) Run `review_decisions` against the decision endpoint + +Point the review script at the sidecar decision endpoint: + +```bash +uv run python scripts/review_decisions.py \ + --cases /tmp/redacted_review_cases.yaml \ + --endpoint http://127.0.0.1:8080/v1/route/decision \ + --routes config/routes.yaml +``` + +## 5) Interpret PASS/FAIL safely + +- `PASS`: returned `route_id` matches expected `route_id`. +- `FAIL`: returned `route_id` differs from expected `route_id`. +- Use route-level aggregates and mismatch counts for audits. +- Do not copy raw prompts into tickets; reference sample IDs/notes instead. diff --git a/tests/samples/redacted_review_fixture.jsonl b/tests/samples/redacted_review_fixture.jsonl new file mode 100644 index 0000000..73da234 --- /dev/null +++ b/tests/samples/redacted_review_fixture.jsonl @@ -0,0 +1 @@ +{"text":"[REDACTED] checkout incident summary","expect":"strong","redacted":true,"source":"synthetic_fixture","note":"synthetic sample for tooling checks"} diff --git a/tests/test_import_review_samples.py b/tests/test_import_review_samples.py index d2a5004..7099fb0 100644 --- a/tests/test_import_review_samples.py +++ b/tests/test_import_review_samples.py @@ -1,6 +1,8 @@ from __future__ import annotations import json +from pathlib import Path + import pytest import yaml @@ -197,3 +199,23 @@ def test_main_invalid_unredacted_input_fails_before_writing(tmp_path, monkeypatc with pytest.raises(ReviewSampleError, match="redacted=true"): import_review_samples.main() assert not output_path.exists() + + +def test_redacted_fixture_is_redacted_and_uses_route_id_expectation(): + fixture_path = Path("tests/samples/redacted_review_fixture.jsonl") + line = fixture_path.read_text(encoding="utf-8").strip() + sample = json.loads(line) + + assert sample["redacted"] is True + assert sample["expect"] == "strong" + assert sample["expect"] not in {"cheap-router", "pro-router", "free-probe-router"} + + +def test_redacted_fixture_converts_with_route_validation(): + fixture_path = Path("tests/samples/redacted_review_fixture.jsonl") + raw_lines = fixture_path.read_text(encoding="utf-8").splitlines() + + result = convert_review_samples(raw_lines, allowed_route_ids={"fast", "strong", "experimental"}) + + assert result["cases"][0]["expect"] == "strong" + assert result["cases"][0]["source"] == "production_review:synthetic_fixture"