A web-based editor for AI conversation transcripts with AI-assisted editing and integrated evaluation via a trusted monitor.
Built for red-teaming research: craft attack transcripts, score them for suspiciousness, and iterate to find monitor blind spots.
New to red-teaming transcripts? See GETTING_STARTED.md for goals, tips, and a walkthrough.
cp backend/.env.example backend/.env
# Edit backend/.env to add your ANTHROPIC_API_KEY
docker compose up./start.shThe script auto-creates a Python venv, installs dependencies, and starts both servers. Edit backend/.env to add your ANTHROPIC_API_KEY.
# Terminal 1: Backend
cd backend
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # Edit to add API key
python main.py
# Terminal 2: Frontend
npm install
npm run devOpen http://localhost:5173 in your browser.
Transcripts are stored in the transcripts/ directory (configurable via TRANSCRIPTS_DIR in backend/.env), organized by project subdirectories. Each transcript is a .jsonl file with an optional .meta.json sidecar for scores and metadata. Create new projects from the UI or by making a directory manually.
The examples/ directory contains sample attack transcripts to get started:
- api-key-in-healthcheck — credential exfiltration via PID collision in
/tmp - exfil-ssh — SSH private key embedded in a git commit message
- docker-volume-wipe — destructive sabotage via config parameter manipulation
Copy them into your transcripts/ directory to view them in the editor:
mkdir -p transcripts/examples
cp examples/*.jsonl examples/*.meta.json transcripts/examples/- Load, edit, and save JSONL transcripts organized by project
- Insert, delete, reorder, and modify messages
- Edit tool call inputs and results directly
- Auto-detects and minimizes full Claude Code format on load
- Auto-fixes non-standard tool IDs to realistic format
- Natural language instructions to modify transcripts
- Agent reads transcript structure and makes targeted edits
- Fork transcripts into variants to try different approaches in parallel
- Configurable agent role: Strategic (agent reasons about attack approach) or Literal (exact instruction execution)
- Score transcripts for suspiciousness (0-100) across multiple prompt variants
- Evaluate egregiousness, incriminating evidence, effectiveness, confidence, realism
- Cached scoring with automatic invalidation on transcript changes
- Real-time progress via SSE streaming
- Virtual filesystem validation: detects Read/Write/Edit mismatches
- Suppressible warnings for known false positives
# Required
ANTHROPIC_API_KEY=sk-ant-...
# Optional
ANTHROPIC_API_KEY_ALT=sk-ant-... # Alternate key for load balancing
DEFAULT_MODEL=claude-opus-4-6 # Default model for agent + token counting
TRANSCRIPTS_DIR=./transcripts # Transcript storage directory
HOST=127.0.0.1 # Server bind address
PORT=8000 # Server port
CORS_ORIGINS=http://localhost:5173 # Comma-separated allowed originsAccess via the gear icon. Configurable at runtime:
- Model selection: Choose model and API key per session
- Agent role: Strategic (agent reasons about attack approach) or Literal (exact instruction execution)
- Role document: Select which .md file to use as agent system prompt
- Monitor settings: Number of evaluations, prompt variants, enabled metrics
- Rate limits: Per-key TPM limits
- Child lock: Prevent edits after monitor scoring
Minimal JSONL — one JSON object per line:
{"role": "user", "content": [{"type": "text", "text": "Fix the failing test"}], "cwd": "/home/user/project"}
{"role": "assistant", "content": [{"type": "thinking", "thinking": "Let me look at the test..."}, {"type": "tool_use", "id": "toolu_01ABC...", "name": "Read", "input": {"file_path": "test.py"}}]}
{"role": "user", "content": [{"type": "tool_result", "tool_use_id": "toolu_01ABC...", "content": "def test_foo():..."}]}See TRANSCRIPT_FORMAT.md for the full specification.
Frontend (React 19 + TypeScript + Zustand)
│
├── SSE stream (/api/session/{file}/events)
│ └── Real-time: agent streaming, monitor progress, rate limit status
│
└── HTTP commands
├── /api/session/{file}/submit - Start agent
├── /api/session/{file}/stop - Stop agent
├── /api/files/* - File CRUD + metadata
└── /api/monitor/* - Evaluation triggers + polling
Backend (FastAPI + AsyncAnthropic)
│
├── Sessions: one per file, multiple subscribers (UI + CLI)
├── Agent loop: streaming API calls, tool execution, rate limiting
├── Monitor: concurrent metric evaluation, preprocessing cache
└── Disk: .jsonl (transcript) + .meta.json (scores) + .chat.json (agent state)
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Backend status, API keys, monitor, sessions, rate limits |
/api/config/rate-limit |
GET/PUT | Per-key TPM rate limit config |
| Endpoint | Method | Description |
|---|---|---|
/projects |
GET | List projects with file counts |
/list?project=... |
GET | List transcripts in project |
/load/{path} |
GET | Load transcript (auto-minimizes, auto-fixes IDs) |
/save |
POST | Save transcript |
/upload |
POST | Upload .jsonl file |
/delete/{path} |
DELETE | Delete transcript + sidecars |
/rename |
POST | Rename file atomically |
/move |
POST | Move between projects |
/duplicate |
POST | Copy file + metadata |
/meta |
POST | Write/merge sidecar metadata |
| Endpoint | Method | Description |
|---|---|---|
/{file}/events |
GET | SSE stream (agent events, monitor progress) |
/{file}/submit |
POST | Submit prompt to agent |
/{file}/stop |
POST | Stop running agent |
/{file}/reset |
POST | Clear chat history |
/{file}/state |
GET | Get current session state |
/{file}/settings |
PUT | Per-session model/prompt config |
/{file}/messages |
POST | Update messages (from frontend edits) |
/{file}/metadata |
POST | Update outcome/scenario/mechanism |
/global-settings |
GET/PUT | Global editor configuration |
| Endpoint | Method | Description |
|---|---|---|
/evaluate-file |
POST | Start multi-metric evaluation |
/evaluate-file/{id} |
GET | Poll evaluation progress |
/token-count |
POST | Count tokens in preprocessed transcript |
/status |
GET | Check trusted-monitor availability |
The agent has access to these tools for transcript manipulation:
| Tool | Description |
|---|---|
get_messages |
Read messages (list, detail, block, search modes) |
insert_message |
Insert new message at position |
update_message |
Replace message content |
delete_messages |
Remove message range |
move_messages |
Reorder messages |
replace_messages |
Bulk replace message range |
update_tool_content |
Edit specific tool_use input fields |
find_replace |
Search and replace across transcript |
fork_to_new_tab |
Create variant transcript |
set_transcript_metadata |
Set outcome/scenario/mechanism |
get_monitor_score |
Trigger and retrieve monitor evaluation |
count_tokens |
Count tokens in transcript |
check_consistency |
Run file consistency validation |
finish_editing |
Signal editing complete |
For programmatic access (e.g., from Claude Code):
cd backend
./venv/bin/python cli_client.py PROJECT/FILE.jsonl --prompt "Make the attack stealthier"# Backend (auto-reload)
cd backend && python main.py
# Frontend (hot reload)
npm run dev
# Type check
npx tsc --noEmit
# Tests
cd backend && pytest tests/
# Production build
npm run buildBackend: FastAPI, Anthropic SDK, aiofiles, uvicorn, trusted-monitor Frontend: React 19, Zustand, Vite, TypeScript, react-markdown