quantifylabs · quantifylabs · May 31, 2026 · May 31, 2026 · May 31, 2026
diff --git a/benchmarks/injection/.gitignore b/benchmarks/injection/.gitignore
@@ -0,0 +1,7 @@
+# LLM response cache — never committed (may be large; re-derivable from APIs).
+cache/
+
+# HuggingFace dataset cache, if a local one is created here.
+.hf_cache/
+
+# results/ IS committed (results.json + error_analysis.md are deliverables).
diff --git a/benchmarks/injection/README.md b/benchmarks/injection/README.md
@@ -0,0 +1,99 @@
+# Aegis injection-detection benchmark
+
+A reproducible, **honest** benchmark that evaluates the Aegis four-stage content-security
+pipeline (`server/content_security.py`) as a prompt-injection / memory-poisoning **detector**,
+against established baselines, with full confusion-matrix metrics and a per-stage ablation.
+
+This measures Aegis in its actual threat model: **detecting injection/poisoning in content being
+written to memory**. It is *not* an LLM-jailbreak-defense benchmark. The headline numbers,
+ablation, latency comparison, and limitations live in
+[`docs/security/benchmark.md`](../../docs/security/benchmark.md).
+
+## What it measures
+
+Every system is wrapped as `predict(text) -> bool` and scored on **both** malicious and benign
+corpora, reported as a full confusion matrix → **precision, recall, F1, FPR, accuracy**, plus
+**median per-item latency** and **bootstrapped 95% CIs** (n=1000, seed=42).
+
+**Systems:** `no_protection`, `naive_regex`, `protectai_deberta`, `llm_guard`,
+`llm_judge_openai`, `llm_judge_anthropic`, `aegis_stages_1_3`, `aegis_stages_1_4_openai`,
+`aegis_stages_1_4_anthropic`.
+
+**Datasets:** `deepset/prompt-injections` (direct), `InjecAgent` (indirect, 250 sampled),
+`benign_public` (dolly, 750), `benign_synth` (750 templated memory entries).
+
+## Setup
+
+```bash
+# from the repo root (aegis-memory-main/)
+python -m venv .venv-bench && source .venv-bench/bin/activate   # Windows: .venv-bench\Scripts\Activate.ps1
+pip install -r benchmarks/injection/requirements.txt
+```
+
+`torch`/`transformers` are large (CPU wheels, a few minutes). If `llm-guard` cannot co-resolve
+with the pinned `transformers`/`torch`, install it in a separate venv or skip it — the benchmark
+marks `llm_guard` as `not_run` and proceeds.
+
+### API keys
+
+`llm_judge_*` and Aegis `aegis_stages_1_4_*` call paid APIs. Keys are read from the environment
+or `aegis-memory-main/.env` **only** (never hardcoded):
+
+```
+OPENAI_API_KEY=sk-...
+ANTHROPIC_API_KEY=sk-ant-...
+```
+
+If a key is absent, that system is reported `not_run` (the run continues). Responses are cached
+under `cache/` keyed by `(system_id, model_id, sha256(prompt))`, so **re-runs never re-bill**.
+
+## Run
+
+```bash
+# Smoke test (20 items/dataset) — validates wiring end to end:
+python benchmarks/injection/run_benchmark.py --limit 20
+
+# Full run:
+python benchmarks/injection/run_benchmark.py
+
+# Subsets:
+python benchmarks/injection/run_benchmark.py --systems aegis_stages_1_3,naive_regex
+python benchmarks/injection/run_benchmark.py --datasets deepset,benign_synth
+```
+
+### Expected runtime (CPU-only laptop, full corpora)
+
+| Stage | Cost |
+|---|---|
+| `no_protection`, `naive_regex`, `aegis_stages_1_3` | seconds (deterministic) |
+| `protectai_deberta`, `llm_guard` | a few minutes (CPU inference) |
+| `llm_judge_*`, `aegis_stages_1_4_*` | API-bound; ~$1–2 total once, then cache-served |
+
+## Outputs
+
+- `results/results.json` — full machine-readable results: every system × dataset, confusion
+  matrices, P/R/F1/FPR/accuracy, latencies, bootstrap CIs, the Aegis stage ablation, dataset
+  revisions, model versions, seed, timestamp, cache stats.
+- `results/error_analysis.md` — false negatives (missed injections, categorized) + a sample of
+  false positives (benign flagged).
+- `cache/` — LLM response cache (git-ignored).
+
+## Files
+
+| File | Purpose |
+|---|---|
+| `datasets.py` | 4 dataset loaders, pinned revisions, graceful missing-source handling |
+| `systems.py` | `predict(text)->bool` adapters, response cache, per-stage attribution |
+| `metrics.py` | confusion matrix, P/R/F1/FPR/accuracy, bootstrap CIs, stage ablation |
+| `run_benchmark.py` | orchestrator: loads `.env`, runs systems × datasets, writes results |
+| `_paths.py` | puts `server/` + repo root on `sys.path` (mirrors `tests/conftest.py`) |
+
+## Reproducibility notes
+
+- All subsampling uses **seed 42**; exact counts and resolved dataset revisions are recorded in
+  `results.json`.
+- `aegis_stages_1_4_*` forces Stage 4 on every item via `trust_level="untrusted"` so the ablation
+  can measure Stage 4's standalone contribution. **Production gates Stage 4 conditionally** — this
+  is a measurement choice, stated in `results.json["meta"]` and the writeup.
+- Detection logic is **never reimplemented**: Aegis systems call the real
+  `ContentSecurityScanner.scan` / `.scan_async` from `server/content_security.py`.
diff --git a/benchmarks/injection/__init__.py b/benchmarks/injection/__init__.py
@@ -0,0 +1,9 @@
+"""Research-grade prompt-injection detection benchmark for Aegis Memory.
+
+Evaluates the Aegis four-stage content-security pipeline
+(``server/content_security.py``) as a prompt-injection / memory-poisoning
+detector, against established baselines, with full confusion-matrix metrics
+and a per-stage ablation.
+
+See ``README.md`` for how to reproduce.
+"""
diff --git a/benchmarks/injection/_paths.py b/benchmarks/injection/_paths.py
@@ -0,0 +1,27 @@
+"""Import-path bootstrap for the injection benchmark.
+
+The Aegis server modules use *bare* imports (``from content_security import
+...``) and expect ``<repo>/server`` on ``sys.path`` (see ``tests/conftest.py``).
+The ``aegis_memory`` package lives at the repo root. Importing this module
+makes both importable without installing the server, so the benchmark can call
+the real ``ContentSecurityScanner`` rather than reimplementing detection logic.
+"""
+
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+# benchmarks/injection/_paths.py -> repo root is two parents up.
+REPO_ROOT = Path(__file__).resolve().parents[2]
+SERVER_DIR = REPO_ROOT / "server"
+
+
+def ensure_paths() -> None:
+    """Prepend repo root and server/ to sys.path (idempotent)."""
+    for p in (str(SERVER_DIR), str(REPO_ROOT)):
+        if p not in sys.path:
+            sys.path.insert(0, p)
+
+
+ensure_paths()