diff --git a/.github/workflows/pip-audit.yml b/.github/workflows/pip-audit.yml new file mode 100644 index 0000000..74a38a7 --- /dev/null +++ b/.github/workflows/pip-audit.yml @@ -0,0 +1,35 @@ +name: pip-audit (shipped deps) + +# Audits ONLY the shipped-library dependency surface (server/requirements.txt and the +# pyproject.toml core + [server] deps) against the OSV database. Benchmark/dev-only deps +# (benchmarks/injection/requirements.txt) are intentionally NOT gated here — their residual +# advisories are triaged and accepted in docs/security/vuln-triage.md. This job is blocking: +# it fails the PR on any NEW vulnerability reachable by library users. + +on: + pull_request: + branches: [main] + push: + branches: [main] + +permissions: + contents: read + +jobs: + audit: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1 + - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0 + with: + python-version: "3.12" + - name: Install pip-audit + run: pip install pip-audit + - name: Audit server/requirements.txt + run: python -m pip_audit -r server/requirements.txt + - name: Audit installed shipped package (pyproject core + [server]) + # Resolves the real shipped tree from pyproject.toml so transitive deps not pinned in + # server/requirements.txt are covered too. No --ignore-vuln: this surface is clean today. + run: | + pip install -e ".[server]" + python -m pip_audit diff --git a/README.md b/README.md index 85c3f8a..b018608 100644 --- a/README.md +++ b/README.md @@ -307,6 +307,19 @@ Benchmarked on 8 vCPU / 7.6 GB RAM (Intel 13th Gen), 1000 memories, Docker Compo > Query tail latency (p95/p99) is dominated by the external OpenAI embedding call, not Aegis or PostgreSQL. Write and vote operations that skip embedding are consistently under 100ms at p50. +## Security benchmark + +Does the [4-stage content security pipeline](#built-for-a-world-where-agents-get-compromised) actually catch prompt injection? We measured it as a detector against five baselines (DeBERTa, LLM Guard, an LLM judge, and more) on labelled injection + benign corpora — with full confusion-matrix metrics, a per-stage ablation, and an honest error analysis. **The false-positive rate is reported next to recall everywhere** — a blocker that flags everything is useless. + +| Aegis configuration | Recall | FPR | Median latency | +|-----------------------------------------|------:|-----:|---------------:| +| Stages 1–3 (deterministic, no API call) | 0.14 | 0.00 | 46 µs | +| Stages 1–4 (+ LLM classifier) | 0.67 | 0.00 | 1.2 s | + +> `deepset/prompt-injections`, direct injection (N=662). The free deterministic core adds **zero** false positives here and across 1,500 benign memory snippets (1 FP); the optional LLM stage trades ~1s of latency for a 4.6× recall gain. Stage 2 (PII) contributes ~0 to injection recall by design — it's a different threat category. + +→ **Full results, ablation, baselines, latency, and limitations: [`docs/security/benchmark.md`](docs/security/benchmark.md)** · reproduce with `python benchmarks/injection/run_benchmark.py`. + ## Deployment ### Docker Compose diff --git a/SECURITY.md b/SECURITY.md index a87e203..ed546d6 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -28,3 +28,11 @@ Include: affected version, reproduction steps, and impact assessment. For deeper security architecture (4-stage content pipeline, HMAC-SHA256 integrity, OWASP 4-tier trust hierarchy), see [docs/guides/security.mdx](docs/guides/security.mdx). + +## Dependency Vulnerabilities + +The shipped-library dependency surface is audited against OSV in CI +([`.github/workflows/pip-audit.yml`](.github/workflows/pip-audit.yml)) and currently reports +**zero known vulnerabilities**. For the full triage — including the benchmark/dev-only residual +that is documented and accepted (never shipped to PyPI users) — see +[docs/security/vuln-triage.md](docs/security/vuln-triage.md). diff --git a/benchmarks/injection/requirements.txt b/benchmarks/injection/requirements.txt index f1f87fa..2854681 100644 --- a/benchmarks/injection/requirements.txt +++ b/benchmarks/injection/requirements.txt @@ -8,11 +8,11 @@ # --- Core (always needed) --- numpy>=1.26,<2.0 # metrics + bootstrap resampling -python-dotenv==1.0.1 # load OPENAI_/ANTHROPIC_ keys from aegis-memory-main/.env +python-dotenv==1.2.2 # load OPENAI_/ANTHROPIC_ keys; >=1.2.2 clears CVE-2026-28684 # --- Datasets --- datasets==2.19.1 # deepset/prompt-injections, databricks-dolly-15k -huggingface-hub==0.23.4 +huggingface-hub==0.30.2 # >=0.30 required by transformers>=4.53; datasets 2.19.1 allows it requests>=2.31.0 # InjecAgent raw fetch (best-effort) # --- ML baseline: protectai_deberta AND framework baseline: llm_guard --- @@ -21,11 +21,16 @@ requests>=2.31.0 # InjecAgent raw fetch (best-effort) # (deberta-v3 text-classification works across this transformers range too). # CPU torch wheel is large (~200MB); install takes a few minutes. # IMPORTANT: transformers 5.x breaks llm-guard 0.3.x (import error), and -# llm-guard 0.3.15 requires torch>=2.4 — so cap transformers<5 and let llm-guard -# pull a compatible torch. deberta-v3 text-classification works in this range. +# llm-guard 0.3.15 requires torch>=2.4 and transformers>=4.43.4 — so cap +# transformers<5 and let llm-guard pull a compatible torch. deberta-v3 +# text-classification works in this range. +# Security floor: >=4.53.0 clears every transformers advisory that has a fix +# below 5.x (CVE-2024-12720, CVE-2025-1194/3263/3264/3777/3933/5197/6051/6638/6921, +# PYSEC-2024-227/228/229, PYSEC-2025-40). The remaining advisories have no <5 fix +# and are documented as accepted benchmark-only risk in docs/security/vuln-triage.md. torch>=2.4 -transformers>=4.41,<5 -sentencepiece==0.2.0 # deberta-v3 tokenizer needs this +transformers>=4.53.0,<5 +sentencepiece==0.2.1 # deberta-v3 tokenizer needs this; >=0.2.1 clears CVE-2026-1260 llm-guard==0.3.15 # If the resolver still cannot satisfy llm-guard on your platform, drop it and # rerun — the benchmark marks `llm_guard` as "not run" and proceeds. diff --git a/docs/security/vuln-triage.md b/docs/security/vuln-triage.md new file mode 100644 index 0000000..d4facd2 --- /dev/null +++ b/docs/security/vuln-triage.md @@ -0,0 +1,112 @@ +# Dependency vulnerability triage + +_Audited with `pip-audit` 2.10.0 (OSV) on 2026-06-02. Ground truth for this PR; the OpenSSF +Scorecard viewer refreshes on its own schedule after merge._ + +## Headline + +**Zero known vulnerabilities in the shipped library.** Every advisory OSV reports for this repo +lives in **benchmark-only dev tooling** (`benchmarks/injection/requirements.txt`), which is never +installed by people who `pip install aegis-memory`. Before this PR those advisories spanned +**3 distinct packages (28 advisory instances)**; after conservative bumps the residual is +**1 package (9 advisories), all in `transformers`, with no fix available below the major version +that breaks the benchmark's `llm-guard` dependency.** + +| Surface | Manifest | Before | After | +|---|---|--:|--:| +| Shipped library | `server/requirements.txt` | 0 | 0 | +| Shipped library | `pyproject.toml` (core + `[server]`) | 0 | 0 | +| Benchmark / dev-only | `benchmarks/injection/requirements.txt` | 3 pkgs / 28 | **1 pkg / 9** | + +The shipped surface was already clean thanks to the transitive security floors in +`server/requirements.txt` (`idna>=3.15`, `pygments>=2.20.0`, `tqdm>=4.66.3`). It is now also +gated in CI by [`.github/workflows/pip-audit.yml`](../../.github/workflows/pip-audit.yml) so a new +shipped-dependency vulnerability fails the build. + +> **Note on the Scorecard count.** The public viewer has shown ~53 OSV advisories. That number +> counts *every advisory ID* across the fuller tree Scorecard resolves — including the duplicate +> IDs `pip-audit` also emits (e.g. `PYSEC-2024-227/228/229` were each listed twice) and the +> `PYSEC-2025-211..218` cluster, which is **one package**, not eight. The number that actually +> matters is **distinct shipped-dependency packages needing a fix: zero.** + +## Manifests scanned + +| Manifest | Role | +|---|---| +| `server/requirements.txt` | Shipped library runtime deps (PyPI install surface) | +| `pyproject.toml` (`dependencies`, `[server]`) | Shipped library / server extra | +| `benchmarks/injection/requirements.txt` | Benchmark-only dev tooling (transformers, torch, datasets, llm-guard, …) — not shipped | + +No `setup.py`, `poetry.lock`, or other lockfiles exist in the repo. + +## Triage table (one row per distinct package) + +| Package | Version (before → after) | Manifest | Advisories (grouped) | Fix available | Safe bump? | Action | +|---|---|---|---|---|---|---| +| `python-dotenv` | `1.0.1` → `1.2.2` | benchmark-only | CVE-2026-28684 | yes (`1.2.2`) | yes — API-compatible | **Bumped** | +| `sentencepiece` | `0.2.0` → `0.2.1` | benchmark-only | CVE-2026-1260 | yes (`0.2.1`) | yes — patch; deberta-v3 tokenizer unaffected | **Bumped** | +| `transformers` | `4.46.3` → `4.53.3` (floor `>=4.41,<5` → `>=4.53.0,<5`) | benchmark-only | 14 with a `<5` fix · 8 no-fix (`PYSEC-2025-211..218`) · 1 needing 5.x (`CVE-2026-1839`) | partial | bump to highest `<5`; rest unbumpable | **Bumped (partial)** + residual documented below | +| `huggingface-hub` | `0.23.4` → `0.30.2` | benchmark-only | none (compat bump) | n/a | yes — required by `transformers>=4.53`; `datasets==2.19.1` allows it | **Bumped (to satisfy transformers)** | + +### transformers advisories cleared by the `>=4.53.0` floor (14) + +`PYSEC-2024-227`, `PYSEC-2024-228`, `PYSEC-2024-229` (4.48.0) · `PYSEC-2025-40` (4.49.0) · +`CVE-2024-12720` (4.48.0) · `CVE-2025-1194` (4.50.0) · `CVE-2025-3263`, `CVE-2025-3264` (4.51.0) · +`CVE-2025-3777`, `CVE-2025-3933` (4.52.1) · `CVE-2025-5197`, `CVE-2025-6638`, `CVE-2025-6051`, +`CVE-2025-6921` (4.53.0). + +## Known unfixable / accepted residual + +All residual is **benchmark-only** dev tooling in `transformers 4.53.3`. It is **not reachable by +library users** — `transformers` is not a dependency of `aegis-memory` or its `[server]` extra; it +is installed only by someone running the injection benchmark in an isolated venv. Risk to shipped +users: **none.** + +| Advisory | Why it can't be bumped | Reachability | +|---|---|---| +| `PYSEC-2025-211` | No fixed version published in OSV (no `<5` patch) | benchmark-only | +| `PYSEC-2025-212` | No fixed version published in OSV | benchmark-only | +| `PYSEC-2025-213` | No fixed version published in OSV | benchmark-only | +| `PYSEC-2025-214` | No fixed version published in OSV | benchmark-only | +| `PYSEC-2025-215` | No fixed version published in OSV | benchmark-only | +| `PYSEC-2025-216` | No fixed version published in OSV | benchmark-only | +| `PYSEC-2025-217` | No fixed version published in OSV | benchmark-only | +| `PYSEC-2025-218` | No fixed version published in OSV | benchmark-only | +| `CVE-2026-1839` | Fix only in `5.0.0rc3`; `transformers 5.x` breaks `llm-guard 0.3.15` (the benchmark's `<5` ceiling) | benchmark-only | + +### Deliberate ignore list + +If/when `pip-audit` is run over the benchmark manifest in tooling, the residual is suppressed +*explicitly* (a reviewed decision, not an oversight): + +``` +python -m pip_audit -r benchmarks/injection/requirements.txt ` + --ignore-vuln PYSEC-2025-211 --ignore-vuln PYSEC-2025-212 ` + --ignore-vuln PYSEC-2025-213 --ignore-vuln PYSEC-2025-214 ` + --ignore-vuln PYSEC-2025-215 --ignore-vuln PYSEC-2025-216 ` + --ignore-vuln PYSEC-2025-217 --ignore-vuln PYSEC-2025-218 ` + --ignore-vuln CVE-2026-1839 +``` + +The shipped-deps CI job (`.github/workflows/pip-audit.yml`) needs **no** ignore list — that surface +is clean — and intentionally does **not** audit the benchmark manifest, so the accepted residual +above never blocks a merge. + +## Proposal for the maintainer (not done in this PR) + +A large majority of OSV signal for this repo comes from benchmark-only tooling. To make attribution +unambiguous, the benchmark extras could be moved into an isolated optional-dependency group, e.g. +`[project.optional-dependencies] benchmark = [...]` in `pyproject.toml`, installed via +`pip install aegis-memory[benchmark]`. This is **clarity of attribution**, not concealment — +Scorecard may still scan any manifest in the repo. Flagged here for a maintainer decision; the +dependency layout is intentionally **not** restructured in this PR. + +## Verification performed + +1. `python -m pip_audit -r server/requirements.txt` → `No known vulnerabilities found`. +2. `python -m pip_audit` over the `pyproject.toml` core + `[server]` resolved tree → `No known vulnerabilities found`. +3. `python -m pip_audit -r benchmarks/injection/requirements.txt` → 9 advisories, all the documented + `transformers` residual above (down from 28 across 3 packages). +4. `python -m pytest tests/` → 493 passed, 2 skipped (the only errors are `asyncpg` connection + failures from tests that need a live Postgres, which CI provides via its `postgres` service; + unrelated to the dependency bumps, which touch no shipped code).