Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/pip-audit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: pip-audit (shipped deps)

# Audits ONLY the shipped-library dependency surface (server/requirements.txt and the
# pyproject.toml core + [server] deps) against the OSV database. Benchmark/dev-only deps
# (benchmarks/injection/requirements.txt) are intentionally NOT gated here — their residual
# advisories are triaged and accepted in docs/security/vuln-triage.md. This job is blocking:
# it fails the PR on any NEW vulnerability reachable by library users.

on:
pull_request:
branches: [main]
push:
branches: [main]

permissions:
contents: read

jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4.3.1
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: "3.12"
- name: Install pip-audit
run: pip install pip-audit
- name: Audit server/requirements.txt
run: python -m pip_audit -r server/requirements.txt
- name: Audit installed shipped package (pyproject core + [server])
# Resolves the real shipped tree from pyproject.toml so transitive deps not pinned in
# server/requirements.txt are covered too. No --ignore-vuln: this surface is clean today.
run: |
pip install -e ".[server]"
python -m pip_audit
Comment on lines +34 to +35
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Audit only the shipped dependency set

When a CVE is reported in pip-audit itself or one of its runner-only dependencies, this no-argument python -m pip_audit will fail the job even though aegis-memory[server] is still clean. The pip-audit CLI documents the no-input form as auditing the current Python environment, and this workflow installs pip-audit into that same environment just before this step, so the new blocking gate is broader than the shipped dependency surface it claims to enforce.

Useful? React with 👍 / 👎.

13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,19 @@ Benchmarked on 8 vCPU / 7.6 GB RAM (Intel 13th Gen), 1000 memories, Docker Compo

> Query tail latency (p95/p99) is dominated by the external OpenAI embedding call, not Aegis or PostgreSQL. Write and vote operations that skip embedding are consistently under 100ms at p50.

## Security benchmark

Does the [4-stage content security pipeline](#built-for-a-world-where-agents-get-compromised) actually catch prompt injection? We measured it as a detector against five baselines (DeBERTa, LLM Guard, an LLM judge, and more) on labelled injection + benign corpora — with full confusion-matrix metrics, a per-stage ablation, and an honest error analysis. **The false-positive rate is reported next to recall everywhere** — a blocker that flags everything is useless.

| Aegis configuration | Recall | FPR | Median latency |
|-----------------------------------------|------:|-----:|---------------:|
| Stages 1–3 (deterministic, no API call) | 0.14 | 0.00 | 46 µs |
| Stages 1–4 (+ LLM classifier) | 0.67 | 0.00 | 1.2 s |

> `deepset/prompt-injections`, direct injection (N=662). The free deterministic core adds **zero** false positives here and across 1,500 benign memory snippets (1 FP); the optional LLM stage trades ~1s of latency for a 4.6× recall gain. Stage 2 (PII) contributes ~0 to injection recall by design — it's a different threat category.

→ **Full results, ablation, baselines, latency, and limitations: [`docs/security/benchmark.md`](docs/security/benchmark.md)** · reproduce with `python benchmarks/injection/run_benchmark.py`.

## Deployment

### Docker Compose
Expand Down
8 changes: 8 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,11 @@ Include: affected version, reproduction steps, and impact assessment.

For deeper security architecture (4-stage content pipeline, HMAC-SHA256 integrity,
OWASP 4-tier trust hierarchy), see [docs/guides/security.mdx](docs/guides/security.mdx).

## Dependency Vulnerabilities

The shipped-library dependency surface is audited against OSV in CI
([`.github/workflows/pip-audit.yml`](.github/workflows/pip-audit.yml)) and currently reports
**zero known vulnerabilities**. For the full triage — including the benchmark/dev-only residual
that is documented and accepted (never shipped to PyPI users) — see
[docs/security/vuln-triage.md](docs/security/vuln-triage.md).
17 changes: 11 additions & 6 deletions benchmarks/injection/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@

# --- Core (always needed) ---
numpy>=1.26,<2.0 # metrics + bootstrap resampling
python-dotenv==1.0.1 # load OPENAI_/ANTHROPIC_ keys from aegis-memory-main/.env
python-dotenv==1.2.2 # load OPENAI_/ANTHROPIC_ keys; >=1.2.2 clears CVE-2026-28684

# --- Datasets ---
datasets==2.19.1 # deepset/prompt-injections, databricks-dolly-15k
huggingface-hub==0.23.4
huggingface-hub==0.30.2 # >=0.30 required by transformers>=4.53; datasets 2.19.1 allows it
requests>=2.31.0 # InjecAgent raw fetch (best-effort)

# --- ML baseline: protectai_deberta AND framework baseline: llm_guard ---
Expand All @@ -21,11 +21,16 @@ requests>=2.31.0 # InjecAgent raw fetch (best-effort)
# (deberta-v3 text-classification works across this transformers range too).
# CPU torch wheel is large (~200MB); install takes a few minutes.
# IMPORTANT: transformers 5.x breaks llm-guard 0.3.x (import error), and
# llm-guard 0.3.15 requires torch>=2.4 — so cap transformers<5 and let llm-guard
# pull a compatible torch. deberta-v3 text-classification works in this range.
# llm-guard 0.3.15 requires torch>=2.4 and transformers>=4.43.4 — so cap
# transformers<5 and let llm-guard pull a compatible torch. deberta-v3
# text-classification works in this range.
# Security floor: >=4.53.0 clears every transformers advisory that has a fix
# below 5.x (CVE-2024-12720, CVE-2025-1194/3263/3264/3777/3933/5197/6051/6638/6921,
# PYSEC-2024-227/228/229, PYSEC-2025-40). The remaining advisories have no <5 fix
# and are documented as accepted benchmark-only risk in docs/security/vuln-triage.md.
torch>=2.4
transformers>=4.41,<5
sentencepiece==0.2.0 # deberta-v3 tokenizer needs this
transformers>=4.53.0,<5
sentencepiece==0.2.1 # deberta-v3 tokenizer needs this; >=0.2.1 clears CVE-2026-1260
llm-guard==0.3.15
# If the resolver still cannot satisfy llm-guard on your platform, drop it and
# rerun — the benchmark marks `llm_guard` as "not run" and proceeds.
Expand Down
112 changes: 112 additions & 0 deletions docs/security/vuln-triage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Dependency vulnerability triage

_Audited with `pip-audit` 2.10.0 (OSV) on 2026-06-02. Ground truth for this PR; the OpenSSF
Scorecard viewer refreshes on its own schedule after merge._

## Headline

**Zero known vulnerabilities in the shipped library.** Every advisory OSV reports for this repo
lives in **benchmark-only dev tooling** (`benchmarks/injection/requirements.txt`), which is never
installed by people who `pip install aegis-memory`. Before this PR those advisories spanned
**3 distinct packages (28 advisory instances)**; after conservative bumps the residual is
**1 package (9 advisories), all in `transformers`, with no fix available below the major version
that breaks the benchmark's `llm-guard` dependency.**

| Surface | Manifest | Before | After |
|---|---|--:|--:|
| Shipped library | `server/requirements.txt` | 0 | 0 |
| Shipped library | `pyproject.toml` (core + `[server]`) | 0 | 0 |
| Benchmark / dev-only | `benchmarks/injection/requirements.txt` | 3 pkgs / 28 | **1 pkg / 9** |

The shipped surface was already clean thanks to the transitive security floors in
`server/requirements.txt` (`idna>=3.15`, `pygments>=2.20.0`, `tqdm>=4.66.3`). It is now also
gated in CI by [`.github/workflows/pip-audit.yml`](../../.github/workflows/pip-audit.yml) so a new
shipped-dependency vulnerability fails the build.

> **Note on the Scorecard count.** The public viewer has shown ~53 OSV advisories. That number
> counts *every advisory ID* across the fuller tree Scorecard resolves — including the duplicate
> IDs `pip-audit` also emits (e.g. `PYSEC-2024-227/228/229` were each listed twice) and the
> `PYSEC-2025-211..218` cluster, which is **one package**, not eight. The number that actually
> matters is **distinct shipped-dependency packages needing a fix: zero.**

## Manifests scanned

| Manifest | Role |
|---|---|
| `server/requirements.txt` | Shipped library runtime deps (PyPI install surface) |
| `pyproject.toml` (`dependencies`, `[server]`) | Shipped library / server extra |
| `benchmarks/injection/requirements.txt` | Benchmark-only dev tooling (transformers, torch, datasets, llm-guard, …) — not shipped |

No `setup.py`, `poetry.lock`, or other lockfiles exist in the repo.

## Triage table (one row per distinct package)

| Package | Version (before → after) | Manifest | Advisories (grouped) | Fix available | Safe bump? | Action |
|---|---|---|---|---|---|---|
| `python-dotenv` | `1.0.1` → `1.2.2` | benchmark-only | CVE-2026-28684 | yes (`1.2.2`) | yes — API-compatible | **Bumped** |
| `sentencepiece` | `0.2.0` → `0.2.1` | benchmark-only | CVE-2026-1260 | yes (`0.2.1`) | yes — patch; deberta-v3 tokenizer unaffected | **Bumped** |
| `transformers` | `4.46.3` → `4.53.3` (floor `>=4.41,<5` → `>=4.53.0,<5`) | benchmark-only | 14 with a `<5` fix · 8 no-fix (`PYSEC-2025-211..218`) · 1 needing 5.x (`CVE-2026-1839`) | partial | bump to highest `<5`; rest unbumpable | **Bumped (partial)** + residual documented below |
| `huggingface-hub` | `0.23.4` → `0.30.2` | benchmark-only | none (compat bump) | n/a | yes — required by `transformers>=4.53`; `datasets==2.19.1` allows it | **Bumped (to satisfy transformers)** |

### transformers advisories cleared by the `>=4.53.0` floor (14)

`PYSEC-2024-227`, `PYSEC-2024-228`, `PYSEC-2024-229` (4.48.0) · `PYSEC-2025-40` (4.49.0) ·
`CVE-2024-12720` (4.48.0) · `CVE-2025-1194` (4.50.0) · `CVE-2025-3263`, `CVE-2025-3264` (4.51.0) ·
`CVE-2025-3777`, `CVE-2025-3933` (4.52.1) · `CVE-2025-5197`, `CVE-2025-6638`, `CVE-2025-6051`,
`CVE-2025-6921` (4.53.0).

## Known unfixable / accepted residual

All residual is **benchmark-only** dev tooling in `transformers 4.53.3`. It is **not reachable by
library users** — `transformers` is not a dependency of `aegis-memory` or its `[server]` extra; it
is installed only by someone running the injection benchmark in an isolated venv. Risk to shipped
users: **none.**

| Advisory | Why it can't be bumped | Reachability |
|---|---|---|
| `PYSEC-2025-211` | No fixed version published in OSV (no `<5` patch) | benchmark-only |
| `PYSEC-2025-212` | No fixed version published in OSV | benchmark-only |
| `PYSEC-2025-213` | No fixed version published in OSV | benchmark-only |
| `PYSEC-2025-214` | No fixed version published in OSV | benchmark-only |
| `PYSEC-2025-215` | No fixed version published in OSV | benchmark-only |
| `PYSEC-2025-216` | No fixed version published in OSV | benchmark-only |
| `PYSEC-2025-217` | No fixed version published in OSV | benchmark-only |
| `PYSEC-2025-218` | No fixed version published in OSV | benchmark-only |
| `CVE-2026-1839` | Fix only in `5.0.0rc3`; `transformers 5.x` breaks `llm-guard 0.3.15` (the benchmark's `<5` ceiling) | benchmark-only |

### Deliberate ignore list

If/when `pip-audit` is run over the benchmark manifest in tooling, the residual is suppressed
*explicitly* (a reviewed decision, not an oversight):

```
python -m pip_audit -r benchmarks/injection/requirements.txt `
--ignore-vuln PYSEC-2025-211 --ignore-vuln PYSEC-2025-212 `
--ignore-vuln PYSEC-2025-213 --ignore-vuln PYSEC-2025-214 `
--ignore-vuln PYSEC-2025-215 --ignore-vuln PYSEC-2025-216 `
--ignore-vuln PYSEC-2025-217 --ignore-vuln PYSEC-2025-218 `
--ignore-vuln CVE-2026-1839
```

The shipped-deps CI job (`.github/workflows/pip-audit.yml`) needs **no** ignore list — that surface
is clean — and intentionally does **not** audit the benchmark manifest, so the accepted residual
above never blocks a merge.

## Proposal for the maintainer (not done in this PR)

A large majority of OSV signal for this repo comes from benchmark-only tooling. To make attribution
unambiguous, the benchmark extras could be moved into an isolated optional-dependency group, e.g.
`[project.optional-dependencies] benchmark = [...]` in `pyproject.toml`, installed via
`pip install aegis-memory[benchmark]`. This is **clarity of attribution**, not concealment —
Scorecard may still scan any manifest in the repo. Flagged here for a maintainer decision; the
dependency layout is intentionally **not** restructured in this PR.

## Verification performed

1. `python -m pip_audit -r server/requirements.txt` → `No known vulnerabilities found`.
2. `python -m pip_audit` over the `pyproject.toml` core + `[server]` resolved tree → `No known vulnerabilities found`.
3. `python -m pip_audit -r benchmarks/injection/requirements.txt` → 9 advisories, all the documented
`transformers` residual above (down from 28 across 3 packages).
4. `python -m pytest tests/` → 493 passed, 2 skipped (the only errors are `asyncpg` connection
failures from tests that need a live Postgres, which CI provides via its `postgres` service;
unrelated to the dependency bumps, which touch no shipped code).
Loading