Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
814f451
chore: add mission infrastructure for multi-signal correlation (v1.3)
lance0 Mar 19, 2026
f813cea
feat: add migration 007 (signal_groups) and CorrelationConfig infrast…
lance0 Mar 19, 2026
833c0f7
feat: add core correlation engine with signal group management, weigh…
lance0 Mar 19, 2026
f6e6109
feat: integrate correlation engine into event ingestion flow
lance0 Mar 19, 2026
f502efa
feat: add signal groups API endpoints (GET /v1/signal-groups, GET /v1…
lance0 Mar 19, 2026
47834ba
docs: update documentation for correlation engine (API, config, chang…
lance0 Mar 19, 2026
12b8984
chore: add scrutiny validation for correlation-engine milestone
lance0 Mar 19, 2026
104bb7e
fix: resolve signal group only after mitigation is confirmed, add uni…
lance0 Mar 19, 2026
0d9fb08
chore: add user testing validation for correlation-engine milestone (…
lance0 Mar 19, 2026
633f246
feat: add Alertmanager webhook adapter (POST /v1/signals/alertmanager)
lance0 Mar 19, 2026
d52f43f
feat: add FastNetMon webhook adapter (POST /v1/signals/fastnetmon)
lance0 Mar 19, 2026
309ccfe
feat: add correlation config API endpoints (GET/PUT /v1/config/correl…
lance0 Mar 19, 2026
3148638
feat: add signal adapter E2E tests and FastNetMon API docs
lance0 Mar 19, 2026
c462f93
chore: add scrutiny validation for signal-adapters milestone (4/4 rev…
lance0 Mar 19, 2026
24f94a0
chore: add user testing validation for signal-adapters milestone (20/…
lance0 Mar 19, 2026
12a6552
chore: add user testing validation round 2 for signal-adapters milest…
lance0 Mar 19, 2026
2c14447
fix: handle concurrent signal group inserts and clean up list endpoin…
lance0 Mar 19, 2026
076219b
chore: add scrutiny validation for misc-fixes-1 milestone (1/1 review…
lance0 Mar 19, 2026
2125400
chore: add user testing validation for misc-fixes-1 milestone (0 asse…
lance0 Mar 19, 2026
d23a53b
feat: add Correlation dashboard page with Signals, Groups, and Config…
lance0 Mar 19, 2026
c24514d
feat: add signal group detail page at /correlation/groups/[id]
lance0 Mar 19, 2026
4dfa953
feat: add correlation section to mitigation detail page
lance0 Mar 19, 2026
58ccb43
chore: add scrutiny validation for correlation-dashboard milestone (3…
lance0 Mar 19, 2026
dfac191
fix: address correlation-dashboard scrutiny review issues
lance0 Mar 19, 2026
9608e7e
chore: add user testing validation for correlation-dashboard mileston…
lance0 Mar 19, 2026
515f45f
chore: remove .factory from git tracking and add to .gitignore
lance0 Mar 19, 2026
4302df8
docs: update all documentation for multi-signal correlation feature
lance0 Mar 19, 2026
f597c4e
docs: add upgrade guide, deployment docs for correlation and signal a…
lance0 Mar 20, 2026
e387bda
fix: make configs mount writable by default in docker-compose
lance0 Mar 20, 2026
6924851
chore: track default correlation.yaml config, gitignore .bak files
lance0 Mar 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
lab/clab-*/
.claude/
.factory/
*.yaml.bak
36 changes: 24 additions & 12 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ src/
├── auth/ # AuthBackend (axum-login), mode-aware auth (none/bearer/credentials/mtls)
├── bgp/ # FlowSpecAnnouncer trait, GoBGP gRPC client, mock
├── config/ # Settings, Inventory, Playbooks (YAML parsing)
├── correlation/ # Multi-signal correlation engine (config, engine, signal groups)
├── db/ # PostgreSQL repository with sqlx + MockRepository for testing
├── domain/ # Core types: AttackEvent, Mitigation, FlowSpecRule
├── guardrails/ # Validation, quotas, safelist protection
Expand All @@ -70,7 +71,8 @@ frontend/
│ │ ├── audit-log/ # Audit trail
│ │ ├── config/ # Settings (JSON) + Playbooks (cards) + hot-reload
│ │ ├── admin/ # Tabbed: System Status, Safelist CRUD, User management
│ │ └── ip-history/ # IP history timeline with search
│ │ ├── ip-history/ # IP history timeline with search
│ │ └── correlation/ # Correlation dashboard (Signals, Groups, Config tabs) + group detail
│ ├── login/ # Login page (outside auth guard)
│ ├── globals.css # Light + dark theme variables
│ └── layout.tsx # Root layout with ThemeProvider + Toaster
Expand All @@ -92,17 +94,18 @@ frontend/
├── vitest.config.ts # Vitest config (jsdom, react plugin, @ alias)
└── vitest.setup.ts # jest-dom matchers

configs/ # prefixd.yaml, inventory.yaml, playbooks.yaml, nginx.conf, gobgp.conf
configs/ # prefixd.yaml, inventory.yaml, playbooks.yaml, correlation.yaml, nginx.conf, gobgp.conf
docs/
├── api.md # Full API reference with examples
├── deployment.md # Docker + nginx deployment guide
└── adr/ # 17 Architecture Decision Records (001-017)
├── configuration.md # Full configuration reference
└── adr/ # 19 Architecture Decision Records (001-019)
grafana/ # Prometheus config, Grafana provisioning, dashboard JSON
tests/
├── integration.rs # 44 integration tests (health, config, mitigations, events, filters, bulk withdraw, cursor pagination, bulk acknowledge, per-dest routing, preferences, event batch, incident reports)
├── integration_e2e.rs # 6 end-to-end tests (ignored without Docker)
├── integration.rs # 99 integration tests (health, config, mitigations, events, filters, bulk withdraw, cursor pagination, bulk acknowledge, per-dest routing, preferences, event batch, incident reports, signal groups, correlation, signal adapters)
├── integration_e2e.rs # 9 end-to-end tests (ignored without Docker)
├── integration_gobgp.rs # 8 tests (GoBGP integration, ignored without GoBGP)
└── integration_postgres.rs # 9 integration tests (Postgres-backed flows)
└── integration_postgres.rs # 16 integration tests (Postgres-backed flows, signal groups)
```

## Key Design Decisions
Expand All @@ -119,7 +122,7 @@ tests/
10. **Route-group auth guard** - Next.js `(dashboard)/layout.tsx` wraps all protected pages
11. **Mode-aware auth** - `none`/`bearer`/`credentials`/`mtls` with role checks on protected endpoints

See `docs/adr/` for all 17 Architecture Decision Records.
See `docs/adr/` for all 19 Architecture Decision Records.

## API Endpoints

Expand Down Expand Up @@ -157,12 +160,19 @@ See `docs/adr/` for all 17 Architecture Decision Records.
- `GET/POST /v1/operators` - User management (admin only)
- `DELETE /v1/operators/{id}` - Delete user (admin only)
- `PUT /v1/operators/{id}/password` - Change password (admin only)
- `GET /v1/signal-groups` - List signal groups (with pagination, status/vector/date filters)
- `GET /v1/signal-groups/{id}` - Signal group detail with contributing events
- `POST /v1/signals/alertmanager` - Alertmanager webhook adapter (v4 payload)
- `POST /v1/signals/fastnetmon` - FastNetMon webhook adapter (native JSON)
- `GET /v1/config/correlation` - Correlation config (admin, secrets redacted)
- `PUT /v1/config/correlation` - Update correlation config (admin only, writes YAML + hot-reload)

## Data Flow

1. **Event Ingestion** (`POST /v1/events`)
- Validate input, check duplicates
- Lookup IP context from inventory
- Correlate signals (if `correlation.enabled`): find/create signal group, check corroboration
- Evaluate playbook for vector
- Check guardrails (TTL, /32, quotas, safelist)
- Create or extend mitigation
Expand All @@ -184,10 +194,10 @@ See `docs/adr/` for all 17 Architecture Decision Records.
## Testing

```bash
# Backend unit tests (126 tests)
# Backend unit tests (179 tests)
cargo test

# All backend tests including integration (179 runnable: 126 unit + 44 integration + 9 postgres; 14 ignored requiring GoBGP/Docker)
# All backend tests including integration (294 runnable: 179 unit + 99 integration + 16 postgres; 17 ignored requiring GoBGP/Docker)
cargo test --features test-utils

# Lint
Expand All @@ -210,6 +220,7 @@ cargo run -- --config ./configs
- `configs/prefixd.yaml` - Main daemon config
- `configs/inventory.yaml` - Customer/service/IP mapping
- `configs/playbooks.yaml` - Vector → action policies
- `configs/correlation.yaml` - Correlation engine config (sources, weights, thresholds)
- `configs/nginx.conf` - Reverse proxy config
- `configs/gobgp.conf` - GoBGP BGP config

Expand Down Expand Up @@ -243,11 +254,12 @@ Completed:
- Nginx reverse proxy (single-origin deployment)
- ErrorBoundary wrapping all dashboard pages
- Cross-entity navigation (command palette → detail pages, event↔mitigation linking, audit log → mitigations, clickable stat cards)
- 17 Architecture Decision Records
- Multi-signal correlation engine with signal groups, Alertmanager and FastNetMon adapters
- 19 Architecture Decision Records
- CLI tool (prefixdctl) for all API operations
- OpenAPI spec with utoipa annotations
- 126 backend unit tests + 53 integration tests (+ 14 ignored requiring GoBGP/Docker)
- Vitest + Testing Library frontend test infrastructure (34 tests)
- 179 backend unit tests + 99 integration + 16 postgres tests (+ 17 ignored requiring GoBGP/Docker)
- Vitest + Testing Library frontend test infrastructure (64 tests)

## Code Conventions

Expand Down
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,29 @@ All notable changes to prefixd will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- **Multi-signal correlation engine** — Time-windowed grouping of related attack events by (victim_ip, vector) from multiple detection sources. Configurable source weights, corroboration thresholds, and per-playbook overrides. When `correlation.enabled` is true, events are grouped into signal groups and mitigation only triggers when corroboration requirements are met (configurable `min_sources` and `confidence_threshold`). Single-source behavior is preserved with `min_sources=1` (backward compatible). See [ADR 018](docs/adr/018-multi-signal-correlation-engine.md).
- **Signal groups API** — `GET /v1/signal-groups` (list with cursor pagination, status/vector/date filters) and `GET /v1/signal-groups/{id}` (detail with contributing events, source weights, and confidence). Both endpoints require authentication.
- **Correlation context on mitigations** — `GET /v1/mitigations` and `GET /v1/mitigations/{id}` responses include a `correlation` field for correlated mitigations, containing signal_group_id, derived_confidence, source_count, corroboration_met, contributing_sources, and a human-readable explanation.
- **Correlation engine metrics** — `prefixd_signal_groups_total`, `prefixd_signal_group_sources`, `prefixd_correlation_confidence`, `prefixd_corroboration_met_total`, `prefixd_corroboration_timeout_total` Prometheus counters and histograms.
- **Signal group expiry** — Reconciliation loop expires open signal groups whose time window has elapsed, transitioning them to `expired` status.
- **Database migration 007** — `signal_groups` and `signal_group_events` tables, `mitigations.signal_group_id` nullable FK column with indexes.
- **Correlation configuration** — New `correlation` section in `prefixd.yaml` with `enabled`, `window_seconds`, `min_sources`, `confidence_threshold`, `sources` (per-source weight/type), and `default_weight`. Per-playbook `correlation` overrides in `playbooks.yaml`. Hot-reloadable via `POST /v1/config/reload`.
- **Alertmanager webhook adapter** — `POST /v1/signals/alertmanager` accepts Alertmanager v4 webhook payloads. Maps labels/annotations to attack event fields (vector, victim_ip, bps/pps, severity→confidence). Handles batched alerts with per-alert results, resolved alerts (→ withdraw), fingerprint dedup. Returns 400 for malformed payloads (Alertmanager won't retry 4xx). See [ADR 019](docs/adr/019-signal-adapter-architecture.md).
- **FastNetMon webhook adapter** — `POST /v1/signals/fastnetmon` accepts FastNetMon's native JSON notify payload. Classifies attack vector from traffic breakdown (UDP/SYN/ICMP/TCP), maps action type to confidence (ban=0.9, partial_block=0.7, alert=0.5, configurable), uses `attack_uuid` for dedup. Returns `EventResponse` shape for script compatibility.
- **Correlation config API** — `GET /v1/config/correlation` (secrets redacted) and `PUT /v1/config/correlation` (admin only, validates, writes YAML, hot-reloads). Correlation config reloaded alongside inventory/playbooks/alerting on `POST /v1/config/reload`.
- **Signal adapter E2E tests** — 3 end-to-end tests in `tests/integration_e2e.rs` verifying full-stack signal adapter flows through real Postgres and GoBGP: Alertmanager→signal group→mitigation, FastNetMon→signal group→mitigation, multi-source corroboration (FastNetMon + Alertmanager → same group → mitigation with FlowSpec in RIB). Marked `#[ignore]` by default (require Docker).

### Changed

- Backend unit tests increased from 126 to 179 (correlation engine, config parsing, corroboration, explainability, signal adapters)
- Integration tests increased from 44 to 99 (signal group CRUD, correlation flow, concurrent event handling, Alertmanager adapter, FastNetMon adapter, correlation config API)
- Postgres integration tests increased from 9 to 16 (signal group operations)
- Frontend tests increased from 34 to 67 (correlation dashboard, signal group detail, mitigation detail correlation)

## [0.13.0] - 2026-03-19

### Added
Expand Down
71 changes: 69 additions & 2 deletions FEATURES.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,9 @@ Comprehensive list of prefixd capabilities.
Any system that can POST JSON works. Tested with:

- **FastNetMon Community** - Notify script integration ([setup guide](docs/detectors/fastnetmon.md))
- **Prometheus/Alertmanager** - Via webhook receiver
- **Custom scripts** - Simple curl calls
- **FastNetMon** (native webhook) - `POST /v1/signals/fastnetmon` accepts FastNetMon's JSON payload directly, classifies vector from traffic breakdown, configurable confidence mapping
- **Prometheus/Alertmanager** - `POST /v1/signals/alertmanager` accepts Alertmanager v4 webhook payloads, maps labels/annotations to event fields, handles batched alerts with per-alert results
- **Custom scripts** - Simple curl calls to `POST /v1/events`

### Event Schema

Expand Down Expand Up @@ -241,6 +242,72 @@ Real-time events pushed to dashboard:

---

## Multi-Signal Correlation

Combine weak signals from multiple detectors into high-confidence mitigation decisions. Example: FastNetMon reports a UDP flood at 0.6 confidence + Alertmanager fires a bandwidth alert = corroborated high-confidence mitigation.

### Signal Groups

Events targeting the same (victim_ip, vector) within a time window are grouped into a **signal group**. Each group tracks:

- Contributing events from multiple sources
- Derived confidence (weighted by source reliability)
- Corroboration status (whether the `min_sources` threshold is met)
- Source breakdown with per-source confidence and weight

### Corroboration Model

- **min_sources** - Minimum number of distinct sources required before mitigation triggers (default: 1 for backward compatibility)
- **confidence_threshold** - Minimum derived confidence to trigger mitigation
- **Per-playbook overrides** - Different thresholds per attack vector
- **Time-windowed** - Signal groups expire after `window_seconds` if corroboration is not met

### Source Weighting

Each detection source is assigned a weight reflecting its reliability:

```yaml
correlation:
enabled: true
window_seconds: 120
min_sources: 2
confidence_threshold: 0.7
sources:
fastnetmon:
weight: 1.0
type: detector
alertmanager:
weight: 0.8
type: alert
default_weight: 0.5
```

### Explainability

Every correlated mitigation includes a `correlation` field explaining the decision:

- Signal group ID and contributing sources
- Per-source confidence and weight
- Whether corroboration was met
- Human-readable explanation string

### Signal Groups API

- `GET /v1/signal-groups` - List groups with cursor pagination, status/vector/date filters
- `GET /v1/signal-groups/{id}` - Detail with contributing events, source weights, and confidence
- `GET /v1/config/correlation` - Current correlation config (secrets redacted)
- `PUT /v1/config/correlation` - Update config (admin only, validates, writes YAML, hot-reloads)

### Correlation Dashboard

- **Signals tab** - Recent events with source, confidence, and group assignment
- **Groups tab** - Signal groups with status, source count, confidence, corroboration status
- **Config tab** - Visual correlation config editor with source weights
- **Group detail page** - Contributing events, source breakdown, timeline
- **Mitigation detail integration** - Correlation context section on mitigated IPs

---

## Inventory

### Customer/IP Mapping
Expand Down
27 changes: 18 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,8 +150,9 @@ playbooks:

### 3. Connect a detector

Point your detector at prefixd's API:
Point your detector at prefixd's API. Three integration paths:

**Generic events API** (any detector):
```bash
curl -X POST http://localhost/v1/events \
-H "Content-Type: application/json" \
Expand All @@ -165,6 +166,12 @@ curl -X POST http://localhost/v1/events \
}'
```

**Native adapters** (zero-config signal translation):
- **Alertmanager** → `POST /v1/signals/alertmanager` — maps labels/annotations to events
- **FastNetMon** → `POST /v1/signals/fastnetmon` — accepts native JSON payload

With [multi-signal correlation](docs/configuration.md#correlation) enabled, events from multiple detectors targeting the same IP are grouped and corroborated before triggering mitigation.

See [FastNetMon Integration](docs/detectors/fastnetmon.md) for a complete setup guide.

### 4. Peer with your routers
Expand All @@ -177,7 +184,8 @@ Configure GoBGP neighbors in `configs/gobgp.conf` and set up FlowSpec import pol

| Category | What it does |
|----------|--------------|
| **Signal Ingestion** | HTTP API accepts attack events from any detector |
| **Signal Ingestion** | HTTP API + native Alertmanager and FastNetMon webhook adapters |
| **Multi-Signal Correlation** | Time-windowed grouping of events from multiple detectors with source weighting and corroboration |
| **Policy Engine** | YAML playbooks define per-vector responses with escalation |
| **Guardrails** | Quotas, safelist, /32-only enforcement, mandatory TTLs |
| **BGP FlowSpec** | Announces via GoBGP (traffic-rate, discard actions) |
Expand All @@ -192,13 +200,14 @@ Configure GoBGP neighbors in `configs/gobgp.conf` and set up FlowSpec import pol

## How It Works

1. **Detector sends event** → `POST /v1/events` with victim IP, vector, confidence
1. **Detector sends event** → `POST /v1/events`, `/v1/signals/alertmanager`, or `/v1/signals/fastnetmon`
2. **Inventory lookup** → Find customer/service owning the IP
3. **Playbook match** → Determine action (police/discard) based on vector
4. **Guardrails check** → Validate quotas, safelist, prefix length
5. **FlowSpec announce** → Send rule to GoBGP via gRPC
6. **Router enforcement** → Traffic filtered at line rate
7. **Auto-expiry** → Rule withdrawn when TTL expires
3. **Signal correlation** → Group related signals by (victim_ip, vector), check corroboration
4. **Playbook match** → Determine action (police/discard) based on vector
5. **Guardrails check** → Validate quotas, safelist, prefix length
6. **FlowSpec announce** → Send rule to GoBGP via gRPC
7. **Router enforcement** → Traffic filtered at line rate
8. **Auto-expiry** → Rule withdrawn when TTL expires

**Fail-open design:** If prefixd dies, mitigations auto-expire. No permanent rules, no stuck state.

Expand Down Expand Up @@ -263,7 +272,7 @@ Current version: **v0.13.0**

- **Issues:** [GitHub Issues](https://github.com/lance0/prefixd/issues)
- **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md)
- **Architecture Decision Records:** [docs/adr/](docs/adr/)
- **Architecture Decision Records:** [docs/adr/](docs/adr/) (19 ADRs)

---

Expand Down
18 changes: 9 additions & 9 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,25 +274,25 @@ Replace the GoBGP container dependency with [rustbgpd](https://github.com/lance0

Example: FastNetMon says UDP flood at 0.6 confidence + router CPU spiking + host conntrack exhaustion = **high-confidence mitigation**.

### Signal Adapters (start with one)
### Signal Adapters

- [ ] Prometheus/Alertmanager adapter (metric queries, webhook receiver) — most universal, many operators already have this
- [ ] Enhanced FastNetMon adapter (configurable confidence mapping) — common pairing for self-hosted
- [x] Prometheus/Alertmanager adapter (`POST /v1/signals/alertmanager` webhook receiver) — maps labels/annotations to attack events, handles batched alerts
- [x] FastNetMon webhook adapter (`POST /v1/signals/fastnetmon`) — classifies vectors from traffic breakdown, configurable confidence mapping
- [ ] Router telemetry adapter (JTI, gNMI)

### Correlation Engine

- [ ] Time-windowed event grouping
- [ ] Source weighting and reliability scoring
- [ ] Corroboration requirements ("require 2+ sources")
- [ ] Correlation explainability (`why` details in API/UI for each mitigation decision)
- [x] Time-windowed event grouping
- [x] Source weighting and reliability scoring
- [x] Corroboration requirements ("require 2+ sources")
- [x] Correlation explainability (`why` details in API/UI for each mitigation decision)
- [ ] Replay mode for tuning (simulate historical incidents without announcing FlowSpec rules)

### Confidence Model

- [ ] Derived confidence from traffic patterns
- [x] Derived confidence from traffic patterns
- [ ] Confidence decay over time
- [ ] Per-playbook thresholds
- [x] Per-playbook thresholds

---

Expand Down
18 changes: 18 additions & 0 deletions configs/correlation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
enabled: true
window_seconds: 300
min_sources: 1
confidence_threshold: 0.5
sources:
dashboard:
weight: 1.0
type: manual
confidence_mapping: {}
fastnetmon:
weight: 1.0
type: detector
confidence_mapping: {}
alertmanager:
weight: 0.8
type: telemetry
confidence_mapping: {}
default_weight: 1.0
16 changes: 16 additions & 0 deletions configs/prefixd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,22 @@ safelist:
- "10.0.0.0/8"
- "192.168.0.0/16"

correlation:
enabled: true
window_seconds: 300
min_sources: 1
confidence_threshold: 0.5
sources:
fastnetmon:
weight: 1.0
type: detector
alertmanager:
weight: 0.8
type: telemetry
dashboard:
weight: 1.0
type: manual

shutdown:
drain_timeout_seconds: 30
preserve_announcements: true
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ services:
expose:
- "8080"
volumes:
- ./configs:/etc/prefixd:ro
- ./configs:/etc/prefixd
- prefixd-data:/data
environment:
- RUST_LOG=info
Expand Down
Loading