diff --git a/docs/5-integrations/extensions/limacharlie/vulnerability-reporting.md b/docs/5-integrations/extensions/limacharlie/vulnerability-reporting.md index 1fffe6b84..cc9756980 100644 --- a/docs/5-integrations/extensions/limacharlie/vulnerability-reporting.md +++ b/docs/5-integrations/extensions/limacharlie/vulnerability-reporting.md @@ -2,7 +2,7 @@ The Vulnerability Reporting extension (`ext-vulnerability-reporting`) collects per-endpoint software inventories, resolves them against the LimaCharlie CVE database, enriches each finding with CISA KEV and FIRST EPSS data, scores them with environment-aware risk, tracks per-finding resolutions across rescans, and surfaces the results in the LimaCharlie web app and via the extension API. -It is the first consumer of the canonical [`lc:asset:*` tag namespace](../../../2-sensors-deployment/asset-tags.md): asset criticality, exposure, environment, owner, and compliance tags are read directly off the sensors and used to prioritize findings, scope filters, and parameterize remediation SLAs. +It is the first consumer of the canonical [`lc:asset:*` tag namespace](../../../2-sensors-deployment/asset-tags.md): asset criticality, exposure, environment, owner, and compliance tags are read directly off the sensors and used to prioritize findings and scope filters. ## What it does @@ -10,7 +10,7 @@ It is the first consumer of the canonical [`lc:asset:*` tag namespace](../../../ 2. **CVE resolution.** Inventories are sent to `cve.limacharlie.io`, which maps each `(package_name, package_version)` pair to the set of CVEs that affect it. 3. **Enrichment.** Each CVE is joined against CISA KEV and FIRST EPSS via `cve.limacharlie.io/enrich`. KEV / EPSS / criticality multiplier are folded into a 0-100 [LC Risk](#lc-risk) score that is persisted on every finding row. 4. **Resolutions.** Every finding is implicitly **open** unless an operator records a resolution: `mitigated`, `accepted`, or `false_positive`. Resolutions are keyed by a deterministic [fingerprint](#finding-fingerprint) so they survive rescans. -5. **Daily scans.** A per-org daily tick runs four jobs: KEV-match emission, SLA-breach warning, open-finding snapshot for the burndown tile, and EPSS-percentile snapshot for the per-CVE history sparkline. +5. **Daily scans.** A per-org daily tick runs three jobs: KEV-match emission, open-finding snapshot for the burndown tile, and EPSS-percentile snapshot for the per-CVE history sparkline. 6. **Surfacing.** Findings are exposed via the LimaCharlie web app's Vulnerabilities page (KPI strip, trend tiles, filter chip-bar, KEV/EPSS columns, LC Risk score, lifecycle chips, CVE / asset detail pages, exec / compliance / remediation reports) and via the extension API ([API Actions](#api-actions)). The extension is stateless aside from the per-org Spanner-backed tables (`vuln_reports`, `vuln_finding_state`, `vuln_daily_snapshots`, `vuln_epss_history`, plus rollup tables) and a small `org_value` keyed at `ext_vuln_kev_known_set`. @@ -39,7 +39,6 @@ The configuration is edited on the extension page in the LimaCharlie web app. Al | Field | Type | Default | Description | |-------|------|---------|-------------| | `scan_mode` | enum | `scheduled` | One of `scheduled`, `manual`, `all`. See [Setup](#setup). | -| `sla_windows_hours` | object | `{"critical":168,"high":720,"medium":2160,"low":4320}` | Per-criticality remediation deadlines in hours. Keys must be one of `critical` / `high` / `medium` / `low`; values must be positive integers. Unset keys fall back to the defaults (1 week / 30 days / 90 days / 180 days, matching common NIST and FedRAMP baselines). | | `criticality_tag_overrides` | object | `{}` | Map of `{your-tag → canonical-bucket}` for organizations that already run their own asset-tag taxonomy. See [Asset Metadata](#asset-metadata). | ### Example @@ -47,12 +46,6 @@ The configuration is edited on the extension page in the LimaCharlie web app. Al ```json { "scan_mode": "scheduled", - "sla_windows_hours": { - "critical": 24, - "high": 168, - "medium": 720, - "low": 2160 - }, "criticality_tag_overrides": { "crown-jewel": "critical", "tier-1": "high", @@ -61,13 +54,13 @@ The configuration is edited on the extension page in the LimaCharlie web app. Al } ``` -`criticality_tag_overrides` is consulted only when a sensor carries no canonical `lc:asset:criticality:*` tag. Explicit canonical tags always win, so an organization can migrate gradually. Override values must be canonical buckets; any other value is rejected at write time. `sla_windows_hours` keys must be canonical buckets and values must be positive integers; partial maps are valid (unset keys fall back to defaults). +`criticality_tag_overrides` is consulted only when a sensor carries no canonical `lc:asset:criticality:*` tag. Explicit canonical tags always win, so an organization can migrate gradually. Override values must be canonical buckets; any other value is rejected at write time. ## Asset metadata The extension reads sensor tags in the [`lc:asset:*` namespace](../../../2-sensors-deployment/asset-tags.md) and uses them to: -- **Prioritize findings.** `lc:asset:criticality:*` is the multiplier in the LC Risk score and the source for the per-criticality SLA window. +- **Prioritize findings.** `lc:asset:criticality:*` is the multiplier in the LC Risk score. - **Scope filters.** `lc:asset:env:*` and `lc:asset:exposure:*` populate filter chips on the Vulnerabilities page. - **Surface compliance views.** `lc:asset:compliance:*` is multi-value; an asset can carry several regimes. - **Route assignments.** `lc:asset:owner:*` is exposed on the asset detail page so downstream workflows (Cases, Outputs to Slack/Jira/etc.) have the routing target available. @@ -96,12 +89,12 @@ Tags with malformed values for the closed-set fields (`criticality`, `exposure`, A finding has exactly one of two postures: **open** (the default — there is no resolution row) or **resolved** (a row exists in `vuln_finding_state` carrying one of three resolutions). Resolutions are keyed by a [fingerprint](#finding-fingerprint), so they survive rescans. -| Posture | `resolution` | Description | Counts against SLA | -|---------|--------------|-------------|--------------------| -| open | — (no row) | New finding. Implicit; nothing is persisted. | Yes | -| resolved | `mitigated` | Compensating control in place; finding is no longer counted as exploitable. Sets `resolved_at` (used by MTTR). | No | -| resolved | `accepted` | Risk has been formally accepted as an exception, optionally with an `expires_at`. | No, until `expires_at` lapses | -| resolved | `false_positive` | Confirmed not applicable (resolver mis-mapped the package, etc.). | No | +| Posture | `resolution` | Description | +|---------|--------------|-------------| +| open | — (no row) | New finding. Implicit; nothing is persisted. | +| resolved | `mitigated` | Compensating control in place; finding is no longer counted as exploitable. Sets `resolved_at` (used by MTTR). | +| resolved | `accepted` | Risk has been formally accepted as an exception, optionally with an `expires_at`. Lapses back into the open count when `expires_at` is in the past. | +| resolved | `false_positive` | Confirmed not applicable (resolver mis-mapped the package, etc.). | The resolution row carries six columns: `resolution`, `expires_at`, `case_number`, `resolved_at`, `resolved_by`, `updated_at`. There is no per-finding audit log — re-running `set_finding_resolution` overwrites in place. To **reopen** a finding, call `set_finding_resolution` with `resolution: null`; this deletes the row. @@ -219,14 +212,13 @@ A bucket with zero findings is still written on a quiet day so the burndown spar ### Daily Update tick -The platform scheduler (`legion_extension_manager` / `legion_scheduler`'s `ext-update-event` cron) fires `EventTypes.Update` once per subscribed org per day, spread across 24h via `MultiplexOID`. The handler runs four scans sequentially with an independent 60-second timeout per scan; one scan's failure does not suppress the others. +The platform scheduler (`legion_extension_manager` / `legion_scheduler`'s `ext-update-event` cron) fires `EventTypes.Update` once per subscribed org per day, spread across 24h via `MultiplexOID`. The handler runs three scans sequentially with an independent 10-minute timeout per scan; one scan's failure does not suppress the others. | Order | Scan | Output | |-------|------|--------| | 1 | `kev_match` | Emits `vuln_finding.kev_match` for CVEs that just entered KEV AND for which the org still has open findings. Diffed against an `org_value` "previously-known KEV set". | -| 2 | `sla_breach` | Emits `vuln_finding.sla_breach_warning` for findings whose age > `sla_window - 3 days` AND that have no `vuln_finding_state` row, or whose row is a lapsed acceptance. | -| 3 | `daily_snapshot` | Writes the per-severity open / KEV counts for today (see [Daily snapshots](#daily-snapshots)). | -| 4 | `epss_history` | Writes one EPSS row per distinct org CVE for today (see [EPSS history](#epss-history-90-day-series)). | +| 2 | `daily_snapshot` | Writes the per-severity open / KEV counts for today (see [Daily snapshots](#daily-snapshots)). | +| 3 | `epss_history` | Writes one EPSS row per distinct org CVE for today (see [EPSS history](#epss-history-90-day-series)). | The handler also re-reconciles D&R rules (idempotent) so a config change picks up on the next tick without requiring a manual re-subscribe. @@ -281,6 +273,7 @@ The full request and response schemas live in the extension's `requestSchema()` | `scan_packages` | Trigger an out-of-band `os_packages` scan against a specific sensor. | | `set_finding_resolution` | Set or clear a finding's resolution. Pass `resolution: null` to reopen (delete the row). | | `bulk_set_finding_resolution` | Apply a resolution change across up to 100 findings in one call. | +| `reset_asset_findings` | Wipe every stored finding for one sensor (for reformat / reimage / decommission). Org-scope fingerprints that were only on this sensor fire `vuln_finding.closed`. | ### Internal action @@ -600,6 +593,22 @@ Trigger an out-of-band scan for one sensor: Returns immediately; the scan completes asynchronously when the sensor reports back via the ingest D&R rule. +#### `reset_asset_findings` + +Wipe every stored finding for one sensor. Use when the host has been reformatted, reimaged, or decommissioned and the existing findings no longer reflect reality. The next legitimate package scan repopulates findings from scratch. + +```json +{ "sid": "550e8400-..." } +``` + +Response carries the number of org-scope fingerprints that the reset cleared from the org entirely (one `vuln_finding.closed` event fires per cleared fingerprint): + +```json +{ "data": { "sid": "550e8400-...", "closed": 17 } } +``` + +Side effect: `vuln_endpoint_scans.last_scan_at` is stamped to the reset time, matching the semantic "operator declared this asset clean at this time". The next real package scan overwrites it. + ## Events emitted The extension emits the following events through LimaCharlie's standard webhook adapter. Customers route them via Outputs to Jira, Slack, Cases, PagerDuty, etc. @@ -607,9 +616,9 @@ The extension emits the following events through LimaCharlie's standard webhook | Event | When fired | Notable fields | |-------|-----------|----------------| | `vuln_finding.created` | A new finding lands for an asset (rescan write path detected a new `(oid, fingerprint)` tuple). | `cve`, `severity`, `score`, `sid`, `hostname`, `kev`, `epss`, `first_seen` | +| `vuln_finding.closed` | The last sensor holding `(cve, normalized_package_name)` cleared it on a rescan, so the org-scope fingerprint is gone. Also fires per cleared fingerprint when `reset_asset_findings` wipes a host. | `cve`, `severity`, `score`, `sid`, `hostname`, `fingerprint` | | `vuln_finding.kev_match` | A CVE just entered CISA KEV AND the org still has at least one open finding for it. | `cve`, `kev`, `epss` | -| `vuln_finding.resolution_changed` | `set_finding_resolution` / `bulk_set_finding_resolution` succeeded (including reopens, where the carried `resolution` is `null`). | `fingerprint`, `scope`, `resolution`, `expires_at`, `case_number`, `resolved_at`, `resolved_by` | -| `vuln_finding.sla_breach_warning` | Finding age > `sla_window - 3 days` AND there is no active resolution row (or the row is a lapsed acceptance). Fires once per finding per daily tick. | `cve`, `severity`, `sid`, `hostname`, `first_seen`, `extra.criticality`, `extra.days_to_deadline` | +| `vuln_finding.state_changed` | `set_finding_resolution` / `bulk_set_finding_resolution` succeeded (including reopens, where the embedded resolution row carries `scope` + `fingerprint` + `updated_at` and the resolution-related fields are nil). | `fingerprint`, embedded `resolution` row (`scope`, `resolution`, `expires_at`, `case_number`, `resolved_at`, `resolved_by`, `updated_at`) | Every event carries `event_type`, `oid`, and an optional `fingerprint`. Event delivery is best-effort: a failed webhook is logged at warn level and does not roll back the underlying state mutation. @@ -673,7 +682,7 @@ Concrete operator playbooks. Each workflow is a numbered sequence; substitute `< - Affected hosts (`query_cve_vuln_hosts`) — what asset criticality / exposure mix? 3. From the CVE detail page click **Run a hunt** — the deeplink seeds an LCQL hunt with the CVE context for live investigation. 4. Decide: - - If a compensating control is in place → **Set resolution → mitigated**. `resolved_at` is stamped and the finding stops counting against SLA. + - If a compensating control is in place → **Set resolution → mitigated**. `resolved_at` is stamped and the finding drops out of the open count. - If the operator is going to actively patch → leave the finding as `open` (no resolution row); the burndown sparkline tracks remediation by attrition (the rescan removes the row when the patch lands). - If business has formally accepted the risk → **Set resolution → accepted** with an optional `expires_at`. @@ -689,7 +698,7 @@ When a finding cannot be patched in time and the business formally accepts the r 1. CVE row → **Set resolution → accepted**. 2. Optionally set `expires_at` (RFC3339, in the future). An accepted resolution without an `expires_at` never lapses. -3. The finding stops counting against the SLA until `expires_at`. When `expires_at` is in the past the UI derives a **lapsed acceptance** signal at read time — the row renders with the same urgency as an open finding so the operator knows to revisit. +3. The finding drops out of the open count until `expires_at`. When `expires_at` is in the past the UI derives a **lapsed acceptance** signal at read time — the row renders with the same urgency as an open finding so the operator knows to revisit, and the daily snapshot counts it back as open. 4. To extend, call `set_finding_resolution` again with a new `expires_at` (the row is upserted in place; `resolved_at` and `resolved_by` are refreshed). 5. To formally close, transition to `mitigated` once the patch lands. To reopen, pass `resolution: null` (deletes the row). @@ -747,7 +756,7 @@ curl -s -X POST "$LC_API/v1/extension/request/ext-vulnerability-reporting" \ | Patching is in flight | (leave as `open`; the rescan removes the row when the patch lands) | | Compensating control blocks exploitation; finding no longer counts as exploitable | `mitigated` | | Patch lands, rescan confirms gone | (no action — the row drops out of `vuln_reports` on the next scan) | -| Cannot patch by the SLA window, business-accepted exception | `accepted` (optional `expires_at`) | +| Cannot patch in the desired window, business-accepted exception | `accepted` (optional `expires_at`) | | Resolver false positive (wrong product / wrong version) | `false_positive` | Do not use `mitigated` for "patch in progress" — `resolved_at` is stamped on entry, which would skew MTTR. Leave the finding open until the patch lands or a compensating control is documented. @@ -771,28 +780,28 @@ Drive these via `limacharlie tag mass-add` keyed off existing infrastructure tag CVSS severity is environment-blind: a CVSS 9.8 critical scores the same on a dev laptop and on the customer-facing API gateway. LC Risk corrects for that by multiplying in the asset-criticality bucket — a `low` host caps roughly half the score, and a `critical` host inflates it by 60%. Always sort by LC Risk first; fall back to CVSS only when explaining the score externally. -### SLA configuration +### Tracking remediation deadlines -Defaults match common compliance baselines (NIST SP 800-40 Rev. 4, FedRAMP Continuous Monitoring): 1 week / 30 days / 90 days / 180 days for critical / high / medium / low. Tighten as needed: +The extension does not enforce a built-in remediation SLA — there is no SLA-window configuration, no per-criticality deadline persisted on findings, and no SLA-breach event. The signals that **are** available for cadence-tracking are: -```json -{ "sla_windows_hours": { "critical": 24, "high": 72 } } -``` +- `vuln_finding.created` — for stamping a target deadline at ingest in your ticketing system. +- `first_seen_at` (returned on every finding row) — the basis any external SLA calculation should key off. +- `vuln_finding.closed` and `mitigated`-resolution `vuln_finding.state_changed` events — for closing the loop and computing MTTR externally. -Partial maps are fine — unset keys keep their default. The `sla_breach_warning` event fires when remaining time falls under 3 days; orgs that want a longer warning horizon should tighten the SLA itself rather than rely on the warning window. +If you need deadline alerts, wire the per-criticality clock in your downstream system (Jira / Linear / PagerDuty) using `first_seen_at` + `criticality` from the event payload. ### "Near-real-time" expectations -The daily Update tick is per-org, spread across 24h. KEV-match alerts, SLA-breach warnings, and snapshot writes can lag by up to a full day for any one org. This is deliberate — the cron's load is steady rather than spiking. If you need sub-hour KEV detection, subscribe to the upstream CISA RSS feed in addition to this extension. +The daily Update tick is per-org, spread across 24h. KEV-match alerts and snapshot writes can lag by up to a full day for any one org. This is deliberate — the cron's load is steady rather than spiking. If you need sub-hour KEV detection, subscribe to the upstream CISA RSS feed in addition to this extension. ### Integrating with downstream Outputs -Route the four `vuln_finding.*` events to your existing alerting pipeline. A typical wiring: +Route the `vuln_finding.*` events to your existing alerting pipeline. A typical wiring: - `vuln_finding.kev_match` → Slack `#vuln-priority` + page on-call. -- `vuln_finding.sla_breach_warning` → ticketing system (Jira, Linear). Use `extra.days_to_deadline` to escalate. -- `vuln_finding.resolution_changed` → SIEM. Each event carries the full resolution row, so a downstream consumer can rebuild the change feed without a follow-up read. -- `vuln_finding.created` → optional; high-volume on first scan, often filtered to `severity=critical` only. +- `vuln_finding.created` → ticketing system (Jira, Linear) so each new finding gets a tracked owner. High-volume on first scan; often filtered to `severity=critical` only. +- `vuln_finding.state_changed` → SIEM. Each event carries the full resolution row, so a downstream consumer can rebuild the change feed without a follow-up read. +- `vuln_finding.closed` → ticketing system (auto-resolve the matching ticket) and SIEM. ## Glossary @@ -806,7 +815,6 @@ Route the four `vuln_finding.*` events to your existing alerting pipeline. A typ | **EPSS** | FIRST.org's Exploit Prediction Scoring System. Per-CVE probability + percentile of in-the-wild exploitation in the next 30 days. | | **LC Risk** | LimaCharlie's 0-100 environment-aware risk score. Persisted per-finding; see [LC Risk](#lc-risk). | | **MTTR** | Mean Time To Remediation. Computed from `(resolved_at - first_seen_at)` per severity bucket. | -| **SLA** | Service Level Agreement. The per-criticality remediation deadline configured via `sla_windows_hours`. | | **Criticality** | Asset importance bucket (`critical`/`high`/`medium`/`low`). Source: `lc:asset:criticality:*` or the configured override map. | | **Exposure** | Network reachability bucket (`internet-facing`/`dmz`/`internal`). Source: `lc:asset:exposure:*`. | | **Env** | Environment bucket (`prod`/`staging`/`dev`/`test`). Source: `lc:asset:env:*`. | @@ -814,9 +822,8 @@ Route the four `vuln_finding.*` events to your existing alerting pipeline. A typ | **Scope** | `org` (per-package, applies to every host) or `host` (per-package-per-sensor). Resolution precedence: host beats org. | | **Resolution** | Replaces the older `state` term. A finding is either implicitly **open** (no row) or **resolved** with `resolution ∈ { mitigated, accepted, false_positive }`. See [Lifecycle states](#lifecycle-states). | | **Lapsed acceptance** | An `accepted` resolution whose `expires_at` is in the past. Derived in the UI as `resolution === 'accepted' && expires_at < now`; the row is **not** mutated. | -| **Daily Update tick** | Per-org per-day cron firing the four daily scans. Spread across 24h. See [Daily Update tick](#daily-update-tick). | +| **Daily Update tick** | Per-org per-day cron firing the three daily scans. Spread across 24h. See [Daily Update tick](#daily-update-tick). | | **KEV match** | The event fired when a CVE just entered KEV AND the org still has an open finding for it. | -| **SLA breach warning** | The event fired when a finding's age > `sla_window - 3 days` AND there is no active resolution row (or the row is a lapsed acceptance). | | **Case number** | Optional integer on a resolution row reserved for upcoming ext-cases linkage. Plumbed through the API today; not surfaced in the UI yet. | ## Reachability (deferred)