diff --git a/docs/concepts/architecture/design/vulnerability-analysis.md b/docs/concepts/architecture/design/vulnerability-analysis.md index 684b824b..f445ea34 100644 --- a/docs/concepts/architecture/design/vulnerability-analysis.md +++ b/docs/concepts/architecture/design/vulnerability-analysis.md @@ -3,7 +3,7 @@ ## Overview The vulnerability analysis system identifies known vulnerabilities in a project's components -by coordinating multiple vulnerability analyzers via a [durable workflow](durable-execution.md). +by coordinating a fleet of vulnerability analyzers via a [durable workflow](durable-execution.md). ## Granularity @@ -23,30 +23,36 @@ The unit-of-work being a project further has the following, non-functional benef ## Triggers -Vulnerability analysis can be triggered in three ways: +Three triggers can start vulnerability analysis. Each trigger creates a run +of the outer `analyze-project` workflow, which calls the `vuln-analysis` workflow +described in this document, followed by project policy evaluation and metrics update. -| Trigger | Workflow Instance ID | Concurrency Key | Priority | -|:-----------|:----------------------------------------|:------------------------------|:------------| -| Scheduled | `scheduled-vuln-analysis:` | `vuln-analysis:` | 0 (default) | -| BOM upload | *(none)* | `vuln-analysis:` | 50 | -| Manual | `manual-vuln-analysis:` | `vuln-analysis:` | 75 | +| Trigger | Workflow instance ID | Concurrency key | Priority | +|:-----------|:--------------------------------------------------|:--------------------------------|:------------| +| Scheduled | `analyze-project-scheduled:` | `analyze-project:` | 0 (default) | +| BOM upload | `analyze-project:bom-upload:` | `analyze-project:` | 50 | +| Manual | `analyze-project-manual:` | `analyze-project:` | 75 | -All triggers share the same concurrency key pattern, serializing analysis runs per project -regardless of how they were initiated. Only one run per project can be active at a time. -This prevents data races and excessive resource utilization. +All triggers share the same concurrency key pattern, which serializes analysis runs per project +regardless of the trigger. Only one run per project can be active at a time. +This prevents data races and excessive resource use. -BOM uploads intentionally omit an instance ID. This ensures that every upload -results in an analysis, even if multiple uploads occur in quick succession. -Scheduled and manual triggers use instance IDs to deduplicate concurrent requests -for the same project. +BOM upload instance IDs include the per-upload token, so every upload results in a +separate run even when uploads arrive in quick succession for the same project. +Scheduled and manual triggers use project-scoped instance IDs to deduplicate concurrent +requests for the same project. -Higher priority values are processed first, so manual triggers (75) take precedence +The engine processes higher priority values first, so manual triggers (75) take precedence over BOM uploads (50), which take precedence over scheduled runs (0). +When `analyze-project` invokes `vuln-analysis`, it sets the nested workflow's concurrency +key to `vuln-analysis:`. This isolates the vulnerability analysis stage from +the surrounding policy and metrics stages. + Refer to the [durable execution](durable-execution.md) documentation for details -on how concurrency keys, instance IDs, and priorities are enforced by the engine. +on how the engine enforces concurrency keys, instance IDs, and priorities. -## Workflow Execution +## Workflow execution The `vuln-analysis` workflow orchestrates the full analysis lifecycle: @@ -82,22 +88,22 @@ sequenceDiagram ### Preparation -| Activity | Task Queue | +| Activity | Task queue | |:------------------------|:-----------| | `prepare-vuln-analysis` | `default` | -1. Determines which analyzers are applicable for the project by querying all enabled +1. Determines which analyzers are applicable for the project by querying all active analyzer instances for their requirements. 2. Aggregates requirements across all analyzers and assembles a CycloneDX BOM containing the project's components with the necessary data (CPEs, PURLs, etc.). 3. Stores the BOM to file storage. -If no analyzers are enabled, or the project has no analyzable components, +If no analyzers are active, or the project has no analyzable components, the workflow terminates early. -### Analyzer Invocation +### Analyzer invocation -| Activity | Task Queue | +| Activity | Task queue | |:-----------------------|:----------------| | `invoke-vuln-analyzer` | `vuln-analyses` | @@ -110,9 +116,9 @@ the workflow terminates early. Each analyzer invocation is a separate activity, enabling independent retries and concurrent execution across analyzers. -### Analyzer Result Reconciliation +### Analyzer result reconciliation -| Activity | Task Queue | +| Activity | Task queue | |:----------------------------------|:--------------------------------| | `reconcile-vuln-analysis-results` | `vuln-analysis-reconciliations` | @@ -123,194 +129,71 @@ Reconciliation performs the following operations, in order: 1. Merging duplicate vulnerability reports across VDRs. 2. Synchronizing vulnerabilities to the database (that is, creating or updating them). -3. Synchronizing vulnerability alias assertions (see [Alias Synchronization](#alias-synchronization)). -4. Creating and soft-deleting finding attributions (see ADR-013). +3. Synchronizing vulnerability alias assertions (see [Alias synchronization](#alias-synchronization)). +4. Creating and soft-deleting finding attributions (see [ADR-013][adr-013]). 5. Evaluating vulnerability policies for active findings. 6. Emitting notifications (see [Notifications](#notifications)). -All database changes are batched and committed in a single transaction. +The activity batches all database changes and commits them in a single transaction. This ensures that changes are atomic, and the activity is idempotent. -### File Deletion +### File deletion -| Activity | Task Queue | +| Activity | Task queue | |:---------------|:-----------| | `delete-files` | `default` | Deletes the BOM file, all VDR files, and the context file (if present) from file storage. -## Analyzer Extension Point - -Analyzers are pluggable extension points. Their API surface consists of the following interfaces: - -???- abstract "VulnAnalyzerFactory" - ```java linenums="1" - package org.dependencytrack.vulnanalysis.api; - - import org.dependencytrack.plugin.api.ExtensionFactory; - - import java.util.EnumSet; - - public interface VulnAnalyzerFactory extends ExtensionFactory { - - /** - * @return Whether the analyzer is enabled. - */ - boolean isEnabled(); - - /** - * Declares which component data the analyzer needs to perform its analysis. - *

- * For example, an analyzer that queries the NVD by CPE would return - * {@link VulnAnalyzerRequirement#COMPONENT_CPE}. - *

- * Requirements are aggregated across all enabled analyzers. The resulting BOM passed to - * {@link VulnAnalyzer#analyze(org.cyclonedx.proto.v1_6.Bom)} may thus contain more - * data than any single analyzer requested. Requirements are satisfied on a best-effort basis, - * and components provided to analyzers may lack the requested fields. - *

- * Note that group, name, and version is always provided for all components. - * - * @return Requirements for this analyzer. - */ - EnumSet analyzerRequirements(); - - } - ``` - -???- abstract "VulnAnalyzer" - ```java linenums="1" - package org.dependencytrack.vulnanalysis.api; - - import org.cyclonedx.proto.v1_6.Bom; - import org.cyclonedx.proto.v1_6.VulnerabilityAffects; - import org.dependencytrack.plugin.api.ExtensionPoint; - import org.dependencytrack.plugin.api.ExtensionPointSpec; - - @ExtensionPointSpec(name = "vuln-analyzer", required = false) - public interface VulnAnalyzer extends ExtensionPoint { - - /** - * Analyzes the given BOM for vulnerabilities. - * - *

Input

- *

- * The input is a CycloneDX BOM representing a project's components. - * Components MAY have the fields indicated by the analyzer's - * {@link VulnAnalyzerFactory#analyzerRequirements()}, but this is not guaranteed. - * Components can have more or fewer fields. It is the responsibility of the analyzer - * to determine which components it can work with and which it should ignore. - *

- * Components may include a {@code dependencytrack:internal:is-internal-component} property. - * When present, the component is internal and its data MUST NOT be sent to external services. - * The mere presence of the property is suffices, the value is irrelevant. Example: - *

{@code
-       * {
-       *   "components": [
-       *     {
-       *       "bomRef": "ab84cf35-82a1-4341-a70f-0e8c9138e3c4",
-       *       "type": "CLASSIFICATION_LIBRARY",
-       *       "name": "acme-lib",
-       *       "version": "1.0.0",
-       *       "purl": "pkg:maven/com.acme/acme-lib@1.0.0",
-       *       "properties": [
-       *         {
-       *           "name": "dependencytrack:internal:is-internal-component"
-       *         }
-       *       ]
-       *     },
-       *     {
-       *       "bomRef": "cd72ef49-93b2-4452-b81e-1a9249fce4b5",
-       *       "type": "CLASSIFICATION_LIBRARY",
-       *       "name": "jackson-databind",
-       *       "version": "2.18.0",
-       *       "purl": "pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.18.0",
-       *       "cpe": "cpe:2.3:a:fasterxml:jackson-databind:2.18.0:*:*:*:*:*:*:*"
-       *     }
-       *   ]
-       * }
-       * }
- * - *

Output

- *

- * The output is a CycloneDX VDR. It must contain {@code vulnerabilities} with - * {@link VulnerabilityAffects} entries referencing affected components via their {@code bomRef}. - * BOM refs should be treated as opaque strings, and analyzers should not make assumptions - * about their format. Example: - *

{@code
-       * {
-       *   "vulnerabilities": [
-       *     {
-       *       "id": "CVE-2024-1234",
-       *       "source": {
-       *         "name": "NVD",
-       *         "url": "https://nvd.nist.gov/"
-       *       },
-       *       "affects": [
-       *         {
-       *           "ref": "cd72ef49-93b2-4452-b81e-1a9249fce4b5"
-       *         }
-       *       ]
-       *     }
-       *   ]
-       * }
-       * }
- *

- * Vulnerabilities MAY include a {@code dependency-track:vuln:reference-url} property, - * containing a URL that links to the analyzer-specific advisory or issue page for the - * vulnerability. Example: - *

{@code
-       * {
-       *   "vulnerabilities": [
-       *     {
-       *       "id": "CVE-2024-1234",
-       *       "source": {
-       *         "name": "NVD"
-       *       },
-       *       "properties": [
-       *         {
-       *           "name": "dependency-track:vuln:reference-url",
-       *           "value": "https://security.snyk.io/vuln/SNYK-JAVA-EXAMPLE-1234"
-       *         }
-       *       ],
-       *       "affects": [
-       *         {
-       *           "ref": "cd72ef49-93b2-4452-b81e-1a9249fce4b5"
-       *         }
-       *       ]
-       *     }
-       *   ]
-       * }
-       * }
- * - * @param bom the CycloneDX BOM to analyze. - * @return A CycloneDX VDR containing discovered vulnerabilities. - */ - Bom analyze(Bom bom); - - } - ``` - -## File Storage - -The BOM and VDR files produced during analysis are stored via the `FileStorage` mechanism, +## Analyzer extension point + +Analyzers are pluggable via the [extensibility](extensibility.md) system. The API surface consists of two interfaces: + +* **`VulnAnalyzerFactory`**: Creates analyzer instances. Reports whether the analyzer is active, + and declares the component data the analyzer needs via a `VulnAnalyzerRequirement` set + (for example, `COMPONENT_CPE`, `COMPONENT_PURL`). +* **`VulnAnalyzer`**: Receives a CycloneDX BOM representing a project's components, + and returns a CycloneDX VDR describing the vulnerabilities found. + +The preparation step aggregates requirements across all active analyzers, so the BOM +each analyzer receives may carry more data than that analyzer requested. Requirements are +best-effort and components may lack the requested fields. Group, name, and version are always +present. Each analyzer decides which components it can work with and ignores the rest. + +In the returned VDR, `vulnerabilities[].affects[].ref` entries reference affected components +by their `bomRef`. Analyzers must treat `bomRef` values as opaque strings. + +### Internal components + +Components may carry a `dependencytrack:internal:is-internal-component` property. When present, +the component is internal and analyzers must not send its data to external services. The presence of +the property is enough. Its value is irrelevant. + +### Advisory reference URLs + +Vulnerabilities in the VDR may include a `dependency-track:vuln:reference-url` property pointing +to the analyzer-specific advisory or issue page for that vulnerability. + +## File storage + +The BOM and VDR files produced during analysis go through the `FileStorage` mechanism, which abstracts over an underlying storage backend. -File storage was chosen because: +The system uses file storage because: -* In-memory storage is not an option, as workflow execution may span multiple application nodes. +* In-memory storage is not an option, as workflow execution may span more than one node. * The respective artifacts are arbitrarily large. * Passing them directly via workflow and activity arguments would bloat workflow history. * Storing them as blobs in the database would strain the database with excessive I/O. -The `FileMetadata` returned by the storage provider is passed between activities, +The `FileMetadata` returned by the storage provider flows between activities, decoupling the workflow from storage specifics. -### File Naming +### File naming Files follow a deterministic naming scheme scoped to the workflow run: -| File | Name Pattern | +| File | Name pattern | |:--------|:---------------------------------------------------------| | BOM | `vuln-analysis//bom.proto` | | VDR | `vuln-analysis//vdr_.proto` | @@ -318,122 +201,121 @@ Files follow a deterministic naming scheme scoped to the workflow run: ## Resiliency -### Activity Retries +### Activity retries Analyzer invocations use the following retry policy: -| Parameter | Value | -|:---------------------|:------| -| Initial delay | 5 s | -| Delay multiplier | 2x | -| Randomization factor | 0.3 | -| Max delay | 5 m | -| Max attempts | 5 | +| Parameter | Value | +|:---------------------|:-----------| +| Initial delay | 5 s | +| Delay multiplier | 2x | +| Randomization factor | 0.3 | +| Max delay | 1 min | +| Max attempts | 5 | -This yields a maximum retry window of roughly 2-3 minutes before an analyzer -invocation is considered permanently failed. +This yields a retry window of roughly 2-3 minutes before an analyzer +invocation counts as permanently failed. -### Graceful Failure Handling +### Graceful failure handling When an analyzer fails (even after exhausting retries), the workflow catches the `ActivityFailureException` and records the analyzer as failed. The workflow continues -with results from successful analyzers. During reconciliation, findings attributed -to failed analyzers are preserved. Their attributions are not deleted, since the -absence of a result does not imply the absence of a vulnerability. +with results from successful analyzers. During reconciliation, the activity keeps +findings attributed to failed analyzers and does not delete their attributions, since +the absence of a result does not imply the absence of a vulnerability. -An `ANALYZER_ERROR` notification is emitted for each failed analyzer. +The workflow emits an `ANALYZER_ERROR` notification for each failed analyzer. -### File Cleanup +### File cleanup The `deleteFiles` step runs in both the success path and the error path (via catch + rethrow), -ensuring BOM and VDR files are cleaned up regardless of outcome. +which cleans up BOM and VDR files regardless of outcome. ## Reconciliation -During reconciliation, VDRs from all successful analyzers are processed, -and findings synchronized with the database. +During reconciliation, the activity processes VDRs from all successful analyzers and +synchronizes findings with the database. -### Vulnerability Merge +### Vulnerability merge -When multiple analyzers report the same vulnerability (identified by source and vulnerability ID), -only one report is used for synchronization. VDRs are processed in alphabetical +When more than one analyzer reports the same vulnerability (identified by source and vulnerability ID), +the activity uses only one report for synchronization. It processes VDRs in alphabetical analyzer name order, and the first report wins. The exception is when a later report carries a pre-populated database ID (only set by the internal analyzer), in which case it takes precedence, since the ID allows skipping database lookups during synchronization. -### Vulnerability Synchronization +### Vulnerability synchronization -Reported vulnerabilities are synchronized to the database using a three-tier strategy: +The activity synchronizes reported vulnerabilities to the database using a three-tier strategy: 1. Pre-populated IDs: Vulnerabilities whose database ID is already known (from the internal - analyzer) are returned directly without any database query. -2. Read-only resolution: Vulnerabilities that cannot be updated by analyzers (see - [Vulnerability Updates](#vulnerability-updates)) are resolved via a `SELECT` query, avoiding - the exclusive row locks that an upsert would acquire. -3. Upsert: Remaining vulnerabilities are inserted or updated via - `INSERT ... ON CONFLICT DO UPDATE`. + analyzer) return directly without any database query. +2. Read-only resolution: Vulnerabilities that analyzers can not update (see + [Vulnerability updates](#vulnerability-updates)) resolve via a `SELECT` query, which avoids + the exclusive row locks that an upsert would take. +3. Upsert: Remaining vulnerabilities go through `INSERT ... ON CONFLICT DO UPDATE`. -Vulnerabilities are processed in batches of 100. +The activity processes vulnerabilities in batches of 100. -### Vulnerability Updates +### Vulnerability updates An analyzer can update a vulnerability's data if: -* It is the authoritative source for that vulnerability type +* It serves as the authoritative source for that vulnerability type (for example, `oss-index` for `OSSINDEX`, `snyk` for `SNYK`), or -* The vulnerability source can be mirrored (for example, `NVD`, `GITHUB`, `OSV`), - and mirroring for that source is disabled. +* The vulnerability source supports mirroring (for example, `NVD`, `GITHUB`, `OSV`), + and an operator turned off mirroring for that source. -Vulnerabilities from the `INTERNAL` source are never modified by analyzers. +Analyzers never change vulnerabilities from the `INTERNAL` source. -In addition to the source-level checks, the upsert enforces two row-level guards: +Beyond the source-level checks, the upsert enforces two row-level guards: * Temporal: the incoming `UPDATED` timestamp must be strictly newer than the existing one. This prevents older data from overwriting newer data. -* Idempotency: all mutable fields are compared via `IS DISTINCT FROM`. - If the incoming data is identical to what is already stored, the row is not written. +* Idempotency: the upsert compares all mutable fields via `IS DISTINCT FROM`. + If the incoming data matches what's already stored, the row is not written. -### Alias Synchronization +### Alias synchronization -After vulnerability synchronization, alias assertions reported by each analyzer are -synchronized to the database. +After vulnerability synchronization, the activity synchronizes alias assertions reported +by each analyzer to the database. -Refer to ADR-014 for details on the alias schema and synchronization algorithm. +Refer to [ADR-014][adr-014] for details on the alias schema and synchronization algorithm. -### Finding Attributions +### Finding attributions Each finding (component and vulnerability pair) tracks which analyzers reported it via attributions. During reconciliation: -1. New attributions are created for newly reported findings. -2. Stale attributions are soft-deleted when an analyzer that previously reported - a finding no longer does. Attributions from *failed* analyzers are preserved. -3. A finding is considered active as long as it has at least one non-deleted attribution. +1. The activity creates new attributions for newly reported findings. +2. The activity soft-deletes stale attributions when an analyzer that prior reported + a finding no longer does. Attributions from *failed* analyzers stay intact. +3. A finding counts as active as long as it has at least one non-deleted attribution. -Refer to ADR-013 for further details on the attribution mechanism. +Refer to [ADR-013][adr-013] for further details on the attribution mechanism. -### Policy Evaluation +### Policy evaluation -After findings are reconciled, vulnerability policies are evaluated against +After the activity reconciles findings, it evaluates vulnerability policies against all active findings. Policy results can automatically set analysis states (for example, suppression), and any resulting audit changes emit notifications. -When a policy that previously matched a finding no longer does, the automatically -applied analysis state is reset to defaults. +When a policy that prior matched a finding no longer does, the automatically +applied analysis state resets to defaults. !!! tip - Policy analyses are applied atomically with finding reconciliation. - A newly identified finding can therefore be immediately suppressed, + The activity applies policy analyses atomically with finding reconciliation. + A newly identified finding can thus get suppressed immediately, without ever showing up in time series metrics, or triggering a `NEW_VULNERABILITY` notification. ### Notifications -The following notifications can be emitted during reconciliation: +Reconciliation can emit the following notifications: * `NEW_VULNERABILITY`: For each newly created finding, and for findings that become - active again after previously being inactive (see [Finding Attributions](#finding-attributions)). -* `VULNERABILITY_RETRACTED`: When a finding becomes inactive, that is, all its attributions - have been soft-deleted and no analyzer reports it anymore. + active again after going inactive (see [Finding attributions](#finding-attributions)). +* `VULNERABILITY_RETRACTED`: When a finding becomes inactive, that's, all its attributions + have gone soft-deleted and no analyzer reports it anymore. * `NEW_VULNERABLE_DEPENDENCY`: When a BOM upload introduces new components that have existing vulnerabilities. The BOM upload trigger stores a context file containing the IDs of newly added components. During reconciliation, if the context file is present, @@ -442,27 +324,17 @@ The following notifications can be emitted during reconciliation: suppression of an existing finding. * `ANALYZER_ERROR`: For each analyzer that failed during invocation. -All notifications are emitted within the same transaction that persists findings, +The activity emits all notifications within the same transaction that persists findings, following the [transactional outbox](notifications.md) pattern. -## Post-Workflow Notifications - -After the workflow run completes (regardless of outcome), a `PROJECT_VULN_ANALYSIS_COMPLETE` -notification is emitted. For successful runs, it includes the full list of active findings for the project. -For failed runs, it includes a failed status indicator. - -This notification is emitted by a [durable execution engine](durable-execution.md) event listener. -It fires once for each batch of completed workflow runs. +## Post-workflow event listeners -!!! note - Two additional event listeners exist for backward compatibility and are scheduled - for removal: +A backward-compatibility event listener reacts to completed workflow runs. The team plans to remove it: - * **Delayed BOM processed notification**: Optionally defers the `BOM_PROCESSED` - notification until after vulnerability analysis completes. Only applies to analyses - that were triggered by a BOM upload. - * **Legacy workflow step completer**: Bridges new durable execution engine workflow runs to the - legacy workflow state tracking system, - and dispatches downstream policy evaluation and metrics update events. +* **Delayed BOM processed notification**: When [`dt.tmp.delay-bom-processed-notification`](../../../reference/configuration/properties.md#dttmpdelay-bom-processed-notification) + is on, defers the `BOM_PROCESSED` notification until after vulnerability analysis + completes. Only applies to analyses initiated by a BOM upload. [Trivy]: https://trivy.dev/ +[adr-013]: https://github.com/DependencyTrack/dependency-track/tree/main/docs/adr/013-finding-status.md +[adr-014]: https://github.com/DependencyTrack/dependency-track/tree/main/docs/adr/014-new-alias-schema.md