From 5a2cfec608100c9cdcc07a553056d04bcc9c3f0e Mon Sep 17 00:00:00 2001 From: Paul Osinski Date: Wed, 25 Feb 2026 16:38:36 -0500 Subject: [PATCH 1/5] edit dedupe reimport docs --- .../import_intro/import_vs_reimport.md | 10 ++++- .../OS__deduplication_tuning.md | 9 ++++- .../PRO__deduplication_tuning.md | 17 ++++++++- .../about_deduplication.md | 37 ++++++++++++++++--- .../avoid_excess_duplicates.md | 2 +- 5 files changed, 64 insertions(+), 11 deletions(-) diff --git a/docs/content/import_data/import_intro/import_vs_reimport.md b/docs/content/import_data/import_intro/import_vs_reimport.md index 8b21ce9d2a2..a00cf012e8d 100644 --- a/docs/content/import_data/import_intro/import_vs_reimport.md +++ b/docs/content/import_data/import_intro/import_vs_reimport.md @@ -1,5 +1,5 @@ --- -title: "Import vs Reimport" +title: "Reimport" description: "Learn how to import data manually, through the API, or via a connector" weight: 2 aliases: @@ -80,7 +80,13 @@ This header indicates the actions taken by an Import/Reimport. * **\# left untouched shows the count of Open Findings which were unchanged by a Reimport (because they also existed in the incoming report).** * **\#** **reactivated** shows any Closed Findings which were reopened by an incoming Reimport. -## Reimport via API \- special note +## Reimport Deduplication + +Reimport decides whether an incoming item matches an existing Finding using **[Reimport Deduplication](/triage_findings/finding_deduplication/about_deduplication/)** settings. This is separate from “Same Tool Deduplication” and “Cross Tool Deduplication,” which operate after Findings exist. + +If you are seeing Reimport close old Findings and create new Findings when only a minor attribute changes (for example, a line number shift), tune **Reimport Deduplication** for that tool to use stable identifiers that ignore those attributes (such as Unique ID From Tool). + +## Reimport via API - special note Note that the /reimport API endpoint can both **extend an existing Test** (apply the method in this article) **or create a new Test** with new data \- an initial call to `/import`, or setting up a Test in advance is not required. diff --git a/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md b/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md index 82968cb87f5..49d2fec33ea 100644 --- a/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md +++ b/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md @@ -1,5 +1,5 @@ --- -title: "Deduplication Tuning" +title: "Deduplication Tuning (Open Source)" description: "Configure deduplication in DefectDojo Open Source: algorithms, hash fields, endpoints, and service" weight: 5 audience: opensource @@ -106,6 +106,10 @@ Notes: ## After changing deduplication settings +After changing algorithms or Hash computation, you will need to **recompute hashes** for the affected parser/test type before the new matching behavior will apply consistently across existing data. + +Note: Recomputing hashes can be lead to on large instances. Plan maintenance windows accordingly. + - Changes to dedupe configuration (e.g., `HASHCODE_FIELDS_PER_SCANNER`, `HASH_CODE_FIELDS_ALWAYS`, `DEDUPLICATION_ALGORITHM_PER_PARSER`) are not applied retroactively automatically. To re-evaluate existing findings you must run the management command below. Run inside the uwsgi container. Example (hash codes only, no dedupe): @@ -141,3 +145,6 @@ To help troubleshooting deduplication use the following tools: ![Unique ID from Tool and Hash Code on the View Finding page](images/hash_code_id_field.png) ![Unique ID from Tool and Hash Code on the Finding List Status Column](images/hash_code_status_column.png) + +In Open Source, + diff --git a/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md b/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md index a3cbfa6dd03..b7fc60495a2 100644 --- a/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md +++ b/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md @@ -1,11 +1,12 @@ --- -title: "Deduplication Tuning" +title: "Deduplication Tuning (Pro)" description: "Configure how DefectDojo identifies and manages duplicate findings" weight: 4 audience: pro aliases: - /en/working_with_findings/finding_deduplication/tune_deduplication --- + Deduplication Tuning is a DefectDojo Pro feature that gives you fine-grained control over how findings are deduplicated, allowing you to optimize duplicate detection for your specific security testing workflow. ## Deduplication Settings @@ -41,6 +42,8 @@ Uses a combination of selected fields to generate a unique hash. When selected, #### Unique ID From Tool Leverages the security tool's own internal identifier for findings, ensuring perfect deduplication when the scanner provides reliable unique IDs. +This algorithm can be useful when working with SAST scanners, or situations where a Finding can "move around" in source code as development progresses. + #### Unique ID From Tool or Hash Code Attempts to use the tool's unique ID first, then falls back to the hash code if no unique ID is available. This provides the most flexible deduplication option. @@ -60,7 +63,11 @@ Unlike Same Tool Deduplication, Cross Tool Deduplication only supports the Hash ## Reimport Deduplication -Reimport Deduplication Settings are specifically designed for reimporting data using Universal Parsers or the Generic Parser. +**⚠️ Reimport processes can completely discard Findings before they are recorded. This can lead to data loss if set incorrectly, so Reimport Deduplication settings should be adjusted with caution.** + +Reimport Deduplication Settings can be used to set an algorithm for Universal Parsers, or for a Generic Findings Import Parser. + +Reimport Deduplication cannot be adjusted for other tools by default. Users who want to adjust the Reimport Deduplication algorithm for other tools in their instance should reach out to [DefectDojo Support](mailto:support@defectdojo.com) for assistance. ![image](images/reimport_deduplication.png) @@ -74,6 +81,8 @@ The same three algorithm options are available for Reimport Deduplication as for - Unique ID From Tool - Unique ID From Tool or Hash Code +Reimport can completely discard Findings before they are recorded, so Reimport Deduplication settings should be adjusted with caution. + ## Deduplication Best Practices For optimal results with Deduplication Tuning: @@ -85,3 +94,7 @@ For optimal results with Deduplication Tuning: - **Avoid overly broad deduplication**: Cross-tool deduplication with too few hash fields may result in false duplicates By tuning deduplication settings to your specific tools, you can significantly reduce duplicate noise. + +## Locked Findings + +Whenever Deduplication Settings are changed for a given tool, Deduplication hashes will need to be re-calculated for that tool across the entire DefectDojo instance. During this process, Findings of this tool will be "locked", and their Deduplication Algorithm cannot not be changed again until the recalculation is complete. \ No newline at end of file diff --git a/docs/content/triage_findings/finding_deduplication/about_deduplication.md b/docs/content/triage_findings/finding_deduplication/about_deduplication.md index 5e18a8c21cb..da338a8bb12 100644 --- a/docs/content/triage_findings/finding_deduplication/about_deduplication.md +++ b/docs/content/triage_findings/finding_deduplication/about_deduplication.md @@ -26,13 +26,29 @@ By default, these Tests would need to be nested under the same Product for Dedup Duplicate Findings are set as Inactive by default. This does not mean the Duplicate Finding itself is Inactive. Rather, this is so that your team only has a single active Finding to work on and remediate, with the implication being that once the original Finding is Mitigated, the Duplicates will also be Mitigated. -## Deduplication vs Reimport +## Reimport Deduplication -Deduplication and Reimport are similar processes but they have a key difference: +Deduplication and Reimport are similar processes, but they use different algorithms to identify Finding matches. -* When you Reimport to a Test, the Reimport process looks at incoming Findings, **filters and** **discards any matches**. Those matches will never be created as Findings or Finding Duplicates. -* Deduplication is applied 'passively' on Findings that have already been created. It will identify duplicates in scope and **label them**, but it will not delete or discard the Finding unless 'Delete Deduplicate Findings' is enabled. -* The 'reimport' action of discarding a Finding always happens before deduplication; DefectDojo **cannot deduplicate Findings that are never created** as a result of Reimport's filtering. +* When you Reimport to a Test, the Reimport process looks at incoming Findings, **compares hash codes, and then discards any matches**. Those matches will never be created as Findings or Finding Duplicates. + +However, any Findings that remain after Reimport Deduplication are still subject to Same-Tool Deduplication. So if you use narrower a scope for Same-Tool Deduplication, you can end up with Duplicates within a Reimport pipeline. + +### Example + +Here's a tool with a Reimport Deduplication algorithm which is different from the Same-Tool Deduplication algorithm. + +| Deduplication Algorithm | Hash Code Fields | +| ----- | ---- | +| Reimport | Title, CWE, Severity, Description, Line Number | +| Same-Tool | Title, CWE, Severity, Description | + +Let's say you had a Finding in DefectDojo with a given line number. You re-scanned your environment and the line number of that vulnerability changed. You reimport to the same Test. Here's what will happen during reimport, and deduplication: + +* During Reimport, the Finding will not be matched to any Findings that already exist, because the line number is different. So a new Finding will be created in the Test. +* After Reimport is complete, the Same-Tool Deduplication algorithm will run. Same-Tool Deduplication does not consider line number in this configuration, so the new Finding will be labelled as a duplicate. + +Reimport can completely discard Findings before they are recorded, so Reimport Deduplication settings should be adjusted with caution. ## When are duplicates appropriate? @@ -119,3 +135,14 @@ For example, let’s say that you had your Maximum Duplicates field set to ‘1 ### Applying this setting Applying **Delete Deduplicate Findings** will begin a deletion process immediately. This setting can be applied on the **System Settings** page. See Enabling Deduplication for more information. + +## Troubleshooting Deduplication + +Sometimes, Deduplication does not work as expected. Here are some examples of ways that Deduplication might not be working correctly, along with possible solutions. + +| What you see | Most likely cause | What to tune | +| --- | --- | --- | +| Reimport closes an old Finding and creates a new one when only the line number changed | Reimport matching uses unstable fields (for example, line number) | Reimport Deduplication (prefer stable IDs or stable hash fields) | +| Multiple Findings are created in the same Test that you believe should be duplicates | Deduplication matching is not configured for that tool or scope | Same Tool Deduplication (and consider “Delete Deduplicate Findings” behavior) | +| Duplicates are created across different tools | Cross-tool matching is disabled or too strict | Cross Tool Deduplication (Pro only) (hash-based matching) | +| Excess duplicates of the same Finding are being created, across Tests | Asset Hierarchy is not set up correctly | [Consider Reimport for continual testing](/triage_findings/finding_deduplication/avoid_excess_duplicates/) | diff --git a/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md b/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md index b30a187e098..dfc07d3c740 100644 --- a/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md +++ b/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md @@ -5,7 +5,7 @@ weight: 4 aliases: - /en/working_with_findings/finding_deduplication/avoiding_duplicates_via_reimport --- -One of DefectDojo’s strengths is that the data model can accommodate many different use\-cases and applications. You’ll likely change your approach as you master the software and discover ways to optimize your workflow. +One of DefectDojo’s strengths is that the data model can accommodate many different use-cases and applications. You’ll likely change your approach as you master the software and discover ways to optimize your workflow. By default, DefectDojo does not delete any duplicate Findings that are created. Each Finding is considered to be a separate instance of a vulnerability. So in this case, **Duplicate Findings** can be an indicator that a process change is required to your workflow. From 6ca29bbaa6356c9b4b0795e6ccbc4a1f7d2dc38d Mon Sep 17 00:00:00 2001 From: Paul Osinski <42211303+paulOsinski@users.noreply.github.com> Date: Thu, 26 Feb 2026 15:28:13 -0500 Subject: [PATCH 2/5] Update docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md Co-authored-by: Cody Maffucci <46459665+Maffooch@users.noreply.github.com> --- .../finding_deduplication/OS__deduplication_tuning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md b/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md index 49d2fec33ea..08b7baed4c6 100644 --- a/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md +++ b/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md @@ -108,7 +108,7 @@ Notes: After changing algorithms or Hash computation, you will need to **recompute hashes** for the affected parser/test type before the new matching behavior will apply consistently across existing data. -Note: Recomputing hashes can be lead to on large instances. Plan maintenance windows accordingly. +Note: Recomputing hashes can be lead to long wait times on large instances. Plan maintenance windows accordingly. - Changes to dedupe configuration (e.g., `HASHCODE_FIELDS_PER_SCANNER`, `HASH_CODE_FIELDS_ALWAYS`, `DEDUPLICATION_ALGORITHM_PER_PARSER`) are not applied retroactively automatically. To re-evaluate existing findings you must run the management command below. From 8bd130504c518687960f19213b4683d06102b460 Mon Sep 17 00:00:00 2001 From: Paul Osinski Date: Thu, 26 Feb 2026 15:30:27 -0500 Subject: [PATCH 3/5] change article name and update links --- docs/content/get_started/about/faq.md | 2 +- docs/content/get_started/common_use_cases/common_use_cases.md | 2 +- .../import_intro/{import_vs_reimport.md => reimport.md} | 0 .../finding_deduplication/avoid_excess_duplicates.md | 2 +- 4 files changed, 3 insertions(+), 3 deletions(-) rename docs/content/import_data/import_intro/{import_vs_reimport.md => reimport.md} (100%) diff --git a/docs/content/get_started/about/faq.md b/docs/content/get_started/about/faq.md index 9132248fd89..f242818a709 100644 --- a/docs/content/get_started/about/faq.md +++ b/docs/content/get_started/about/faq.md @@ -69,7 +69,7 @@ If you're looking to add a new tool to your suite, we have a list of recommended There are two different methods to import a single report from a security tool: - **Import** handles the report as a single point-in-time record. Importing a report creates a Test containing the resulting Findings. -- **[Reimport](/import_data/import_intro/import_vs_reimport/)** is used to update an existing Test with a new set of results. If you have a more open-ended approach to your testing process, you can continuously Reimport the latest version of your report to an existing Test. DefectDojo will compare the results of the incoming report to your existing data, record any changes, and then adjust the Findings in the Test to match the latest report. +- **[Reimport](/import_data/import_intro/reimport/)** is used to update an existing Test with a new set of results. If you have a more open-ended approach to your testing process, you can continuously Reimport the latest version of your report to an existing Test. DefectDojo will compare the results of the incoming report to your existing data, record any changes, and then adjust the Findings in the Test to match the latest report. To understand the difference, it’s helpful to think of Import as recording a single instance of a scan event, and Reimport as updating a continual record of scanning. diff --git a/docs/content/get_started/common_use_cases/common_use_cases.md b/docs/content/get_started/common_use_cases/common_use_cases.md index 0565035ddfd..14f68145752 100644 --- a/docs/content/get_started/common_use_cases/common_use_cases.md +++ b/docs/content/get_started/common_use_cases/common_use_cases.md @@ -38,7 +38,7 @@ Each of these report categories can be handled by a separate Engagement, with a ![image](images/example_product_hierarchy_bigcorp.png) - If a Product has a CI/CD pipeline, all of the results from that pipeline can be continually imported into a single open-ended Engagement. Each tool used will create a separate Test within the CI/CD Engagement, which can be continuously updated with new data. -(See our guide to [Reimport](/import_data/import_intro/import_vs_reimport/)) +(See our guide to [Reimport](/import_data/import_intro/reimport/)) - Each Pen Test effort can have a separate Engagement created to contain all of the results: e.g. "Q1 Pen Test 2024," "Q2 Pen Test 2024," etc. - BigCorp will likely want to run their own mock PCI audit so that they're prepared for the real thing. The results of those audits can also be stored as a separate Engagement. diff --git a/docs/content/import_data/import_intro/import_vs_reimport.md b/docs/content/import_data/import_intro/reimport.md similarity index 100% rename from docs/content/import_data/import_intro/import_vs_reimport.md rename to docs/content/import_data/import_intro/reimport.md diff --git a/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md b/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md index dfc07d3c740..4c06b90484e 100644 --- a/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md +++ b/docs/content/triage_findings/finding_deduplication/avoid_excess_duplicates.md @@ -46,7 +46,7 @@ DefectDojo has two methods for importing test data to create Findings: **Import* Each time you import new vulnerability reports into DefectDojo, those reports will be stored in a Test object. A Test object can be created by a user ahead of time to hold a future **Import**. If a user wants to import data without specifying a Test destination, a new Test will be created to store the incoming report. -Tests are flexible objects, and although they can only hold one *kind* of report, they can handle multiple instances of that same report through the **Reimport** method. To learn more about Reimport, see our **[article](/import_data/import_intro/import_vs_reimport/)** on this topic. +Tests are flexible objects, and although they can only hold one *kind* of report, they can handle multiple instances of that same report through the **Reimport** method. To learn more about Reimport, see our **[article](/import_data/import_intro/reimport/)** on this topic. ## Using Reimport for continual Tests From 2117a814b3b42ce65264963423b8f9ddbf3e9bd6 Mon Sep 17 00:00:00 2001 From: Paul Osinski <42211303+paulOsinski@users.noreply.github.com> Date: Fri, 27 Feb 2026 16:09:35 -0500 Subject: [PATCH 4/5] Update docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md Co-authored-by: Cody Maffucci <46459665+Maffooch@users.noreply.github.com> --- .../finding_deduplication/PRO__deduplication_tuning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md b/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md index b7fc60495a2..998b44bb7d4 100644 --- a/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md +++ b/docs/content/triage_findings/finding_deduplication/PRO__deduplication_tuning.md @@ -97,4 +97,4 @@ By tuning deduplication settings to your specific tools, you can significantly r ## Locked Findings -Whenever Deduplication Settings are changed for a given tool, Deduplication hashes will need to be re-calculated for that tool across the entire DefectDojo instance. During this process, Findings of this tool will be "locked", and their Deduplication Algorithm cannot not be changed again until the recalculation is complete. \ No newline at end of file +Whenever Deduplication Settings are changed for a given tool, Deduplication hashes are re-calculated for that tool across the entire DefectDojo instance. \ No newline at end of file From 295dc73a7c7279481c0eb525cc1271733cc611e8 Mon Sep 17 00:00:00 2001 From: Paul Osinski Date: Fri, 27 Feb 2026 16:13:49 -0500 Subject: [PATCH 5/5] remove weird line --- .../finding_deduplication/OS__deduplication_tuning.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md b/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md index 08b7baed4c6..b4aa8c7d543 100644 --- a/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md +++ b/docs/content/triage_findings/finding_deduplication/OS__deduplication_tuning.md @@ -145,6 +145,3 @@ To help troubleshooting deduplication use the following tools: ![Unique ID from Tool and Hash Code on the View Finding page](images/hash_code_id_field.png) ![Unique ID from Tool and Hash Code on the Finding List Status Column](images/hash_code_status_column.png) - -In Open Source, -