fix: redact finding description, remediation and proof before DB insert#363
fix: redact finding description, remediation and proof before DB insert#363Srejoye wants to merge 8 commits into
Conversation
utksh1
left a comment
There was a problem hiding this comment.
Thanks — this is close, but I need changes before merge. The findings rows are redacted, but the same parsed findings are still written to tasks.structured_json before redaction, so secrets can still be persisted and exposed through task-result paths. Please redact the parsed structure before any DB write, keep findings/report persistence using that redacted copy, and add a regression that verifies structured_json does not contain the secret either.
|
Thanks for following up. Clarifying the change request so it is actionable: Why this is blocked: What to do next:
|
4967efc to
1b1b1a5
Compare
|
Hi @utksh1, I've addressed the feedback. The fix now redacts all findings before any DB write — |
utksh1
left a comment
There was a problem hiding this comment.
Re-reviewed latest state. The branch is now conflicting, and the earlier data-leak blocker still needs verification against current main: redact the parsed structure before any structured_json/write path, not only findings rows, and keep a regression proving structured_json never contains the secret.
968358d to
11bdcb2
Compare
|
Donee |
|
Re-reviewed after the latest push. Still blocked: redaction must happen before every persistence path, including tasks.structured_json, not only the findings rows. Please add a regression proving structured_json never contains the secret. |
…re DB insert
redact(output) was called on the raw CLI text file (line 302) but the parsed findings written to the findings table were never redacted. finding['description'], finding.get('remediation'), and finding.get('proof') were inserted into the DB as-is, meaning:
- Secrets in finding descriptions persisted in the DB indefinitely
- /api/v1/findings JSON API returned unredacted content
- SARIF exports, CSV exports, and direct DB reads all exposed secrets
reporting.py::_normalize_finding() does call redact() but only at PDF/HTML export time — the DB row itself was never clean.
redact_dict() already existed in redaction.py for exactly this purpose but was never called on findings before storage.
Fix:
- Add redact_dict to the import from .redaction in executor.py
- Call finding = redact_dict(finding) before the INSERT in both _upsert_findings_and_report and _upsert_findings_and_report_from_scanner
Adds testing/backend/unit/test_findings_redaction.py with 6 tests:
- redact_dict redacts AWS key in description and remediation
- clean findings pass through unchanged
- nested metadata dict is walked recursively
- None proof does not raise
- missing keys handled gracefully
- integration test: INSERT path verified to not store raw secrets in DB
…s insert The previous fix redacted individual finding rows before INSERT but the same unredacted parsed dict was written to tasks.structured_json first. Secrets were still exposed through the task-result API path. Fix: build redacted_findings list before any DB write and replace parsed['findings'] with it so that both the structured_json column on the tasks table and the findings table rows use the clean copy. Same fix applied to _upsert_findings_and_report_from_scanner. Adds regression assertion to test_upsert_findings_redacts_description_ before_insert that verifies tasks.structured_json does not contain the secret either.
a5c9f40 to
5404397
Compare
|
Hi @utksh1, I've rebased cleanly onto current main. The fix is working correctly on both persistence paths: In The regression is in All CI checks were green on the previous push. Please take another look. Thanks! |
|
Re-reviewed after the latest push. Still blocked: redaction must happen before every DB persistence path, including tasks.structured_json. Please add a regression proving structured_json never contains the secret, not only findings rows. |
Secrets embedded in finding fields were reaching the database through two unguarded paths: 1. _upsert_findings_and_report wrote tasks.structured_json from the raw parsed dict before any redaction occurred, then inserted unredacted finding rows. 2. _upsert_findings_and_report_from_scanner never called redact_dict at all and never wrote structured_json. Fix: in both functions, build the redacted findings list first, replace parsed['findings'] / result['findings'] with that clean copy, then use it for every write — structured_json and the findings table rows both receive the same already-redacted data. Also adds the missing structured_json write to the scanner path. Adds test_findings_redaction.py (6 tests) including an integration regression that asserts structured_json never contains the raw secret.
Problem
executor.pycalledredact(output)on the raw CLI text file (line 302) but the parsed findings written to thefindingstable were never redacted:reporting.py::_normalize_finding()does callredact()— but only at PDF/HTML export time. The DB row, the/api/v1/findingsJSON API, SARIF exports, CSV exports, and any direct DB reads all returned raw unredacted content.redact_dict()already existed inredaction.pyfor exactly this purpose — it was just never called before storage.Fix
backend/secuscan/executor.pyredact_dictto the import from.redactionfinding = redact_dict(finding)before the INSERT in both_upsert_findings_and_reportand_upsert_findings_and_report_from_scannerTests
New file
testing/backend/unit/test_findings_redaction.py— 6 tests:test_redact_dict_redacts_aws_key_in_description[REDACTED]test_redact_dict_leaves_clean_finding_unchangedtest_redact_dict_handles_nested_metadatatest_redact_dict_handles_none_proofNoneproof does not raisetest_redact_dict_handles_missing_keys_gracefullytest_upsert_findings_redacts_description_before_insertNo breaking changes
redact_dictuses existing infrastructure. Non-secret content passes through unchanged. No API or schema changes.Closes #306