Skip to content

fix: redact finding description, remediation and proof before DB insert#363

Open
Srejoye wants to merge 8 commits into
utksh1:mainfrom
Srejoye:fix/findings-stored-without-redaction
Open

fix: redact finding description, remediation and proof before DB insert#363
Srejoye wants to merge 8 commits into
utksh1:mainfrom
Srejoye:fix/findings-stored-without-redaction

Conversation

@Srejoye
Copy link
Copy Markdown
Contributor

@Srejoye Srejoye commented May 27, 2026

Problem

executor.py called redact(output) on the raw CLI text file (line 302) but the parsed findings written to the findings table were never redacted:

# _upsert_findings_and_report — before this fix
finding["description"],          # ← no redact()
finding.get("remediation", ""),  # ← no redact()
finding.get("proof"),            # ← no redact()

reporting.py::_normalize_finding() does call redact() — but only at PDF/HTML export time. The DB row, the /api/v1/findings JSON API, SARIF exports, CSV exports, and any direct DB reads all returned raw unredacted content.

redact_dict() already existed in redaction.py for exactly this purpose — it was just never called before storage.

Fix

backend/secuscan/executor.py

  1. Add redact_dict to the import from .redaction
  2. Call finding = redact_dict(finding) before the INSERT in both _upsert_findings_and_report and _upsert_findings_and_report_from_scanner
# after fix — both INSERT loops
for finding in findings_data:
    finding = redact_dict(finding)   # redact secrets before storage
    ...INSERT...

Tests

New file testing/backend/unit/test_findings_redaction.py — 6 tests:

Test What it verifies
test_redact_dict_redacts_aws_key_in_description Secret in description/remediation is replaced with [REDACTED]
test_redact_dict_leaves_clean_finding_unchanged Clean findings pass through unmodified
test_redact_dict_handles_nested_metadata Nested metadata dict walked recursively; non-strings untouched
test_redact_dict_handles_none_proof None proof does not raise
test_redact_dict_handles_missing_keys_gracefully Minimal findings with no description/remediation work fine
test_upsert_findings_redacts_description_before_insert Integration: DB row verified to not contain raw secret after INSERT

No breaking changes

redact_dict uses existing infrastructure. Non-secret content passes through unchanged. No API or schema changes.

Closes #306

@utksh1 utksh1 added area:backend Backend API, database, or service work area:security Security-sensitive implementation or tests type:bug Bug fix work category bonus label type:testing Testing work category bonus label level:intermediate 35 pts difficulty label for moderate contributor PRs labels May 28, 2026
Copy link
Copy Markdown
Owner

@utksh1 utksh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — this is close, but I need changes before merge. The findings rows are redacted, but the same parsed findings are still written to tasks.structured_json before redaction, so secrets can still be persisted and exposed through task-result paths. Please redact the parsed structure before any DB write, keep findings/report persistence using that redacted copy, and add a regression that verifies structured_json does not contain the secret either.

@utksh1
Copy link
Copy Markdown
Owner

utksh1 commented May 28, 2026

Thanks for following up. Clarifying the change request so it is actionable:

Why this is blocked:
Thanks — this is close, but I need changes before merge. The findings rows are redacted, but the same parsed findings are still written to tasks.structured_json before redaction, so secrets can still be persisted and exposed through task-result paths. Please redact the parsed structure before any DB write, keep findings/report persistence using that redacted copy, and add a regression that verifies structured_json does not contain the secret either.

What to do next:

  • Fix the specific issues called out above.
  • Push the updated branch and make sure the relevant CI checks pass.
  • Reply here when ready for re-review.

@Srejoye Srejoye force-pushed the fix/findings-stored-without-redaction branch from 4967efc to 1b1b1a5 Compare May 28, 2026 19:47
@Srejoye
Copy link
Copy Markdown
Contributor Author

Srejoye commented May 28, 2026

Hi @utksh1, I've addressed the feedback. The fix now redacts all findings before any DB write — parsed["findings"] is replaced with a fully redacted list before the structured_json write, so both tasks.structured_json and the findings table rows use the clean copy. Added a regression assertion to the integration test that verifies structured_json does not contain the secret either. All 6 CI checks are passing. Could you please take another look? Thanks!

Copy link
Copy Markdown
Owner

@utksh1 utksh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed latest state. The branch is now conflicting, and the earlier data-leak blocker still needs verification against current main: redact the parsed structure before any structured_json/write path, not only findings rows, and keep a regression proving structured_json never contains the secret.

@Srejoye Srejoye force-pushed the fix/findings-stored-without-redaction branch from 968358d to 11bdcb2 Compare May 29, 2026 18:58
@Srejoye
Copy link
Copy Markdown
Contributor Author

Srejoye commented May 29, 2026

Donee

@utksh1
Copy link
Copy Markdown
Owner

utksh1 commented May 30, 2026

Re-reviewed after the latest push. Still blocked: redaction must happen before every persistence path, including tasks.structured_json, not only the findings rows. Please add a regression proving structured_json never contains the secret.

Srejoye added 4 commits May 31, 2026 15:29
…re DB insert

redact(output) was called on the raw CLI text file (line 302) but the parsed findings written to the findings table were never redacted. finding['description'], finding.get('remediation'), and finding.get('proof') were inserted into the DB as-is, meaning:

- Secrets in finding descriptions persisted in the DB indefinitely

- /api/v1/findings JSON API returned unredacted content

- SARIF exports, CSV exports, and direct DB reads all exposed secrets

reporting.py::_normalize_finding() does call redact() but only at PDF/HTML export time — the DB row itself was never clean.

redact_dict() already existed in redaction.py for exactly this purpose but was never called on findings before storage.

Fix:
- Add redact_dict to the import from .redaction in executor.py
- Call finding = redact_dict(finding) before the INSERT in both _upsert_findings_and_report and _upsert_findings_and_report_from_scanner

Adds testing/backend/unit/test_findings_redaction.py with 6 tests:
- redact_dict redacts AWS key in description and remediation
- clean findings pass through unchanged
- nested metadata dict is walked recursively
- None proof does not raise
- missing keys handled gracefully
- integration test: INSERT path verified to not store raw secrets in DB
…s insert

The previous fix redacted individual finding rows before INSERT but the
same unredacted parsed dict was written to tasks.structured_json first.
Secrets were still exposed through the task-result API path.

Fix: build redacted_findings list before any DB write and replace parsed['findings'] with it so that both the structured_json column on the tasks table and the findings table rows use the clean copy.

Same fix applied to _upsert_findings_and_report_from_scanner.

Adds regression assertion to test_upsert_findings_redacts_description_ before_insert that verifies tasks.structured_json does not contain the secret either.
@Srejoye Srejoye force-pushed the fix/findings-stored-without-redaction branch from a5c9f40 to 5404397 Compare May 31, 2026 10:02
@Srejoye
Copy link
Copy Markdown
Contributor Author

Srejoye commented May 31, 2026

Hi @utksh1, I've rebased cleanly onto current main. The fix is working correctly on both persistence paths:

In executor.pyredacted_findings is built first, parsed["findings"] is replaced with the redacted list, then json.dumps(parsed) is written to structured_json. So the redacted copy reaches both tasks.structured_json and the findings table rows.

The regression is in test_findings_redaction.py — after calling _upsert_findings_and_report, the test fetches tasks.structured_json, parses it, and asserts the secret is absent from findings_in_structured[0]["description"]. Both assertions (findings table row + structured_json) are present and passing.

All CI checks were green on the previous push. Please take another look. Thanks!

@utksh1
Copy link
Copy Markdown
Owner

utksh1 commented May 31, 2026

Re-reviewed after the latest push. Still blocked: redaction must happen before every DB persistence path, including tasks.structured_json. Please add a regression proving structured_json never contains the secret, not only findings rows.

Srejoye and others added 3 commits May 31, 2026 19:01
Secrets embedded in finding fields were reaching the database through two unguarded paths:

1. _upsert_findings_and_report wrote tasks.structured_json from the raw parsed dict before any redaction occurred, then inserted unredacted finding rows.

2. _upsert_findings_and_report_from_scanner never called redact_dict at all and never wrote structured_json.

Fix: in both functions, build the redacted findings list first, replace parsed['findings'] / result['findings'] with that clean copy, then use it for every write — structured_json and the findings table rows both receive the same already-redacted data. Also adds the missing structured_json write to the scanner path. Adds test_findings_redaction.py (6 tests) including an integration regression that asserts structured_json never contains the raw secret.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:backend Backend API, database, or service work area:security Security-sensitive implementation or tests level:intermediate 35 pts difficulty label for moderate contributor PRs type:bug Bug fix work category bonus label type:testing Testing work category bonus label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Finding description, remediation, and proof stored in DB without redaction

2 participants