Skip to content

Split SARIF broadcasts into a deduplicated finding pool#506

Open
frabert wants to merge 1 commit intomainfrom
frabert/split-sarifs
Open

Split SARIF broadcasts into a deduplicated finding pool#506
frabert wants to merge 1 commit intomainfrom
frabert/split-sarifs

Conversation

@frabert
Copy link
Copy Markdown

@frabert frabert commented Apr 8, 2026

Extract individual findings from SARIF results into a separate Redis pool with fingerprint-based deduplication, and switch seed-gen vuln discovery to consume findings instead of raw SARIF broadcasts. Findings are formatted as compact XML hints in prompts, and sampling probability now scales with pool size.

CC @reytchison

Extract individual findings from SARIF results into a separate Redis
pool with fingerprint-based deduplication, and switch seed-gen vuln
discovery to consume findings instead of raw SARIF broadcasts. Findings
are formatted as compact XML hints in prompts, and sampling probability
now scales with pool size.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@hbrodin hbrodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: 7 findings (5 P2, 2 P3)

The PR's core design is solid — extracting structured findings from raw SARIF and deduplicating via fingerprints is a good simplification for the seed-gen pipeline. The issues below are mostly around atomicity of the Redis dedup pattern, a consistency bug in the backward-compat fallback, and missing test coverage for the new methods.

See inline comments for details.

if self.redis.sismember(seen_key, finding.fingerprint):
continue
self.redis.rpush(finding_key, finding.model_dump_json())
self.redis.sadd(seen_key, finding.fingerprint)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — Race condition: non-atomic dedup allows duplicate findings.

The sismemberrpushsadd sequence is three independent Redis commands. Two concurrent store() calls for the same task_id can both pass the sismember check before either executes sadd, causing the same finding to be rpushed twice.

The rest of the codebase uses redis.pipeline() for atomic multi-key writes (e.g. maps.py:67, task_registry.py:99, sets.py:121).

Simplest fix — use sadd return value as the atomic guard:

for finding in findings:
    if not self.redis.sadd(seen_key, finding.fingerprint):
        continue
    self.redis.rpush(finding_key, finding.model_dump_json())
    added += 1

sadd returns 1 if the member was added (new), 0 if it already existed. This eliminates the TOCTOU window in a single atomic check and also fixes the operation ordering (the guard is set before the list push, so a crash between the two can't leave an unguarded duplicate).

if all_findings:
self._store_findings(task_id, all_findings)

return all_findings
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — Fallback returns un-deduplicated findings.

The primary path (line 220) returns from the deduplicated Redis pool, but this fallback returns the raw all_findings list. If two stored SARIFs contain findings with the same fingerprint, the first call returns duplicates (while all subsequent calls return the deduplicated pool). This inconsistency propagates to sample_findings(), inflating both the probability calculation (len(self.findings)) and the sampling pool.

Suggested fix — re-read from Redis after storing, or deduplicate before returning:

if all_findings:
    self._store_findings(task_id, all_findings)
    # Return from the deduplicated pool to match the primary path
    return self.get_findings_by_task_id(task_id)

return []

(This is safe from infinite recursion because _store_findings populates the finding key, so the recursive call hits the primary if finding_list: path.)

"""
# Get all SARIF keys in Redis
if added > 0:
logger.info("Added %d new findings for task %s (total pool: %d)", added, task_id, added)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — Log message says "total pool" but reports the batch count.

The third format arg is added again, not the actual pool size. If 2 findings are added to a pool that already has 10, the log says total pool: 2.

Suggested change
logger.info("Added %d new findings for task %s (total pool: %d)", added, task_id, added)
logger.info("Added %d new findings for task %s (total pool: %d)", added, task_id, self.redis.llen(finding_key))

if pool_size == 0:
return []

probability = min(1.0, self.MIN_FINDING_PROBABILITY + 0.14 * pool_size)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 — Docstring/formula mismatch; magic constant.

The docstring says probability starts at MIN_FINDING_PROBABILITY (0.3) for 1 finding, but 0.3 + 0.14 * 1 = 0.44. The formula reaches 1.0 at pool_size=5 as intended, but the base case doesn't match the docs.

If the intent is p=0.3 at 1 finding and p=1.0 at 5 findings:

scale = (1.0 - self.MIN_FINDING_PROBABILITY) / 4  # 0.175
probability = min(1.0, self.MIN_FINDING_PROBABILITY + scale * (pool_size - 1))

Deriving the coefficient from the constants also eliminates the magic 0.14.

logger.info("Using %d SARIFs for challenge %s", len(self.sarifs), self.package_name)
return self.sarifs
return []
def sample_findings(self) -> list[Finding]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — No test coverage for sample_findings().

This method has non-trivial logic (probability scaling, MAX_FINDINGS cap, random.sample) but zero tests. Both vuln_discovery test fixtures pass findings=[], so sample_findings() always returns [] in tests.

Edge cases worth covering: empty pool → [], pool_size=1 (probability=0.44), pool_size > MAX_FINDINGS (capped sampling), pool_size=5+ (probability clamped to 1.0). Use unittest.mock.patch('random.random') and patch('random.sample') to make the tests deterministic.

def format_sarif_hints(self) -> str:
"""Format SARIF hints for prompts"""
if not self.sarifs:
def format_finding_hints(self) -> str:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — No test coverage for format_finding_hints().

This method builds structured XML from Finding objects and is called by both Full and Delta tasks' _gather_context and _analyze_bug prompts. Since test fixtures pass findings=[], this always returns "" in tests — the XML formatting, multi-line vs single-line display, and field interpolation are never exercised.

assert findings[1].rule_id == "CWE-787"


def test_findings_deduplication(sarif_store, sarif_with_findings):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 — Dedup test only covers same-SARIF case.

This stores the identical SARIF broadcast twice, but the primary real-world dedup scenario is two different SARIFs containing findings with the same fingerprint (same rule+location, different sarif_id). Consider adding a test with a second SARIF broadcast that has an overlapping finding to verify cross-SARIF deduplication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants