Split SARIF broadcasts into a deduplicated finding pool by frabert · Pull Request #506 · trailofbits/buttercup

frabert · 2026-04-08T09:12:35Z

Extract individual findings from SARIF results into a separate Redis pool with fingerprint-based deduplication, and switch seed-gen vuln discovery to consume findings instead of raw SARIF broadcasts. Findings are formatted as compact XML hints in prompts, and sampling probability now scales with pool size.

CC @reytchison

Extract individual findings from SARIF results into a separate Redis pool with fingerprint-based deduplication, and switch seed-gen vuln discovery to consume findings instead of raw SARIF broadcasts. Findings are formatted as compact XML hints in prompts, and sampling probability now scales with pool size. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hbrodin

Review: 7 findings (5 P2, 2 P3)

The PR's core design is solid — extracting structured findings from raw SARIF and deduplicating via fingerprints is a good simplification for the seed-gen pipeline. The issues below are mostly around atomicity of the Redis dedup pattern, a consistency bug in the backward-compat fallback, and missing test coverage for the new methods.

See inline comments for details.

hbrodin · 2026-04-09T16:17:54Z

common/src/buttercup/common/sarif_store.py

+            if self.redis.sismember(seen_key, finding.fingerprint):
+                continue
+            self.redis.rpush(finding_key, finding.model_dump_json())
+            self.redis.sadd(seen_key, finding.fingerprint)


P2 — Race condition: non-atomic dedup allows duplicate findings.

The sismember → rpush → sadd sequence is three independent Redis commands. Two concurrent store() calls for the same task_id can both pass the sismember check before either executes sadd, causing the same finding to be rpushed twice.

The rest of the codebase uses redis.pipeline() for atomic multi-key writes (e.g. maps.py:67, task_registry.py:99, sets.py:121).

Simplest fix — use sadd return value as the atomic guard:

for finding in findings: if not self.redis.sadd(seen_key, finding.fingerprint): continue self.redis.rpush(finding_key, finding.model_dump_json()) added += 1

sadd returns 1 if the member was added (new), 0 if it already existed. This eliminates the TOCTOU window in a single atomic check and also fixes the operation ordering (the guard is set before the list push, so a crash between the two can't leave an unguarded duplicate).

hbrodin · 2026-04-09T16:17:54Z

common/src/buttercup/common/sarif_store.py

+        if all_findings:
+            self._store_findings(task_id, all_findings)
+
+        return all_findings


P2 — Fallback returns un-deduplicated findings.

The primary path (line 220) returns from the deduplicated Redis pool, but this fallback returns the raw all_findings list. If two stored SARIFs contain findings with the same fingerprint, the first call returns duplicates (while all subsequent calls return the deduplicated pool). This inconsistency propagates to sample_findings(), inflating both the probability calculation (len(self.findings)) and the sampling pool.

Suggested fix — re-read from Redis after storing, or deduplicate before returning:

if all_findings: self._store_findings(task_id, all_findings) # Return from the deduplicated pool to match the primary path return self.get_findings_by_task_id(task_id) return []

(This is safe from infinite recursion because _store_findings populates the finding key, so the recursive call hits the primary if finding_list: path.)

hbrodin · 2026-04-09T16:17:54Z

common/src/buttercup/common/sarif_store.py

-        """
-        # Get all SARIF keys in Redis
+        if added > 0:
+            logger.info("Added %d new findings for task %s (total pool: %d)", added, task_id, added)


P2 — Log message says "total pool" but reports the batch count.

The third format arg is added again, not the actual pool size. If 2 findings are added to a pool that already has 10, the log says total pool: 2.

Suggested change

logger.info("Added %d new findings for task %s (total pool: %d)", added, task_id, added)

logger.info("Added %d new findings for task %s (total pool: %d)", added, task_id, self.redis.llen(finding_key))

hbrodin · 2026-04-09T16:17:54Z

seed-gen/src/buttercup/seed_gen/vuln_base_task.py

+        if pool_size == 0:
+            return []
+
+        probability = min(1.0, self.MIN_FINDING_PROBABILITY + 0.14 * pool_size)


P3 — Docstring/formula mismatch; magic constant.

The docstring says probability starts at MIN_FINDING_PROBABILITY (0.3) for 1 finding, but 0.3 + 0.14 * 1 = 0.44. The formula reaches 1.0 at pool_size=5 as intended, but the base case doesn't match the docs.

If the intent is p=0.3 at 1 finding and p=1.0 at 5 findings:

scale = (1.0 - self.MIN_FINDING_PROBABILITY) / 4 # 0.175 probability = min(1.0, self.MIN_FINDING_PROBABILITY + scale * (pool_size - 1))

Deriving the coefficient from the constants also eliminates the magic 0.14.

hbrodin · 2026-04-09T16:17:55Z

seed-gen/src/buttercup/seed_gen/vuln_base_task.py

-            logger.info("Using %d SARIFs for challenge %s", len(self.sarifs), self.package_name)
-            return self.sarifs
-        return []
+    def sample_findings(self) -> list[Finding]:


P2 — No test coverage for sample_findings().

This method has non-trivial logic (probability scaling, MAX_FINDINGS cap, random.sample) but zero tests. Both vuln_discovery test fixtures pass findings=[], so sample_findings() always returns [] in tests.

Edge cases worth covering: empty pool → [], pool_size=1 (probability=0.44), pool_size > MAX_FINDINGS (capped sampling), pool_size=5+ (probability clamped to 1.0). Use unittest.mock.patch('random.random') and patch('random.sample') to make the tests deterministic.

hbrodin · 2026-04-09T16:17:55Z

seed-gen/src/buttercup/seed_gen/vuln_base_task.py

-    def format_sarif_hints(self) -> str:
-        """Format SARIF hints for prompts"""
-        if not self.sarifs:
+    def format_finding_hints(self) -> str:


P2 — No test coverage for format_finding_hints().

This method builds structured XML from Finding objects and is called by both Full and Delta tasks' _gather_context and _analyze_bug prompts. Since test fixtures pass findings=[], this always returns "" in tests — the XML formatting, multi-line vs single-line display, and field interpolation are never exercised.

hbrodin · 2026-04-09T16:17:55Z

common/tests/test_sarif_store.py

+    assert findings[1].rule_id == "CWE-787"
+
+
+def test_findings_deduplication(sarif_store, sarif_with_findings):


P3 — Dedup test only covers same-SARIF case.

This stores the identical SARIF broadcast twice, but the primary real-world dedup scenario is two different SARIFs containing findings with the same fingerprint (same rule+location, different sarif_id). Consider adding a test with a second SARIF broadcast that has an overlapping finding to verify cross-SARIF deduplication.

frabert requested review from hbrodin, michaelbrownuc, ret2libc and reytchison as code owners April 8, 2026 09:12

hbrodin reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split SARIF broadcasts into a deduplicated finding pool#506

Split SARIF broadcasts into a deduplicated finding pool#506
frabert wants to merge 1 commit intomainfrom
frabert/split-sarifs

frabert commented Apr 8, 2026

Uh oh!

hbrodin left a comment

Uh oh!

hbrodin Apr 9, 2026

Uh oh!

hbrodin Apr 9, 2026

Uh oh!

hbrodin Apr 9, 2026

Uh oh!

hbrodin Apr 9, 2026

Uh oh!

hbrodin Apr 9, 2026

Uh oh!

hbrodin Apr 9, 2026

Uh oh!

hbrodin Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	logger.info("Added %d new findings for task %s (total pool: %d)", added, task_id, added)
	logger.info("Added %d new findings for task %s (total pool: %d)", added, task_id, self.redis.llen(finding_key))

		assert findings[1].rule_id == "CWE-787"


		def test_findings_deduplication(sarif_store, sarif_with_findings):

Conversation

frabert commented Apr 8, 2026

Uh oh!

hbrodin left a comment

Choose a reason for hiding this comment

Review: 7 findings (5 P2, 2 P3)

Uh oh!

hbrodin Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hbrodin Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hbrodin Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hbrodin Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hbrodin Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hbrodin Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hbrodin Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants