Skip to content

[FEAT][RACE DETECTOR] Add effect-based race classification and concrete HB solver#345

Open
mark14wu wants to merge 5 commits intoadd_atomic_metadata_plumbing_and_return__bec4from
add_effect_based_race_classification_and_28bc
Open

[FEAT][RACE DETECTOR] Add effect-based race classification and concrete HB solver#345
mark14wu wants to merge 5 commits intoadd_atomic_metadata_plumbing_and_return__bec4from
add_effect_based_race_classification_and_28bc

Conversation

@mark14wu
Copy link
Copy Markdown
Collaborator

@mark14wu mark14wu commented Mar 17, 2026

Add effect-based race classification and concrete HB solver

Replace hard-coded _classify_race with effects_at_addr for mixed
atomic/plain pairs (e.g. failed CAS = read-only). Add HBSolver using
graph reachability over po+sw edges to suppress races proven ordered
by release/acquire synchronization. Integrate into detect_races with
caching and legacy_atomic exclusion.

PR chain

  1. [FEAT][RACE DETECTOR] Add atomic metadata plumbing and return value capture #344
  2. 👉 [FEAT][RACE DETECTOR] Add effect-based race classification and concrete HB solver #345 👈 YOU ARE HERE
  3. [FEAT][RACE DETECTOR] Add symbolic path HB solver #346
  4. [FEAT][RACE DETECTOR] Add atomic sync examples and end-to-end tests #347

⚠️⚠️ Please do not click the green "merge" button unless you know what
you're doing. This PR is part of a chain of PRs, and clicking the merge
button will not merge it into master. ⚠️⚠️

@mark14wu mark14wu changed the title Add effect-based race classification and concrete HB solver [FEAT][RACE DETECTOR] PR2: Add effect-based race classification and concrete HB solver Mar 17, 2026
@mark14wu mark14wu changed the title [FEAT][RACE DETECTOR] PR2: Add effect-based race classification and concrete HB solver [FEAT][RACE DETECTOR] Add effect-based race classification and concrete HB solver Mar 17, 2026
mark14wu and others added 2 commits March 18, 2026 15:53
Replace hard-coded _classify_race with effects_at_addr for mixed
atomic/plain pairs (e.g. failed CAS = read-only). Add HBSolver using
graph reachability over po+sw edges to suppress races proven ordered
by release/acquire synchronization. Integrate into detect_races with
caching and legacy_atomic exclusion.

git-pr-chain: add_effect_based_race_classification_and_28bc
sw edges were created for any release/acquire pair on the same address,
over-suppressing races when the acquire didn't observe the release's
write. Now sw edges require reads-from proof: the acquire's atomic_old
must match the value written by the release.

Also updates RaceType enum from RW/WAW to RAW/WAR/WAW for finer-grained
classification, and adds reads-from helper functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mark14wu mark14wu force-pushed the add_effect_based_race_classification_and_28bc branch from 6de4922 to fbb62ce Compare March 18, 2026 20:14
mark14wu and others added 3 commits March 18, 2026 19:38
- _written_value_at_addr now computes the actual written value for RMW ops
  (e.g., add(5) with old=10 → written=15) instead of returning the operand
- sw edges are only created when the written value uniquely identifies the
  writer, preventing unsound edges when multiple writers produce the same value
- Add _rmw_op_suffix and _compute_rmw_written helpers for all RMW op types
- Add 3 regression tests: ABA ambiguity, add written value, operand mismatch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ambiguity scan in check_race_possible() only skipped the candidate
writer (wi) but not the reader (ri). For a polling CAS like cas(1,1),
_written_value_at_addr() returns the reader's own writeback value,
which the scan treated as a third-party same-value writer, marking it
ambiguous and blocking the sw edge. This broke producer-consumer
patterns using CAS polling on the acquire side.

Fix: extract _has_ambiguous_writer() helper that skips both wi and ri,
and reorder to check _reads_from first as a cheap gate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CTA scope is block-local and cannot synchronize across blocks, but
_detect_global_barrier_addresses(), _is_barrier_atomic(), and
_detect_symbolic_barrier_keys() only checked op kind + address, not
scope. This caused CTA-scoped add+cas at a shared address to be
misidentified as a global barrier, incorrectly splitting epochs and
altering the race candidate set.

Fix: filter to scope in ("gpu", "sys", None) in all three functions,
consistent with the HB solver's _scope_ok() gate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant