Problem Summary
The current BP decoder implementation shows only 2% improvement over a simple syndrome-parity baseline decoder, which is far below expected performance based on published research. After analyzing the code and comparing against recent literature, I've identified fundamental architectural issues that prevent the decoder from working properly.
Expected vs. Actual Performance
What We Should See (from Literature)
Based on recent 2024 research on BP decoders for surface codes:
What We Actually See
At p=0.03 (3% physical error):
| Decoder |
LER |
Precision |
Recall |
F1 |
| Syndrome-parity baseline |
50.6% |
35.6% |
50.8% |
0.418 |
| BP Decoder |
49.6% |
36.1% |
50.3% |
0.421 |
| Improvement |
2.0% |
- |
- |
- |
This is broken! At 3% physical error, we're getting ~50% logical error rate, which means the decoder is barely better than random guessing.
Root Cause Analysis
I've identified three fundamental architectural issues:
Issue 1: Wrong Factor Graph Structure
Current UAI Model (from dem_to_uai()):
# Lines 174-182 in dem.py
n_detectors = dem.num_detectors # 24 for d=3
lines.append(str(n_detectors)) # Variables = DETECTORS
lines.append(str(len(errors))) # Factors = ERROR MECHANISMS
This creates:
- Variables = Detectors (binary: fired=1, not fired=0)
- Factors = Error mechanisms (one factor per DEM error)
The Problem:
- This models P(detectors | errors), the forward model
- For decoding, we need P(errors | syndrome), the inverse model
- Setting detectors as "evidence" makes no sense—we already KNOW the detector states from the syndrome!
Issue 2: Inference on the Wrong Quantity
What the code does:
# examples/tanner_graph_walkthrough.py, line 438
evidence = {det_idx + 1: int(syndrome[det_idx]) for det_idx in range(len(syndrome))}
bp_ev = apply_evidence(bp, evidence)
state, info = belief_propagate(bp_ev, ...)
marginals = compute_marginals(state, bp_ev) # Marginals over DETECTORS!
The Problem:
- After setting evidence, BP computes marginals over detector variables
- But we already know the detector values (from the syndrome)!
- What we don't know is which errors occurred
- The marginals over detectors are useless for decoding
Issue 3: Decoding Logic Outside BP
Current approach:
# Lines 457-499 in examples/tanner_graph_walkthrough.py
# Build error likelihood scores based on syndrome
for det_idx in syndrome_active:
errors_for_detector = np.where(H[det_idx, :] == 1)[0]
error_likelihoods[errors_for_detector] *= 2.0 # Ad-hoc boost
# Apply parity heuristic
if syndrome_weight % 2 == 1:
flip_likelihood_avg *= 3.0 # Ad-hoc multiplier
The Problem:
- The "decoding" happens outside BP using ad-hoc heuristics
- This completely bypasses the power of belief propagation!
- The 2% improvement comes from these heuristics, not from BP
- It's essentially a fancy wrapper around syndrome-parity decoding
What a Proper BP Decoder Should Do
A correct BP decoder for surface codes should:
1. Correct Factor Graph
Variables = Error mechanisms (binary: occurred=1, didn't occur=0)
Factors = Parity check constraints (one per detector)
Evidence = Observed syndrome s
2. Proper Inference
- Enforce syndrome constraints:
s_i = ⊕_{j: H[i,j]=1} e_j (parity checks)
- Run BP to compute marginal probabilities over errors
- Decode by finding most likely error configuration
3. Maximum Likelihood Decoding
error_config = argmax P(e | s)
= argmax P(s | e) · P(e) [Bayes' rule]
= argmax [∏_i δ(s_i = ⊕_j H[i,j]e_j)] · [∏_j p_j^{e_j}(1-p_j)^{1-e_j}]
Proposed Solutions
Option 1: Build Correct Factor Graph
Rewrite dem_to_uai() to create:
- Variables: Error mechanisms (286 binary variables for d=3, r=3)
- Factors: Parity constraints (24 factors, one per detector)
- Each factor enforces:
detector_i = XOR(errors affecting detector i)
Option 2: Use Existing Tools
Consider using proven implementations:
- PyMatching (GitHub) - MWPM decoder, ~1% threshold
- LDPC (bp_osd) - BP+OSD implementation
- Stim built-in decoders for benchmarking
Option 3: Implement Modern BP Variants
From the 2024 literature:
- EWAInit-BP: Adaptive prior probability updates
- MBP: Momentum-based belief propagation
- BlockBP: Outperforms MWPM by >10x in some regimes
Impact
This affects:
References
- Improved Belief Propagation Decoding on Surface Codes (2024)
- Quantum Error Correction Below the Surface Code Threshold
- Error Correction Zoo: Surface Code
- PyMatching: MWPM Decoder
- BP-OSD Decoder
Next Steps
I propose:
- Document this issue (this issue)
- Add warning to current tutorial about performance limitations
- Design correct factor graph structure
- Implement proper BP decoder
- Benchmark against PyMatching to validate
Should I proceed with designing the corrected architecture?
Labels: bug, enhancement, research, decoder
Priority: High - affects core functionality
Problem Summary
The current BP decoder implementation shows only 2% improvement over a simple syndrome-parity baseline decoder, which is far below expected performance based on published research. After analyzing the code and comparing against recent literature, I've identified fundamental architectural issues that prevent the decoder from working properly.
Expected vs. Actual Performance
What We Should See (from Literature)
Based on recent 2024 research on BP decoders for surface codes:
What We Actually See
At p=0.03 (3% physical error):
This is broken! At 3% physical error, we're getting ~50% logical error rate, which means the decoder is barely better than random guessing.
Root Cause Analysis
I've identified three fundamental architectural issues:
Issue 1: Wrong Factor Graph Structure
Current UAI Model (from
dem_to_uai()):This creates:
The Problem:
Issue 2: Inference on the Wrong Quantity
What the code does:
The Problem:
Issue 3: Decoding Logic Outside BP
Current approach:
The Problem:
What a Proper BP Decoder Should Do
A correct BP decoder for surface codes should:
1. Correct Factor Graph
2. Proper Inference
s_i = ⊕_{j: H[i,j]=1} e_j(parity checks)3. Maximum Likelihood Decoding
Proposed Solutions
Option 1: Build Correct Factor Graph
Rewrite
dem_to_uai()to create:detector_i = XOR(errors affecting detector i)Option 2: Use Existing Tools
Consider using proven implementations:
Option 3: Implement Modern BP Variants
From the 2024 literature:
Impact
This affects:
pytorch_bpmodule for actual decodingReferences
Next Steps
I propose:
Should I proceed with designing the corrected architecture?
Labels:
bug,enhancement,research,decoderPriority: High - affects core functionality