perf: C++ hot-path optimizations (LUT encoding, buffer reuse, misc) by aviezerl · Pull Request #83 · tanaylab/misha

aviezerl · 2026-04-03T08:07:26Z

Summary

Replace switch-based DNA base encoding with compile-time lookup tables in DnaPSSM and GseqString — eliminates branch mispredictions in per-base hot loops
Replace string-based mode comparisons with enum dispatch in gseq.pwm inner loop
Reuse DP buffer across calls in PWMEditDistanceScorer::compute_with_indels
Reuse bin_vals buffer in GenomeTrackFixedBin instead of reallocating per chromosome
GIntervals::range(chromid): use chrom map for O(k) instead of scanning all intervals O(n)
GenomeTrackSmooth: replace modulo with conditional increment in per-sample path
GenomeTrackArrays: bulk-read array values in single call instead of 2×N separate reads

Test plan

All 17,801 tests pass (alutil::tst(parallel=TRUE))
Clean compilation with no warnings

Replace all switch-statement character encoding in PWM scoring hot loops with static 256-entry lookup tables (BASE_ENCODE, COMPLEMENT_ENCODE, NEUTRAL_CHAR). This eliminates indirect branches that cause ~25% branch prediction accuracy on random DNA sequences. Key changes: - Add DnaLookupTables struct with three static lookup tables in DnaPSSM.h - Replace all encode()/get_log_prob(char) calls with direct table lookups - Fix double-switch in calc_like_rc (complement switch + encode switch) - Fix latent bug: 'case h' instead of 'case g' in integrate_energy RC path Benchmarks (50K sequences x 400bp, bidirectional): - gseq.pwm max mode: 15.5s -> 0.76s (20x speedup) - gseq.pwm lse mode: 16.6s -> 1.88s (8.8x speedup) - vtrack forward: 35.5s -> 2.48s (14.3x speedup) - vtrack bidirect: 42.3s -> 2.68s (15.8x speedup) All 1220 PWM tests pass. All 17800 tests pass.

Convert PWM scoring mode dispatch from per-position string comparisons (mode == "lse", "max", etc.) to a pre-parsed enum (PwmMode). The string is parsed once at function entry, then integer comparison is used in the inner loop. Applied to both gap and non-gap code paths.

Replace per-window heap allocation of the 3D DP table with a reusable member buffer (m_dp_buffer). The buffer grows to the maximum needed size and is reused across subsequent windows, eliminating thousands of malloc/free cycles per interval.

Replace per-call vector<float> allocations for bin reads with a reusable member scratch buffer (m_scratch_bin_vals). Eliminates heap allocations in the read_interval hot path for FixedBin tracks.

- GIntervals::range(chromid): use m_chrom2itr for O(k) per-chrom iteration instead of scanning all intervals O(n) - GenomeTrackSmooth: replace modulo with conditional increment in per-sample hot path (avoids integer division) - GenomeTrackArrays: bulk-read ArrayVal array in one call instead of 2*N separate BufferedFile::read() calls per element

aviezerl added 6 commits April 3, 2026 09:44

perf: reuse bin_vals buffer in GenomeTrackFixedBin

a90e4c5

Replace per-call vector<float> allocations for bin reads with a reusable member scratch buffer (m_scratch_bin_vals). Eliminates heap allocations in the read_interval hot path for FixedBin tracks.

docs: add NEWS.md entries for PWM optimization and case 'h' bug fix

fed7239

aviezerl merged commit 149d2f8 into master Apr 3, 2026
4 of 5 checks passed

aviezerl deleted the perf/pwm-lut-encoding branch April 3, 2026 08:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: C++ hot-path optimizations (LUT encoding, buffer reuse, misc)#83

perf: C++ hot-path optimizations (LUT encoding, buffer reuse, misc)#83
aviezerl merged 6 commits intomasterfrom
perf/pwm-lut-encoding

aviezerl commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aviezerl commented Apr 3, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant