perf: C++ hot-path optimizations (LUT encoding, buffer reuse, misc)#83
Merged
perf: C++ hot-path optimizations (LUT encoding, buffer reuse, misc)#83
Conversation
Replace all switch-statement character encoding in PWM scoring hot loops with static 256-entry lookup tables (BASE_ENCODE, COMPLEMENT_ENCODE, NEUTRAL_CHAR). This eliminates indirect branches that cause ~25% branch prediction accuracy on random DNA sequences. Key changes: - Add DnaLookupTables struct with three static lookup tables in DnaPSSM.h - Replace all encode()/get_log_prob(char) calls with direct table lookups - Fix double-switch in calc_like_rc (complement switch + encode switch) - Fix latent bug: 'case h' instead of 'case g' in integrate_energy RC path Benchmarks (50K sequences x 400bp, bidirectional): - gseq.pwm max mode: 15.5s -> 0.76s (20x speedup) - gseq.pwm lse mode: 16.6s -> 1.88s (8.8x speedup) - vtrack forward: 35.5s -> 2.48s (14.3x speedup) - vtrack bidirect: 42.3s -> 2.68s (15.8x speedup) All 1220 PWM tests pass. All 17800 tests pass.
Convert PWM scoring mode dispatch from per-position string comparisons (mode == "lse", "max", etc.) to a pre-parsed enum (PwmMode). The string is parsed once at function entry, then integer comparison is used in the inner loop. Applied to both gap and non-gap code paths.
Replace per-window heap allocation of the 3D DP table with a reusable member buffer (m_dp_buffer). The buffer grows to the maximum needed size and is reused across subsequent windows, eliminating thousands of malloc/free cycles per interval.
Replace per-call vector<float> allocations for bin reads with a reusable member scratch buffer (m_scratch_bin_vals). Eliminates heap allocations in the read_interval hot path for FixedBin tracks.
- GIntervals::range(chromid): use m_chrom2itr for O(k) per-chrom iteration instead of scanning all intervals O(n) - GenomeTrackSmooth: replace modulo with conditional increment in per-sample hot path (avoids integer division) - GenomeTrackArrays: bulk-read ArrayVal array in one call instead of 2*N separate BufferedFile::read() calls per element
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DnaPSSMandGseqString— eliminates branch mispredictions in per-base hot loopsgseq.pwminner loopPWMEditDistanceScorer::compute_with_indelsbin_valsbuffer inGenomeTrackFixedBininstead of reallocating per chromosomeGIntervals::range(chromid): use chrom map for O(k) instead of scanning all intervals O(n)GenomeTrackSmooth: replace modulo with conditional increment in per-sample pathGenomeTrackArrays: bulk-read array values in single call instead of 2×N separate readsTest plan
alutil::tst(parallel=TRUE))