⚡ Bolt: Remove HashMap allocation in repetition penalty#3506
⚡ Bolt: Remove HashMap allocation in repetition penalty#3506EffortlessSteven wants to merge 2 commits intomainfrom
Conversation
- Replaced `HashMap` in `SamplingStrategy` with a `Vec<u32>` - Eliminated per-token memory allocation in `apply_repetition_penalty` - Used sliding window with `drain` to prevent memory growth
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Added disk cleanup step (removing unused runtimes and pruning docker images) to all heavy Ubuntu jobs in `property-tests.yml`. - Set `CARGO_PROFILE_TEST_DEBUG="0"` to reduce the size of generated test binaries.
💡 What
Replaced the
HashMap<usize, usize>used for tracking repetition counts inSamplingStrategywith a pre-allocated chronologicalVec<u32>. Changedapply_repetition_penaltyto use this history slice directly instead of dynamically generating a new vector usingflat_mapandrepeat_non every token.🎯 Why
Token generation is the tightest hot loop in LLM inference. Allocating new vectors and performing
HashMaplookups/updates for every sampled token introduces significant overhead. The underlying mathematics ofbitnet_logits::apply_repetition_penaltyapply the exponential penalty for every token present in the history slice, so passing a chronologically accurateVec<u32>has identical mathematical behavior without requiringHashMapaccounting or per-token vector materialization.📊 Impact
HashMapinsertion per generated token.Vec<u32>allocation and expansion loop per generated token.drain), improving semantic stability over the previous hardclear()approach.🔬 Measurement
Verified the logic using existing tests, which passed immediately after applying changes.
Run:
cargo test -p bitnet-inference --lib generation::samplingPR created automatically by Jules for task 9867502972988120625 started by @EffortlessSteven