perf(shader): cache current_cycle in ldst_unit::cycle()#339
Open
eunseo9311 wants to merge 1 commit intogpgpu-sim:devfrom
Open
perf(shader): cache current_cycle in ldst_unit::cycle()#339eunseo9311 wants to merge 1 commit intogpgpu-sim:devfrom
eunseo9311 wants to merge 1 commit intogpgpu-sim:devfrom
Conversation
Hoist the repeated expression m_core->get_gpu()->gpu_sim_cycle + m_core->get_gpu()->gpu_tot_sim_cycle into a single local variable at the top of ldst_unit::cycle(). This eliminates 5 redundant pointer-chase + addition pairs in a per-SM per-cycle hot path. The value is constant within a single cycle invocation so the substitution is semantically identical. No simulation output change — purely a micro-optimization that reduces overhead in the load/store unit pipeline loop.
fb12c6f to
2f042b7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cache the repeated
gpu_sim_cycle + gpu_tot_sim_cyclecomputation into a single local variable inldst_unit::cycle(), eliminating redundant pointer dereferences in a per-SM per-cycle hot path.Motivation
ldst_unit::cycle()is called every cycle for every SM's load/store unit. The expressionm_core->get_gpu()->gpu_sim_cycle + m_core->get_gpu()->gpu_tot_sim_cycleappeared 6 times within the function body, each requiring two pointer chases (m_core→gpu→member) and an addition. The value is constant within a singlecycle()invocation.What changed
const unsigned long long current_cycleat the top ofldst_unit::cycle()File modified:
src/gpgpu-sim/shader.cc—ldst_unit::cycle()(line 2834)Impact
Test plan
source setup_environment release && make -jrodinia_2.0-ft/configs.gtx1080ti.yml