Skip to content

perf(shader): cache current_cycle in ldst_unit::cycle()#339

Open
eunseo9311 wants to merge 1 commit intogpgpu-sim:devfrom
eunseo9311:perf/ldst-cache-current-cycle
Open

perf(shader): cache current_cycle in ldst_unit::cycle()#339
eunseo9311 wants to merge 1 commit intogpgpu-sim:devfrom
eunseo9311:perf/ldst-cache-current-cycle

Conversation

@eunseo9311
Copy link
Copy Markdown

@eunseo9311 eunseo9311 commented Mar 25, 2026

Summary

Cache the repeated gpu_sim_cycle + gpu_tot_sim_cycle computation into a single local variable in ldst_unit::cycle(), eliminating redundant pointer dereferences in a per-SM per-cycle hot path.

Motivation

ldst_unit::cycle() is called every cycle for every SM's load/store unit. The expression m_core->get_gpu()->gpu_sim_cycle + m_core->get_gpu()->gpu_tot_sim_cycle appeared 6 times within the function body, each requiring two pointer chases (m_core→gpu→member) and an addition. The value is constant within a single cycle() invocation.

What changed

  • Hoisted the expression into const unsigned long long current_cycle at the top of ldst_unit::cycle()
  • Replaced all 6 occurrences with the local variable
  • Net result: -12 lines, +8 lines (cleaner and fewer redundant dereferences)

File modified: src/gpgpu-sim/shader.ccldst_unit::cycle() (line 2834)

Impact

  • No simulation output change — semantically identical substitution
  • Micro-optimization reducing pointer-chase overhead in the load/store unit pipeline
  • Improved readability of the response FIFO handling logic

Test plan

  • Build with source setup_environment release && make -j
  • Run Rodinia 2.0 hotspot on GTX1080Ti config — verify identical simulation output
  • Docker regression: rodinia_2.0-ft/configs.gtx1080ti.yml

Hoist the repeated expression
  m_core->get_gpu()->gpu_sim_cycle + m_core->get_gpu()->gpu_tot_sim_cycle
into a single local variable at the top of ldst_unit::cycle().

This eliminates 5 redundant pointer-chase + addition pairs in a
per-SM per-cycle hot path. The value is constant within a single
cycle invocation so the substitution is semantically identical.

No simulation output change — purely a micro-optimization that
reduces overhead in the load/store unit pipeline loop.
@eunseo9311 eunseo9311 force-pushed the perf/ldst-cache-current-cycle branch from fb12c6f to 2f042b7 Compare March 25, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant