Skip to content

refactor(shader): consolidate 20 inc*_stat methods into inc_fu_stat#341

Open
eunseo9311 wants to merge 1 commit intogpgpu-sim:devfrom
eunseo9311:refactor/consolidate-inc-fu-stat
Open

refactor(shader): consolidate 20 inc*_stat methods into inc_fu_stat#341
eunseo9311 wants to merge 1 commit intogpgpu-sim:devfrom
eunseo9311:refactor/consolidate-inc-fu-stat

Conversation

@eunseo9311
Copy link
Copy Markdown

Summary

Consolidate 20 near-identical power statistics methods in shader_core_ctx into a single parameterized implementation. No call sites were modified and simulation output is unchanged.

Motivation

shader_core_ctx contained 20 methods (incialu_stat, incfpalu_stat, incsqrt_stat, etc.) that all followed the same three-step pattern:

  1. Compute active_count * latency
  2. Optionally add inactive lane overhead via inactive_lanes_accesses_sfu() or inactive_lanes_accesses_nonsfu() depending on the execution unit type, gated by gpgpu_clock_gated_lanes
  3. Optionally update m_active_exu_threads and m_active_exu_warps

The only differences across all 20 methods were which counter to write, which lane overhead function to call, and whether to update the EXU counters — making this a clear consolidation opportunity.

Changes

shader.h

  • Added enum class lane_model { SFU, NON_SFU, NONE } to express the three execution unit categories
  • Replaced each method body (~12 lines each) with a single-line delegation to inc_fu_stat()
  • Existing public API is fully preserved — zero call site changes required

shader.cc

  • Added 16-line implementation of shader_core_ctx::inc_fu_stat()
void shader_core_ctx::inc_fu_stat(double *counter, unsigned active_count,
                                   double latency, lane_model model,
                                   bool update_exu) {
  double access = (double)active_count * latency;
  if (model != lane_model::NONE && !m_config->gpgpu_clock_gated_lanes) {
    access += (model == lane_model::SFU)
                  ? inactive_lanes_accesses_sfu(active_count, latency)
                  : inactive_lanes_accesses_nonsfu(active_count, latency);
  }
  counter[m_sid] += access;
  if (update_exu) {
    m_stats->m_active_exu_threads[m_sid] += active_count;
    m_stats->m_active_exu_warps[m_sid]++;
  }
}

Pattern classification

All 20 methods were verified to fall into exactly four patterns:

Pattern lane_model update_exu Methods
A NON_SFU true incialu, incimul, incimul24, incfpalu, incfpmul, incdpalu, incdpmul (7)
B SFU true incimul32, incidiv, incfpdiv, incdpdiv, incsqrt, inclog, incexp, incsin, inctensor, inctex (10)
C NON_SFU false incmem (1)
D NONE false incsfu, incsp (2)

Stats

  • -233 / +47 lines (net: -186 lines)
  • 0 call site changes
  • All counter types confirmed double * across all 20 methods

Testing

Verified with Rodinia 2.0 hotspot on GTX1080Ti config. All m_num_*_acesses values in gpgpusim.*.log are identical before and after the change.

source setup_environment release && make -j

All 20 functional-unit statistics methods in shader_core_ctx shared
the same pattern: accumulate (active_count * latency) into a counter,
optionally add inactive-lane overhead (SFU or non-SFU model), and
optionally update m_active_exu_threads/warps.

Introduce a single inc_fu_stat(counter, active_count, latency,
lane_model, update_exu) that captures all four variants:
  A) NON_SFU + exu update  (7 methods: ialu, imul, imul24, fpalu, ...)
  B) SFU    + exu update  (10 methods: idiv, fpdiv, sqrt, sin, ...)
  C) NON_SFU + no exu      (1 method:  mem)
  D) NONE   + no exu      (2 methods: sfu, sp)

Each original method is retained as a one-line inline delegating to
inc_fu_stat, so all call sites remain unchanged.

Net reduction: ~186 lines removed from shader.h.
No functional change — identical computation for all patterns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant