Skip to content

Skip the bmk scores that were not run during the partial runtime of SPEC#20

Open
zephyrols wants to merge 1 commit intoshinezyy:graphmakerfrom
zephyrols:skip_no_run_bmk
Open

Skip the bmk scores that were not run during the partial runtime of SPEC#20
zephyrols wants to merge 1 commit intoshinezyy:graphmakerfrom
zephyrols:skip_no_run_bmk

Conversation

@zephyrols
Copy link
Copy Markdown

Slightly modified the function of the score statistics section, so that when running SPEC Benchmarks in some cases, it can only calculate the scores of the benchmarks that have been executed.

Copilot AI review requested due to automatic review settings March 25, 2026 10:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates SPEC score aggregation to better handle partial benchmark runs by attempting to skip scoring for benchmarks that were not executed.

Changes:

  • Adds a guard to skip scoring when a benchmark has time == 0.
  • Computes overall geometric mean from a filtered list of “valid” (non-zero) per-benchmark scores.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 172 to +179
score[bmk]['time'] = float(weighted_df.loc[bmk, 'time'])
score[bmk]['ref_time'] = float(reftime_js[bmk])
if score[bmk]['time'] == 0:
warnings.warn(f'{bmk} has 0 time, skip scoring')
print(f'{bmk} has 0 time, skip scoring')
score[bmk]['score'] = 0
score[bmk]['coverage'] = 0
continue
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weighted_df is reindexed to the full SPEC list earlier, so benchmarks that were not run will typically have NaN time (not 0). float(np.nan) yields nan, so this branch won’t trigger and you’ll compute ref_time / nan (propagating NaNs into downstream stats). Consider treating missing/non-finite times as “not run” (e.g., pd.isna / np.isfinite checks) and skipping scoring for those rows too.

Suggested change
score[bmk]['time'] = float(weighted_df.loc[bmk, 'time'])
score[bmk]['ref_time'] = float(reftime_js[bmk])
if score[bmk]['time'] == 0:
warnings.warn(f'{bmk} has 0 time, skip scoring')
print(f'{bmk} has 0 time, skip scoring')
score[bmk]['score'] = 0
score[bmk]['coverage'] = 0
continue
raw_time = weighted_df.loc[bmk, 'time']
# Treat missing or non-finite times as "not run" and skip scoring
if pd.isna(raw_time) or not np.isfinite(float(raw_time)) or float(raw_time) == 0:
warnings.warn(f'{bmk} has invalid or zero time ({raw_time}), skip scoring')
print(f'{bmk} has invalid or zero time ({raw_time}), skip scoring')
score[bmk]['time'] = float(raw_time) if not pd.isna(raw_time) else float('nan')
score[bmk]['ref_time'] = float(reftime_js[bmk])
score[bmk]['score'] = 0
score[bmk]['coverage'] = 0
continue
score[bmk]['time'] = float(raw_time)
score[bmk]['ref_time'] = float(reftime_js[bmk])

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +179
if score[bmk]['time'] == 0:
warnings.warn(f'{bmk} has 0 time, skip scoring')
print(f'{bmk} has 0 time, skip scoring')
score[bmk]['score'] = 0
score[bmk]['coverage'] = 0
continue
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking skipped benchmarks with score = 0 will still include them in the later intdf['score'] / fpdf['score'] geometric means (lines 208+), producing 0 rather than the mean over executed benchmarks. To align with the PR goal (“only calculate executed benchmarks”), it’s better to exclude skipped entries from those per-suite means (e.g., filter out non-positive/non-finite scores or drop the rows entirely).

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +186
valid_scores = [x[1]['score'] for x in score.items() if x[1]['score'] != 0]
score['mean'] = {
'time':0,
'ref_time':0,
'score': geometric_mean([x[1]['score'] for x in score.items()]),
'score': geometric_mean(valid_scores) if valid_scores else 0,
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid_scores currently filters only != 0, which will still include NaN scores (since nan != 0 is True) and can make geometric_mean(valid_scores) return NaN. Filter for finite, positive values instead so the overall mean remains well-defined when some benchmarks are missing.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants