Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion simpoint_cpt/compute_weighted.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,12 +171,19 @@ def compute_weighted_metrics(csv_path: str, js_path: str, out_csv: str, args):
print(weighted_df.loc[bmk])
score[bmk]['time'] = float(weighted_df.loc[bmk, 'time'])
score[bmk]['ref_time'] = float(reftime_js[bmk])
if score[bmk]['time'] == 0:
warnings.warn(f'{bmk} has 0 time, skip scoring')
print(f'{bmk} has 0 time, skip scoring')
score[bmk]['score'] = 0
score[bmk]['coverage'] = 0
continue
Comment on lines 172 to +179
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weighted_df is reindexed to the full SPEC list earlier, so benchmarks that were not run will typically have NaN time (not 0). float(np.nan) yields nan, so this branch won’t trigger and you’ll compute ref_time / nan (propagating NaNs into downstream stats). Consider treating missing/non-finite times as “not run” (e.g., pd.isna / np.isfinite checks) and skipping scoring for those rows too.

Suggested change
score[bmk]['time'] = float(weighted_df.loc[bmk, 'time'])
score[bmk]['ref_time'] = float(reftime_js[bmk])
if score[bmk]['time'] == 0:
warnings.warn(f'{bmk} has 0 time, skip scoring')
print(f'{bmk} has 0 time, skip scoring')
score[bmk]['score'] = 0
score[bmk]['coverage'] = 0
continue
raw_time = weighted_df.loc[bmk, 'time']
# Treat missing or non-finite times as "not run" and skip scoring
if pd.isna(raw_time) or not np.isfinite(float(raw_time)) or float(raw_time) == 0:
warnings.warn(f'{bmk} has invalid or zero time ({raw_time}), skip scoring')
print(f'{bmk} has invalid or zero time ({raw_time}), skip scoring')
score[bmk]['time'] = float(raw_time) if not pd.isna(raw_time) else float('nan')
score[bmk]['ref_time'] = float(reftime_js[bmk])
score[bmk]['score'] = 0
score[bmk]['coverage'] = 0
continue
score[bmk]['time'] = float(raw_time)
score[bmk]['ref_time'] = float(reftime_js[bmk])

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +179
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking skipped benchmarks with score = 0 will still include them in the later intdf['score'] / fpdf['score'] geometric means (lines 208+), producing 0 rather than the mean over executed benchmarks. To align with the PR goal (“only calculate executed benchmarks”), it’s better to exclude skipped entries from those per-suite means (e.g., filter out non-positive/non-finite scores or drop the rows entirely).

Copilot uses AI. Check for mistakes.
score[bmk]['score'] = score[bmk]['ref_time'] / score[bmk]['time']
score[bmk]['coverage'] = weighted_df.loc[bmk, 'coverage']
valid_scores = [x[1]['score'] for x in score.items() if x[1]['score'] != 0]
score['mean'] = {
'time':0,
'ref_time':0,
'score': geometric_mean([x[1]['score'] for x in score.items()]),
'score': geometric_mean(valid_scores) if valid_scores else 0,
Comment on lines +182 to +186
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid_scores currently filters only != 0, which will still include NaN scores (since nan != 0 is True) and can make geometric_mean(valid_scores) return NaN. Filter for finite, positive values instead so the overall mean remains well-defined when some benchmarks are missing.

Copilot uses AI. Check for mistakes.
'coverage':0
}
score_col = ['time','ref_time','score','coverage']
Expand Down
Loading