ci: gate releases on benchmark regressions#429
Conversation
Greptile SummaryThis PR adds a release benchmark gate to the existing CI workflow: committed versioned snapshots in
Confidence Score: 4/5Safe to merge; the gate is additive and will not affect any release until a benchmark snapshot is committed. The core comparison logic is well-tested and the workflow wiring looks correct. Two minor issues exist: an argument-order dependency in run-release-benchmark.ts where --version after --out silently discards the custom path, and the threshold column in the CI markdown table omitting the unit from the absolute value. Neither affects the gate's ability to block real regressions in normal usage. scripts/run-release-benchmark.ts (--version/--out ordering) and scripts/compare-release-benchmarks.ts (formatThreshold unit display) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[push v* tag / workflow_dispatch] --> B[release-benchmark-gate]
B --> C{bench:release:compare}
C -- head file missing --> D[❌ Throw: missing head snapshot]
C -- no previous baseline --> E[❌ Throw: no baseline]
C -- comparison done --> F{comparison.failed?}
F -- No --> G[✅ Gate passes]
F -- Yes + push event --> H[❌ Job fails — release blocked]
F -- Yes + workflow_dispatch\nallow_benchmark_regression=true --> I{reason provided?}
I -- No --> J[❌ Require reason step fails]
I -- Yes --> K[✅ Override accepted, summary updated]
G --> L[build-binaries]
K --> L
L --> M[stage-release]
M --> N[publish]
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
scripts/run-release-benchmark.ts:41-52
**`--version` silently clobbers an explicit `--out`**
When `--version` is parsed, `options.out` is unconditionally overwritten with the computed release path. If a caller passes `--out custom/path.json --version 0.16.0`, the `--out` value is discarded and the file goes to `benchmarks/release/bench-0.16.0.json` instead. Only the reverse order (`--version` first, then `--out`) works correctly. A simple guard — only update `options.out` from `--version` when `--out` has not yet been explicitly set — would fix the ordering dependency.
### Issue 2 of 2
scripts/compare-release-benchmarks.ts:282-292
**Absolute threshold displayed without its unit**
`formatThreshold` formats the absolute part with `formatNumber` but never appends the metric unit, so a timing threshold renders as `+15% and +5.00` instead of `+15% and +5ms`, and the memory threshold becomes `+20% and +8388608.0` instead of `+20% and +8MiB`. Reviewers scanning the CI summary table won't immediately know whether the absolute value is in milliseconds or raw bytes, which defeats the column's purpose.
Reviews (1): Last reviewed commit: "ci: gate releases on benchmark regressio..." | Re-trigger Greptile |
| index += 1; | ||
| continue; | ||
| } | ||
|
|
||
| if (arg === "--samples") { | ||
| options.samples = Number(readArgValue(args, index)); | ||
| index += 1; | ||
| continue; | ||
| } | ||
|
|
||
| if (arg === "--out") { | ||
| options.out = path.resolve(readArgValue(args, index)); |
There was a problem hiding this comment.
--version silently clobbers an explicit --out
When --version is parsed, options.out is unconditionally overwritten with the computed release path. If a caller passes --out custom/path.json --version 0.16.0, the --out value is discarded and the file goes to benchmarks/release/bench-0.16.0.json instead. Only the reverse order (--version first, then --out) works correctly. A simple guard — only update options.out from --version when --out has not yet been explicitly set — would fix the ordering dependency.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/run-release-benchmark.ts
Line: 41-52
Comment:
**`--version` silently clobbers an explicit `--out`**
When `--version` is parsed, `options.out` is unconditionally overwritten with the computed release path. If a caller passes `--out custom/path.json --version 0.16.0`, the `--out` value is discarded and the file goes to `benchmarks/release/bench-0.16.0.json` instead. Only the reverse order (`--version` first, then `--out`) works correctly. A simple guard — only update `options.out` from `--version` when `--out` has not yet been explicitly set — would fix the ordering dependency.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed by preserving an explicitly provided --out path even when --version appears later, and added a regression test for that argument ordering.
Responded by Pi using OpenAI GPT-5.
| return unit === "bytes" ? "B" : unit; | ||
| } | ||
|
|
||
| function formatThreshold(threshold: BenchmarkThreshold | undefined) { | ||
| if (!threshold) { | ||
| return "—"; | ||
| } | ||
|
|
||
| return `+${((threshold.maxRegressionRatio - 1) * 100).toFixed(0)}% and +${formatNumber( | ||
| threshold.minAbsoluteRegression, | ||
| )}`; |
There was a problem hiding this comment.
Absolute threshold displayed without its unit
formatThreshold formats the absolute part with formatNumber but never appends the metric unit, so a timing threshold renders as +15% and +5.00 instead of +15% and +5ms, and the memory threshold becomes +20% and +8388608.0 instead of +20% and +8MiB. Reviewers scanning the CI summary table won't immediately know whether the absolute value is in milliseconds or raw bytes, which defeats the column's purpose.
Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/compare-release-benchmarks.ts
Line: 282-292
Comment:
**Absolute threshold displayed without its unit**
`formatThreshold` formats the absolute part with `formatNumber` but never appends the metric unit, so a timing threshold renders as `+15% and +5.00` instead of `+15% and +5ms`, and the memory threshold becomes `+20% and +8388608.0` instead of `+20% and +8MiB`. Reviewers scanning the CI summary table won't immediately know whether the absolute value is in milliseconds or raw bytes, which defeats the column's purpose.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed by rendering threshold absolute values with units in the markdown summary: milliseconds for timing thresholds and MiB for byte thresholds, with test coverage.
Responded by Pi using OpenAI GPT-5.
f6a6772 to
cb87d6d
Compare
Summary
bench:releaseandbench:release:comparescripts for versioned benchmark snapshots inbenchmarks/release/bench-<version>.json.Notes
This PR does not commit generated baseline snapshots. Before the next release, backfill at least one prior stable release benchmark and commit the new release snapshot with the release-prep change.
Tests
bun run typecheckbun run format:checkbun run lintbun test ./scripts/compare-release-benchmarks.test.tsbun test ./scripts ./benchmarks/libbun testThis PR description was generated by Pi using OpenAI GPT-5