feat: add CI performance benchmark suite#384
Conversation
Greptile SummaryThis PR introduces a comprehensive CI benchmark suite covering startup, patch parsing, render/layout planning, highlight prefetch, large-stream rendering, memory, and optional competitor comparisons. It adds a shared runner (
Confidence Score: 3/5Safe to merge after the two functional fixes; the pagination bug will silently flood active PRs with duplicate benchmark comments from day one, and the misclassification in classifyMetric embeds incorrect metadata in every JSON artifact. The pagination bug in comment-pr.ts will create a new benchmark comment on every CI run for any PR with more than 100 existing comments, rather than updating the existing one — a silent but immediately visible problem for busy PRs. The classifyMetric ordering issue means every competitor timing metric is written to JSON artifacts as comparable: true with a regression threshold, which is only papered over by a defensive name-check in compareMetric; if that guard is ever simplified the CI gate for competitor metrics would break unexpectedly. benchmarks/lib/benchmark-result.ts (classifyMetric branch ordering) and benchmarks/comment-pr.ts (comment pagination) need fixes before the benchmark infrastructure can be relied on. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[push / pull_request / workflow_dispatch] --> B[Checkout full history]
B --> C[bun install]
C --> D[Run head benchmarks\nbun run bench --samples N --out head.json]
D --> E{PR event?}
E -- No --> F[Publish step summary\nhead.txt]
E -- Yes --> G[git worktree add origin/main\ncp benchmarks → base worktree\nbun run benchmarks/run.ts --out base.json]
G --> H[Compare head vs base\nbench:compare → comparison.json + summary.md]
H --> I[Publish step summary\nsummary.md]
I --> J[Comment benchmark summary on PR\nbench:comment-pr --body summary.md]
J --> K{compare step failed?}
K -- Yes --> L[exit 1 — block merge]
K -- No --> M[Upload artifacts\nbenchmark-results/*]
F --> M
L --> M
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
benchmarks/lib/benchmark-result.ts:68-98
The `startsWith("competitor_")` guard is unreachable dead code: every competitor timing metric ends with `_ms` and is caught by the first branch, giving it `comparable: true` with a 15% regression threshold. The intent of `comparable: false` for competitors is only saved by the `metric.name.includes("competitor_")` fall-through in `compareMetric`. If that guard is ever removed or refactored, competitor regressions would start failing CI. The fix is to check the competitor prefix before the generic `_ms` check.
```suggestion
if (name.startsWith("competitor_")) {
return { unit: "ms", comparable: false };
}
if (name.endsWith("_ms")) {
return {
unit: "ms",
comparable: true,
threshold: { maxRegressionRatio: 1.15, minAbsoluteRegression: 5 },
};
}
if (
name.startsWith("is_") ||
name.endsWith("_ready_before_move") ||
name.endsWith("_available")
) {
return { unit: "boolean", comparable: false };
}
if (name.includes("rss") || name.includes("heap")) {
return {
unit: "bytes",
comparable: true,
threshold: { maxRegressionRatio: 1.2, minAbsoluteRegression: 8 * 1024 * 1024 },
};
}
if (name.endsWith("_bytes")) {
return { unit: "bytes", comparable: false };
}
```
### Issue 2 of 3
benchmarks/comment-pr.ts:62-65
The comment search is capped at 100 with no pagination. On any active PR with more than 100 comments the marker lookup silently returns nothing, so every subsequent benchmark run posts a brand-new comment instead of updating the existing one.
```suggestion
async function fetchAllComments(
repository: string,
pullRequestNumber: number,
): Promise<Array<{ id: number; body?: string }>> {
const all: Array<{ id: number; body?: string }> = [];
let page = 1;
while (true) {
const batch = (await githubRequest(
`/repos/${repository}/issues/${pullRequestNumber}/comments?per_page=100&page=${page}`,
)) as Array<{ id: number; body?: string }>;
all.push(...batch);
if (batch.length < 100) break;
page += 1;
}
return all;
}
const comments = await fetchAllComments(repository, pullRequestNumber);
const existing = comments.find((comment) => comment.body?.includes(marker));
```
### Issue 3 of 3
benchmarks/lib/benchmark-result.ts:16-18
The `informational` field is declared in `BenchmarkMetricResult` but `aggregateMetric` never sets it, so it is always `undefined` in every serialized artifact and in every comparison row. The `metric.informational` branch in `compareMetric` is therefore dead code — it can never flip a metric to `"informational"` status. Either populate the field in `aggregateMetric` or remove it from the interface to avoid misleading future readers.
```suggestion
comparable: boolean;
source: string;
```
Reviews (1): Last reviewed commit: "feat: add CI performance benchmark suite" | Re-trigger Greptile |
| if (name.endsWith("_ms")) { | ||
| return { | ||
| unit: "ms", | ||
| comparable: true, | ||
| threshold: { maxRegressionRatio: 1.15, minAbsoluteRegression: 5 }, | ||
| }; | ||
| } | ||
|
|
||
| if ( | ||
| name.startsWith("is_") || | ||
| name.endsWith("_ready_before_move") || | ||
| name.endsWith("_available") | ||
| ) { | ||
| return { unit: "boolean", comparable: false }; | ||
| } | ||
|
|
||
| if (name.includes("rss") || name.includes("heap")) { | ||
| return { | ||
| unit: "bytes", | ||
| comparable: true, | ||
| threshold: { maxRegressionRatio: 1.2, minAbsoluteRegression: 8 * 1024 * 1024 }, | ||
| }; | ||
| } | ||
|
|
||
| if (name.endsWith("_bytes")) { | ||
| return { unit: "bytes", comparable: false }; | ||
| } | ||
|
|
||
| if (name.startsWith("competitor_")) { | ||
| return { unit: "ms", comparable: false }; | ||
| } |
There was a problem hiding this comment.
The
startsWith("competitor_") guard is unreachable dead code: every competitor timing metric ends with _ms and is caught by the first branch, giving it comparable: true with a 15% regression threshold. The intent of comparable: false for competitors is only saved by the metric.name.includes("competitor_") fall-through in compareMetric. If that guard is ever removed or refactored, competitor regressions would start failing CI. The fix is to check the competitor prefix before the generic _ms check.
| if (name.endsWith("_ms")) { | |
| return { | |
| unit: "ms", | |
| comparable: true, | |
| threshold: { maxRegressionRatio: 1.15, minAbsoluteRegression: 5 }, | |
| }; | |
| } | |
| if ( | |
| name.startsWith("is_") || | |
| name.endsWith("_ready_before_move") || | |
| name.endsWith("_available") | |
| ) { | |
| return { unit: "boolean", comparable: false }; | |
| } | |
| if (name.includes("rss") || name.includes("heap")) { | |
| return { | |
| unit: "bytes", | |
| comparable: true, | |
| threshold: { maxRegressionRatio: 1.2, minAbsoluteRegression: 8 * 1024 * 1024 }, | |
| }; | |
| } | |
| if (name.endsWith("_bytes")) { | |
| return { unit: "bytes", comparable: false }; | |
| } | |
| if (name.startsWith("competitor_")) { | |
| return { unit: "ms", comparable: false }; | |
| } | |
| if (name.startsWith("competitor_")) { | |
| return { unit: "ms", comparable: false }; | |
| } | |
| if (name.endsWith("_ms")) { | |
| return { | |
| unit: "ms", | |
| comparable: true, | |
| threshold: { maxRegressionRatio: 1.15, minAbsoluteRegression: 5 }, | |
| }; | |
| } | |
| if ( | |
| name.startsWith("is_") || | |
| name.endsWith("_ready_before_move") || | |
| name.endsWith("_available") | |
| ) { | |
| return { unit: "boolean", comparable: false }; | |
| } | |
| if (name.includes("rss") || name.includes("heap")) { | |
| return { | |
| unit: "bytes", | |
| comparable: true, | |
| threshold: { maxRegressionRatio: 1.2, minAbsoluteRegression: 8 * 1024 * 1024 }, | |
| }; | |
| } | |
| if (name.endsWith("_bytes")) { | |
| return { unit: "bytes", comparable: false }; | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: benchmarks/lib/benchmark-result.ts
Line: 68-98
Comment:
The `startsWith("competitor_")` guard is unreachable dead code: every competitor timing metric ends with `_ms` and is caught by the first branch, giving it `comparable: true` with a 15% regression threshold. The intent of `comparable: false` for competitors is only saved by the `metric.name.includes("competitor_")` fall-through in `compareMetric`. If that guard is ever removed or refactored, competitor regressions would start failing CI. The fix is to check the competitor prefix before the generic `_ms` check.
```suggestion
if (name.startsWith("competitor_")) {
return { unit: "ms", comparable: false };
}
if (name.endsWith("_ms")) {
return {
unit: "ms",
comparable: true,
threshold: { maxRegressionRatio: 1.15, minAbsoluteRegression: 5 },
};
}
if (
name.startsWith("is_") ||
name.endsWith("_ready_before_move") ||
name.endsWith("_available")
) {
return { unit: "boolean", comparable: false };
}
if (name.includes("rss") || name.includes("heap")) {
return {
unit: "bytes",
comparable: true,
threshold: { maxRegressionRatio: 1.2, minAbsoluteRegression: 8 * 1024 * 1024 },
};
}
if (name.endsWith("_bytes")) {
return { unit: "bytes", comparable: false };
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed — competitor metrics are now classified before generic _ms metrics, so optional competitor timings stay non-comparable without relying on the comparison fallback.
This comment was generated by Pi using OpenAI GPT-5
| const comments = (await githubRequest( | ||
| `/repos/${repository}/issues/${pullRequestNumber}/comments?per_page=100`, | ||
| )) as Array<{ id: number; body?: string }>; | ||
| const existing = comments.find((comment) => comment.body?.includes(marker)); |
There was a problem hiding this comment.
The comment search is capped at 100 with no pagination. On any active PR with more than 100 comments the marker lookup silently returns nothing, so every subsequent benchmark run posts a brand-new comment instead of updating the existing one.
| const comments = (await githubRequest( | |
| `/repos/${repository}/issues/${pullRequestNumber}/comments?per_page=100`, | |
| )) as Array<{ id: number; body?: string }>; | |
| const existing = comments.find((comment) => comment.body?.includes(marker)); | |
| async function fetchAllComments( | |
| repository: string, | |
| pullRequestNumber: number, | |
| ): Promise<Array<{ id: number; body?: string }>> { | |
| const all: Array<{ id: number; body?: string }> = []; | |
| let page = 1; | |
| while (true) { | |
| const batch = (await githubRequest( | |
| `/repos/${repository}/issues/${pullRequestNumber}/comments?per_page=100&page=${page}`, | |
| )) as Array<{ id: number; body?: string }>; | |
| all.push(...batch); | |
| if (batch.length < 100) break; | |
| page += 1; | |
| } | |
| return all; | |
| } | |
| const comments = await fetchAllComments(repository, pullRequestNumber); | |
| const existing = comments.find((comment) => comment.body?.includes(marker)); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: benchmarks/comment-pr.ts
Line: 62-65
Comment:
The comment search is capped at 100 with no pagination. On any active PR with more than 100 comments the marker lookup silently returns nothing, so every subsequent benchmark run posts a brand-new comment instead of updating the existing one.
```suggestion
async function fetchAllComments(
repository: string,
pullRequestNumber: number,
): Promise<Array<{ id: number; body?: string }>> {
const all: Array<{ id: number; body?: string }> = [];
let page = 1;
while (true) {
const batch = (await githubRequest(
`/repos/${repository}/issues/${pullRequestNumber}/comments?per_page=100&page=${page}`,
)) as Array<{ id: number; body?: string }>;
all.push(...batch);
if (batch.length < 100) break;
page += 1;
}
return all;
}
const comments = await fetchAllComments(repository, pullRequestNumber);
const existing = comments.find((comment) => comment.body?.includes(marker));
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed — the PR comment updater now paginates issue comments until all pages are fetched before searching for the benchmark marker.
This comment was generated by Pi using OpenAI GPT-5
| comparable: boolean; | ||
| informational?: boolean; | ||
| source: string; |
There was a problem hiding this comment.
The
informational field is declared in BenchmarkMetricResult but aggregateMetric never sets it, so it is always undefined in every serialized artifact and in every comparison row. The metric.informational branch in compareMetric is therefore dead code — it can never flip a metric to "informational" status. Either populate the field in aggregateMetric or remove it from the interface to avoid misleading future readers.
| comparable: boolean; | |
| informational?: boolean; | |
| source: string; | |
| comparable: boolean; | |
| source: string; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: benchmarks/lib/benchmark-result.ts
Line: 16-18
Comment:
The `informational` field is declared in `BenchmarkMetricResult` but `aggregateMetric` never sets it, so it is always `undefined` in every serialized artifact and in every comparison row. The `metric.informational` branch in `compareMetric` is therefore dead code — it can never flip a metric to `"informational"` status. Either populate the field in `aggregateMetric` or remove it from the interface to avoid misleading future readers.
```suggestion
comparable: boolean;
source: string;
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed — removed the unused informational field and the dead comparison branch so informational status is derived from comparable: false.
This comment was generated by Pi using OpenAI GPT-5
Hunk benchmark results❌ One or more benchmarks regressed beyond the configured threshold. Base: Key Hunk benchmarks
20 additional comparable Hunk metrics were checked but hidden to keep this comment readable. See the workflow artifacts for full JSON and text output. Raw JSON and text logs are available in the benchmark workflow artifacts. |
|
Closing this for now. The benchmark harness is useful, but running it as PR CI is still too noisy/heavy; we'll keep the idea as local benchmark coverage instead.\n\nThis comment was generated by Pi using OpenAI GPT-5 |
Summary
Validation
bun run format:checkbun run typecheckbun run lintbun testbun run bench -- --samples 3 --include-competitors --out benchmarks/results/local.jsonThis PR description was generated by Pi using OpenAI GPT-5