Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -200,8 +200,9 @@ jobs:

embedding-benchmark:
runs-on: ubuntu-latest
# 7 models x 20 min each = 140 min worst-case + ~30 min setup/npm-wait headroom
timeout-minutes: 195
# 7 models x 30 min each = 210 min worst-case; symbols are sampled to 1500 so
# typical runtime is ~23 min/model ≈ 160 min + setup headroom
timeout-minutes: 240
if: >-
github.event_name == 'workflow_dispatch' ||
(github.event.workflow_run.conclusion == 'success' &&
Expand Down
34 changes: 31 additions & 3 deletions scripts/embedding-benchmark.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ import { resolveBenchmarkSource, srcImport } from './lib/bench-config.js';
import { forkWorker } from './lib/fork-engine.js';

const MODEL_WORKER_KEY = '__BENCH_MODEL__';
/**
* Cap symbol count so CI stays under the per-model timeout.
* At ~1500 symbols on a CPU-only runner, search evaluation takes ~5 min;
* embedding all DB symbols takes ~18 min — ~23 min total, within the 30-min timeout.
*/
const MAX_SYMBOLS = 1500;
Comment on lines +22 to +26
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Stale timing estimates in JSDoc

The comment says "embed + search takes ~12-15 min per model" and "well within the 45-min timeout", but TIMEOUT_MS is 1_800_000 ms = 30 min, not 45 min. The TIMEOUT_MS comment also separately estimates embed at ~18 min and search at ~5 min (~23 min total), which contradicts the 12-15 min claim here. Because buildEmbeddings runs over all DB symbols (~7,128), not just the sampled 1,500, the 23-min estimate is more plausible.

Suggested change
* Cap symbol count so CI stays under the per-model timeout.
* At ~1500 symbols on a CPU-only runner, embed + search takes ~12-15 min
* per model well within the 45-min timeout with headroom to spare.
*/
const MAX_SYMBOLS = 1500;
/**
* Cap symbol count so CI stays under the per-model timeout.
* At ~1500 symbols on a CPU-only runner, search evaluation takes ~5 min;
* embedding all DB symbols takes ~18 min ~23 min total, within the 30-min timeout.
*/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e7b1008 — updated the JSDoc to match the actual TIMEOUT_MS (30 min) and the realistic timing breakdown (~18 min embed + ~5 min search = ~23 min total). The stale "12-15 min" and "45-min timeout" references are gone.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e7b1008 — updated the JSDoc to match the actual TIMEOUT_MS (30 min) and the realistic timing breakdown (~18 min embed + ~5 min search = ~23 min total). The stale "12-15 min" and "45-min timeout" references are gone.


const __dirname = path.dirname(fileURLToPath(import.meta.url));
const root = path.resolve(__dirname, '..');
Expand Down Expand Up @@ -66,12 +72,34 @@ if (process.env[MODEL_WORKER_KEY]) {
return symbols;
}

/**
* Deterministic shuffle using a simple seeded PRNG (mulberry32).
* Keeps results reproducible across runs while sampling fairly.
*/
function seededShuffle<T>(arr: T[], seed: number): T[] {
const out = arr.slice();
let s = seed | 0;
for (let i = out.length - 1; i > 0; i--) {
s = (s + 0x6d2b79f5) | 0;
let t = Math.imul(s ^ (s >>> 15), 1 | s);
t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t;
const r = ((t ^ (t >>> 14)) >>> 0) / 4294967296;
const j = Math.floor(r * (i + 1));
[out[i], out[j]] = [out[j], out[i]];
}
return out;
}

// Redirect console.log to stderr so only JSON goes to stdout
const origLog = console.log;
console.log = (...args) => console.error(...args);

const symbols = loadSymbols();
console.error(` [${modelKey}] Loaded ${symbols.length} symbols`);
let symbols = loadSymbols();
if (symbols.length > MAX_SYMBOLS) {
console.error(` [${modelKey}] Sampling ${MAX_SYMBOLS} of ${symbols.length} symbols (deterministic seed=42)`);
symbols = seededShuffle(symbols, 42).slice(0, MAX_SYMBOLS);
}
console.error(` [${modelKey}] Benchmarking ${symbols.length} symbols`);

const embedStart = performance.now();
await buildEmbeddings(root, modelKey, dbPath, { strategy: 'structured' });
Expand Down Expand Up @@ -125,7 +153,7 @@ const dbPath = path.join(root, '.codegraph', 'graph.db');

const { MODELS } = await import(srcImport(srcDir, 'domain/search/index.js'));

const TIMEOUT_MS = 1_800_000; // 30 min — CPU-only CI runners need ~20 min per model for 6k+ symbols
const TIMEOUT_MS = 1_800_000; // 30 min — with symbol sampling, embed (~18 min) + search (~5 min) fits comfortably
const hasHfToken = !!process.env.HF_TOKEN;
const modelKeys = Object.keys(MODELS);
const results = {};
Expand Down
Loading