added benchmarks by jonathanMLDev · Pull Request #79 · cppalliance/pinecone-read-only-mcp-typescript

jonathanMLDev · 2026-05-13T21:36:47Z

Pull Request

Added

benchmarks/latency.ts

setLogLevel('ERROR') at startup so hybrid logInfo lines do not flood stderr during runs.
query_no_rerank / query_with_rerank: real PineconeClient with ensureIndexes + searchIndex stubs (dense/sparse distinguished by index object identity) and optional rerankResults stub for the rerank path.
guided_query_end_to_end: setPineconeClient() with a minimal mock (query, count, listNamespacesWithMetadata, etc.), namespace cache primed once, handler captured via a small registerTool stub on a fake McpServer, then preferred_tool: 'query_fast' with a list-style user_query so routing + suggestion + formatQueryResultRows run every iteration.
list_namespaces_cache_miss / list_namespaces_cache_hit: invalidateNamespacesCache() + getNamespacesWithCache() vs warm cache only.
runBenchmark: 10 warmup + 200 measured iterations, performance.now(), percentiles p50 / p95 / p99, 4 decimal places so sub-ms rows are not all zeros.
ASCII table to stdout and benchmarks/baseline.json (metadata + results array).

Updated

package.json

Script: "benchmark": "tsx benchmarks/latency.ts" (and package.json formatted with Prettier).

Added benchmarks/baseline.json

Generated from npm run benchmark on this machine (includes node version in the JSON).

Updated README.md

New ### Benchmarks subsection under Development with run instructions and how to diff the baseline.

npm run typecheck, npm run lint, npm run build, and npm test all pass. npm run ci still fails on npm run format:check because many existing src/**/*.ts files already fail Prettier in this repo; that was not changed as part of this work.

close #74

Summary by CodeRabbit

Documentation
- Added a "Benchmarks" section with instructions to run local latency benchmarks and interpret p50/p95/p99 metrics.
New Features
- Added a local benchmark harness to measure server-side latency for hybrid queries, reranking, guided-query flows, and cache hit/miss scenarios; runs without external API keys, prints stats, and writes baseline results for regression tracking.
Chores
- Added a run script to execute the benchmarks.

coderabbitai · 2026-05-13T21:36:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 315c6934-28b1-409a-8c68-4b0f833fb893

📥 Commits

Reviewing files that changed from the base of the PR and between f9edba6 and c0421cb.

📒 Files selected for processing (1)

benchmarks/latency.ts

🚧 Files skipped from review as they are similar to previous changes (1)

benchmarks/latency.ts

📝 Walkthrough

Walkthrough

Adds a local benchmark harness (benchmarks/latency.ts) that runs synthetic, mocked Pinecone scenarios (hybrid query with/without reranking, guided-query end-to-end, namespace cache miss/hit), prints p50/p95/p99/min/max latencies, writes benchmarks/baseline.json, adds npm run benchmark, and documents usage in README.md.

Changes

Benchmark harness and orchestration

Layer / File(s)	Summary
Types, configuration, and utilities `benchmarks/latency.ts`, `package.json`	Adds benchmark entry, warmup/iteration/top-k config, `BenchmarkResult` shape, synthetic hit generators, percentile/min/max computation utilities, and new npm script `benchmark`.
Timing and reporting `benchmarks/latency.ts`	`runBenchmark` performs warmup and timed iterations using `performance.now()`, computes p50/p95/p99/min/max, and `formatTable` renders an ASCII table of results. Results are saved as JSON.
Bench Pinecone client and reranker doubles `benchmarks/latency.ts`	`buildQueryBenchClient` creates a `PineconeClient`-shaped test double by overriding `ensureIndexes`, `searchIndex` (dense/sparse synthetic responses), and `rerankResults` (deterministic reranking).
MCP/guided-query capture and namespace mocks `benchmarks/latency.ts`	`captureGuidedQueryHandler` registers tools on a mocked MCP server and extracts the `guided_query` handler; `createBenchPineconeMock` implements mocked `listNamespaces`/count behavior to exercise cache hit/miss scenarios.
Scenario orchestration and baseline write `benchmarks/latency.ts`	`main` runs benchmark scenarios (hybrid no-rerank, hybrid with rerank, guided-query end-to-end, namespace cache miss/hit), prints formatted table, writes `benchmarks/baseline.json` with metadata (generated_at, node version, warmup/iterations) and results, and installs top-level error handling.
Docs `README.md`	Adds “Benchmarks” section describing `npm run benchmark`, mocked-Pinecone approach, p50/p95/p99 outputs, and `benchmarks/baseline.json` baseline comparison instructions.

Sequence Diagram

sequenceDiagram
  participant Dev as CLI / Developer
  participant Bench as Benchmark Script
  participant MCP as MCP Tool Registry / guided_query
  participant Pine as Pinecone Mock Client
  participant FS as Filesystem

  Dev->>Bench: run `npm run benchmark`
  Bench->>Pine: install bench Pinecone test double (search, rerank, namespaces)
  Bench->>MCP: register tools / capture `guided_query` handler
  Bench->>Bench: warmup scenarios
  loop for each scenario & iteration
    Bench->>MCP: invoke guided_query (when applicable)
    MCP->>Pine: query / rerank / list namespaces
    Pine-->>MCP: synthetic responses
    MCP-->>Bench: tool response
  end
  Bench->>FS: write `benchmarks/baseline.json`
  Bench-->>Dev: print ASCII table with p50/p95/p99/min/max

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

wpak-ai

Poem

🐰 I timed each hop, from query to land,
Mocked Pinecone answers lent a steady hand,
P50, p95, p99 in a tidy band,
Baseline saved at my little command,
A rabbit claps for performance, nimble and grand.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'added benchmarks' is partially related but overly broad; it refers to a real aspect of the change but lacks specificity about the main purpose (measuring server latency overhead).	Consider a more specific title like 'Add benchmark harness to measure server latency' to clarify the primary change.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	The pull request successfully implements all key requirements from issue `#74`: benchmark script measuring query/reranking/guided_query/cache scenarios, mocked Pinecone responses, structured output with p50/p95/p99 latencies, npm run benchmark command, committed baseline.json, and README documentation.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#74` objectives: benchmarks/latency.ts implements the harness, package.json adds the npm script, README documents usage, and baseline.json provides the baseline file.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-13T21:38:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@2b05008). Learn more about missing BASE report.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #79   +/-   ##
=======================================
  Coverage        ?   66.01%           
=======================================
  Files           ?       33           
  Lines           ?     1121           
  Branches        ?      359           
=======================================
  Hits            ?      740           
  Misses          ?      381           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jonathanMLDev · 2026-05-14T02:53:40Z

@coderabbitai full review

coderabbitai · 2026-05-14T02:53:48Z

✅ Actions performed

Full review triggered.

wpak-ai · 2026-05-14T21:14:09Z

@coderabbitai resume

coderabbitai · 2026-05-14T21:14:15Z

✅ Actions performed

Reviews resumed.

wpak-ai · 2026-05-14T21:14:19Z

@coderabbitai full review

coderabbitai · 2026-05-14T21:14:24Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

benchmarks/latency.ts (1)

142-145: ⚡ Quick win

Return fresh hit objects per call to avoid benchmark state bleed.

Line 142–Line 145 currently return shared arrays/objects. If downstream code mutates results, later iterations can measure mutated state instead of a clean run.

Proposed fix

 function buildQueryBenchClient(): PineconeClientBenchDouble {
   const denseHits = syntheticHits('dense', TOP_K, 0.95);
   const sparseHits = syntheticHits('sparse', TOP_K, 0.9);
+  const cloneHits = (hits: PineconeHit[]): PineconeHit[] =>
+    hits.map((h) => ({
+      ...h,
+      fields: { ...h.fields },
+    }));
   const denseIndexRef = {} as SearchableIndex;
   const sparseIndexRef = {} as SearchableIndex;
@@
   client.searchIndex = async (index) => {
-    if (index === denseIndexRef) return denseHits;
-    if (index === sparseIndexRef) return sparseHits;
+    if (index === denseIndexRef) return cloneHits(denseHits);
+    if (index === sparseIndexRef) return cloneHits(sparseHits);
     return [];
   };

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/latency.ts` around lines 142 - 145, The searchIndex stub returns
shared arrays/objects (denseHits/sparseHits) which can be mutated across
iterations; update client.searchIndex so it returns fresh copies per call (e.g.,
create a new array and clone each hit object) for the denseIndexRef and
sparseIndexRef branches and return a new empty array for the default path to
avoid benchmark state bleed; locate the client.searchIndex closure and replace
the direct returns of denseHits/sparseHits with code that maps/duplicates each
hit (referencing denseIndexRef, sparseIndexRef, denseHits, sparseHits, and
client.searchIndex).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/latency.ts`:
- Around line 113-120: The table is converting numeric latency values via
String(...) which discards trailing zeros; update the rendering inside the
line([...]) call so r.p50, r.p95, r.p99, r.min, and r.max are formatted with
four decimal places (e.g. use Number(...).toFixed(4) or ensure the values are
numbers and call toFixed(4)) before converting to strings; keep the name slicing
(r.name.slice(...)) unchanged and apply this formatting only to the numeric
latency fields so the table consistently shows four decimal places.

---

Nitpick comments:
In `@benchmarks/latency.ts`:
- Around line 142-145: The searchIndex stub returns shared arrays/objects
(denseHits/sparseHits) which can be mutated across iterations; update
client.searchIndex so it returns fresh copies per call (e.g., create a new array
and clone each hit object) for the denseIndexRef and sparseIndexRef branches and
return a new empty array for the default path to avoid benchmark state bleed;
locate the client.searchIndex closure and replace the direct returns of
denseHits/sparseHits with code that maps/duplicates each hit (referencing
denseIndexRef, sparseIndexRef, denseHits, sparseHits, and client.searchIndex).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 08440cb8-8b58-4953-a02f-4fb5c85e7f4e

📥 Commits

Reviewing files that changed from the base of the PR and between c07279f and f9edba6.

📒 Files selected for processing (3)

README.md
benchmarks/latency.ts
package.json

added benchmarks

c99b4a3

jonathanMLDev self-assigned this May 13, 2026

removed baseline.json

f9edba6

jonathanMLDev requested a review from wpak-ai May 14, 2026 16:06

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread benchmarks/latency.ts

addressed a ai review

c0421cb

wpak-ai approved these changes May 14, 2026

View reviewed changes

wpak-ai merged commit b961f3d into cppalliance:main May 14, 2026
12 checks passed

Conversation

jonathanMLDev commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Added

benchmarks/latency.ts

Updated

package.json

Added benchmarks/baseline.json

Updated README.md

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jonathanMLDev commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

wpak-ai commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

wpak-ai commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonathanMLDev commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading