Improve internal/net/grpc/pool performance by kpango · Pull Request #3489 · vdaas/vald

kpango · 2026-02-24T08:38:03Z

This PR improves the performance and efficiency of the gRPC connection pool.

Key changes:

Refactored getHealthyConn to load connection slots atomically once and iterate over the local slice, reducing repeated atomic load overhead.
Refactored IsHealthy to inline loop logic, avoiding closure allocation and function call overhead.
Refactored loop to use local slice length for iteration bounds, avoiding repeated poolSize atomic loads.
Optimized pool struct layout to group frequently accessed atomic fields (connSlots, currentIndex, poolSize, closing) together to improve cache locality and minimize padding.

Benchmarks show significant improvement in high contention scenarios (~26% reduction in ns/op) and modest improvements in other scenarios.
Verified with go test -bench -race.

PR created automatically by Jules for task 3793426715706888587 started by @kpango

Signed-off-by: kpango <kpango@vdaas.org>

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

google-labs-jules · 2026-02-24T08:38:05Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

coderabbitai · 2026-02-24T08:38:11Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)

main

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 028457b6-6c53-47c2-b4a8-887b98806625

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch improve-grpc-pool-perf-3793426715706888587

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use your project's `golangci-lint` configuration to improve the quality of Go code reviews.

Add a configuration file to your project to customize how CodeRabbit runs golangci-lint.

vdaas-ci · 2026-02-24T08:38:12Z

[CHATOPS:HELP] ChatOps commands.

🙆‍♀️ /approve - approve
🍱 /format - format codes and add licenses
✅ /gen-test - generate test codes
🏷️ /label - add labels
🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

cloudflare-workers-and-pages · 2026-02-24T08:38:23Z

Deploying vald with Cloudflare Pages

Latest commit:	`a703874`
Status:	✅ Deploy successful!
Preview URL:	https://a76734b3.vald.pages.dev
Branch Preview URL:	https://improve-grpc-pool-perf-37934.vald.pages.dev

View logs

codecov · 2026-02-24T08:50:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.38%. Comparing base (9e5e903) to head (a703874).
⚠️ Report is 1 commits behind head on fix-grpc-broadcast-latency-17105619334351837761.

Additional details and impacted files

@@                                 Coverage Diff                                  @@
##           fix-grpc-broadcast-latency-17105619334351837761    #3489       +/-   ##
====================================================================================
+ Coverage                                            26.77%   41.38%   +14.60%     
====================================================================================
  Files                                                  527      317      -210     
  Lines                                                48959    19084    -29875     
====================================================================================
- Hits                                                 13110     7897     -5213     
+ Misses                                               34874    10765    -24109     
+ Partials                                               975      422      -553

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…rmance - Introduce `dials` field to `pool` struct to track active dialing per slot, implementing singleflight logic to prevent duplicate dials. - Refactor `refreshConn` to check `dials` state and return early if a dial is already in progress. - Offload connection closure (`pc.Close`) to the pool's global `errGroup` with a detached context, preventing it from blocking the critical path in `getHealthyConn`. - Update `init` and `grow` to manage the `dials` slice lifecycle. - Verify performance with benchmarks, showing maintained improvements and stability. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

Signed-off-by: kpango <kpango@vdaas.org>

… atomic loading (#3494) * perf(grpc/pool): Add GetHealthyConn benchmark and optimize tight loops This commit adds a benchmark for the `GetHealthyConn` function which is responsible for retrieving healthy gRPC connections. Based on pprof analysis, the iteration logic inside `getHealthyConn` and `loop` frequently invoked `p.load(idx)` which triggers an atomic load on `connSlots` inside of a loop. The optimization caches the evaluation of `p.getSlots()` ahead of time which drops CPU utilization in tightly contended scenarios and improves overall pool loop efficiency. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> * perf(grpc/pool): Add getHealthyConn benchmark and optimize atomic loads This commit introduces rigorous tests validating the benchmark performance of `getHealthyConn` inside the `grpc/pool` library. Based on the pprof analysis from these benchmarks, a major CPU bottleneck was identified originating from the internal calls invoking repeated dynamic atomic slot reading. By directly pulling slot bounds evaluation (`slen` & `p.getSlots()`) upfront outside of iterative loops and reducing the reliance of modulo evaluation by simple variable boundaries (`idx = idx % sz`), performance increased drastically, particularly under parallel benchmarking contexts reducing operation overhead times by 25%. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> * perf(grpc/pool): optimize atomic index logic in getHealthyConn Following a thorough evaluation of high-contention benchmark scenarios, I further optimized the pool loops. We observed that the initial CPU overhead generated by looping index loads using `% sz` evaluation produced negligible performance penalties when strictly decoupled from dynamic length properties. Thus, I refined the loop boundaries to avoid calling dynamic atomic index references. Performance improvements reduced `BenchmarkParallel_GetHealthyConn` latency by an additional 20%+ dropping `ns/op` overhead execution metrics when scaling thread sizes under highly congested loops. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

…mprove-grpc-pool-perf-3793426715706888587 Signed-off-by: Yusuke Kato <kpango@vdaas.org>

kpango and others added 2 commits February 20, 2026 14:24

fix(grpc): resolve broadcast concurrency and tune connection pool

24ca7a8

Signed-off-by: kpango <kpango@vdaas.org>

Refactor internal/net/grpc/pool for better performance

fe042ca

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

github-actions Bot added area/internal actions/build/amd64 Build AMD64 Docker Images size/XXL priority/low labels Feb 24, 2026

kpango and others added 2 commits February 24, 2026 08:52

fix(grpc): resolve broadcast concurrency and tune connection pool

ffcc224

Signed-off-by: kpango <kpango@vdaas.org>

kpango force-pushed the fix-grpc-broadcast-latency-17105619334351837761 branch from 24ca7a8 to ffcc224 Compare March 13, 2026 03:47

kpango and others added 2 commits March 14, 2026 09:40

Merge branch 'fix-grpc-broadcast-latency-17105619334351837761' into i…

a703874

…mprove-grpc-pool-perf-3793426715706888587 Signed-off-by: Yusuke Kato <kpango@vdaas.org>

kpango force-pushed the fix-grpc-broadcast-latency-17105619334351837761 branch 4 times, most recently from dea3cb1 to 402d9d5 Compare March 14, 2026 02:44

kpango closed this Mar 14, 2026

kpango deleted the improve-grpc-pool-perf-3793426715706888587 branch March 14, 2026 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve internal/net/grpc/pool performance#3489

Improve internal/net/grpc/pool performance#3489
kpango wants to merge 6 commits intofix-grpc-broadcast-latency-17105619334351837761from
improve-grpc-pool-perf-3793426715706888587

kpango commented Feb 24, 2026

Uh oh!

google-labs-jules Bot commented Feb 24, 2026

Uh oh!

coderabbitai Bot commented Feb 24, 2026 •

edited

Loading

Review skipped

Uh oh!

vdaas-ci commented Feb 24, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kpango commented Feb 24, 2026

Uh oh!

google-labs-jules Bot commented Feb 24, 2026

Uh oh!

coderabbitai Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

vdaas-ci commented Feb 24, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying vald with Cloudflare Pages

Uh oh!

codecov Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Feb 24, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Feb 24, 2026 •

edited

Loading

codecov Bot commented Feb 24, 2026 •

edited

Loading