Skip to content

Improve internal/net/grpc/pool performance#3489

Closed
kpango wants to merge 6 commits intofix-grpc-broadcast-latency-17105619334351837761from
improve-grpc-pool-perf-3793426715706888587
Closed

Improve internal/net/grpc/pool performance#3489
kpango wants to merge 6 commits intofix-grpc-broadcast-latency-17105619334351837761from
improve-grpc-pool-perf-3793426715706888587

Conversation

@kpango
Copy link
Copy Markdown
Collaborator

@kpango kpango commented Feb 24, 2026

This PR improves the performance and efficiency of the gRPC connection pool.

Key changes:

  • Refactored getHealthyConn to load connection slots atomically once and iterate over the local slice, reducing repeated atomic load overhead.
  • Refactored IsHealthy to inline loop logic, avoiding closure allocation and function call overhead.
  • Refactored loop to use local slice length for iteration bounds, avoiding repeated poolSize atomic loads.
  • Optimized pool struct layout to group frequently accessed atomic fields (connSlots, currentIndex, poolSize, closing) together to improve cache locality and minimize padding.

Benchmarks show significant improvement in high contention scenarios (~26% reduction in ns/op) and modest improvements in other scenarios.
Verified with go test -bench -race.


PR created automatically by Jules for task 3793426715706888587 started by @kpango

kpango and others added 2 commits February 20, 2026 14:24
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 24, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (1)
  • main

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 028457b6-6c53-47c2-b4a8-887b98806625

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch improve-grpc-pool-perf-3793426715706888587
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use your project's `golangci-lint` configuration to improve the quality of Go code reviews.

Add a configuration file to your project to customize how CodeRabbit runs golangci-lint.

@vdaas-ci
Copy link
Copy Markdown
Collaborator

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Feb 24, 2026

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: a703874
Status: ✅  Deploy successful!
Preview URL: https://a76734b3.vald.pages.dev
Branch Preview URL: https://improve-grpc-pool-perf-37934.vald.pages.dev

View logs

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.38%. Comparing base (9e5e903) to head (a703874).
⚠️ Report is 1 commits behind head on fix-grpc-broadcast-latency-17105619334351837761.

Additional details and impacted files
@@                                 Coverage Diff                                  @@
##           fix-grpc-broadcast-latency-17105619334351837761    #3489       +/-   ##
====================================================================================
+ Coverage                                            26.77%   41.38%   +14.60%     
====================================================================================
  Files                                                  527      317      -210     
  Lines                                                48959    19084    -29875     
====================================================================================
- Hits                                                 13110     7897     -5213     
+ Misses                                               34874    10765    -24109     
+ Partials                                               975      422      -553     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kpango and others added 2 commits February 24, 2026 08:52
…rmance

- Introduce `dials` field to `pool` struct to track active dialing per slot, implementing singleflight logic to prevent duplicate dials.
- Refactor `refreshConn` to check `dials` state and return early if a dial is already in progress.
- Offload connection closure (`pc.Close`) to the pool's global `errGroup` with a detached context, preventing it from blocking the critical path in `getHealthyConn`.
- Update `init` and `grow` to manage the `dials` slice lifecycle.
- Verify performance with benchmarks, showing maintained improvements and stability.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@kpango kpango force-pushed the fix-grpc-broadcast-latency-17105619334351837761 branch from 24ca7a8 to ffcc224 Compare March 13, 2026 03:47
kpango and others added 2 commits March 14, 2026 09:40
… atomic loading (#3494)

* perf(grpc/pool): Add GetHealthyConn benchmark and optimize tight loops

This commit adds a benchmark for the `GetHealthyConn` function which is
responsible for retrieving healthy gRPC connections. Based on pprof
analysis, the iteration logic inside `getHealthyConn` and `loop`
frequently invoked `p.load(idx)` which triggers an atomic load on
`connSlots` inside of a loop. The optimization caches the evaluation
of `p.getSlots()` ahead of time which drops CPU utilization in
tightly contended scenarios and improves overall pool loop efficiency.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

* perf(grpc/pool): Add getHealthyConn benchmark and optimize atomic loads

This commit introduces rigorous tests validating the benchmark performance of `getHealthyConn` inside the `grpc/pool` library. Based on the pprof analysis from these benchmarks, a major CPU bottleneck was identified originating from the internal calls invoking repeated dynamic atomic slot reading.

By directly pulling slot bounds evaluation (`slen` & `p.getSlots()`) upfront outside of iterative loops and reducing the reliance of modulo evaluation by simple variable boundaries (`idx = idx % sz`), performance increased drastically, particularly under parallel benchmarking contexts reducing operation overhead times by 25%.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

* perf(grpc/pool): optimize atomic index logic in getHealthyConn

Following a thorough evaluation of high-contention benchmark
scenarios, I further optimized the pool loops. We observed
that the initial CPU overhead generated by looping index loads
using `% sz` evaluation produced negligible performance penalties
when strictly decoupled from dynamic length properties.
Thus, I refined the loop boundaries to avoid calling dynamic atomic
index references.

Performance improvements reduced `BenchmarkParallel_GetHealthyConn`
latency by an additional 20%+ dropping `ns/op` overhead execution metrics
when scaling thread sizes under highly congested loops.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
…mprove-grpc-pool-perf-3793426715706888587

Signed-off-by: Yusuke Kato <kpango@vdaas.org>
@kpango kpango force-pushed the fix-grpc-broadcast-latency-17105619334351837761 branch 4 times, most recently from dea3cb1 to 402d9d5 Compare March 14, 2026 02:44
@kpango kpango closed this Mar 14, 2026
@kpango kpango deleted the improve-grpc-pool-perf-3793426715706888587 branch March 14, 2026 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants