feat(gateway): add gateway-wide grpc request rate limiting by alangou · Pull Request #1566 · NVIDIA/OpenShell

alangou · 2026-05-26T16:09:24Z

Summary

Adds an optional gateway-wide gRPC request rate limit. Operators can cap the number of gRPC API requests allowed per time window. The limit is applied only to gRPC API traffic after protocol multiplexing; health, metrics, and local sandbox-service HTTP routes remain outside this control. Both the request count and the window must be positive to enable the limit.

Related Issue

Part of OS-76. The default sandbox CPU/memory limit work originally scoped under OS-76 has been dropped from this PR per maintainer feedback (see review below) and will be revisited separately, so this PR intentionally narrows to the gRPC rate-limiting portion.

Changes

openshell-core config: added grpc_rate_limit_requests and grpc_rate_limit_window_secs fields, a with_grpc_rate_limit(...) builder, and a grpc_rate_limit() accessor that only returns a limit when both values are positive (None/0 disables it).
openshell-server CLI: added --grpc-rate-limit-requests / --grpc-rate-limit-window-seconds flags (with OPENSHELL_GRPC_RATE_LIMIT_* env vars) plus validation that rejects enabling only one of the pair; setting either value to 0 disables the limit.
openshell-server config file: added matching grpc_rate_limit_requests / grpc_rate_limit_window_seconds TOML fields, with CLI flags overriding file values.
openshell-server multiplex: added a fixed-window GrpcRateLimiter (shared via Arc<Mutex>) and a GrpcRateLimitService tower layer applied only to the gRPC service after multiplexing. Over-limit requests return gRPC RESOURCE_EXHAUSTED (status 8) without reaching the inner service. The limiter is built from config once and stored in shared server state.
Docs: documented the limit in architecture/gateway.md and docs/reference/gateway-config.mdx.

Testing

mise run pre-commit passes
Unit tests added/updated
E2E tests added/updated (if applicable)

Additional targeted checks run during development:

mise exec -- cargo test -p openshell-core grpc_rate_limit --lib
mise exec -- cargo test -p openshell-server grpc_rate_limit --lib
mise exec -- cargo clippy -p openshell-server --all-targets -- -D warnings

Scope and Follow-ups

This PR implements a gateway-wide gRPC limit only. Per-user and per-sandbox buckets are intentionally out of scope; follow-up work can add identity-aware limits if a single global bucket proves insufficient.
Default sandbox CPU/memory limits are no longer part of this PR (dropped per maintainer feedback) and are deferred to separate follow-up work.
The Helm chart does not yet expose grpc_rate_limit_requests / grpc_rate_limit_window_seconds in deploy/helm/openshell/templates/gateway-config.yaml. This is consistent with other gateway-only fields (e.g. ssh_session_ttl_secs) but should be added if Helm-based configuration of the limit is required.

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated (if applicable)

copy-pr-bot · 2026-05-26T16:09:28Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

github-actions · 2026-05-27T15:10:43Z

Label test:e2e applied for 8f89ad2. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

johntmyers · 2026-06-04T00:18:55Z

@alangou confirming this should only scope down to grpc limits

johntmyers · 2026-06-04T00:24:00Z

gator-agent

PR Review Status

Re-check after author update: I re-evaluated latest head 57ce3b3b28156348b49fcc3fd01fb4b42c37d5eb after @alangou's June 5 comment that the PR was updated according to reviews. The PR remains project-valid gateway work and is now correctly scoped to gateway-wide gRPC request rate limiting.

Review findings still blocking:

crates/openshell-server/src/multiplex.rs: the rate-limit wrapper still delegates poll_ready directly to the inner gRPC service. When the limiter is exhausted and the inner service is under backpressure, over-limit requests can still wait on inner readiness before returning RESOURCE_EXHAUSTED. Please add a non-consuming capacity/window rollover check for poll_ready and cover the exhausted-limiter plus pending-inner-service case.
deploy/helm/openshell/templates/gateway-config.yaml: Helm still silently omits the rate limit when exactly one of server.grpcRateLimit.requests or server.grpcRateLimit.windowSeconds is positive, and the Helm test currently asserts that omission behavior. This still diverges from CLI/config-file validation and can make operators believe protection is enabled when it is not. Please fail the render for exactly-one-positive partial configuration and update the test to assert that failure.

Docs: Fern gateway config docs are updated under docs/reference/gateway-config.mdx; no navigation change appears required. Helm README/docs are also updated.

Tests: test:e2e remains the right required test label for this gateway/runtime behavior. Current required checks are green for this head, including Branch Checks, Helm Lint, and E2E. No test:e2e-gpu or test:e2e-kubernetes need is apparent from this diff alone.

Next state: gator:in-review

alangou · 2026-06-05T12:58:20Z

@johntmyers the PR has been updated according to the reviews

johntmyers · 2026-06-06T00:35:13Z

gator-agent

Re-check After Author Update

I re-evaluated latest head 57ce3b3b28156348b49fcc3fd01fb4b42c37d5eb after @alangou's June 5 comment that the PR was updated according to reviews. That author response is acknowledged here as new timeline activity.

The PR remains project-valid gateway work, but the prior blocking findings still appear unresolved:

crates/openshell-server/src/multiplex.rs: poll_ready still delegates directly to the inner gRPC service, so an exhausted limiter can still wait on inner readiness under backpressure before returning RESOURCE_EXHAUSTED. Please add a non-consuming capacity/window rollover check for poll_ready and cover the exhausted-limiter plus pending-inner-service case.
deploy/helm/openshell/templates/gateway-config.yaml: Helm still silently omits the rate limit when exactly one of server.grpcRateLimit.requests or server.grpcRateLimit.windowSeconds is positive, and the Helm test currently asserts that omission behavior. Please fail the render for exactly-one-positive partial configuration and update the test to assert that failure.

Checks are green for this head, including Branch Checks, Helm Lint, and E2E. Next state remains gator:in-review pending author changes.

johntmyers · 2026-06-08T16:58:31Z

gator-agent

Re-check After Author Update

I re-evaluated latest head 41deebc1fc83517cba2dbc1749a5c68a0cee8cc6 after @alangou's June 8 force-push updating the PR after the prior gator feedback.

Disposition: partially resolved.

Resolved items:

crates/openshell-server/src/multiplex.rs: the limiter now has a non-consuming has_capacity() path and poll_ready returns ready when the limiter is already exhausted, so over-limit requests no longer wait on inner-service backpressure before returning RESOURCE_EXHAUSTED.
deploy/helm/openshell/templates/gateway-config.yaml: Helm now fails rendering when exactly one of server.grpcRateLimit.requests or server.grpcRateLimit.windowSeconds is positive, and the Helm tests assert both partial-configuration failures.

Remaining blocking item:

crates/openshell-server/src/multiplex.rs: the updated poll_ready/call path still has a Tower readiness edge case. If poll_ready short-circuits because the limiter is exhausted, but the rate-limit window rolls over before call, call can re-check allow() and forward to self.inner.call(req) without the inner service having reported readiness. Please record that poll_ready short-circuited an exhausted limiter for that service clone and force the following call to return RESOURCE_EXHAUSTED, then add a regression test that exhausts the limiter, observes ready against a pending inner service, advances/rewinds the window before call, and verifies the request is still rejected without calling the inner service.

Non-blocking note:

The validation error in crates/openshell-server/src/cli.rs currently names only CLI flags; it would be clearer for TOML users if it also mentioned grpc_rate_limit_requests and grpc_rate_limit_window_seconds.

Docs: Fern gateway config docs and Helm README are updated; no docs navigation change appears necessary.

Checks: Branch Checks, Helm Lint, DCO, and E2E are green for this head, and test:e2e remains the right test label for this gateway/runtime behavior.

Next state: gator:in-review

johntmyers · 2026-06-08T17:17:02Z

gator-agent

PR Review Status

I ran the required independent review for latest head 41deebc1fc83517cba2dbc1749a5c68a0cee8cc6. No new human response has arrived after the June 8 gator disposition, so this is an updated review-status comment rather than a response disposition.

Validation: this remains project-valid gateway work scoped to gateway-wide gRPC request rate limiting.

Review findings still blocking:

crates/openshell-server/src/multiplex.rs: the Tower readiness issue remains. poll_ready can return Ready because the limiter is exhausted, but call re-checks allow(). If the rate-limit window rolls over between poll_ready and call, the request can be forwarded to self.inner.call(req) without the inner service having reported readiness. Please record synthetic readiness for the service clone when poll_ready short-circuits an exhausted limiter, force the matching call to return RESOURCE_EXHAUSTED, and add the regression test described in the prior gator comment.

Non-blocking review notes:

deploy/helm/openshell/templates/gateway-config.yaml: Helm currently treats negative server.grpcRateLimit.requests / windowSeconds values as disabled because the template only checks positive values. It would be better to fail render for negative values so Helm behavior stays aligned with the Rust u64 CLI/TOML configuration boundary.
crates/openshell-server/src/cli.rs: the validation error would be clearer for TOML users if it also mentioned grpc_rate_limit_requests and grpc_rate_limit_window_seconds.

Docs: Fern gateway config docs and Helm README are updated; no docs navigation change appears necessary.

Checks: Branch Checks, Helm Lint, DCO, and E2E are green for this head. test:e2e remains the right test label for this gateway/runtime behavior.

Next state: gator:in-review

Signed-off-by: Adrien Langou <alangou@nvidia.com>

johntmyers · 2026-06-09T15:04:32Z

gator-agent

Re-check After Author Update

I re-evaluated latest head 6a2df66680e9ba4268c6d97190bebc8429cdd6e4 after @alangou's June 9 force-push updating the PR after the prior gator feedback.

Disposition: review feedback resolved.

Remaining items:

No blocking code-review items remain. The independent review found no Critical or Warning issues on this head, and the prior Tower readiness finding in crates/openshell-server/src/multiplex.rs is resolved by the synthetic-readiness guard plus rollover regression coverage.
Required CI is not green: OpenShell / Helm Lint failed in the Helm unit-test dependency fetch with Docker Hub toomanyrequests while pulling bitnamicharts/postgresql:18.6.7. This looks infrastructure-related rather than PR-caused. I attempted to rerun the failed workflow, but GitHub reported that run 27211967022 cannot be rerun.

Docs: Fern gateway config docs and Helm README are updated; no docs navigation change appears necessary.

Checks: Branch Checks, DCO, and E2E are green for this head. Helm Lint is blocked by the registry rate-limit failure above.

Next state: gator:blocked pending a fresh/allowed Helm Lint run or runner-side registry/cache fix.

johntmyers · 2026-06-09T17:57:30Z

gator-agent

Blocker Re-check

I re-checked the gator:blocked state for latest head 6a2df66680e9ba4268c6d97190bebc8429cdd6e4.

Disposition: blocker resolved for now.

The prior Helm Lint failure was caused by Docker Hub returning 429 toomanyrequests while fetching bitnamicharts/postgresql:18.6.7, not by a Helm unit-test assertion. I reran the failed Helm Lint job, and GitHub accepted the rerun. The new Helm Lint attempt is queued as job 80391271575 on run 27211967022.

Remaining items:

Wait for Helm Lint attempt 2 and the required OpenShell / Helm Lint gate to complete.

Next state: gator:watch-pipeline

johntmyers · 2026-06-09T18:17:04Z

gator-agent

Blocked

I re-checked the Helm Lint rerun for latest head 6a2df66680e9ba4268c6d97190bebc8429cdd6e4.

Disposition: blocked by CI infrastructure.

The second Helm Lint attempt failed before chart linting or Helm unit tests ran. The failed log shows helm dependency build could not fetch oci://registry-1.docker.io/bitnamicharts/postgresql:18.6.7 because Docker Hub returned 429 toomanyrequests for the unauthenticated pull rate limit. Branch Checks, DCO, and E2E remain green; no blocking code-review findings remain.

Next action: a maintainer or CI owner needs to rerun Helm Lint after the Docker Hub rate limit clears, or update the runner/chart dependency path to use an authenticated or mirrored chart source.

Next state: gator:blocked

TaylorMutch · 2026-06-09T20:17:20Z

Reran helm and it passed, I am working on a fix over in #1844 to resolve the helm lint issues

johntmyers · 2026-06-09T20:42:08Z

gator-agent

Re-check After Maintainer Update

I re-evaluated latest head 6a2df66680e9ba4268c6d97190bebc8429cdd6e4 after @TaylorMutch's June 9 comment that Helm was rerun and passed, followed by maintainer approval on this head.

Disposition: blocker resolved.

Remaining items:

No blocking code-review items remain.
Required gates are green: OpenShell / Branch Checks, OpenShell / Helm Lint, OpenShell / E2E, and OpenShell / GPU E2E.
test:e2e is applied and the core E2E gate is green; test:e2e-gpu is not applied and the GPU gate reports success for that state.

Next state: gator:approval-needed

Human maintainer approval or merge decision is now required.

elezar · 2026-06-11T11:51:06Z

+    /// When paired with [`Self::grpc_rate_limit_window_secs`], positive values
+    /// enable gateway-wide gRPC request rate limiting. `None` or `0` disables
+    /// the limit.
+    pub grpc_rate_limit_requests: Option<u64>,


One question: Why not a nested type?

elezar · 2026-06-11T11:52:54Z

+grpc_rate_limit_requests = 120
+grpc_rate_limit_window_seconds = 60


What about:

Suggested change

grpc_rate_limit_requests = 120

grpc_rate_limit_window_seconds = 60

[openshell.gateway.grpc_rate_limit]

requests = 120

window_seconds = 60

(or are single config options simpler to handle in the agentic sense?)

elezar · 2026-06-11T11:54:05Z

+| server.grpcRateLimit.requests | int | `0` | Maximum gRPC requests allowed per window. Must be positive (alongside windowSeconds) to enable rate limiting; 0 (default) disables it. |
+| server.grpcRateLimit.windowSeconds | int | `0` | gRPC rate-limit window length in seconds. Must be positive (alongside requests) to enable rate limiting; 0 (default) disables it. |


Note that the server we use already has the grpcRateLimit as a single addressable concept.

alangou marked this pull request as ready for review May 26, 2026 17:01

alangou requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 26, 2026 17:01

johntmyers added topic:security Security issues test:e2e Requires end-to-end coverage labels May 27, 2026

mrunalp reviewed May 29, 2026

View reviewed changes

Comment thread crates/openshell-core/src/config.rs Outdated

johntmyers added the gator:blocked Gator is blocked by process or repository gates label Jun 4, 2026

alangou force-pushed the alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting branch from 8f89ad2 to ee1a5b9 Compare June 5, 2026 09:59

alangou changed the title ~~feat(gateway): add resource defaults and grpc rate limiting~~ feat(gateway): add gateway-wide grpc request rate limiting Jun 5, 2026

alangou force-pushed the alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting branch from ee1a5b9 to 57ce3b3 Compare June 5, 2026 12:45

alangou requested a review from mrunalp June 5, 2026 12:56

johntmyers added gator:in-review Gator is reviewing or awaiting PR review feedback and removed gator:blocked Gator is blocked by process or repository gates labels Jun 5, 2026

alangou force-pushed the alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting branch from 57ce3b3 to 41deebc Compare June 8, 2026 14:34

feat(server): add grpc rate limiting gateway-wide

6a2df66

Signed-off-by: Adrien Langou <alangou@nvidia.com>

alangou force-pushed the alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting branch from 41deebc to 6a2df66 Compare June 9, 2026 14:08

johntmyers added gator:blocked Gator is blocked by process or repository gates and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 9, 2026

johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status gator:blocked Gator is blocked by process or repository gates and removed gator:blocked Gator is blocked by process or repository gates gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 9, 2026

TaylorMutch approved these changes Jun 9, 2026

View reviewed changes

johntmyers added gator:approval-needed Gator completed review; maintainer approval needed and removed gator:blocked Gator is blocked by process or repository gates labels Jun 9, 2026

alangou requested a review from TaylorMutch June 10, 2026 14:32

alangou merged commit b6c87a7 into NVIDIA:main Jun 11, 2026
53 of 59 checks passed

elezar reviewed Jun 11, 2026

View reviewed changes

alangou mentioned this pull request Jun 11, 2026

fix(server): use public tonic body type in gRPC rate limiter #1872

Merged

6 tasks

		grpc_rate_limit_requests = 120
		grpc_rate_limit_window_seconds = 60

		\| server.grpcRateLimit.requests \| int \| `0` \| Maximum gRPC requests allowed per window. Must be positive (alongside windowSeconds) to enable rate limiting; 0 (default) disables it. \|
		\| server.grpcRateLimit.windowSeconds \| int \| `0` \| gRPC rate-limit window length in seconds. Must be positive (alongside requests) to enable rate limiting; 0 (default) disables it. \|

Conversation

alangou commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Testing

Scope and Follow-ups

Checklist

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

johntmyers commented Jun 4, 2026

Uh oh!

johntmyers commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Status

Uh oh!

alangou commented Jun 5, 2026

Uh oh!

johntmyers commented Jun 6, 2026

Re-check After Author Update

Uh oh!

johntmyers commented Jun 8, 2026

Re-check After Author Update

Uh oh!

johntmyers commented Jun 8, 2026

PR Review Status

Uh oh!

johntmyers commented Jun 9, 2026

Re-check After Author Update

Uh oh!

johntmyers commented Jun 9, 2026

Blocker Re-check

Uh oh!

johntmyers commented Jun 9, 2026

Blocked

Uh oh!

TaylorMutch commented Jun 9, 2026

Uh oh!

johntmyers commented Jun 9, 2026

Re-check After Maintainer Update

Uh oh!

Uh oh!

elezar Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

elezar Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

elezar Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alangou commented May 26, 2026 •

edited

Loading

johntmyers commented Jun 4, 2026 •

edited

Loading