Skip to content

feat(gateway): add gateway-wide grpc request rate limiting#1566

Merged
alangou merged 1 commit into
NVIDIA:mainfrom
alangou:alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting
Jun 11, 2026
Merged

feat(gateway): add gateway-wide grpc request rate limiting#1566
alangou merged 1 commit into
NVIDIA:mainfrom
alangou:alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting

Conversation

@alangou

@alangou alangou commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an optional gateway-wide gRPC request rate limit. Operators can cap the number of gRPC API requests allowed per time window. The limit is applied only to gRPC API traffic after protocol multiplexing; health, metrics, and local sandbox-service HTTP routes remain outside this control. Both the request count and the window must be positive to enable the limit.

Related Issue

Part of OS-76. The default sandbox CPU/memory limit work originally scoped under OS-76 has been dropped from this PR per maintainer feedback (see review below) and will be revisited separately, so this PR intentionally narrows to the gRPC rate-limiting portion.

Changes

  • openshell-core config: added grpc_rate_limit_requests and grpc_rate_limit_window_secs fields, a with_grpc_rate_limit(...) builder, and a grpc_rate_limit() accessor that only returns a limit when both values are positive (None/0 disables it).
  • openshell-server CLI: added --grpc-rate-limit-requests / --grpc-rate-limit-window-seconds flags (with OPENSHELL_GRPC_RATE_LIMIT_* env vars) plus validation that rejects enabling only one of the pair; setting either value to 0 disables the limit.
  • openshell-server config file: added matching grpc_rate_limit_requests / grpc_rate_limit_window_seconds TOML fields, with CLI flags overriding file values.
  • openshell-server multiplex: added a fixed-window GrpcRateLimiter (shared via Arc<Mutex>) and a GrpcRateLimitService tower layer applied only to the gRPC service after multiplexing. Over-limit requests return gRPC RESOURCE_EXHAUSTED (status 8) without reaching the inner service. The limiter is built from config once and stored in shared server state.
  • Docs: documented the limit in architecture/gateway.md and docs/reference/gateway-config.mdx.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Additional targeted checks run during development:

  • mise exec -- cargo test -p openshell-core grpc_rate_limit --lib
  • mise exec -- cargo test -p openshell-server grpc_rate_limit --lib
  • mise exec -- cargo clippy -p openshell-server --all-targets -- -D warnings

Scope and Follow-ups

  • This PR implements a gateway-wide gRPC limit only. Per-user and per-sandbox buckets are intentionally out of scope; follow-up work can add identity-aware limits if a single global bucket proves insufficient.
  • Default sandbox CPU/memory limits are no longer part of this PR (dropped per maintainer feedback) and are deferred to separate follow-up work.
  • The Helm chart does not yet expose grpc_rate_limit_requests / grpc_rate_limit_window_seconds in deploy/helm/openshell/templates/gateway-config.yaml. This is consistent with other gateway-only fields (e.g. ssh_session_ttl_secs) but should be added if Helm-based configuration of the limit is required.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@copy-pr-bot

copy-pr-bot Bot commented May 26, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@alangou alangou marked this pull request as ready for review May 26, 2026 17:01
@johntmyers johntmyers added topic:security Security issues test:e2e Requires end-to-end coverage labels May 27, 2026
@github-actions

Copy link
Copy Markdown

Label test:e2e applied for 8f89ad2. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

Comment thread crates/openshell-core/src/config.rs Outdated
@johntmyers

Copy link
Copy Markdown
Collaborator

@alangou confirming this should only scope down to grpc limits

@johntmyers

johntmyers commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

gator-agent

PR Review Status

Re-check after author update: I re-evaluated latest head 57ce3b3b28156348b49fcc3fd01fb4b42c37d5eb after @alangou's June 5 comment that the PR was updated according to reviews. The PR remains project-valid gateway work and is now correctly scoped to gateway-wide gRPC request rate limiting.

Review findings still blocking:

  • crates/openshell-server/src/multiplex.rs: the rate-limit wrapper still delegates poll_ready directly to the inner gRPC service. When the limiter is exhausted and the inner service is under backpressure, over-limit requests can still wait on inner readiness before returning RESOURCE_EXHAUSTED. Please add a non-consuming capacity/window rollover check for poll_ready and cover the exhausted-limiter plus pending-inner-service case.
  • deploy/helm/openshell/templates/gateway-config.yaml: Helm still silently omits the rate limit when exactly one of server.grpcRateLimit.requests or server.grpcRateLimit.windowSeconds is positive, and the Helm test currently asserts that omission behavior. This still diverges from CLI/config-file validation and can make operators believe protection is enabled when it is not. Please fail the render for exactly-one-positive partial configuration and update the test to assert that failure.

Docs: Fern gateway config docs are updated under docs/reference/gateway-config.mdx; no navigation change appears required. Helm README/docs are also updated.

Tests: test:e2e remains the right required test label for this gateway/runtime behavior. Current required checks are green for this head, including Branch Checks, Helm Lint, and E2E. No test:e2e-gpu or test:e2e-kubernetes need is apparent from this diff alone.

Next state: gator:in-review

@johntmyers johntmyers added the gator:blocked Gator is blocked by process or repository gates label Jun 4, 2026
@alangou alangou force-pushed the alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting branch from 8f89ad2 to ee1a5b9 Compare June 5, 2026 09:59
@alangou alangou changed the title feat(gateway): add resource defaults and grpc rate limiting feat(gateway): add gateway-wide grpc request rate limiting Jun 5, 2026
@alangou alangou force-pushed the alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting branch from ee1a5b9 to 57ce3b3 Compare June 5, 2026 12:45
@alangou alangou requested a review from mrunalp June 5, 2026 12:56
@alangou

alangou commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

@johntmyers the PR has been updated according to the reviews

@johntmyers johntmyers added gator:in-review Gator is reviewing or awaiting PR review feedback and removed gator:blocked Gator is blocked by process or repository gates labels Jun 5, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 57ce3b3b28156348b49fcc3fd01fb4b42c37d5eb after @alangou's June 5 comment that the PR was updated according to reviews. That author response is acknowledged here as new timeline activity.

The PR remains project-valid gateway work, but the prior blocking findings still appear unresolved:

  • crates/openshell-server/src/multiplex.rs: poll_ready still delegates directly to the inner gRPC service, so an exhausted limiter can still wait on inner readiness under backpressure before returning RESOURCE_EXHAUSTED. Please add a non-consuming capacity/window rollover check for poll_ready and cover the exhausted-limiter plus pending-inner-service case.
  • deploy/helm/openshell/templates/gateway-config.yaml: Helm still silently omits the rate limit when exactly one of server.grpcRateLimit.requests or server.grpcRateLimit.windowSeconds is positive, and the Helm test currently asserts that omission behavior. Please fail the render for exactly-one-positive partial configuration and update the test to assert that failure.

Checks are green for this head, including Branch Checks, Helm Lint, and E2E. Next state remains gator:in-review pending author changes.

@alangou alangou force-pushed the alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting branch from 57ce3b3 to 41deebc Compare June 8, 2026 14:34
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 41deebc1fc83517cba2dbc1749a5c68a0cee8cc6 after @alangou's June 8 force-push updating the PR after the prior gator feedback.

Disposition: partially resolved.

Resolved items:

  • crates/openshell-server/src/multiplex.rs: the limiter now has a non-consuming has_capacity() path and poll_ready returns ready when the limiter is already exhausted, so over-limit requests no longer wait on inner-service backpressure before returning RESOURCE_EXHAUSTED.
  • deploy/helm/openshell/templates/gateway-config.yaml: Helm now fails rendering when exactly one of server.grpcRateLimit.requests or server.grpcRateLimit.windowSeconds is positive, and the Helm tests assert both partial-configuration failures.

Remaining blocking item:

  • crates/openshell-server/src/multiplex.rs: the updated poll_ready/call path still has a Tower readiness edge case. If poll_ready short-circuits because the limiter is exhausted, but the rate-limit window rolls over before call, call can re-check allow() and forward to self.inner.call(req) without the inner service having reported readiness. Please record that poll_ready short-circuited an exhausted limiter for that service clone and force the following call to return RESOURCE_EXHAUSTED, then add a regression test that exhausts the limiter, observes ready against a pending inner service, advances/rewinds the window before call, and verifies the request is still rejected without calling the inner service.

Non-blocking note:

  • The validation error in crates/openshell-server/src/cli.rs currently names only CLI flags; it would be clearer for TOML users if it also mentioned grpc_rate_limit_requests and grpc_rate_limit_window_seconds.

Docs: Fern gateway config docs and Helm README are updated; no docs navigation change appears necessary.

Checks: Branch Checks, Helm Lint, DCO, and E2E are green for this head, and test:e2e remains the right test label for this gateway/runtime behavior.

Next state: gator:in-review

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

PR Review Status

I ran the required independent review for latest head 41deebc1fc83517cba2dbc1749a5c68a0cee8cc6. No new human response has arrived after the June 8 gator disposition, so this is an updated review-status comment rather than a response disposition.

Validation: this remains project-valid gateway work scoped to gateway-wide gRPC request rate limiting.

Review findings still blocking:

  • crates/openshell-server/src/multiplex.rs: the Tower readiness issue remains. poll_ready can return Ready because the limiter is exhausted, but call re-checks allow(). If the rate-limit window rolls over between poll_ready and call, the request can be forwarded to self.inner.call(req) without the inner service having reported readiness. Please record synthetic readiness for the service clone when poll_ready short-circuits an exhausted limiter, force the matching call to return RESOURCE_EXHAUSTED, and add the regression test described in the prior gator comment.

Non-blocking review notes:

  • deploy/helm/openshell/templates/gateway-config.yaml: Helm currently treats negative server.grpcRateLimit.requests / windowSeconds values as disabled because the template only checks positive values. It would be better to fail render for negative values so Helm behavior stays aligned with the Rust u64 CLI/TOML configuration boundary.
  • crates/openshell-server/src/cli.rs: the validation error would be clearer for TOML users if it also mentioned grpc_rate_limit_requests and grpc_rate_limit_window_seconds.

Docs: Fern gateway config docs and Helm README are updated; no docs navigation change appears necessary.

Checks: Branch Checks, Helm Lint, DCO, and E2E are green for this head. test:e2e remains the right test label for this gateway/runtime behavior.

Next state: gator:in-review

Signed-off-by: Adrien Langou <alangou@nvidia.com>
@alangou alangou force-pushed the alangou/os-76-fsr-025-add-default-cgroup-resource-limits-and-rate-limiting branch from 41deebc to 6a2df66 Compare June 9, 2026 14:08
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 6a2df66680e9ba4268c6d97190bebc8429cdd6e4 after @alangou's June 9 force-push updating the PR after the prior gator feedback.

Disposition: review feedback resolved.

Remaining items:

  • No blocking code-review items remain. The independent review found no Critical or Warning issues on this head, and the prior Tower readiness finding in crates/openshell-server/src/multiplex.rs is resolved by the synthetic-readiness guard plus rollover regression coverage.
  • Required CI is not green: OpenShell / Helm Lint failed in the Helm unit-test dependency fetch with Docker Hub toomanyrequests while pulling bitnamicharts/postgresql:18.6.7. This looks infrastructure-related rather than PR-caused. I attempted to rerun the failed workflow, but GitHub reported that run 27211967022 cannot be rerun.

Docs: Fern gateway config docs and Helm README are updated; no docs navigation change appears necessary.

Checks: Branch Checks, DCO, and E2E are green for this head. Helm Lint is blocked by the registry rate-limit failure above.

Next state: gator:blocked pending a fresh/allowed Helm Lint run or runner-side registry/cache fix.

@johntmyers johntmyers added gator:blocked Gator is blocked by process or repository gates and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 9, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Blocker Re-check

I re-checked the gator:blocked state for latest head 6a2df66680e9ba4268c6d97190bebc8429cdd6e4.

Disposition: blocker resolved for now.

The prior Helm Lint failure was caused by Docker Hub returning 429 toomanyrequests while fetching bitnamicharts/postgresql:18.6.7, not by a Helm unit-test assertion. I reran the failed Helm Lint job, and GitHub accepted the rerun. The new Helm Lint attempt is queued as job 80391271575 on run 27211967022.

Remaining items:

  • Wait for Helm Lint attempt 2 and the required OpenShell / Helm Lint gate to complete.

Next state: gator:watch-pipeline

@johntmyers johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status gator:blocked Gator is blocked by process or repository gates and removed gator:blocked Gator is blocked by process or repository gates gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 9, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Blocked

I re-checked the Helm Lint rerun for latest head 6a2df66680e9ba4268c6d97190bebc8429cdd6e4.

Disposition: blocked by CI infrastructure.

The second Helm Lint attempt failed before chart linting or Helm unit tests ran. The failed log shows helm dependency build could not fetch oci://registry-1.docker.io/bitnamicharts/postgresql:18.6.7 because Docker Hub returned 429 toomanyrequests for the unauthenticated pull rate limit. Branch Checks, DCO, and E2E remain green; no blocking code-review findings remain.

Next action: a maintainer or CI owner needs to rerun Helm Lint after the Docker Hub rate limit clears, or update the runner/chart dependency path to use an authenticated or mirrored chart source.

Next state: gator:blocked

@TaylorMutch

Copy link
Copy Markdown
Collaborator

Reran helm and it passed, I am working on a fix over in #1844 to resolve the helm lint issues

@johntmyers johntmyers added gator:approval-needed Gator completed review; maintainer approval needed and removed gator:blocked Gator is blocked by process or repository gates labels Jun 9, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Maintainer Update

I re-evaluated latest head 6a2df66680e9ba4268c6d97190bebc8429cdd6e4 after @TaylorMutch's June 9 comment that Helm was rerun and passed, followed by maintainer approval on this head.

Disposition: blocker resolved.

Remaining items:

  • No blocking code-review items remain.
  • Required gates are green: OpenShell / Branch Checks, OpenShell / Helm Lint, OpenShell / E2E, and OpenShell / GPU E2E.
  • test:e2e is applied and the core E2E gate is green; test:e2e-gpu is not applied and the GPU gate reports success for that state.

Next state: gator:approval-needed

Human maintainer approval or merge decision is now required.

@alangou alangou requested a review from TaylorMutch June 10, 2026 14:32
@alangou alangou merged commit b6c87a7 into NVIDIA:main Jun 11, 2026
53 of 59 checks passed
/// When paired with [`Self::grpc_rate_limit_window_secs`], positive values
/// enable gateway-wide gRPC request rate limiting. `None` or `0` disables
/// the limit.
pub grpc_rate_limit_requests: Option<u64>,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: Why not a nested type?

Comment on lines +356 to +357
grpc_rate_limit_requests = 120
grpc_rate_limit_window_seconds = 60

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

Suggested change
grpc_rate_limit_requests = 120
grpc_rate_limit_window_seconds = 60
[openshell.gateway.grpc_rate_limit]
requests = 120
window_seconds = 60

(or are single config options simpler to handle in the agentic sense?)

Comment on lines +197 to +198
| server.grpcRateLimit.requests | int | `0` | Maximum gRPC requests allowed per window. Must be positive (alongside windowSeconds) to enable rate limiting; 0 (default) disables it. |
| server.grpcRateLimit.windowSeconds | int | `0` | gRPC rate-limit window length in seconds. Must be positive (alongside requests) to enable rate limiting; 0 (default) disables it. |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the server we use already has the grpcRateLimit as a single addressable concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gator:approval-needed Gator completed review; maintainer approval needed test:e2e Requires end-to-end coverage topic:security Security issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants