feat(gateway): add gateway-wide grpc request rate limiting#1566
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Label |
|
@alangou confirming this should only scope down to grpc limits |
PR Review StatusRe-check after author update: I re-evaluated latest head Review findings still blocking:
Docs: Fern gateway config docs are updated under Tests: Next state: |
8f89ad2 to
ee1a5b9
Compare
ee1a5b9 to
57ce3b3
Compare
|
@johntmyers the PR has been updated according to the reviews |
Re-check After Author UpdateI re-evaluated latest head The PR remains project-valid gateway work, but the prior blocking findings still appear unresolved:
Checks are green for this head, including Branch Checks, Helm Lint, and E2E. Next state remains |
57ce3b3 to
41deebc
Compare
Re-check After Author UpdateI re-evaluated latest head Disposition: partially resolved. Resolved items:
Remaining blocking item:
Non-blocking note:
Docs: Fern gateway config docs and Helm README are updated; no docs navigation change appears necessary. Checks: Branch Checks, Helm Lint, DCO, and E2E are green for this head, and Next state: |
PR Review StatusI ran the required independent review for latest head Validation: this remains project-valid gateway work scoped to gateway-wide gRPC request rate limiting. Review findings still blocking:
Non-blocking review notes:
Docs: Fern gateway config docs and Helm README are updated; no docs navigation change appears necessary. Checks: Branch Checks, Helm Lint, DCO, and E2E are green for this head. Next state: |
Signed-off-by: Adrien Langou <alangou@nvidia.com>
41deebc to
6a2df66
Compare
Re-check After Author UpdateI re-evaluated latest head Disposition: review feedback resolved. Remaining items:
Docs: Fern gateway config docs and Helm README are updated; no docs navigation change appears necessary. Checks: Branch Checks, DCO, and E2E are green for this head. Helm Lint is blocked by the registry rate-limit failure above. Next state: |
Blocker Re-checkI re-checked the Disposition: blocker resolved for now. The prior Helm Lint failure was caused by Docker Hub returning Remaining items:
Next state: |
BlockedI re-checked the Helm Lint rerun for latest head Disposition: blocked by CI infrastructure. The second Helm Lint attempt failed before chart linting or Helm unit tests ran. The failed log shows Next action: a maintainer or CI owner needs to rerun Helm Lint after the Docker Hub rate limit clears, or update the runner/chart dependency path to use an authenticated or mirrored chart source. Next state: |
|
Reran helm and it passed, I am working on a fix over in #1844 to resolve the helm lint issues |
Re-check After Maintainer UpdateI re-evaluated latest head Disposition: blocker resolved. Remaining items:
Next state: Human maintainer approval or merge decision is now required. |
| /// When paired with [`Self::grpc_rate_limit_window_secs`], positive values | ||
| /// enable gateway-wide gRPC request rate limiting. `None` or `0` disables | ||
| /// the limit. | ||
| pub grpc_rate_limit_requests: Option<u64>, |
There was a problem hiding this comment.
One question: Why not a nested type?
| grpc_rate_limit_requests = 120 | ||
| grpc_rate_limit_window_seconds = 60 |
There was a problem hiding this comment.
What about:
| grpc_rate_limit_requests = 120 | |
| grpc_rate_limit_window_seconds = 60 | |
| [openshell.gateway.grpc_rate_limit] | |
| requests = 120 | |
| window_seconds = 60 |
(or are single config options simpler to handle in the agentic sense?)
| | server.grpcRateLimit.requests | int | `0` | Maximum gRPC requests allowed per window. Must be positive (alongside windowSeconds) to enable rate limiting; 0 (default) disables it. | | ||
| | server.grpcRateLimit.windowSeconds | int | `0` | gRPC rate-limit window length in seconds. Must be positive (alongside requests) to enable rate limiting; 0 (default) disables it. | |
There was a problem hiding this comment.
Note that the server we use already has the grpcRateLimit as a single addressable concept.
Summary
Adds an optional gateway-wide gRPC request rate limit. Operators can cap the number of gRPC API requests allowed per time window. The limit is applied only to gRPC API traffic after protocol multiplexing; health, metrics, and local sandbox-service HTTP routes remain outside this control. Both the request count and the window must be positive to enable the limit.
Related Issue
Part of OS-76. The default sandbox CPU/memory limit work originally scoped under OS-76 has been dropped from this PR per maintainer feedback (see review below) and will be revisited separately, so this PR intentionally narrows to the gRPC rate-limiting portion.
Changes
openshell-coreconfig: addedgrpc_rate_limit_requestsandgrpc_rate_limit_window_secsfields, awith_grpc_rate_limit(...)builder, and agrpc_rate_limit()accessor that only returns a limit when both values are positive (None/0disables it).openshell-serverCLI: added--grpc-rate-limit-requests/--grpc-rate-limit-window-secondsflags (withOPENSHELL_GRPC_RATE_LIMIT_*env vars) plus validation that rejects enabling only one of the pair; setting either value to0disables the limit.openshell-serverconfig file: added matchinggrpc_rate_limit_requests/grpc_rate_limit_window_secondsTOML fields, with CLI flags overriding file values.openshell-servermultiplex: added a fixed-windowGrpcRateLimiter(shared viaArc<Mutex>) and aGrpcRateLimitServicetower layer applied only to the gRPC service after multiplexing. Over-limit requests return gRPCRESOURCE_EXHAUSTED(status 8) without reaching the inner service. The limiter is built from config once and stored in shared server state.architecture/gateway.mdanddocs/reference/gateway-config.mdx.Testing
mise run pre-commitpassesAdditional targeted checks run during development:
mise exec -- cargo test -p openshell-core grpc_rate_limit --libmise exec -- cargo test -p openshell-server grpc_rate_limit --libmise exec -- cargo clippy -p openshell-server --all-targets -- -D warningsScope and Follow-ups
grpc_rate_limit_requests/grpc_rate_limit_window_secondsindeploy/helm/openshell/templates/gateway-config.yaml. This is consistent with other gateway-only fields (e.g.ssh_session_ttl_secs) but should be added if Helm-based configuration of the limit is required.Checklist