feat: end-to-end Keycloak authentication, RBAC, and Python SDK integration#367
feat: end-to-end Keycloak authentication, RBAC, and Python SDK integration#367mahil-2040 wants to merge 14 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces a Keycloak deployment, service, and realm configuration to the Helm chart for external user authentication. The reviewer identified several critical security and configuration issues: wildcard redirect URIs and web origins on the confidential SDK client should be configurable; the admin password should be sourced from a Kubernetes Secret instead of plaintext; production mode lacks support for external databases and proxy/HTTPS settings; SSL requirements should be dynamically enabled; and global image pull secrets need to be propagated to the Keycloak pod template.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an optional Keycloak deployment to the Helm base chart to support external user authentication, including default values, a Deployment/Service, and a realm-import ConfigMap.
Changes:
- Introduces
keycloakconfiguration block in chart values. - Adds Keycloak Deployment + Service template gated by
keycloak.enabled. - Adds a realm JSON import ConfigMap template for bootstrapping clients/roles.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| manifests/charts/base/values.yaml | Adds configurable values for enabling and running a Keycloak instance |
| manifests/charts/base/templates/keycloak/keycloak.yaml | Adds Deployment/Service templates for Keycloak |
| manifests/charts/base/templates/keycloak/keycloak-realm.yaml | Adds ConfigMap for importing an initial realm configuration |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #367 +/- ##
===========================================
+ Coverage 47.57% 58.05% +10.48%
===========================================
Files 30 37 +7
Lines 2819 3474 +655
===========================================
+ Hits 1341 2017 +676
+ Misses 1338 1247 -91
- Partials 140 210 +70
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
hzxuzhonghu
left a comment
There was a problem hiding this comment.
before adding a bunch of yamls, can you first add a docs exmplain about the proposal and the workflow
@hzxuzhonghu @acsoto |
|
Either is ok |
- Declarative Realm Import: Created a ConfigMap with the realm JSON defining clients (agentcube-sdk, agentcube-router, workloadmanager, agentcube-admin) and role hierarchy (sandbox:invoke -> sandbox:manage -> admin). - Deployment & Service: Added Keycloak Deployment and Service manifests with management-port health probes and a 300s startupProbe window. - Helm Integration: Added keycloak configuration values to values.yaml, gated behind 'keycloak.enabled: false' to ensure zero impact by default. Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
- Parameterized SDK redirect URIs with secure defaults, added support for secretKeyRef for admin/database credentials, and made sslRequired enforcement dynamic based on devMode. - Added configuration blocks for external databases (PostgreSQL/MySQL) and reverse proxy settings (headers, hostname). - Propagated global imagePullSecrets, decoupled Service targetPort by using the named 'http' port, and dynamically derived the realm import filename from Helm values. Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
- Added fail-fast validation for database and proxy settings in production mode - Decoupled credentials into chart-managed Secrets with required-value checks - Added restricted securityContext for Keycloak pod and container - Migrated realm configuration to Keycloak 26 default roles and client secrets Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
5f5d385 to
9f8f5df
Compare
Okay i will continue in this one only for this time. |
- Require explicit Keycloak admin and client credentials - Enforced production Secret and external DB validation - Fixed Secret key references and quote realm import filename Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
|
@mahil-2040 I donot see your design |
Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
hzxuzhonghu
left a comment
There was a problem hiding this comment.
I would also suggest separating keycloack as a separate addon, not place it in agentcube chart
…sign - Extracted Keycloak into a standalone addon chart (manifests/charts/addons/keycloak) to ensure the core AgentCube chart remains identity-provider agnostic. - Added HA replica support and wildcard URI security validation to the Keycloak addon. - Replaced keycloak.enabled in the base chart with generic OIDC configuration (router.oidc.issuerUrl, audience, and a required rolesClaim for dynamic role extraction). Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Yeah, This is a great suggestion, it makes sense to keep the core agentcube chart lightweight and identity-provider agnostic! I have made the following changes to implement this:
|
- Quoted OIDC flags in router deployment to prevent YAML parsing errors - Fixed inaccurate existingSecret comment in values.yaml - Upgraded Keycloak addon Chart.yaml to apiVersion v2 (Helm 3) - Added validation to block scaling replicas when using embedded H2 database (devMode) Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
|
@hzxuzhonghu @acsoto Does the design look good? Should I start implementing now? |
…values Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
- Implemented provider-agnostic OIDC authentication with JWKS caching and strict JWT algorithm validation. - Added RBAC middleware to enforce configurable role-based access control. - Enhanced SessionManager to securely forward verified external user identities to WorkloadManager. Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
…dentity header Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
…nforcement - Enforced resource-level access control (RLAC) in backend and reject router JWTs missing sub claim. - Added Python SDK pluggable auth providers and routed data plane traffic via Router proxy. - Wired Keycloak realm imports, client secrets, and service account roles dynamically. - Added OIDC E2E tests Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 41 out of 42 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (3)
sdk-python/agentcube/auth.py:1
- Two concrete failure modes here: (1) if
access_tokenis missing, this raisesKeyErroror later returns"", which can silently produceAuthorization: Bearerand hard-to-debug 401s; raise a clear exception when the token field is missing/empty. (2) ifexpires_inis less than the refresh buffer (30s),_expires_atbecomes “in the past” and everyget_token()call will refetch; clamp the computed expiry so it never goes backward (e.g., ensure at least a small positive TTL).
test/e2e/run_e2e.sh:1 - Token acquisition has no validation/retries: if the port-forward isn’t ready or Keycloak returns an error page,
jq -rcan producenull/empty, and the suite will proceed with invalid tokens (causing confusing downstream failures). Add a small retry loop (or explicit HTTP status checks) and fail fast if either token is empty/null.
sdk-python/tests/test_auth.py:1 timeis imported but not used in this test file. Removing it keeps the test module clean and avoids linter noise.
…d fixed python lint issue Signed-off-by: Mahil Patel <mahilpatel0808@gmail.com>
|
@acsoto @hzxuzhonghu Please review this!! |
| oidcIssuerURL = flag.String("oidc-issuer-url", "", "OIDC provider issuer URL, enables external auth when set") | ||
| oidcAudience = flag.String("oidc-audience", "agentcube-api", "Expected audience (aud) claim in the access token") | ||
| oidcRolesClaim = flag.String("oidc-roles-claim", "", "JSON path to roles array in the JWT (e.g., realm_access.roles)") | ||
| oidcRequiredRole = flag.String("oidc-required-role", "", "Role required to access the API (e.g., sandbox:invoke)") |
There was a problem hiding this comment.
Prefer update to
oidc-issuer-url -> jwt-issuer-url
oidc-roles-claim -> jwt-role-claim
oidc-required-role -> jwt-required-role, this may need to be deleted, once we support authz, we can put iy in the api
|
|
||
| ## Before you begin | ||
|
|
||
| 1. Follow the [Getting Started](../getting-started.md) guide to install |
There was a problem hiding this comment.
Following the getstarted, it will not install spire as you showed in L33-L36
|
|
||
| | Item | Details | | ||
| |---|---| | ||
| | **Roles** | `sandbox:invoke` (default for all users), `sandbox:manage` (inherits invoke), `admin` (inherits manage) | |
There was a problem hiding this comment.
elaborate a little bit more, what is the permission of each role
| existingSecretServiceKey: "client-service-secret" | ||
| existingSecretRouterKey: "client-router-secret" | ||
| existingSecretAdminKey: "client-admin-secret" | ||
| service: | ||
| secret: "" | ||
| router: | ||
| secret: "" | ||
| admin: | ||
| secret: "" |
There was a problem hiding this comment.
please explain what's these secret are, and how to setup them
| ```bash | ||
| curl -s -w "\nHTTP Status: %{http_code}\n" \ | ||
| -H "Authorization: Bearer $KEYCLOAK_TOKEN" \ | ||
| http://localhost:8081/v1/namespaces/default/agent-runtimes/sample-agent/invocations/ |
There was a problem hiding this comment.
I doubt this is not actually run at all, the invocation is not complete
|
|
||
| | Client ID | Type | Purpose | | ||
| |-----------|------|---------| | ||
| | `agentcube-service` | Confidential (`client_credentials`) | External backend applications and automated scripts using the Python SDK (Machine-to-Machine / M2M) | |
There was a problem hiding this comment.
confusing,
does it mean caller to agentcube-service?
Or agentcube-service is the client?
| | `agentcube-service` | Confidential (`client_credentials`) | External backend applications and automated scripts using the Python SDK (Machine-to-Machine / M2M) | | ||
| | `agentcube-sdk` | Public (`authorization_code` + PKCE) | Human end users interacting via the CLI or browser (Interactive) | | ||
| | `agentcube-router` | Confidential (`client_credentials`) | Internal Router service identity | | ||
| | `agentcube-admin` | Confidential (`client_credentials`) | Administrative operations | |
There was a problem hiding this comment.
all these client-id need to be explained
| existingSecret: "" # Name of a pre-existing Secret containing admin credentials | ||
| existingSecretKey: "admin-password" # Key within the Secret holding the password |
There was a problem hiding this comment.
existingSecret is a bad name? Can you imagine how a user know the meaning of it from this name?
hzxuzhonghu
left a comment
There was a problem hiding this comment.
I have not read the implement, please address these comments first and help me understand some concepts. Thank you
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR introduces end-to-end external user authentication and authorization using Keycloak as the identity provider. It expands the initial scope to include the complete flow in a single PR: Keycloak deployment, Router integration, Resource-Level Access Control (RLAC), Python SDK support, and the formal design proposal.
1. Design Proposal
docs/design/keycloak-proposal.mddetailing the architecture, request flow, and implementation decisions for the external authentication layer.2. Keycloak Deployment
agentcuberealm, role hierarchy (sandbox:invoke→sandbox:manage→admin), and OAuth2 clients.3. Router OIDC Integration & RBAC
go-oidcwith automatic JWKS key caching.sandbox:invokerealm role on all/v1/*endpoints.--enable-external-auth(default: off, zero impact).4. Downstream Identity Forwarding & RLAC
sub) in internal PicoD JWT claims.X-AgentCube-User-Idheader to WorkloadManager.agentcube.io/owner: <user_id>label.OwnerIDbefore proxying requests.5. Python SDK Auth Integration
AuthProviderprotocol).ServiceAccountAuthforclient_credentialsflow with automatic token refresh.TokenAuthfor static tokens.6. Testing
Which issue(s) this PR fixes:
Fixes Part of #243
Special notes for your reviewer:
keycloak.enabled=true. Existing deployments are unaffected.github.com/coreos/go-oidc/v3(used by Kubernetes itself).Does this PR introduce a user-facing change?:
yes