Add TLS support#953
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds gateway→upstream TLS with two modes (ephemeral self-signed or cert-manager-managed), a new SSLConfig utility for uvicorn, service configs wired to pass TLS args, Helm templates to provision/mount certs and configure Envoy upstream TLS, and docs/values updates exposing TLS parameters. ChangesGateway Upstream TLS Implementation
Sequence DiagramsequenceDiagram
participant Client
participant GatewayEnvoy as Gateway (Envoy)
participant CertMgr as cert-manager
participant Upstream as Upstream Service
rect rgba(100, 150, 200, 0.5)
Note over CertMgr,GatewayEnvoy: Provisioning Phase
alt Ephemeral mode
GatewayEnvoy->>GatewayEnvoy: mint ephemeral cert/key at startup
GatewayEnvoy->>GatewayEnvoy: write tls.crt/tls.key to temp dir
else Cert-manager mode
GatewayEnvoy->>CertMgr: emit Certificate/Issuer resources (Helm)
CertMgr->>CertMgr: sign CA and per-upstream certs
CertMgr->>GatewayEnvoy: create secrets (osmo-*-tls)
end
end
rect rgba(150, 150, 100, 0.5)
Note over Client,Upstream: Request Phase
Client->>GatewayEnvoy: HTTP request (UI) / proxied call
GatewayEnvoy->>GatewayEnvoy: load TLS material (disk or SDS)
GatewayEnvoy->>Upstream: establish TLS connection (SNI set)
Upstream->>GatewayEnvoy: respond over TLS
GatewayEnvoy->>Client: return response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
deployments/charts/service/templates/router-service.yaml (1)
185-198: 💤 Low valueRouter uses helper for probe TLS scheme; minor inconsistency with other services.
The router service uses
osmo.upstream-probe-yamlhelper for all probes, which automatically injectsscheme: HTTPSwhen TLS is enabled. This is more DRY than the inline conditionals used in agent-service.yaml and logger-service.yaml. Consider standardizing on the helper approach across all services for consistency.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deployments/charts/service/templates/router-service.yaml` around lines 185 - 198, The router templates use the osmo.upstream-probe-yaml helper for livenessProbe, startupProbe, and readinessProbe (.Values.services.router.*) which injects scheme: HTTPS when TLS is enabled; update the agent-service and logger-service probe blocks to replace their inline TLS/HTTPS conditionals with the same osmo.upstream-probe-yaml include so all services consistently use the helper (find uses in agent-service.yaml and logger-service.yaml and swap the conditional probe blocks for the include "osmo.upstream-probe-yaml" (dict "probe" . "context" $) | nindent X matching the surrounding indentation).deployments/charts/service/templates/api-service.yaml (1)
216-240: 💤 Low valueMixed probe TLS handling approach within same file.
The
livenessProbe(line 217) uses theosmo.upstream-probe-yamlhelper, whilestartupProbe(lines 225-227) andreadinessProbe(lines 238-240) use inline scheme conditionals. This inconsistency within the same file could cause maintenance confusion. Consider standardizing on one approach.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deployments/charts/service/templates/api-service.yaml` around lines 216 - 240, The file mixes probe rendering approaches: livenessProbe uses the osmo.upstream-probe-yaml helper while startupProbe and readinessProbe use inline TLS conditionals; standardize by switching startupProbe and readinessProbe to use the same helper (osmo.upstream-probe-yaml) and pass their probe configs and context via the same dict pattern used for liveness (e.g. dict "probe" .Values.services.service.startupProbe "context" . and dict "probe" .Values.services.service.readinessProbe "context" .), and remove the inline {{- if .Values.gateway.tls.enabled }} scheme conditionals so TLS is handled consistently by the helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@deployments/charts/service/templates/api-service.yaml`:
- Around line 216-240: The file mixes probe rendering approaches: livenessProbe
uses the osmo.upstream-probe-yaml helper while startupProbe and readinessProbe
use inline TLS conditionals; standardize by switching startupProbe and
readinessProbe to use the same helper (osmo.upstream-probe-yaml) and pass their
probe configs and context via the same dict pattern used for liveness (e.g. dict
"probe" .Values.services.service.startupProbe "context" . and dict "probe"
.Values.services.service.readinessProbe "context" .), and remove the inline {{-
if .Values.gateway.tls.enabled }} scheme conditionals so TLS is handled
consistently by the helper.
In `@deployments/charts/service/templates/router-service.yaml`:
- Around line 185-198: The router templates use the osmo.upstream-probe-yaml
helper for livenessProbe, startupProbe, and readinessProbe
(.Values.services.router.*) which injects scheme: HTTPS when TLS is enabled;
update the agent-service and logger-service probe blocks to replace their inline
TLS/HTTPS conditionals with the same osmo.upstream-probe-yaml include so all
services consistently use the helper (find uses in agent-service.yaml and
logger-service.yaml and swap the conditional probe blocks for the include
"osmo.upstream-probe-yaml" (dict "probe" . "context" $) | nindent X matching the
surrounding indentation).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6edceb57-5e95-4ea5-b318-a1922bfabe23
📒 Files selected for processing (15)
deployments/charts/service/README.mddeployments/charts/service/templates/_gateway-envoy-config.tpldeployments/charts/service/templates/_gateway-helpers.tpldeployments/charts/service/templates/agent-service.yamldeployments/charts/service/templates/api-service.yamldeployments/charts/service/templates/gateway-tls.yamldeployments/charts/service/templates/logger-service.yamldeployments/charts/service/templates/router-service.yamldeployments/charts/service/values.yamlsrc/service/agent/agent_service.pysrc/service/core/service.pysrc/service/core/workflow/objects.pysrc/service/logger/logger.pysrc/service/router/router.pysrc/utils/static_config.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #953 +/- ##
==========================================
+ Coverage 42.93% 43.02% +0.09%
==========================================
Files 218 219 +1
Lines 28457 28506 +49
Branches 4255 4259 +4
==========================================
+ Hits 12218 12266 +48
- Misses 15598 15599 +1
Partials 641 641
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
📖 Docs preview: https://d3in15bfzp49i0.cloudfront.net/953/index.html |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
deployments/charts/service/templates/_gateway-envoy-config.tpl (2)
73-81:⚠️ Potential issue | 🟠 MajorFix CA bundle mounting for external
issuerRefconfigurations.When
gateway.tls.certManager.issuerRefis set to use an external issuer, the chart does not create the{{ $gwName }}-ca-tlssecret, butdeployments/charts/service/templates/gateway.yamlstill unconditionally mounts this secret at/etc/gateway-tls. The SDS config in_gateway-envoy-config.tplthen references/etc/gateway-tls/ca.crt, which will not exist at runtime. Either:
- Condition the volume mount in
gateway.yamlon whetherissuerRefis unset, OR- Document that external issuers must pre-create the
{{ $gwName }}-ca-tlssecret, OR- Extend the templates to fetch and mount the CA bundle from external issuers automatically.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deployments/charts/service/templates/_gateway-envoy-config.tpl` around lines 73 - 81, The chart mounts a CA secret at /etc/gateway-tls unconditionally while _gateway-envoy-config.tpl expects /etc/gateway-tls/ca.crt, which breaks when gateway.tls.certManager.issuerRef points to an external issuer (no {{ $gwName }}-ca-tls created). Fix by conditioning the volume/secret mount in gateway.yaml on the certManager issuerRef being unset (or on gateway.tls.certManager.enabled && not set issuerRef) so the {{ $gwName }}-ca-tls secret is only mounted when the chart created it; update the template logic that controls the mount (the volume and volumeMount entries related to /etc/gateway-tls and the secret named {{ $gwName }}-ca-tls) to match the same condition used in _gateway-envoy-config.tpl (which references /etc/gateway-tls/ca.crt).
581-590:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd SAN validation to the cert-manager upstream TLS contexts.
These blocks only establish CA trust. In Envoy,
trusted_cadoes not verify the server identity by itself, andsnionly controls the ClientHello value; without SAN validation, any cert signed by the same CA can impersonateosmo-service,osmo-router,osmo-agent,osmo-logger, or the internal JWKS endpoint. Addauto_sni_san_validation: truein each cert-manager branch to validate that the upstream certificate includes a DNS SAN matching the SNI value.Fix pattern
transport_socket: name: envoy.transport_sockets.tls typed_config: "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext sni: {{ $gw.upstreams.service.host }} + auto_sni_san_validation: true common_tls_context: validation_context_sds_secret_config: name: upstream_ca sds_config: path_config_source: path: /var/config/sds_upstream_ca.yamlApply the same change to the router (619-628), agent (680-689), logger (717-726), and internal JWKS (841-850) clusters.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deployments/charts/service/templates/_gateway-envoy-config.tpl` around lines 581 - 590, The cert-manager TLS blocks currently only set validation_context_sds_secret_config (which establishes CA trust) but do not enforce SAN validation, so add auto_sni_san_validation: true under each validation_context_sds_secret_config in the cert-manager branches to ensure the upstream certificate's DNS SAN matches the SNI; specifically update the block that contains sni: {{ $gw.upstreams.service.host }} and common_tls_context -> validation_context_sds_secret_config (for the service cluster) and apply the same change to the router, agent, logger and internal JWKS cert-manager branches that mirror this pattern (the blocks referencing upstream_ca and sds_config/path_config_source).
🧹 Nitpick comments (1)
src/utils/static_config.py (1)
94-137: 💤 Low valueConsider adding file existence validation for cert-manager mode.
When
ssl_keyfileandssl_certfileare provided (cert-manager mode), there's no validation that these files exist before returning them. If the cert-manager Secret hasn't been mounted yet, uvicorn will fail with a potentially unclear error.🛡️ Optional: Add existence check
def uvicorn_ssl_kwargs(self) -> Dict[str, Any]: """Return uvicorn keyword args for TLS, or an empty dict if TLS is off.""" if self.ssl_self_signed: keyfile, certfile = _mint_ephemeral_self_signed() return {'ssl_keyfile': keyfile, 'ssl_certfile': certfile} if self.ssl_keyfile and self.ssl_certfile: + if not os.path.isfile(self.ssl_keyfile): + raise FileNotFoundError(f"SSL key file not found: {self.ssl_keyfile}") + if not os.path.isfile(self.ssl_certfile): + raise FileNotFoundError(f"SSL cert file not found: {self.ssl_certfile}") return {'ssl_keyfile': self.ssl_keyfile, 'ssl_certfile': self.ssl_certfile} return {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/static_config.py` around lines 94 - 137, The code never validates externally provided TLS file paths (ssl_keyfile, ssl_certfile) before uvicorn uses them; add an existence check where the config returns/chooses these files (i.e., the branch that does NOT call _mint_ephemeral_self_signed) to verify os.path.exists(ssl_keyfile) and os.path.exists(ssl_certfile) and raise a clear exception or log+exit if either is missing; also ensure any return values (keyfile_path, certfile_path) are only returned after similar existence checks when using cert-manager-mounted files so uvicorn fails fast with a helpful message.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@deployments/charts/service/templates/_gateway-envoy-config.tpl`:
- Around line 73-81: The chart mounts a CA secret at /etc/gateway-tls
unconditionally while _gateway-envoy-config.tpl expects /etc/gateway-tls/ca.crt,
which breaks when gateway.tls.certManager.issuerRef points to an external issuer
(no {{ $gwName }}-ca-tls created). Fix by conditioning the volume/secret mount
in gateway.yaml on the certManager issuerRef being unset (or on
gateway.tls.certManager.enabled && not set issuerRef) so the {{ $gwName
}}-ca-tls secret is only mounted when the chart created it; update the template
logic that controls the mount (the volume and volumeMount entries related to
/etc/gateway-tls and the secret named {{ $gwName }}-ca-tls) to match the same
condition used in _gateway-envoy-config.tpl (which references
/etc/gateway-tls/ca.crt).
- Around line 581-590: The cert-manager TLS blocks currently only set
validation_context_sds_secret_config (which establishes CA trust) but do not
enforce SAN validation, so add auto_sni_san_validation: true under each
validation_context_sds_secret_config in the cert-manager branches to ensure the
upstream certificate's DNS SAN matches the SNI; specifically update the block
that contains sni: {{ $gw.upstreams.service.host }} and common_tls_context ->
validation_context_sds_secret_config (for the service cluster) and apply the
same change to the router, agent, logger and internal JWKS cert-manager branches
that mirror this pattern (the blocks referencing upstream_ca and
sds_config/path_config_source).
---
Nitpick comments:
In `@src/utils/static_config.py`:
- Around line 94-137: The code never validates externally provided TLS file
paths (ssl_keyfile, ssl_certfile) before uvicorn uses them; add an existence
check where the config returns/chooses these files (i.e., the branch that does
NOT call _mint_ephemeral_self_signed) to verify os.path.exists(ssl_keyfile) and
os.path.exists(ssl_certfile) and raise a clear exception or log+exit if either
is missing; also ensure any return values (keyfile_path, certfile_path) are only
returned after similar existence checks when using cert-manager-mounted files so
uvicorn fails fast with a helpful message.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 70718958-2b94-4ff3-9206-fec9e65a28bd
📒 Files selected for processing (11)
deployments/charts/service/README.mddeployments/charts/service/templates/_gateway-envoy-config.tpldeployments/charts/service/templates/_gateway-helpers.tpldeployments/charts/service/templates/agent-service.yamldeployments/charts/service/templates/api-service.yamldeployments/charts/service/templates/gateway-tls.yamldeployments/charts/service/templates/gateway.yamldeployments/charts/service/templates/logger-service.yamldeployments/charts/service/templates/router-service.yamldeployments/charts/service/values.yamlsrc/utils/static_config.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/utils/ssl_config.py`:
- Around line 76-83: In uvicorn_ssl_kwargs, validate TLS configuration and fail
fast: if ssl_self_signed is True and either ssl_keyfile or ssl_certfile is also
set, raise a configuration error; if exactly one of ssl_keyfile or ssl_certfile
is set (missing the other), raise a configuration error instead of returning {}.
Keep the existing successful branches (minting self-signed certs when
ssl_self_signed is True, or returning provided ssl_keyfile/ssl_certfile) but
perform these checks first to prevent silent fallback to HTTP.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4029fc36-520c-41a8-8291-fd05a68f69af
📒 Files selected for processing (10)
src/service/agent/BUILDsrc/service/agent/agent_service.pysrc/service/core/workflow/BUILDsrc/service/core/workflow/objects.pysrc/service/logger/BUILDsrc/service/logger/logger.pysrc/service/router/BUILDsrc/service/router/router.pysrc/utils/BUILDsrc/utils/ssl_config.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/service/agent/agent_service.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
deployments/charts/service/templates/_gateway-envoy-config.tpl (1)
581-598: ⚡ Quick winConsider extracting repeated upstream TLS config to a helper template.
This TLS configuration block (SNI, tls_params, conditional validation_context) is duplicated verbatim across the service, router, agent, logger, and JWKS clusters. Given the existing pattern of helper templates in
_gateway-helpers.tpl(e.g.,osmo.upstream-tls-args,osmo.upstream-tls-volume), this would fit well as a reusable helper.♻️ Example helper template
Add to
_gateway-helpers.tpl:{{- define "osmo.upstream-tls-context" -}} {{- /* Pass dict with: context, host */ -}} sni: {{ .host }} common_tls_context: tls_params: tls_minimum_protocol_version: TLSv1_2 tls_maximum_protocol_version: TLSv1_3 {{- if .context.Values.gateway.tls.certManager.enabled }} validation_context_sds_secret_config: name: upstream_ca sds_config: path_config_source: path: /var/config/sds_upstream_ca.yaml watched_directory: path: /var/config {{- end }} {{- end }}Then usage in each cluster:
transport_socket: name: envoy.transport_sockets.tls typed_config: "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext {{- include "osmo.upstream-tls-context" (dict "context" $ "host" $gw.upstreams.service.host) | nindent 10 }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deployments/charts/service/templates/_gateway-envoy-config.tpl` around lines 581 - 598, The TLS block (sni, common_tls_context including tls_params and conditional validation_context_sds_secret_config) is duplicated across multiple clusters; extract it into a helper (e.g., define "osmo.upstream-tls-context" in _gateway-helpers.tpl that accepts a dict with context and host) and replace the inlined blocks in templates like _gateway-envoy-config.tpl (where the current sni / common_tls_context appears) with an include of that helper (pass the rendering context and $gw.upstreams.service.host so the helper can reference .context.Values.gateway.tls.certManager.enabled to conditionally emit validation_context_sds_secret_config). Ensure the existing transport_socket/typed_config wrappers continue to surround the helper include so typing and indentation are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@deployments/charts/service/templates/_gateway-envoy-config.tpl`:
- Around line 581-598: The TLS block (sni, common_tls_context including
tls_params and conditional validation_context_sds_secret_config) is duplicated
across multiple clusters; extract it into a helper (e.g., define
"osmo.upstream-tls-context" in _gateway-helpers.tpl that accepts a dict with
context and host) and replace the inlined blocks in templates like
_gateway-envoy-config.tpl (where the current sni / common_tls_context appears)
with an include of that helper (pass the rendering context and
$gw.upstreams.service.host so the helper can reference
.context.Values.gateway.tls.certManager.enabled to conditionally emit
validation_context_sds_secret_config). Ensure the existing
transport_socket/typed_config wrappers continue to surround the helper include
so typing and indentation are preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fbdfff58-b9d2-41ac-a8e9-ffe3a6902692
📒 Files selected for processing (1)
deployments/charts/service/templates/_gateway-envoy-config.tpl
Issue #None
Checklist
Summary by CodeRabbit
New Features
Configuration
Documentation