Skip to content

feat: add check to verify backend connection in readiness probe#919

Open
nancynh wants to merge 1 commit into
mainfrom
backend-check
Open

feat: add check to verify backend connection in readiness probe#919
nancynh wants to merge 1 commit into
mainfrom
backend-check

Conversation

@nancynh
Copy link
Copy Markdown
Collaborator

@nancynh nancynh commented Apr 10, 2026

This PR adds an optional flag to set that will continuously check that a connection can be established to the server before returning a ready status in the health probe.

Fixes #909.

@nancynh nancynh force-pushed the backend-check branch 6 times, most recently from 85ba4ae to fec9183 Compare April 10, 2026 23:28
@nancynh nancynh marked this pull request as ready for review April 10, 2026 23:32
@nancynh nancynh requested a review from a team as a code owner April 10, 2026 23:32
@nancynh nancynh assigned enocom and unassigned rhatgadkar-goog Apr 10, 2026
Comment thread internal/healthcheck/healthcheck.go Outdated

if c.withBackendCheck {
var lastErr error
for {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip the for loop as Cloud Run (or K8s) will continue to invoke this endpoint at a regular interval. See https://docs.cloud.google.com/run/docs/configuring/healthchecks#readiness-probes (especially the "period" parameter).

Since these checks will result in prematurely closed database connections, what do you think about extending the metadata exchange to indicate this is a health probe (and so don't need to initiate a connection to the db)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the loop.

Fine with possibly integrating this with metadata exchange though that'll need some more thinking of how we go about that first...

Comment thread internal/healthcheck/healthcheck.go Outdated
@nancynh nancynh force-pushed the backend-check branch 3 times, most recently from d395472 to 474e7ea Compare April 16, 2026 23:24
@nancynh nancynh requested a review from enocom April 16, 2026 23:30
Copy link
Copy Markdown
Member

@enocom enocom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. I think we should double-check with our Cloud Run friends if we have the correct understanding of Direct VPC Egress setup times.

Comment thread cmd/root.go Outdated
@nancynh nancynh force-pushed the backend-check branch 2 times, most recently from 79a1957 to f539078 Compare April 22, 2026 23:19
@nancynh
Copy link
Copy Markdown
Collaborator Author

nancynh commented Apr 23, 2026

Discussed this a bit offline - moved the check to be in the startup probe instead in consideration of https://github.com/learnk8s/kubernetes-production-best-practices/blob/master/application-development.md#apps-are-independent.

Gonna hold off on merging this PR though to see what the Direct VPC Egress folks say first

@nancynh nancynh requested a review from enocom April 23, 2026 17:19
Copy link
Copy Markdown
Member

@enocom enocom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to manually test this to confirm it works as intended. Otherwise LGTM.

@enocom
Copy link
Copy Markdown
Member

enocom commented May 19, 2026

Tested this manually with a newly created PSC instance. Here’s my service config (note the new flag and the startup probe):

# Cloud Run service: hello-alloydb app + AlloyDB Auth Proxy sidecar.
#
# This file uses shell-style placeholders ($PROJECT_ID, $REGION,
# $CLUSTER, $INSTANCE). The README's deploy step expands them with
# `sed` before piping to `gcloud run services replace`.
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello-alloydb-psc
spec:
  template:
    metadata:
      annotations:
        # gen2 execution environment is required for sidecar containers.
        run.googleapis.com/execution-environment: gen2
        # Direct VPC egress lets the proxy reach AlloyDB's private IP
        # without a Serverless VPC Access connector.
        run.googleapis.com/network-interfaces: '[{"network":"default","subnetwork":"default"}]'
        run.googleapis.com/vpc-access-egress: private-ranges-only
        run.googleapis.com/container-dependencies: '{"app":["alloydb-auth-proxy"]}'
    spec:
      serviceAccountName: hello-alloydb@$PROJECT_ID.iam.gserviceaccount.com
      containers:
        # The demo app. Waits for the proxy to be ready, then queries
        # AlloyDB through it on 127.0.0.1:5432.
        - name: app
          image: gcr.io/enocom-experiments-304623/hello-alloydb:latest
          ports:
            - containerPort: 8080
          env:
            # IAM principal of the service account, with the
            # ".gserviceaccount.com" suffix stripped (AlloyDB convention).
            - name: DB_USER
              value: hello-alloydb@$PROJECT_ID.iam

        # The AlloyDB Auth Proxy as a sidecar. It authenticates with
        # Google Cloud using this service's IAM identity, requests an
        # OAuth token for IAM database authn (--auto-iam-authn), and
        # exposes a local TCP listener that the app connects to.
        - name: alloydb-auth-proxy
          image: gcr.io/$PROJECT_ID/alloydb-auth-proxy:latest
          args:
            - "--port=5432"
            - "--auto-iam-authn"
            - "--health-check"
            - "--http-address=0.0.0.0"
            - "--http-port=9090"
            - "--with-backend-check"
            - "--psc"
            - "projects/$PROJECT_ID/locations/$REGION/clusters/$CLUSTER/instances/$INSTANCE"
          startupProbe:
            httpGet:
              path: /startup
              port: 9090
            periodSeconds: 1
            failureThreshold: 30

Then in the Cloud Run service logs, I see the probe running 4 times before it succeeds and then launches my app container:

image

Comment thread internal/proxy/proxy.go
// ExitZeroOnSigterm exits with 0 exit code when Sigterm received
ExitZeroOnSigterm bool

// WithBackendCheck enables a backend connection check in the readiness
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/readiness/startup/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Option to verify backend connectivity before health check endpoints return 200

3 participants