Skip to content

fix: route metadata server GCP token requests through hub client#184

Closed
scion-gteam[bot] wants to merge 2 commits into
mainfrom
scion/gcp-auth-fix
Closed

fix: route metadata server GCP token requests through hub client#184
scion-gteam[bot] wants to merge 2 commits into
mainfrom
scion/gcp-auth-fix

Conversation

@scion-gteam

@scion-gteam scion-gteam Bot commented Jun 8, 2026

Copy link
Copy Markdown

Summary

  • The metadata server made direct HTTP calls to the Hub's /api/v1/agent/gcp-token endpoint using Authorization: Bearer <app-token>, which conflicts with OIDC transport auth (IAP/Cloud Run) and uses a different header convention than the hub client (X-Scion-Agent-Token)
  • Added callback-based delegation so the metadata server routes GCP token fetches through the hub client, which has the correct OIDC transport and auth headers
  • This fixes GCP auth failures on resumed agents where sciontool doctor shows metadata server returned 502: token generation failed despite scion auth being valid

Changes

  • pkg/sciontool/metadata/server.go: Added FetchGCPToken and FetchGCPIdentityToken callbacks to Config; when set, these are used instead of direct HTTP calls. Exported GCPAccessTokenResponse type.
  • pkg/sciontool/hub/client.go: Added FetchGCPToken() and FetchGCPIdentityToken() methods that call the Hub using X-Scion-Agent-Token auth and the OIDC transport layer
  • cmd/sciontool/commands/init.go: Wired the metadata server's callbacks to the hub client using late-binding closures (hub client is created after the metadata server starts)

Test plan

  • go build ./... passes
  • go test ./pkg/sciontool/metadata/... passes (direct HTTP fallback still works)
  • go test ./pkg/sciontool/hub/... passes
  • go test ./cmd/sciontool/... passes
  • Deploy and verify sciontool doctor shows [ OK ] GCP access token retrievable on a resumed agent

@scion-gteam scion-gteam Bot force-pushed the scion/gcp-auth-fix branch 3 times, most recently from 38fa4df to 2640353 Compare June 8, 2026 21:05
Colocated docker agents ran with --network=host, making the per-agent
metadata server (127.0.0.1:18380) and telemetry OTLP receiver (:4317)
host-global singletons. Only the first agent could bind them; concurrent
or resumed agents got 'address already in use' -> sciontool doctor 502.
Host networking also leaks GCP SA identity across agents.

Route colocated docker agents at the public Caddy domain so each runs in
its own netns under bridge networking:

- ResolveDockerNetworking: add SCION_FORCE_HOST_NETWORK escape hatch;
  add DockerSupportsHostGateway capability probe (Engine >= 20.10).
- startRuntimeBroker: ContainerHubEndpoint autocompute prefers the public
  domain for colocated docker; falls back to host.docker.internal (host
  networking) when force-host is set, host-gateway is unsupported, or no
  public domain is configured (warns in the latter two cases).
- applyContainerBridgeOverride: use a public-domain ContainerHubEndpoint
  wholesale instead of grafting the localhost port (e.g. :8080) onto it.
- gce-start-hub.sh: export SCION_SERVER_BASE_URL=https://${HUB_DOMAIN}
  so the broker dispatches agents to the domain.

Scope is confined to docker + colocated; kubernetes, cloud run, podman,
and remote-hub agents are unaffected. Reverting is a one-flag rollback
(SCION_FORCE_HOST_NETWORK=1) with no redeploy.
@scion-gteam scion-gteam Bot force-pushed the scion/gcp-auth-fix branch from 6c94700 to bfd9964 Compare June 8, 2026 23:09
- DockerSupportsHostGateway: bound the 'docker version' probe with a
  5s timeout so an unresponsive daemon can't hang server startup.
- parseDockerServerVersion: scan line-by-line and tolerate a leading
  v/V prefix so daemon warnings or prefixed versions don't defeat the
  probe.
- add tests for v/V prefix, surrounding whitespace, and warning-prefixed
  multi-line output.
@ptone ptone closed this Jun 9, 2026
@ptone

ptone commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Merged upstream — PR GoogleCloudPlatform#371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant