feat(sandbox): add GCE metadata emulator for Google Cloud#1763
Conversation
|
All contributors have signed the DCO ✍️ ✅ |
|
I have read the DCO document and I hereby sign the DCO. |
cb2e254 to
28d626a
Compare
|
Label |
|
/ok to test 28d626a |
Single source of truth for GCP naming: env var aliases, provider config keys, token search order, and Vertex-specific env vars. Consumed by openshell-server, openshell-providers, and openshell-sandbox. - Add google_cloud.rs with metadata emulator host and loopback address - Define PROJECT_ID, REGION, and SERVICE_ACCOUNT_EMAIL env var aliases - Add provider config key constants for gcp provider implementations - Define TOKEN_ENV_KEYS search order (SA token takes priority over ADC) - Add Vertex-specific env vars for Goose and Claude Code SDK integration - Add STATIC_CONFIG_KEYS as union of all alias arrays for env resolution - Export module via openshell-core lib.rs Signed-off-by: Robert Sturla <rsturla@redhat.com>
Add GoogleCloudProvider and VertexProvider implementing inject_env to project GCP config (project ID, region, SA email, metadata host) into sandbox environment variables. Replace the inline Vertex AI env injection in the server with the registry-based inject_env dispatch. Also adds the google-cloud.yaml provider profile with SA JWT and ADC OAuth2 credential refresh flows. Signed-off-by: Robert Sturla <rsturla@redhat.com>
28d626a to
abde14b
Compare
|
Force pushed to fix the lint issues. "mise run pre-commit" now succeeds locally again. Previous E2E tests ongoing here - https://github.com/NVIDIA/OpenShell/actions/runs/26982657920 Edit: |
|
This is great! One nit pick though, I think Tested with: $ env UV_CACHE_DIR=/tmp/uv-cache uv run --with google-auth --with requests python /tmp/verify_gce_metadata_ip.py
ping_127_no_port False
ping_127_with_port Truehere's the contents of import os
import threading
import http.server
import socketserver
import google.auth.compute_engine._metadata as metadata
from google.auth.transport.requests import Request
class MetadataHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header("Metadata-Flavor", "Google")
self.end_headers()
def log_message(self, *args):
pass
def main():
server = socketserver.TCPServer(("127.0.0.1", 8174), MetadataHandler)
thread = threading.Thread(target=server.serve_forever, daemon=True)
thread.start()
try:
request = Request()
os.environ["GCE_METADATA_IP"] = "127.0.0.1"
print("ping_127_no_port", metadata.ping(request, timeout=1, retry_count=1))
os.environ["GCE_METADATA_IP"] = "127.0.0.1:8174"
print("ping_127_with_port", metadata.ping(request, timeout=1, retry_count=1))
finally:
server.shutdown()
server.server_close()
if __name__ == "__main__":
main() |
Add a loopback HTTP server on 127.0.0.1:8174 inside the sandbox network namespace that emulates the GCE instance metadata API. GCP client SDKs discover it via GCE_METADATA_HOST and obtain credential placeholders that the proxy resolves to real tokens at egress. Add metadata_server module with MetadataHandler trait and netns-aware TCP binding via std::thread (not spawn_blocking) to avoid tokio pool namespace contamination Add google_cloud_metadata module implementing the GCE metadata API subset (token, project-id, email, scopes, service-accounts) Add child_env_resolved() and gcp_token_response() to ProviderCredentialState for GCP-aware credential projection Wire metadata server into sandbox lifecycle before SSH handler Collapse multi-line HTTP response format string into single line Signed-off-by: Robert Sturla <rsturla@redhat.com>
abde14b to
009f28a
Compare
|
Awesome spot! Ran through your reproducer and confirmed it now works as expected. |
009f28a to
2d776a7
Compare
Document the google-cloud provider setup for ADC and service account flows, injected environment variables, metadata emulator behavior, and network policy configuration for GCP APIs. Signed-off-by: Robert Sturla <rsturla@redhat.com>
Widen --from-gcloud-adc to accept google-cloud providers. The ADC credential key is derived from the provider profile rather than hardcoded per type, so future GCP provider types get ADC support by declaring the right refresh metadata in their profile YAML. Add ProviderTypeProfile::adc_credential() to find the ADC-compatible credential from a profile's refresh metadata. Remove unused VERTEX_AI_ADC_TOKEN_KEY and GCP_ADC_TOKEN_KEY constants. Signed-off-by: Robert Sturla <rsturla@redhat.com>
2d776a7 to
aada3da
Compare
|
One thing I'd like to consider here is OpenShell including something like https://github.com/LobsterTrap/llmproxy by default - i.e. an inference endpoint that always appears to be OpenResponses compatible to inner tooling. Wouldn't work with Claude Code (AFAIK without hacks) but it'd be nice to just entirely remove needing to handle inference provider auth at all for all the tools that can speak OpenResponses. It's a heavier hammer here though. Of course it's worth noting that many non-local deployments will probably end up wanting some kind of proxy anyways to handle observability etc. There's various existing more heavyweight things in that space. |
|
Thanks, these changes work for me standalone following Adam's instructions. Will vertex provider stay or get stripped out? It might be confusing if it exists but doesn't work with CC. |
Summary
Right now you can't use Google Cloud APIs (Vertex AI, Cloud Storage, BigQuery, Drive, Maps, etc.) from inside a sandbox. GCP SDKs expect a metadata server to be running and query it to get tokens - but there's no metadata server in the sandbox, so they fail before any API call is even attempted.
Go's metadata client makes this worse. It dials the metadata IP directly over TCP, bypassing
HTTP_PROXYentirely, so the sandbox proxy never even sees the request. Additionally, there's no way to override this from within the SDK config.This PR adds a google-cloud provider type and a GCE metadata emulator running on loopback (
127.0.0.1:8174) inside the sandbox network namespace. GCP SDKs find it viaGCE_METADATA_HOST, get credential placeholders back, and include those in their API calls. The proxy resolves placeholders to real tokens at egress. The sandbox process never holds a real credential.Related Issue
Closes #1706
Changes
google_cloudmodule with shared constants and loopback addressgoogle_cloud_metadatamodule implementing GCE metadata APImetadata_servermodule with MetadataHandler trait for provider- agnostic loopback server lifecyclechild_env_resolved()andgcp_token_response()for GCP-aware credential statestd::thread::spawn+setns(notspawn_blocking) to avoid tokio thread pool namespace contaminationgoogle-cloud.yamlprovider profile and credentials documentationTesting
mise run pre-commitpassesChecklist