fix: devcontainer build network resilience (apt mirrors, timeout, retries)#901
Open
simple-agent-manager[bot] wants to merge 6 commits intomainfrom
Open
fix: devcontainer build network resilience (apt mirrors, timeout, retries)#901simple-agent-manager[bot] wants to merge 6 commits intomainfrom
simple-agent-manager[bot] wants to merge 6 commits intomainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… retries) - Thread cloud provider through cloud-init to VM agent via PROVIDER env var - Add provider-specific apt mirror script (/etc/sam/apt-mirror-config.sh) that maps Hetzner → mirror.hetzner.com for container apt operations - Add apt retry config (3 retries, 30s timeout) on host via cloud-init - Add configurable devcontainer build timeout (default: 15min) to prevent indefinite hangs when apt/network is degraded - VM agent injects apt mirror config into containers before package installs - Validate provider field in cloud-init variable validation Fixes intermittent devcontainer build failures caused by containers using archive.ubuntu.com instead of Hetzner's local mirror, combined with no build timeout causing 30min+ hangs during Ubuntu repo outages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…leak)
- CRITICAL: Eliminate sh -c outer shell in injectAptMirrorConfig — use
exec.Command("docker","exec",...) with containerID as a direct argument
to prevent shell injection. Mirror hostname is resolved in Go and
validated with a hostname regex before use.
- HIGH: Call buildCancel() immediately after cmd.CombinedOutput() returns
instead of relying solely on defer, releasing the timer goroutine before
the potentially long fallback build runs.
- MEDIUM: Log findDevcontainerID failure at slog.Debug level at both call
sites so silent skips are diagnosable.
- MEDIUM: Add // Non-fatal comment to second call site for consistency.
- LOW: Downgrade devcontainerBuildContext log from Info to Debug.
- LOW: Document zero-means-disabled contract in devcontainerBuildContext.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…silience Apt retry config (Acquire::Retries, timeouts) now injects into ALL containers regardless of provider, not just Hetzner. Mirror replacement remains provider-specific. Adds TestInjectAptRetryConfig test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
archive.ubuntu.combeing slow/unreachable from Hetzner containers (discovered during Ubuntu DDoS/outage on 2026-05-05)archive.ubuntu.cominstead of the provider's local mirror (mirror.hetzner.com), causing timeouts through Docker bridge NATValidation
pnpm lintpnpm typecheckpnpm test(139 cloud-init tests, all Go tests pass)pnpm buildStaging Verification (REQUIRED for all code changes — merge-blocking)
app.sammy.party, dashboard renders, navigation worksStaging Verification Evidence
UI Compliance Checklist (Required for UI changes)
N/A: no UI changes
End-to-End Verification (Required for multi-component changes)
Data Flow Trace
apps/api/src/services/nodes.ts:provisionNode()passesprovider: targetProvidertogenerateCloudInit()packages/cloud-init/src/generate.ts:generateCloudInit()substitutes{{ provider }}in templateEnvironment=PROVIDER=hetznerpackages/vm-agent/internal/config/config.go:Load()parsesPROVIDERandDEVCONTAINER_BUILD_TIMEOUTbootstrap.go:bootstrapWorkspace()finds container, callsinjectAptRetryConfig()theninjectAptMirrorConfig()devcontainerBuildContext()wraps build command with 15m deadlineUntested Gaps
Full VM provisioning flow cannot be automated without Hetzner credentials on staging. The individual components are proven via unit/integration tests.
Specialist Review Evidence
Exceptions (If any)
Agent Preflight (Required)
Classification
External References
Codebase Impact Analysis
packages/cloud-init/— template, generation, types, validationpackages/vm-agent/— config, bootstrap (container setup)apps/api/— node provisioning (passes provider field)Documentation & Specs
N/A: internal infrastructure resilience improvements, no user-facing doc changes needed.
Constitution & Risk Check
DEVCONTAINER_BUILD_TIMEOUT, provider from env varPROVIDER.