Skip to content

Stabilize WPT Server Infrastructure (Bridge Networking, Port Alignment & State Migration)#84

Merged
jcscottiii merged 3 commits intomasterfrom
jcscottiii/fix-cert-3
Apr 3, 2026
Merged

Stabilize WPT Server Infrastructure (Bridge Networking, Port Alignment & State Migration)#84
jcscottiii merged 3 commits intomasterfrom
jcscottiii/fix-cert-3

Conversation

@jcscottiii
Copy link
Copy Markdown
Collaborator

@jcscottiii jcscottiii commented Mar 30, 2026

This change stabilizes the WPT server cloud-native deployment by resolving persistent health check failures, unprivileged port binding conflicts, and migrating state to GCS.

Key Changes

WPT Server Deployment (cloud-init & Bridge Networking)

  • Reverted from --net=host to bridge networking: Removed --net=host and added explicit port mappings (-p 80:8000, -p 443:8443, etc.) to let the Docker daemon (running as root on the host) bind to privileged ports and forward traffic to unprivileged WPT inside the container.
  • Security Hardening (Unprivileged Execution):
    • Hardened the WPT server environment by removing root execution and sudo dependencies.
    • Created dedicated wpt-server (UID 1000) and wpt-sync (UID 1001) users.
    • Shifted all file paths and git operations inside home directories instead of /root.
    • Unified supervisord process execution without sudo.
  • Fixed binding failures: Set server_host: "0.0.0.0" in wpt-config.json.template to force WPT to listen on all interfaces and ignore hostname resolution issues.
  • Improved recovery: Added autorestart=true in supervisord.conf and RestartSec=5s in systemd to allow WPT to recover from transient failures after reboots.

Infrastructure Alignment & Load Balancing

  • Aligned ports with WPT configuration: Updated variables.tf to match what WPT actually uses internally for secondary services:
    • ws on port 8001
    • wss on port 8002
    • http2 on port 8003
  • Modernized the instance_group_manager named ports to use standard bridge networking.

Decommission Legacy Instances

  • Removed the deprecated container-startup-agent instances (wpt_servers manager and associated template) from compute.tf since the new cloud-init instances are now fully HEALTHY.

Terraform State Migration

  • Configured the gcs remote backend in versions.tf to store state in gs://wpt-live-app-tfstate/terraform/state/default.tfstate.
  • Local terraform.tfstate has been deleted (and ignored in .gitignore).

Verification Results

Running gcloud compute target-pools get-health wpt-tot-app-load-balancing shows:

healthStatus:
- healthState: HEALTHY
  instance: wpt-tot-app-cloud-init-n8l1
  ports: 80, 443, 8000, 8001, 8002, 8003, 8443

All traffic is flowing correctly to the modern cloud-init instances!

…e to GCS

Hardened the WPT server environment by removing root execution and sudo dependencies.
Migrated Terraform state to GCS and removed legacy Container VM agent overrides.

Infrastructure (Terraform):
- Created GCS bucket `gs://wpt-live-app-tfstate` with versioning for state storage.
- Switched instance templates to use a dedicated Service Account (`wpt-tot-app-sa`).
- Added IAM role `roles/storage.objectViewer` to the SA for certificate retrieval.
- Migrated local state to `gs://wpt-live-app-tfstate/terraform/state`.
- Retired legacy Container VM agent in favor of standard cloud-init.
Web Server (Docker/Supervisor):
- Created `wpt-server` (UID 1000) and `wpt-sync` (UID 1001) users.
- Granted `CAP_NET_BIND_SERVICE` to python3.10 to allow unprivileged binding to ports 80/443.
- Shifted all file paths and `git` operations inside `/home/wpt-sync` instead of `/root`.
- Unified supervisord process execution without `sudo`.
Copy link
Copy Markdown
Collaborator

@DanielRyanSmith DanielRyanSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini findings (I'm not certain of their reliability):

1. Logic Error: Unnecessary Privilege Grants & Sysctl Changes

The commit message states that CAP_NET_BIND_SERVICE was granted to python3.10 (in wpt-server-tot.Dockerfile) and sysctl net.ipv4.ip_unprivileged_port_start=80 was added (in cloud-init.yaml) to allow unprivileged binding to ports 80 and 443.
This is unnecessary. Because the deployment uses Docker's bridge networking with port mapping (-p 80:8000 -p 443:8443), the Docker daemon (which runs as root on the host) is what actually binds to the privileged ports 80 and 443. The Python WPT process inside the container only ever binds to 8000 and 8443 (as defined in wpt-config.json.template), which are unprivileged ports. The setcap and sysctl commands should be entirely removed as they solve a problem that doesn't exist under this architecture.

2. Security/Logic Flaw: World-Readable Private Keys

In src/fetch-certs.py, the code introduces a change that explicitly makes the SSL private key world-readable:

os.chmod('{}/fullchain.pem'.format(outdir), 0o644)
os.chmod('{}/privkey.pem'.format(outdir), 0o644)

Private keys (privkey.pem) should never be 0644. Since the wpt-server user was properly added to the wpt-sync group in the Dockerfile, the permissions only need to be 0640 (read access for owner and group) to allow the server to read the certificate without making it world-readable.

@jcscottiii
Copy link
Copy Markdown
Collaborator Author

This should be good to go again. @DanielRyanSmith

Copy link
Copy Markdown
Collaborator

@DanielRyanSmith DanielRyanSmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this 😊

@jcscottiii jcscottiii merged commit 8c3d066 into master Apr 3, 2026
1 check passed
@jcscottiii jcscottiii deleted the jcscottiii/fix-cert-3 branch April 3, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants