Stabilize WPT Server Infrastructure (Bridge Networking, Port Alignment & State Migration)#84
Conversation
…e to GCS Hardened the WPT server environment by removing root execution and sudo dependencies. Migrated Terraform state to GCS and removed legacy Container VM agent overrides. Infrastructure (Terraform): - Created GCS bucket `gs://wpt-live-app-tfstate` with versioning for state storage. - Switched instance templates to use a dedicated Service Account (`wpt-tot-app-sa`). - Added IAM role `roles/storage.objectViewer` to the SA for certificate retrieval. - Migrated local state to `gs://wpt-live-app-tfstate/terraform/state`. - Retired legacy Container VM agent in favor of standard cloud-init. Web Server (Docker/Supervisor): - Created `wpt-server` (UID 1000) and `wpt-sync` (UID 1001) users. - Granted `CAP_NET_BIND_SERVICE` to python3.10 to allow unprivileged binding to ports 80/443. - Shifted all file paths and `git` operations inside `/home/wpt-sync` instead of `/root`. - Unified supervisord process execution without `sudo`.
DanielRyanSmith
left a comment
There was a problem hiding this comment.
Gemini findings (I'm not certain of their reliability):
1. Logic Error: Unnecessary Privilege Grants & Sysctl Changes
The commit message states that CAP_NET_BIND_SERVICE was granted to python3.10 (in wpt-server-tot.Dockerfile) and sysctl net.ipv4.ip_unprivileged_port_start=80 was added (in cloud-init.yaml) to allow unprivileged binding to ports 80 and 443.
This is unnecessary. Because the deployment uses Docker's bridge networking with port mapping (-p 80:8000 -p 443:8443), the Docker daemon (which runs as root on the host) is what actually binds to the privileged ports 80 and 443. The Python WPT process inside the container only ever binds to 8000 and 8443 (as defined in wpt-config.json.template), which are unprivileged ports. The setcap and sysctl commands should be entirely removed as they solve a problem that doesn't exist under this architecture.
2. Security/Logic Flaw: World-Readable Private Keys
In src/fetch-certs.py, the code introduces a change that explicitly makes the SSL private key world-readable:
os.chmod('{}/fullchain.pem'.format(outdir), 0o644)
os.chmod('{}/privkey.pem'.format(outdir), 0o644)Private keys (privkey.pem) should never be 0644. Since the wpt-server user was properly added to the wpt-sync group in the Dockerfile, the permissions only need to be 0640 (read access for owner and group) to allow the server to read the certificate without making it world-readable.
|
This should be good to go again. @DanielRyanSmith |
DanielRyanSmith
left a comment
There was a problem hiding this comment.
Thanks for taking care of this 😊
This change stabilizes the WPT server cloud-native deployment by resolving persistent health check failures, unprivileged port binding conflicts, and migrating state to GCS.
Key Changes
WPT Server Deployment (cloud-init & Bridge Networking)
--net=hostto bridge networking: Removed--net=hostand added explicit port mappings (-p 80:8000,-p 443:8443, etc.) to let the Docker daemon (running as root on the host) bind to privileged ports and forward traffic to unprivileged WPT inside the container.sudodependencies.wpt-server(UID 1000) andwpt-sync(UID 1001) users.gitoperations inside home directories instead of/root.supervisordprocess execution withoutsudo.server_host: "0.0.0.0"inwpt-config.json.templateto force WPT to listen on all interfaces and ignore hostname resolution issues.autorestart=trueinsupervisord.confandRestartSec=5sin systemd to allow WPT to recover from transient failures after reboots.Infrastructure Alignment & Load Balancing
variables.tfto match what WPT actually uses internally for secondary services:wson port8001wsson port8002http2on port8003instance_group_managernamed ports to use standard bridge networking.Decommission Legacy Instances
container-startup-agentinstances (wpt_serversmanager and associated template) fromcompute.tfsince the newcloud-initinstances are now fullyHEALTHY.Terraform State Migration
gcsremote backend inversions.tfto store state ings://wpt-live-app-tfstate/terraform/state/default.tfstate.terraform.tfstatehas been deleted (and ignored in.gitignore).Verification Results
Running
gcloud compute target-pools get-health wpt-tot-app-load-balancingshows:All traffic is flowing correctly to the modern cloud-init instances!