From dd675bc2855963e06fbe8f7e79c18a1fd77cd4c9 Mon Sep 17 00:00:00 2001 From: Boidushya Date: Mon, 4 May 2026 20:13:13 +0530 Subject: [PATCH] infra: swap infisical for self-hosted keep --- infra/README.md | 38 ++++++++++------ infra/bootstrap.sh | 3 +- infra/files/caddy/Caddyfile | 8 ++++ infra/files/infisical/agent.yaml | 23 ---------- infra/files/infisical/infisical-agent.service | 17 ------- infra/files/infisical/lyrics-api-reload | 29 ------------ infra/files/infisical/lyrics-api.env.tpl | 5 --- infra/files/keep/keep.service | 25 +++++++++++ infra/phases/03-caddy.sh | 1 + infra/phases/05-infisical.sh | 45 ------------------- infra/phases/05-keep.sh | 33 ++++++++++++++ infra/secrets.env.example | 7 +-- 12 files changed, 93 insertions(+), 141 deletions(-) delete mode 100644 infra/files/infisical/agent.yaml delete mode 100644 infra/files/infisical/infisical-agent.service delete mode 100755 infra/files/infisical/lyrics-api-reload delete mode 100644 infra/files/infisical/lyrics-api.env.tpl create mode 100644 infra/files/keep/keep.service delete mode 100755 infra/phases/05-infisical.sh create mode 100755 infra/phases/05-keep.sh diff --git a/infra/README.md b/infra/README.md index 0cfb54c..02a7852 100644 --- a/infra/README.md +++ b/infra/README.md @@ -9,7 +9,8 @@ Everything needed to stand up the server lives here. If the Hetzner box dies or | `caddy` | Reverse proxy with TLS via Cloudflare DNS-01 | 80/443 | | `lyrics-api` | The Go API | localhost:8080 (proxied) | | `lyrics-api@.service` | Per-PR preview environments | localhost:9000+PR | -| `infisical-agent` | Syncs prod secrets from Infisical Cloud into `/etc/lyrics-api.env` and restarts the API on change | n/a | +| `keep` | Self-hosted secrets manager (the prod source of truth) | localhost:4339 (proxied) | +| `keep-agent-lyrics-api-prod.timer` | Pulls prod secrets from keep into `/etc/lyrics-api.env` every 60s and restarts the API on change | n/a | | `beszel-agent` | System metrics, reports to a hub | localhost:45876 | | `logdy` | Browser log viewer for `lyrics-api` journal | localhost:8888 (proxied) | | Backup scripts | Daily `cache.db` dump, off-site upload to Backblaze B2 | cron | @@ -25,11 +26,13 @@ Off the box, before you run bootstrap: - DNS records pre-pointed at the box (see "Manual steps" below) - A Cloudflare API token scoped to Zone:DNS:Edit on the parent zone of your domains - A Backblaze B2 application key with read+write on the backups bucket -- An Infisical Cloud machine identity (Universal Auth) with read access to the `prod` env - A Beszel hub somewhere reachable, with an agent KEY/TOKEN pair generated for this host - `secrets.env` populated next to `bootstrap.sh` (template: `secrets.env.example`) -The compiled `lyrics-api-go` binary is optional at bootstrap time. Phase 04 installs the systemd unit either way and waits to start it until both the binary and `/etc/lyrics-api.env` exist. +Two binaries are optional at bootstrap time and installed via deferred-start: + +- **`lyrics-api-go`**. Phase 04 installs the systemd unit and waits to start it until both the binary at `/opt/lyrics-api/lyrics-api-go` and `/etc/lyrics-api.env` exist. +- **`keep`**. Phase 05 installs the systemd unit, the data dir, and the `keep` system user. Build keep from the sibling repo (see [`SELF_HOSTING.md`](https://github.com/boidushya/keep/blob/main/SELF_HOSTING.md)), `scp` the binary to `/usr/local/bin/keep`, then `systemctl enable --now keep` to bring it up. ## Running @@ -48,9 +51,11 @@ Logs go to `/var/log/bli-bootstrap.log`. Phases are idempotent: re-running recon These happen outside the box and stay manual: - **Provision the Hetzner instance** with `hcloud server create --type cax21 --image ubuntu-24.04 --location hel1 ...` -- **DNS records in Cloudflare** for the four hostnames in `secrets.env` plus the preview wildcard, all proxied (orange cloud) -- **Infisical secrets** in the project's `prod` env. The agent only syncs; it does not create. +- **DNS records in Cloudflare** for the five hostnames in `secrets.env` (primary, staging, logs, metrics, keep) plus the preview wildcard, all proxied (orange cloud) - **Beszel hub** running somewhere reachable, with an agent slot for this host. The hub UI hands you the KEY/TOKEN pair for `secrets.env`. +- **keep first-run setup**. After phase 05 puts keep up at `https://$KEEP_DOMAIN`, browse to it and complete `/setup`: pick a master password (save it to a password manager), scan TOTP, save the 8 recovery codes offline. Then create project `lyrics-api`, env `prod`, bulk-import the env via the .env paste UI. +- **keep agent token for lyrics-api**. From keep's UI, mint a token for `lyrics-api/prod` with `OUTPUT=/etc/lyrics-api.env`, `RELOAD_CMD="systemctl restart lyrics-api"`, and `REQUIRED_KEYS` set to every key in your env. Paste the bootstrap install command keep generates in a root shell on the box. +- **Post-reboot unseal**. keep restarts sealed every time the host boots; SSH in and log into `https://$KEEP_DOMAIN` once to unseal, otherwise the keep-agent stays stuck on `503` and `/etc/lyrics-api.env` will not refresh. `lyrics-api` keeps running on the last-good env, so this is a "secrets won't roll until you log in" issue, not an outage. - **`cache.db` restore** from B2 if you're rebuilding after a loss. Separate process: `rclone copy b2:lyrics-api-backups/daily/ /var/lib/lyrics-api/data/cache.db`. ## Deploying the lyrics-api binary @@ -60,32 +65,35 @@ The IaC installs the systemd unit but does not ship the Go binary. Two ways to g 1. **From CI** (the path used in prod): GitHub Actions builds, `scp`s to `/opt/lyrics-api/lyrics-api-go`, then `systemctl restart lyrics-api`. 2. **From source on the box**: `git clone`, `go build -o /opt/lyrics-api/lyrics-api-go .`, `chown deploy:deploy`, `systemctl restart lyrics-api`. -Either way, `infisical-agent` writes `/etc/lyrics-api.env` once it starts, which is what unblocks the first `lyrics-api` start. +Either way, the keep-agent timer writes `/etc/lyrics-api.env` once keep is unsealed and the agent token is valid. That write is what unblocks the first `lyrics-api` start. ## Security model -Most secrets live in Infisical Cloud and sync read-only to the box. The exceptions, all kept off the world-readable systemd config: +Prod secrets live in keep's encrypted SQLite (`/var/lib/keep/keep.db`). Each value is age-encrypted under a master key wrapped by your Argon2id-derived master password. keep starts sealed; you unseal it from the web UI with your password + TOTP. After that, the keep-agent on this host pulls `/render`, checks `REQUIRED_KEYS`, atomically swaps `/etc/lyrics-api.env`, and restarts `lyrics-api`. + +The other secrets, all kept off the world-readable systemd config: - `CF_API_TOKEN` is in `/etc/caddy.env` (mode 600, root:caddy) - `B2_*` is in `/home/deploy/.config/rclone/rclone.conf` (mode 600, deploy:deploy) - `LOGDY_UI_PASS` is in `/etc/logdy.env` (mode 640, root:deploy) -- The Infisical `client-secret` is at `/etc/infisical-agent/client-secret` (mode 600, root) +- The keep agent token sits inside `/usr/local/bin/keep-agent-lyrics-api-prod.sh` (mode 755, root) as a quoted bash variable. Anyone with shell on the box can read the agent token; the blast radius is read-only access to the prod env in keep. `BESZEL_AGENT_TOKEN` is the one wart: it sits in `Environment=` lines on a mode-644 unit, since that's how the upstream installer ships it. The blast radius if leaked is impersonating the agent to the hub, which sends fake metrics but does not grant credentials back. The same EnvironmentFile pattern Caddy uses would close it; it is not done yet. -`lyrics-api.service` itself runs as `deploy` with `ProtectSystem=strict`, `ProtectHome=true`, `PrivateTmp=true`, `NoNewPrivileges=true`. UFW restricts inbound traffic to 22/80/443. `fail2ban` watches `sshd`. +`lyrics-api.service` runs as `deploy` with `ProtectSystem=strict`, `ProtectHome=true`, `PrivateTmp=true`, `NoNewPrivileges=true`. `keep.service` runs as a dedicated `keep` user with the same hardening. UFW restricts inbound traffic to 22/80/443. `fail2ban` watches `sshd`. ## Verification after bootstrap Substitute your own hostnames from `secrets.env` for `$PRIMARY_DOMAIN` and `$LOGS_DOMAIN`. ```bash -systemctl is-active caddy lyrics-api infisical-agent beszel-agent logdy fail2ban +systemctl is-active caddy lyrics-api keep keep-agent-lyrics-api-prod.timer beszel-agent logdy fail2ban ls -l /etc/caddy.env # -rw------- root caddy sudo -u nobody cat /etc/caddy.env # permission denied curl -sI https://$PRIMARY_DOMAIN/health # 200 curl -sI https://$LOGS_DOMAIN/ # 200 (then 401 on actual UI without auth) -journalctl -u infisical-agent -n 20 # successful auth + sync +curl -sI https://$KEEP_DOMAIN/ # 405 (keep only allows GET on /), proves TLS+proxy +journalctl -u keep-agent-lyrics-api-prod.service -n 20 # cycles every 60s, "[keep-agent] reloaded" only when secrets change ``` ## Selective rebuild scenarios @@ -97,13 +105,15 @@ journalctl -u infisical-agent -n 20 # successful auth + sync | Backup schedule changed | edit `files/backups/crontab.fragment`, then `sudo ./bootstrap.sh --phase 08` | | Logdy version bump | bump `LOGDY_VERSION` in `secrets.env`, then `sudo ./bootstrap.sh --phase 07` | | Beszel agent token rotated | update `BESZEL_AGENT_TOKEN` in `secrets.env`, then `sudo ./bootstrap.sh --phase 06` | +| keep binary upgraded | rebuild from sibling repo, scp to `/usr/local/bin/keep`, then `sudo systemctl restart keep` (and unseal via UI) | +| keep agent token rotated | revoke old token + mint new one in keep UI, paste the new install command on the box | ## What's intentionally not here - **Provisioning** (`hcloud server create`). One command, varies per provider, not worth scripting. - **DNS records**. Lives in Cloudflare; the UI is fine. -- **The `lyrics-api-go` binary**. Shipped from CI, not infra. -- **A self-hosted Infisical instance**. 600MB resident is more than the project warrants. +- **The `lyrics-api-go` and `keep` binaries**. Both shipped via build + scp, not infra. Keep is a sibling repo at `github.com/boidushya/keep`. +- **keep first-run setup and token minting**. Master password, TOTP, recovery codes, agent token bootstrap; all interactive in the keep UI by design. - **`cache.db` restoration**. Separate runbook, depends on which B2 snapshot you want. ## GitHub Actions configuration @@ -124,4 +134,4 @@ Repository **variables** (Settings > Secrets and variables > Actions > Variables 1. Read `/var/log/bli-bootstrap.log` for the failing phase 2. Re-run just that phase with `--phase NN` 3. If the failure is upstream (apt repo down, a GitHub release moved), the phase script is the source of truth. Open it, fix it, re-run. -4. For infisical-agent issues, `journalctl -t infisical-agent -n 50` shows the reload script's logger output. +4. For keep-agent issues, `journalctl -u keep-agent-lyrics-api-prod.service -n 50` shows curl exit codes and the `[keep-agent]` log line on rewrite. A `503` body in the curl output means keep is sealed and you need to unseal it via the UI. diff --git a/infra/bootstrap.sh b/infra/bootstrap.sh index eaf7cbd..bf33dbf 100755 --- a/infra/bootstrap.sh +++ b/infra/bootstrap.sh @@ -27,11 +27,10 @@ set -a; source "$SECRETS"; set +a REQUIRED_VARS=( CF_API_TOKEN ACME_EMAIL - INFISICAL_CLIENT_ID INFISICAL_CLIENT_SECRET INFISICAL_PROJECT_ID INFISICAL_ENV B2_KEY_ID B2_APP_KEY B2_BUCKET BESZEL_HUB_URL BESZEL_AGENT_KEY BESZEL_AGENT_TOKEN BESZEL_AGENT_PORT LOGDY_UI_PASS LOGDY_VERSION - PUBLIC_IP PRIMARY_DOMAIN STAGING_DOMAIN LOGS_DOMAIN METRICS_DOMAIN PREVIEW_WILDCARD + PUBLIC_IP PRIMARY_DOMAIN STAGING_DOMAIN LOGS_DOMAIN METRICS_DOMAIN KEEP_DOMAIN PREVIEW_WILDCARD ) missing=() for v in "${REQUIRED_VARS[@]}"; do diff --git a/infra/files/caddy/Caddyfile b/infra/files/caddy/Caddyfile index 952bfe6..fa4822b 100644 --- a/infra/files/caddy/Caddyfile +++ b/infra/files/caddy/Caddyfile @@ -48,6 +48,14 @@ __LOGS_DOMAIN__ { reverse_proxy localhost:8888 } +__KEEP_DOMAIN__ { + tls { + dns cloudflare {env.CF_API_TOKEN} + } + import security_headers + reverse_proxy localhost:4339 +} + __PREVIEW_WILDCARD__ { tls { dns cloudflare {env.CF_API_TOKEN} diff --git a/infra/files/infisical/agent.yaml b/infra/files/infisical/agent.yaml deleted file mode 100644 index f6390a6..0000000 --- a/infra/files/infisical/agent.yaml +++ /dev/null @@ -1,23 +0,0 @@ -infisical: - address: "https://app.infisical.com" - -auth: - type: "universal-auth" - config: - client-id: "/etc/infisical-agent/client-id" - client-secret: "/etc/infisical-agent/client-secret" - remove_client_secret_on_read: false - -sinks: - - type: "file" - config: - path: "/etc/infisical-agent/access-token" - -templates: - - source-path: "/etc/infisical-agent/lyrics-api.env.tpl" - destination-path: "/etc/lyrics-api.env.staging" - config: - polling-interval: 60s - execute: - timeout: 30 - command: "/usr/local/bin/lyrics-api-reload" diff --git a/infra/files/infisical/infisical-agent.service b/infra/files/infisical/infisical-agent.service deleted file mode 100644 index 774ea9d..0000000 --- a/infra/files/infisical/infisical-agent.service +++ /dev/null @@ -1,17 +0,0 @@ -[Unit] -Description=Infisical Agent (secrets sync) -After=network-online.target -Wants=network-online.target - -[Service] -Type=simple -User=root -ExecStart=/usr/bin/infisical agent --config /etc/infisical-agent/agent.yaml -Restart=on-failure -RestartSec=10 -MemoryMax=256M -NoNewPrivileges=true -PrivateTmp=true - -[Install] -WantedBy=multi-user.target diff --git a/infra/files/infisical/lyrics-api-reload b/infra/files/infisical/lyrics-api-reload deleted file mode 100755 index a84495c..0000000 --- a/infra/files/infisical/lyrics-api-reload +++ /dev/null @@ -1,29 +0,0 @@ -#!/bin/bash -set -euo pipefail - -STAGING="/etc/lyrics-api.env.staging" -REAL="/etc/lyrics-api.env" - -if [ ! -s "$STAGING" ]; then - logger -t infisical-agent "ERROR: staged env is empty, aborting" - exit 1 -fi - -if ! grep -q '^TTML_MEDIA_USER_TOKENS=' "$STAGING"; then - logger -t infisical-agent "ERROR: TTML_MEDIA_USER_TOKENS missing in staged env, aborting" - exit 1 -fi - -if [ -f "$REAL" ] && cmp -s "$STAGING" "$REAL"; then - logger -t infisical-agent "no change in env, skipping restart" - rm -f "$STAGING" - exit 0 -fi - -[ -f "$REAL" ] && cp "$REAL" "${REAL}.bak" -mv "$STAGING" "$REAL" -chmod 640 "$REAL" -chown root:deploy "$REAL" - -systemctl restart lyrics-api -logger -t infisical-agent "lyrics-api restarted after secret change" diff --git a/infra/files/infisical/lyrics-api.env.tpl b/infra/files/infisical/lyrics-api.env.tpl deleted file mode 100644 index d505d2c..0000000 --- a/infra/files/infisical/lyrics-api.env.tpl +++ /dev/null @@ -1,5 +0,0 @@ -{{- with listSecrets "__INFISICAL_PROJECT_ID__" "__INFISICAL_ENV__" "/" }} -{{- range . }} -{{ .Key }}={{ .Value }} -{{- end }} -{{- end }} diff --git a/infra/files/keep/keep.service b/infra/files/keep/keep.service new file mode 100644 index 0000000..1ea44ce --- /dev/null +++ b/infra/files/keep/keep.service @@ -0,0 +1,25 @@ +[Unit] +Description=keep secrets manager +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +User=keep +Group=keep +WorkingDirectory=/var/lib/keep +ExecStart=/usr/local/bin/keep +Environment=KEEP_DB_PATH=/var/lib/keep/keep.db +Environment=KEEP_PUBLIC_URL=https://__KEEP_DOMAIN__ +Environment=KEEP_LISTEN=:4339 +Restart=on-failure +RestartSec=5 +NoNewPrivileges=true +ProtectSystem=strict +ProtectHome=true +PrivateTmp=true +ReadWritePaths=/var/lib/keep +MemoryMax=256M + +[Install] +WantedBy=multi-user.target diff --git a/infra/phases/03-caddy.sh b/infra/phases/03-caddy.sh index b6c6711..1e8ce4e 100755 --- a/infra/phases/03-caddy.sh +++ b/infra/phases/03-caddy.sh @@ -22,6 +22,7 @@ sed -e "s|__ACME_EMAIL__|${ACME_EMAIL}|g" \ -e "s|__STAGING_DOMAIN__|${STAGING_DOMAIN}|g" \ -e "s|__LOGS_DOMAIN__|${LOGS_DOMAIN}|g" \ -e "s|__METRICS_DOMAIN__|${METRICS_DOMAIN}|g" \ + -e "s|__KEEP_DOMAIN__|${KEEP_DOMAIN}|g" \ -e "s|__PREVIEW_WILDCARD__|${PREVIEW_WILDCARD}|g" \ "$INFRA_DIR/files/caddy/Caddyfile" > /etc/caddy/Caddyfile chmod 644 /etc/caddy/Caddyfile diff --git a/infra/phases/05-infisical.sh b/infra/phases/05-infisical.sh deleted file mode 100755 index e2184d7..0000000 --- a/infra/phases/05-infisical.sh +++ /dev/null @@ -1,45 +0,0 @@ -#!/bin/bash -set -euo pipefail - -# Install Infisical CLI from official apt repo -if ! command -v infisical > /dev/null; then - curl -1sLf 'https://artifacts-cli.infisical.com/setup.deb.sh' | bash - apt-get install -y infisical -fi - -install -d -m 755 /etc/infisical-agent - -# Auth files (mode 600, root only) -printf '%s' "$INFISICAL_CLIENT_ID" > /etc/infisical-agent/client-id -printf '%s' "$INFISICAL_CLIENT_SECRET" > /etc/infisical-agent/client-secret -chmod 600 /etc/infisical-agent/client-id /etc/infisical-agent/client-secret -chown root:root /etc/infisical-agent/client-id /etc/infisical-agent/client-secret - -# Agent config -install -m 644 -o root -g root "$INFRA_DIR/files/infisical/agent.yaml" /etc/infisical-agent/agent.yaml - -# Template - substitute project id + env from secrets.env -sed -e "s|__INFISICAL_PROJECT_ID__|${INFISICAL_PROJECT_ID}|g" \ - -e "s|__INFISICAL_ENV__|${INFISICAL_ENV}|g" \ - "$INFRA_DIR/files/infisical/lyrics-api.env.tpl" > /etc/infisical-agent/lyrics-api.env.tpl -chmod 644 /etc/infisical-agent/lyrics-api.env.tpl - -# Reload script + systemd unit -install -m 755 -o root -g root \ - "$INFRA_DIR/files/infisical/lyrics-api-reload" \ - /usr/local/bin/lyrics-api-reload - -install -m 644 -o root -g root \ - "$INFRA_DIR/files/infisical/infisical-agent.service" \ - /etc/systemd/system/infisical-agent.service - -systemctl daemon-reload -systemctl enable --now infisical-agent - -# Wait briefly for first sync, then trigger reload script if staging file exists -sleep 5 -if [ -s /etc/lyrics-api.env.staging ]; then - /usr/local/bin/lyrics-api-reload || echo "WARN: initial reload failed, check journalctl -t infisical-agent" -fi - -echo "OK: infisical-agent running and synced" diff --git a/infra/phases/05-keep.sh b/infra/phases/05-keep.sh new file mode 100755 index 0000000..0fb83bf --- /dev/null +++ b/infra/phases/05-keep.sh @@ -0,0 +1,33 @@ +#!/bin/bash +set -euo pipefail + +# keep system user + data dir +if ! id keep >/dev/null 2>&1; then + useradd -r -s /usr/sbin/nologin -d /var/lib/keep keep +fi +install -d -o keep -g keep -m 0750 /var/lib/keep + +# systemd unit (substitute KEEP_DOMAIN into KEEP_PUBLIC_URL) +sed -e "s|__KEEP_DOMAIN__|${KEEP_DOMAIN}|g" \ + "$INFRA_DIR/files/keep/keep.service" > /etc/systemd/system/keep.service +chmod 644 /etc/systemd/system/keep.service +systemctl daemon-reload + +# Defer start until the binary exists. Build from sibling repo and scp /usr/local/bin/keep separately. +if [ -x /usr/local/bin/keep ]; then + systemctl enable --now keep + echo "OK: keep enabled and running" +else + echo "OK: keep unit installed (deferred start: /usr/local/bin/keep not present)" + echo " Build keep, scp to /usr/local/bin/keep on this box, then: systemctl enable --now keep" +fi + +# First-run setup, all in the keep UI: +# 1. Browse to https://${KEEP_DOMAIN}/setup. Set master password, scan TOTP, save the 8 recovery codes. +# 2. Create project lyrics-api / env prod, bulk-import secrets via the .env paste UI. +# 3. Mint a token for lyrics-api/prod with OUTPUT=/etc/lyrics-api.env, RELOAD_CMD="systemctl restart lyrics-api", +# and REQUIRED_KEYS = every key currently in /etc/lyrics-api.env. +# 4. Paste the bootstrap install command keep generates in a root shell on this box. +# +# Reminder: keep starts sealed on every host boot. Log into the UI once after a reboot to unseal it, +# otherwise the keep-agent gets 503s and /etc/lyrics-api.env stops refreshing. diff --git a/infra/secrets.env.example b/infra/secrets.env.example index 5ca4501..612fbf3 100644 --- a/infra/secrets.env.example +++ b/infra/secrets.env.example @@ -6,12 +6,6 @@ CF_API_TOKEN= ACME_EMAIL= -# === Infisical (machine identity for agent that syncs /etc/lyrics-api.env) === -INFISICAL_CLIENT_ID= -INFISICAL_CLIENT_SECRET= -INFISICAL_PROJECT_ID= -INFISICAL_ENV=prod - # === Backblaze B2 (rclone remote for off-site backups) === B2_KEY_ID= B2_APP_KEY= @@ -32,6 +26,7 @@ PRIMARY_DOMAIN= STAGING_DOMAIN= LOGS_DOMAIN= METRICS_DOMAIN= +KEEP_DOMAIN= PREVIEW_WILDCARD= # === Pinned versions (bump deliberately, not silently) ===