Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 24 additions & 14 deletions infra/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ Everything needed to stand up the server lives here. If the Hetzner box dies or
| `caddy` | Reverse proxy with TLS via Cloudflare DNS-01 | 80/443 |
| `lyrics-api` | The Go API | localhost:8080 (proxied) |
| `lyrics-api@.service` | Per-PR preview environments | localhost:9000+PR |
| `infisical-agent` | Syncs prod secrets from Infisical Cloud into `/etc/lyrics-api.env` and restarts the API on change | n/a |
| `keep` | Self-hosted secrets manager (the prod source of truth) | localhost:4339 (proxied) |
| `keep-agent-lyrics-api-prod.timer` | Pulls prod secrets from keep into `/etc/lyrics-api.env` every 60s and restarts the API on change | n/a |
| `beszel-agent` | System metrics, reports to a hub | localhost:45876 |
| `logdy` | Browser log viewer for `lyrics-api` journal | localhost:8888 (proxied) |
| Backup scripts | Daily `cache.db` dump, off-site upload to Backblaze B2 | cron |
Expand All @@ -25,11 +26,13 @@ Off the box, before you run bootstrap:
- DNS records pre-pointed at the box (see "Manual steps" below)
- A Cloudflare API token scoped to Zone:DNS:Edit on the parent zone of your domains
- A Backblaze B2 application key with read+write on the backups bucket
- An Infisical Cloud machine identity (Universal Auth) with read access to the `prod` env
- A Beszel hub somewhere reachable, with an agent KEY/TOKEN pair generated for this host
- `secrets.env` populated next to `bootstrap.sh` (template: `secrets.env.example`)

The compiled `lyrics-api-go` binary is optional at bootstrap time. Phase 04 installs the systemd unit either way and waits to start it until both the binary and `/etc/lyrics-api.env` exist.
Two binaries are optional at bootstrap time and installed via deferred-start:

- **`lyrics-api-go`**. Phase 04 installs the systemd unit and waits to start it until both the binary at `/opt/lyrics-api/lyrics-api-go` and `/etc/lyrics-api.env` exist.
- **`keep`**. Phase 05 installs the systemd unit, the data dir, and the `keep` system user. Build keep from the sibling repo (see [`SELF_HOSTING.md`](https://github.com/boidushya/keep/blob/main/SELF_HOSTING.md)), `scp` the binary to `/usr/local/bin/keep`, then `systemctl enable --now keep` to bring it up.

## Running

Expand All @@ -48,9 +51,11 @@ Logs go to `/var/log/bli-bootstrap.log`. Phases are idempotent: re-running recon
These happen outside the box and stay manual:

- **Provision the Hetzner instance** with `hcloud server create --type cax21 --image ubuntu-24.04 --location hel1 ...`
- **DNS records in Cloudflare** for the four hostnames in `secrets.env` plus the preview wildcard, all proxied (orange cloud)
- **Infisical secrets** in the project's `prod` env. The agent only syncs; it does not create.
- **DNS records in Cloudflare** for the five hostnames in `secrets.env` (primary, staging, logs, metrics, keep) plus the preview wildcard, all proxied (orange cloud)
- **Beszel hub** running somewhere reachable, with an agent slot for this host. The hub UI hands you the KEY/TOKEN pair for `secrets.env`.
- **keep first-run setup**. After phase 05 puts keep up at `https://$KEEP_DOMAIN`, browse to it and complete `/setup`: pick a master password (save it to a password manager), scan TOTP, save the 8 recovery codes offline. Then create project `lyrics-api`, env `prod`, bulk-import the env via the .env paste UI.
- **keep agent token for lyrics-api**. From keep's UI, mint a token for `lyrics-api/prod` with `OUTPUT=/etc/lyrics-api.env`, `RELOAD_CMD="systemctl restart lyrics-api"`, and `REQUIRED_KEYS` set to every key in your env. Paste the bootstrap install command keep generates in a root shell on the box.
- **Post-reboot unseal**. keep restarts sealed every time the host boots; SSH in and log into `https://$KEEP_DOMAIN` once to unseal, otherwise the keep-agent stays stuck on `503` and `/etc/lyrics-api.env` will not refresh. `lyrics-api` keeps running on the last-good env, so this is a "secrets won't roll until you log in" issue, not an outage.
- **`cache.db` restore** from B2 if you're rebuilding after a loss. Separate process: `rclone copy b2:lyrics-api-backups/daily/<file> /var/lib/lyrics-api/data/cache.db`.

## Deploying the lyrics-api binary
Expand All @@ -60,32 +65,35 @@ The IaC installs the systemd unit but does not ship the Go binary. Two ways to g
1. **From CI** (the path used in prod): GitHub Actions builds, `scp`s to `/opt/lyrics-api/lyrics-api-go`, then `systemctl restart lyrics-api`.
2. **From source on the box**: `git clone`, `go build -o /opt/lyrics-api/lyrics-api-go .`, `chown deploy:deploy`, `systemctl restart lyrics-api`.

Either way, `infisical-agent` writes `/etc/lyrics-api.env` once it starts, which is what unblocks the first `lyrics-api` start.
Either way, the keep-agent timer writes `/etc/lyrics-api.env` once keep is unsealed and the agent token is valid. That write is what unblocks the first `lyrics-api` start.

## Security model

Most secrets live in Infisical Cloud and sync read-only to the box. The exceptions, all kept off the world-readable systemd config:
Prod secrets live in keep's encrypted SQLite (`/var/lib/keep/keep.db`). Each value is age-encrypted under a master key wrapped by your Argon2id-derived master password. keep starts sealed; you unseal it from the web UI with your password + TOTP. After that, the keep-agent on this host pulls `/render`, checks `REQUIRED_KEYS`, atomically swaps `/etc/lyrics-api.env`, and restarts `lyrics-api`.

The other secrets, all kept off the world-readable systemd config:

- `CF_API_TOKEN` is in `/etc/caddy.env` (mode 600, root:caddy)
- `B2_*` is in `/home/deploy/.config/rclone/rclone.conf` (mode 600, deploy:deploy)
- `LOGDY_UI_PASS` is in `/etc/logdy.env` (mode 640, root:deploy)
- The Infisical `client-secret` is at `/etc/infisical-agent/client-secret` (mode 600, root)
- The keep agent token sits inside `/usr/local/bin/keep-agent-lyrics-api-prod.sh` (mode 755, root) as a quoted bash variable. Anyone with shell on the box can read the agent token; the blast radius is read-only access to the prod env in keep.

`BESZEL_AGENT_TOKEN` is the one wart: it sits in `Environment=` lines on a mode-644 unit, since that's how the upstream installer ships it. The blast radius if leaked is impersonating the agent to the hub, which sends fake metrics but does not grant credentials back. The same EnvironmentFile pattern Caddy uses would close it; it is not done yet.

`lyrics-api.service` itself runs as `deploy` with `ProtectSystem=strict`, `ProtectHome=true`, `PrivateTmp=true`, `NoNewPrivileges=true`. UFW restricts inbound traffic to 22/80/443. `fail2ban` watches `sshd`.
`lyrics-api.service` runs as `deploy` with `ProtectSystem=strict`, `ProtectHome=true`, `PrivateTmp=true`, `NoNewPrivileges=true`. `keep.service` runs as a dedicated `keep` user with the same hardening. UFW restricts inbound traffic to 22/80/443. `fail2ban` watches `sshd`.

## Verification after bootstrap

Substitute your own hostnames from `secrets.env` for `$PRIMARY_DOMAIN` and `$LOGS_DOMAIN`.

```bash
systemctl is-active caddy lyrics-api infisical-agent beszel-agent logdy fail2ban
systemctl is-active caddy lyrics-api keep keep-agent-lyrics-api-prod.timer beszel-agent logdy fail2ban
ls -l /etc/caddy.env # -rw------- root caddy
sudo -u nobody cat /etc/caddy.env # permission denied
curl -sI https://$PRIMARY_DOMAIN/health # 200
curl -sI https://$LOGS_DOMAIN/ # 200 (then 401 on actual UI without auth)
journalctl -u infisical-agent -n 20 # successful auth + sync
curl -sI https://$KEEP_DOMAIN/ # 405 (keep only allows GET on /), proves TLS+proxy
journalctl -u keep-agent-lyrics-api-prod.service -n 20 # cycles every 60s, "[keep-agent] reloaded" only when secrets change
```

## Selective rebuild scenarios
Expand All @@ -97,13 +105,15 @@ journalctl -u infisical-agent -n 20 # successful auth + sync
| Backup schedule changed | edit `files/backups/crontab.fragment`, then `sudo ./bootstrap.sh --phase 08` |
| Logdy version bump | bump `LOGDY_VERSION` in `secrets.env`, then `sudo ./bootstrap.sh --phase 07` |
| Beszel agent token rotated | update `BESZEL_AGENT_TOKEN` in `secrets.env`, then `sudo ./bootstrap.sh --phase 06` |
| keep binary upgraded | rebuild from sibling repo, scp to `/usr/local/bin/keep`, then `sudo systemctl restart keep` (and unseal via UI) |
| keep agent token rotated | revoke old token + mint new one in keep UI, paste the new install command on the box |

## What's intentionally not here

- **Provisioning** (`hcloud server create`). One command, varies per provider, not worth scripting.
- **DNS records**. Lives in Cloudflare; the UI is fine.
- **The `lyrics-api-go` binary**. Shipped from CI, not infra.
- **A self-hosted Infisical instance**. 600MB resident is more than the project warrants.
- **The `lyrics-api-go` and `keep` binaries**. Both shipped via build + scp, not infra. Keep is a sibling repo at `github.com/boidushya/keep`.
- **keep first-run setup and token minting**. Master password, TOTP, recovery codes, agent token bootstrap; all interactive in the keep UI by design.
- **`cache.db` restoration**. Separate runbook, depends on which B2 snapshot you want.

## GitHub Actions configuration
Expand All @@ -124,4 +134,4 @@ Repository **variables** (Settings > Secrets and variables > Actions > Variables
1. Read `/var/log/bli-bootstrap.log` for the failing phase
2. Re-run just that phase with `--phase NN`
3. If the failure is upstream (apt repo down, a GitHub release moved), the phase script is the source of truth. Open it, fix it, re-run.
4. For infisical-agent issues, `journalctl -t infisical-agent -n 50` shows the reload script's logger output.
4. For keep-agent issues, `journalctl -u keep-agent-lyrics-api-prod.service -n 50` shows curl exit codes and the `[keep-agent]` log line on rewrite. A `503` body in the curl output means keep is sealed and you need to unseal it via the UI.
3 changes: 1 addition & 2 deletions infra/bootstrap.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,10 @@ set -a; source "$SECRETS"; set +a

REQUIRED_VARS=(
CF_API_TOKEN ACME_EMAIL
INFISICAL_CLIENT_ID INFISICAL_CLIENT_SECRET INFISICAL_PROJECT_ID INFISICAL_ENV
B2_KEY_ID B2_APP_KEY B2_BUCKET
BESZEL_HUB_URL BESZEL_AGENT_KEY BESZEL_AGENT_TOKEN BESZEL_AGENT_PORT
LOGDY_UI_PASS LOGDY_VERSION
PUBLIC_IP PRIMARY_DOMAIN STAGING_DOMAIN LOGS_DOMAIN METRICS_DOMAIN PREVIEW_WILDCARD
PUBLIC_IP PRIMARY_DOMAIN STAGING_DOMAIN LOGS_DOMAIN METRICS_DOMAIN KEEP_DOMAIN PREVIEW_WILDCARD
)
missing=()
for v in "${REQUIRED_VARS[@]}"; do
Expand Down
8 changes: 8 additions & 0 deletions infra/files/caddy/Caddyfile
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,14 @@ __LOGS_DOMAIN__ {
reverse_proxy localhost:8888
}

__KEEP_DOMAIN__ {
tls {
dns cloudflare {env.CF_API_TOKEN}
}
import security_headers
reverse_proxy localhost:4339
}

__PREVIEW_WILDCARD__ {
tls {
dns cloudflare {env.CF_API_TOKEN}
Expand Down
23 changes: 0 additions & 23 deletions infra/files/infisical/agent.yaml

This file was deleted.

17 changes: 0 additions & 17 deletions infra/files/infisical/infisical-agent.service

This file was deleted.

29 changes: 0 additions & 29 deletions infra/files/infisical/lyrics-api-reload

This file was deleted.

5 changes: 0 additions & 5 deletions infra/files/infisical/lyrics-api.env.tpl

This file was deleted.

25 changes: 25 additions & 0 deletions infra/files/keep/keep.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
[Unit]
Description=keep secrets manager
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=keep
Group=keep
WorkingDirectory=/var/lib/keep
ExecStart=/usr/local/bin/keep
Environment=KEEP_DB_PATH=/var/lib/keep/keep.db
Environment=KEEP_PUBLIC_URL=https://__KEEP_DOMAIN__
Environment=KEEP_LISTEN=:4339
Restart=on-failure
RestartSec=5
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
ReadWritePaths=/var/lib/keep
MemoryMax=256M

[Install]
WantedBy=multi-user.target
1 change: 1 addition & 0 deletions infra/phases/03-caddy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ sed -e "s|__ACME_EMAIL__|${ACME_EMAIL}|g" \
-e "s|__STAGING_DOMAIN__|${STAGING_DOMAIN}|g" \
-e "s|__LOGS_DOMAIN__|${LOGS_DOMAIN}|g" \
-e "s|__METRICS_DOMAIN__|${METRICS_DOMAIN}|g" \
-e "s|__KEEP_DOMAIN__|${KEEP_DOMAIN}|g" \
-e "s|__PREVIEW_WILDCARD__|${PREVIEW_WILDCARD}|g" \
"$INFRA_DIR/files/caddy/Caddyfile" > /etc/caddy/Caddyfile
chmod 644 /etc/caddy/Caddyfile
Expand Down
45 changes: 0 additions & 45 deletions infra/phases/05-infisical.sh

This file was deleted.

33 changes: 33 additions & 0 deletions infra/phases/05-keep.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash
set -euo pipefail

# keep system user + data dir
if ! id keep >/dev/null 2>&1; then
useradd -r -s /usr/sbin/nologin -d /var/lib/keep keep
fi
install -d -o keep -g keep -m 0750 /var/lib/keep

# systemd unit (substitute KEEP_DOMAIN into KEEP_PUBLIC_URL)
sed -e "s|__KEEP_DOMAIN__|${KEEP_DOMAIN}|g" \
"$INFRA_DIR/files/keep/keep.service" > /etc/systemd/system/keep.service
chmod 644 /etc/systemd/system/keep.service
systemctl daemon-reload

# Defer start until the binary exists. Build from sibling repo and scp /usr/local/bin/keep separately.
if [ -x /usr/local/bin/keep ]; then
systemctl enable --now keep
echo "OK: keep enabled and running"
else
echo "OK: keep unit installed (deferred start: /usr/local/bin/keep not present)"
echo " Build keep, scp to /usr/local/bin/keep on this box, then: systemctl enable --now keep"
fi

# First-run setup, all in the keep UI:
# 1. Browse to https://${KEEP_DOMAIN}/setup. Set master password, scan TOTP, save the 8 recovery codes.
# 2. Create project lyrics-api / env prod, bulk-import secrets via the .env paste UI.
# 3. Mint a token for lyrics-api/prod with OUTPUT=/etc/lyrics-api.env, RELOAD_CMD="systemctl restart lyrics-api",
# and REQUIRED_KEYS = every key currently in /etc/lyrics-api.env.
# 4. Paste the bootstrap install command keep generates in a root shell on this box.
#
# Reminder: keep starts sealed on every host boot. Log into the UI once after a reboot to unseal it,
# otherwise the keep-agent gets 503s and /etc/lyrics-api.env stops refreshing.
7 changes: 1 addition & 6 deletions infra/secrets.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,6 @@
CF_API_TOKEN=
ACME_EMAIL=

# === Infisical (machine identity for agent that syncs /etc/lyrics-api.env) ===
INFISICAL_CLIENT_ID=
INFISICAL_CLIENT_SECRET=
INFISICAL_PROJECT_ID=
INFISICAL_ENV=prod

# === Backblaze B2 (rclone remote for off-site backups) ===
B2_KEY_ID=
B2_APP_KEY=
Expand All @@ -32,6 +26,7 @@ PRIMARY_DOMAIN=
STAGING_DOMAIN=
LOGS_DOMAIN=
METRICS_DOMAIN=
KEEP_DOMAIN=
PREVIEW_WILDCARD=

# === Pinned versions (bump deliberately, not silently) ===
Expand Down
Loading