This project uses a GitHub Actions CI/CD pipeline that automatically deploys to a remote server over SSH after the CI workflow passes.
Push to main/develop
↓
CI workflow (lint, Pyright, tests)
↓ on success
Deploy workflow
↓
SSH into server → pull latest code → restart containers
developbranch deploys to the staging environment.mainbranch deploys to the production environment.
The deploy workflow uses GitHub Environments so that each branch uses the right server. Required secrets are environment-scoped (SSH_HOST, SSH_USER, SSH_PRIVATE_KEY) and optional SSH_PORT (defaults to 22) and SSH_KEY_PASSPHRASE — set per environment (production / staging), not as PROD** / DEV** repository secrets.
Go to Settings → Environments and create two environments:
production— used when the deploy is triggered from themainbranch.staging— used when the deploy is triggered from thedevelopbranch.
For production, it is recommended to enable Required reviewers to add a manual approval gate before each production deploy.
In each environment (production and staging), add the following Environment secrets (same names in both; different values per server):
| Secret | Description |
|---|---|
SSH_HOST |
IP address or hostname of the server |
SSH_USER |
SSH username for the deploy account on the server |
SSH_PRIVATE_KEY |
SSH private key (full content, including header/footer) |
SSH_PORT |
SSH port (optional, defaults to 22) |
SSH_KEY_PASSPHRASE |
Passphrase for the SSH private key (optional; only if the key is passphrase-protected) |
GitHub injects the correct set based on the branch: main → production environment secrets, develop → staging environment secrets.
These can stay as Repository secrets (Settings → Secrets and variables → Actions) if you use them:
| Secret | Description |
|---|---|
DEPLOY_SCRIPT_TOKEN |
Token to authenticate downloading a custom deploy.sh. Required only if DEPLOY_SCRIPT_URL is set and needs authentication. |
DEPLOY_SCRIPT_URL |
Override the deploy script URL. Defaults to deploy.sh at the current commit SHA. |
Install these once on each server before the first deploy:
sudo apt update && sudo apt install -y git makeDocker and Docker Compose are also required. Refer to the official Docker docs for installation.
The account that runs git pull on the server (same as SSH_USER in GitHub Actions) needs a key on the server that GitHub accepts for git@github.com:YOUR_GITHUB_ORG/boost-data-collector.git. This is separate from the SSH_PRIVATE_KEY GitHub stores to log into the server.
- Install the private key under a dedicated name (example:
~/.ssh/id_ed25519_github) and the matching.pubnext to it. You can copy it from your workstation withscp(path to the key on your machine →user@server:~/.ssh/...). ~/.ssh/config:
Host github.com
HostName github.com
User git
IdentityFile ~/.ssh/id_ed25519_github
IdentitiesOnly yes- Restrict permissions:
chmod 700 ~/.ssh,chmod 600 ~/.ssh/id_ed25519_github ~/.ssh/config,chmod 644 ~/.ssh/id_ed25519_github.pub. - Verify:
ssh -T git@github.com(expect a GitHub success / username message).
The deploy script does not manage secrets. It also requires an empty or non-existent deploy directory for the first clone: git clone in deploy.sh fails if the target directory already exists and is not a git repo. So create .env after the first clone, not before.
Step 1 — Trigger the first deploy
Push to main or develop (or re-run the Deploy workflow). The script will clone the repo into /opt/boost-data-collector and then exit with an error because .env is missing.
Step 2 — Add .env on the server
SSH into the server as the same user GitHub Actions uses (the account named in SSH_USER) and create the file inside the cloned directory:
cd /opt/boost-data-collector
cp .env.example .env
# edit .env
cp config/boost_collector_schedule.yaml.example config/boost_collector_schedule.yaml
# edit config/boost_collector_schedule.yamlUse .env.example and the YAML example as references for required variables and schedule entries.
If you create or edit .env with sudo (e.g. sudo nano), the file is often owned by root with mode 600. Docker Compose reads .env as the user running make build / make up (your deploy user), which causes permission denied. Fix ownership after saving:
sudo chown YOUR_DEPLOY_USER:YOUR_DEPLOY_USER /opt/boost-data-collector/.env
sudo chmod 600 /opt/boost-data-collector/.env(Replace YOUR_DEPLOY_USER with the same Unix account as SSH_USER.)
Step 3 — Run deploy again
Re-run the Deploy workflow (or push again). The script will see the existing repo and .env, and complete successfully.
On your local machine, generate a dedicated deploy key:
ssh-keygen -t ed25519 -C "deploy" -f ~/.ssh/deploy_key -N ""Copy the public key to the server:
ssh-copy-id -i ~/.ssh/deploy_key.pub user@your-serverAdd the private key content (~/.ssh/deploy_key) as the SSH_PRIVATE_KEY secret in the production or staging environment, depending on which server you use.
This matches a common production/staging layout for this repo:
- On the host: PostgreSQL (package install), nginx (TLS + reverse proxy).
- In Docker Compose:
web(Gunicorn),celery_worker,celery_beat,redis,selenium. The bundleddbservice is commented out indocker-compose.yml; the app usesDATABASE_URLto reach PostgreSQL on the host.
Compose already sets extra_hosts: host.docker.internal:host-gateway on app containers so DATABASE_URL can use host host.docker.internal (see .env.example). DATABASE_URL is required in .env for docker compose (there is no default to a bundled db service while that service stays commented out).
Use a private bucket name you control (placeholder: your-backup-bucket).
- Upload artifacts manually at first (e.g. workspace zip + DB dump).
- Prefer
gcloud storageover legacygsutilfor copies; it is typically much faster on large objects.
Example: copy from bucket to the VM, then unpack (paths are illustrative):
gcloud storage cp "gs://your-backup-bucket/workspace-2026-03-24.zip" .
gcloud storage cp "gs://your-backup-bucket/app-database-2026-03-25.dump" .
unzip workspace-2026-03-24.zip -d /path/to/workspace-parentSync workspace back to the bucket (first full sync can take a long time; later syncs are incremental):
gcloud storage rsync /opt/boost-data-collector/workspace gs://your-backup-bucket/workspace --recursiveThe deploy user (the same Unix account as SSH_USER) should own the app tree under /opt/boost-data-collector so git, make, and Docker Compose can run without sudo.
If you are not relying on the first CI deploy to create the directory, prepare the tree manually (after Server SSH key for GitHub is working), for example:
sudo mkdir -p /opt/boost-data-collector
sudo chown -R "$USER:$USER" /opt/boost-data-collector
cd /opt/boost-data-collector
git clone -b develop git@github.com:YOUR_GITHUB_ORG/boost-data-collector.git .(develop matches the staging branch described above; use main for a production-only checkout if you prefer.)
If the tree was created as root, normalize ownership:
sudo chown -R YOUR_DEPLOY_USER:YOUR_DEPLOY_USER /opt/boost-data-collectorDocker bind mounts for staticfiles (and optionally a host workspace directory if you use one) must be readable/writable by the UID used inside the image for the app process (commonly 1000). Example:
mkdir -p /opt/boost-data-collector/staticfiles
sudo chown -R 1000:1000 /opt/boost-data-collector/staticfiles
# If using a host workspace directory mounted into the container:
sudo chown -R 1000:1000 /opt/boost-data-collector/workspace
sudo chmod -R u+rwX /opt/boost-data-collector/workspaceInstall PostgreSQL on the server (version should be compatible with your dumps and Django; this project is tested with recent PostgreSQL releases).
Create the application role and database once (use a strong password; the example below uses a placeholder):
sudo -u postgres psqlIn the psql session (as superuser, e.g. postgres):
CREATE ROLE bdc WITH LOGIN PASSWORD 'REPLACE_WITH_STRONG_PASSWORD';
-- equivalent: CREATE USER bdc WITH PASSWORD 'REPLACE_WITH_STRONG_PASSWORD';
DROP DATABASE IF EXISTS boost_dashboard;
CREATE DATABASE boost_dashboard OWNER bdc;
GRANT ALL PRIVILEGES ON DATABASE boost_dashboard TO bdc;
GRANT CREATE, USAGE ON SCHEMA public TO bdc;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO bdc;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO bdc;
CREATE ROLE app_readonly NOLOGIN;The app role does not need CREATEDB; the database is created explicitly above. If you need a separate migration-only user with broader rights, create that role separately.
If your dump replays GRANT … TO app_readonly, the app_readonly role must exist before restore (as above).
With host.docker.internal in DATABASE_URL (see docker-compose.yml extra_hosts), Postgres still sees the client as the container’s bridge IP (e.g. 172.20.0.4), not 127.0.0.1, so host … 127.0.0.1/32 rules do not match that traffic. pg_hba.conf lines match database, user, and client address; without a matching line you may see:
FATAL: no pg_hba.conf entry for host "172.20.0.4", user "bdc", database "boost_dashboard", ...
Add something like (adjust user/database names and auth to match your install; scram-sha-256 is typical):
host boost_dashboard bdc 172.16.0.0/12 scram-sha-256
172.16.0.0/12 covers 172.16.0.0–172.31.255.255, which includes common Docker bridge subnets. To narrow to one subnet (e.g. only 172.20.*), use 172.20.0.0/16 instead.
Reload Postgres after editing:
sudo systemctl reload postgresqlIf the client negotiates SSL but the server does not use TLS for this path, add ?sslmode=disable or prefer to DATABASE_URL (see .env.example).
Backups produced with pg_dump -Fc are not plain SQL. Restore with pg_restore, not psql -f. The file may still be named *.sql even though it is custom format—use pg_restore, not the extension, to decide the tool.
Where to put the dump file
- Whoever runs
pg_restoremust be able to read the dump: thepostgresOS user when usingsudo -u postgres …, or the account you use when restoring asbdcover127.0.0.1below. Do not make backups world-readable (e.g.644on a shared host): other local users could read sensitive data. Use least privilege instead:chmod 600(or tighter) andchownthe file to match the restoring account (postgresvs your login). Prefer a private directory (your$HOME, a root-only path such as/root/…, or/var/lib/postgresql/…when onlypostgresshould read it) over/tmp/unless you keep600and tight ownership there too. - To avoid
postgresneeding filesystem access to the path, you can stream from an account that already owns a600dump:cat /path/to/dump | sudo -u postgres pg_restore -h /var/run/postgresql -p 5432 -U postgres --verbose --clean --if-exists -d boost_dashboard -. Use the same idea withoutsudowhen you runpg_restoreasbdc.
Prefer Unix socket for superuser restore
sudo -u postgres pg_restore -h localhost … often fails because TCP to localhost uses pg_hba.conf password rules, not peer auth. Connect via the socket instead (omit -h or use -h /var/run/postgresql):
sudo -u postgres pg_restore -h /var/run/postgresql -p 5432 -U postgres \
--verbose --clean --if-exists \
-d boost_dashboard \
/var/lib/postgresql/boost-data-collector-db-2026-03-25.dumpAlternative: restore as bdc over TCP (127.0.0.1)
If pg_hba.conf allows bdc from 127.0.0.1/32 (password / scram-sha-256), you can run pg_restore as a normal user instead of sudo -u postgres:
pg_restore -h 127.0.0.1 -U bdc -d boost_dashboard --verbose ~/bdc-20260514.dumpAdjust the dump path as needed; set ownership and chmod 600 on the file to match whoever runs pg_restore. You are prompted for bdc's password unless you use PGPASSWORD or ~/.pgpass. Add --clean --if-exists if you want the restore to drop existing objects first (match your dump and risk tolerance).
After restore — privileges for the app user: If you restored as postgres (or another superuser), public tables may remain owned by postgres; DB owner bdc still has no table rights until you GRANT (empty role_table_grants for bdc is expected). Grants to other roles in the dump (e.g. app_readonly) do not apply to bdc. If you restored as bdc and objects are already owned by bdc, you may not need the grants—verify if the app reports permission errors.
- As superuser:
\c boost_dashboard, then run (adjust role name if notbdc):
GRANT USAGE ON SCHEMA public TO bdc;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO bdc;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO bdc;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO bdc;- Check:
pg_tables/information_schemaare per database—always\c boost_dashboardbefore querying. After grants,SELECT COUNT(*) FROM information_schema.role_table_grants WHERE grantee = 'bdc' AND table_schema = 'public'should be large (on the order of hundreds for a full app schema).
Repeat step 1 after any future restore that leaves table ownership on postgres.
Set DATABASE_URL (and any DB_* overrides) so containers reach the host database, for example:
DATABASE_URL=postgres://bdc:REPLACE_WITH_STRONG_PASSWORD@host.docker.internal:5432/boost_dashboardAlso set production-safe values for:
ALLOWED_HOSTS— your public hostname(s).CSRF_TRUSTED_ORIGINS—https://your.hostnameentries for HTTPS sites.USE_X_FORWARDED_HOST=TrueandUSE_TLS_PROXY_HEADERS=Truewhen Django sits behind nginx terminating TLS (see comments inconfig/settings.py).STATIC_URL/FORCE_SCRIPT_NAME— only if you serve the app under a URL prefix; see the optional nginx subsection below.
From /opt/boost-data-collector as the deploy user:
make build # first build is slower; later builds are incremental
make up
make health # optional smoke checks (see Makefile)CI uses the same targets in a fixed order (make down first); see Deploy Script Behavior.
docker-compose.yml publishes Gunicorn on 127.0.0.1:8000 only. Terminate TLS on the host with nginx and proxy to that port.
The following example assumes the app is served at the domain root (https://app.example.com/) and static files at /static/.
upstream boost_collector_app {
server 127.0.0.1:8000;
keepalive 32;
}
server {
listen 443 ssl http2;
server_name app.example.com;
ssl_certificate /etc/letsencrypt/live/app.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/app.example.com/privkey.pem;
client_max_body_size 100M;
location / {
proxy_pass http://boost_collector_app;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_redirect off;
proxy_read_timeout 300s;
}
location /static/ {
alias /opt/boost-data-collector/staticfiles/;
}
}If the app must live under a prefix (e.g. https://app.example.com/boost-data-collector/), set in .env:
FORCE_SCRIPT_NAME=/boost-data-collector(no trailing slash)STATIC_URL=/boost-data-collector/static/(must end with/)
Example nginx (adjust the prefix string to match your FORCE_SCRIPT_NAME):
upstream boost_collector_app {
server 127.0.0.1:8000;
keepalive 32;
}
server {
listen 443 ssl http2;
server_name app.example.com;
ssl_certificate /etc/letsencrypt/live/app.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/app.example.com/privkey.pem;
client_max_body_size 100M;
location /boost-data-collector/static/ {
alias /opt/boost-data-collector/staticfiles/;
}
location /boost-data-collector/ {
proxy_pass http://boost_collector_app/;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_redirect off;
proxy_read_timeout 300s;
}
}Reload nginx after testing config (sudo nginx -t && sudo systemctl reload nginx).
Selenium (port 4444) is bound to 127.0.0.1:4444 in Compose so it is not exposed on the public interface. Access from your laptop via SSH port forwarding if you need the hub from outside the VM:
ssh -L 4444:127.0.0.1:4444 YOUR_DEPLOY_USER@YOUR_SERVER_HOSTThe deploy script (.github/workflows/deploy-script/deploy.sh) runs on the remote server and does the following:
- Validates
REPO_URLandBRANCHare set. - Checks that
gitandmakeare installed. - If the repo already exists — fetches and hard-resets to
origin/<branch>. - If the repo does not exist — clones it fresh.
- Checks for
.envin the deploy directory ($DEPLOY_DIR/.env). If it is missing, the script exits with an error. Create.envafter the first clone using the two-step process. - Stops existing containers (
make down). - Builds and starts the stack (
make build && make up). - Waits for
make healthto succeed (seeMakefile).
By default the repo is cloned into /opt/boost-data-collector. To use a different path, set DEPLOY_DIR as an environment variable on the server or pass it via the envs: parameter in deploy.yml.
When secrets or config values change, SSH into the server and edit the file directly:
nano /opt/boost-data-collector/.envIf you saved with sudo, fix ownership and mode for the deploy user as in Step 2 — Add .env on the server.
Then restart the containers to pick up the new values:
cd /opt/boost-data-collector && make down && make upmake health runs a small set of checks from the Makefile (Django check --database default inside web, redis-cli PING, Selenium /status from inside the selenium container, and docker compose ps for Celery services). It does not fully prove Celery broker connectivity or every cross-service path. Treat it as a quick smoke test after deploy.
For more detail see Docker.md and the health target in the Makefile.