This guide covers PostgreSQL database setup, migrations, backups, and maintenance.
Ubuntu/Debian:
sudo apt update
sudo apt install postgresql-15 postgresql-contrib-15CentOS/RHEL:
sudo dnf install postgresql15-server postgresql15-contrib
sudo postgresql-15-setup initdb
sudo systemctl enable postgresql-15
sudo systemctl start postgresql-15Docker:
podman run -d \
--name postgres \
-e POSTGRES_PASSWORD=your-password \
-e POSTGRES_DB=imagemanager \
-v postgres-data:/var/lib/postgresql/data \
-p 5432:5432 \
postgres:15Create database and user:
# Connect as postgres user
sudo -u postgres psql
# Create database
CREATE DATABASE imagemanager;
# Create user
CREATE USER imagemanager_user WITH PASSWORD 'secure-password';
# Grant privileges
GRANT ALL PRIVILEGES ON DATABASE imagemanager TO imagemanager_user;
# Exit
\qConfigure postgresql.conf for network access:
# Edit postgresql.conf
sudo nano /etc/postgresql/15/main/postgresql.conf
# Listen on all interfaces (or specific IP)
listen_addresses = '*'
# Adjust connection limits
max_connections = 100Configure pg_hba.conf for authentication:
# Edit pg_hba.conf
sudo nano /etc/postgresql/15/main/pg_hba.conf
# Add entries
# TYPE DATABASE USER ADDRESS METHOD
host imagemanager imagemanager_user 10.0.0.0/8 md5
host imagemanager imagemanager_user 172.16.0.0/12 md5
host imagemanager imagemanager_user 192.168.0.0/16 md5Restart PostgreSQL:
sudo systemctl restart postgresqlCRITICAL: Migrations are NEVER run automatically. This prevents accidental schema changes in production.
Manual migrations prevent:
- Accidental schema changes in production
- Race conditions with multiple application instances
- Unexpected downtime during deployments
- Loss of control over when schema changes occur
Always run migrations manually before starting/updating the application:
cd backend
source .venv/bin/activate
alembic upgrade headDocker:
podman run --rm \
-e DATABASE_URL="postgresql+asyncpg://user:pass@host:5432/db" \
vista:latest \
alembic upgrade headKubernetes:
kubectl run alembic-migrate \
--image=vista:latest \
--restart=Never \
--env="DATABASE_URL=postgresql+asyncpg://user:pass@postgres:5432/db" \
-- alembic upgrade head
# Check logs
kubectl logs alembic-migrate
# Clean up
kubectl delete pod alembic-migrateAfter modifying models in backend/core/models.py:
cd backend
alembic revision --autogenerate -m "describe your changes"ALWAYS review the generated migration file before applying:
# View generated migration
cat alembic/versions/<revision>_*.py
# Review changes carefully:
# - Ensure no data loss
# - Check column types
# - Verify constraints
# - Confirm indexesApply after review:
alembic upgrade head# Show current database version
alembic current
# View migration history
alembic history --verbose
# Upgrade to latest
alembic upgrade head
# Upgrade to specific revision
alembic upgrade <revision>
# Downgrade one migration
alembic downgrade -1
# Downgrade to specific revision
alembic downgrade <revision>
# Mark database as current (use cautiously!)
alembic stamp head
# Generate SQL for migration (don't apply)
alembic upgrade head --sql- Always backup before migrations
- Test in staging first
- Review auto-generated migrations
- Never modify applied migrations
- Coordinate with deployments
- Document complex migrations
- Plan for rollback scenarios
Adding a column:
def upgrade():
op.add_column('projects',
sa.Column('new_field', sa.String(255), nullable=True)
)
def downgrade():
op.drop_column('projects', 'new_field')Adding an index:
def upgrade():
op.create_index('ix_projects_name', 'projects', ['name'])
def downgrade():
op.drop_index('ix_projects_name', table_name='projects')Adding NOT NULL constraint:
def upgrade():
# First add column as nullable
op.add_column('projects', sa.Column('status', sa.String(50), nullable=True))
# Set default value for existing rows
op.execute("UPDATE projects SET status = 'active' WHERE status IS NULL")
# Then make it NOT NULL
op.alter_column('projects', 'status', nullable=False)
def downgrade():
op.drop_column('projects', 'status')Issue: Autogenerate detects no changes
# Ensure models are imported in backend/core/models.py
# Check that Base is imported in alembic/env.pyIssue: Migration fails with data loss warning
# Review the migration carefully
# Add data migration logic if needed
# Test on staging database firstIssue: Database out of sync with migrations
# Check current version
alembic current
# If needed, stamp to correct version
alembic stamp <revision>
# Then continue with normal migrations
alembic upgrade headImplement 3-2-1 backup rule:
- 3 copies of data
- 2 different media types
- 1 off-site copy
Full database backup:
pg_dump -h localhost -p 5432 -U postgres -d imagemanager \
-F custom -f backup-$(date +%Y%m%d-%H%M%S).dumpSQL format backup:
pg_dump -h localhost -p 5432 -U postgres -d imagemanager \
| gzip > backup-$(date +%Y%m%d-%H%M%S).sql.gzSchema only:
pg_dump -h localhost -p 5432 -U postgres -d imagemanager \
--schema-only -f schema-$(date +%Y%m%d).sqlData only:
pg_dump -h localhost -p 5432 -U postgres -d imagemanager \
--data-only -f data-$(date +%Y%m%d).sqlCreate backup script:
#!/bin/bash
# /usr/local/bin/backup-imagemanager.sh
BACKUP_DIR=/backups/imagemanager
DATE=$(date +%Y%m%d-%H%M%S)
RETENTION_DAYS=30
# Create backup directory
mkdir -p $BACKUP_DIR
# Database backup
pg_dump -h localhost -U postgres imagemanager \
-F custom -f $BACKUP_DIR/db-$DATE.dump
# Compress and upload to S3
tar -czf $BACKUP_DIR/backup-$DATE.tar.gz $BACKUP_DIR/db-$DATE.dump
aws s3 cp $BACKUP_DIR/backup-$DATE.tar.gz s3://backups/imagemanager/
# Cleanup old local backups
find $BACKUP_DIR -name "*.dump" -mtime +$RETENTION_DAYS -delete
find $BACKUP_DIR -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete
# Log backup
echo "$(date): Backup completed: backup-$DATE.tar.gz" >> /var/log/backup.logMake executable and schedule:
chmod +x /usr/local/bin/backup-imagemanager.sh
# Add to crontab (daily at 2 AM)
crontab -e
0 2 * * * /usr/local/bin/backup-imagemanager.shFor point-in-time recovery, enable WAL archiving:
# Edit postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/wal_archive/%f'From custom format:
pg_restore -h localhost -p 5432 -U postgres -d imagemanager \
--clean --if-exists backup.dumpFrom SQL:
# Drop and recreate database
dropdb -h localhost -U postgres imagemanager
createdb -h localhost -U postgres imagemanager
# Restore
gunzip -c backup.sql.gz | psql -h localhost -U postgres -d imagemanagerPoint-in-time recovery:
# Stop PostgreSQL
sudo systemctl stop postgresql
# Restore base backup
pg_restore -d imagemanager base-backup.dump
# Create recovery.conf
cat > /var/lib/postgresql/15/main/recovery.conf <<EOF
restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p'
recovery_target_time = '2024-01-15 10:30:00'
EOF
# Start PostgreSQL (will enter recovery mode)
sudo systemctl start postgresqlRegular maintenance prevents bloat and updates statistics:
# Manual vacuum
psql -h localhost -U postgres -d imagemanager -c "VACUUM ANALYZE;"
# Verbose output
psql -h localhost -U postgres -d imagemanager -c "VACUUM VERBOSE ANALYZE;"
# Full vacuum (requires exclusive lock)
psql -h localhost -U postgres -d imagemanager -c "VACUUM FULL;"Autovacuum configuration (postgresql.conf):
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 1minRebuild indexes to remove bloat:
# Reindex single table
psql -h localhost -U postgres -d imagemanager -c "REINDEX TABLE projects;"
# Reindex entire database
psql -h localhost -U postgres -d imagemanager -c "REINDEX DATABASE imagemanager;"-- Database size
SELECT pg_size_pretty(pg_database_size('imagemanager'));
-- Table sizes
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
-- Index sizes
SELECT
indexname,
tablename,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;-- Show slow queries
SELECT
pid,
now() - query_start AS duration,
query
FROM pg_stat_activity
WHERE state = 'active'
AND now() - query_start > interval '5 seconds'
ORDER BY duration DESC;
-- Explain query plan
EXPLAIN ANALYZE SELECT * FROM projects WHERE name = 'test';For production, use PgBouncer to manage connections:
sudo apt install pgbouncer# /etc/pgbouncer/pgbouncer.ini
[databases]
imagemanager = host=localhost port=5432 dbname=imagemanager
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5
reserve_pool_timeout = 3# /etc/pgbouncer/userlist.txt
"imagemanager_user" "md5hash_of_password"
# Generate password hash
echo -n "passwordimagemanager_user" | md5sumUpdate DATABASE_URL:
DATABASE_URL=postgresql+asyncpg://user:pass@localhost:6432/imagemanagerMonitor these database metrics:
- Connection count
- Query performance (slow queries)
- Database size growth
- Cache hit ratio
- Transaction rate
- Replication lag (if using replicas)
-- Active connections
SELECT count(*) FROM pg_stat_activity;
-- Cache hit ratio (should be > 90%)
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM pg_statio_user_tables;
-- Database age (vacuum needed if > 200M)
SELECT datname, age(datfrozenxid) FROM pg_database;
-- Lock conflicts
SELECT * FROM pg_locks WHERE NOT granted;Primary server (postgresql.conf):
wal_level = replica
max_wal_senders = 3
wal_keep_size = 1GBCreate replication user:
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'password';Allow replication (pg_hba.conf):
host replication replicator <replica-ip>/32 md5
Standby server:
# Create standby from primary
pg_basebackup -h primary-host -U replicator -D /var/lib/postgresql/15/main -P -RPromote standby to primary:
pg_ctl promote -D /var/lib/postgresql/15/main# Memory settings (adjust for your server)
shared_buffers = 2GB # 25% of RAM
effective_cache_size = 6GB # 50-75% of RAM
maintenance_work_mem = 512MB
work_mem = 64MB
# Checkpoint settings
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
# Planner settings
random_page_cost = 1.1 # For SSD
effective_io_concurrency = 200 # For SSD
# Logging
log_min_duration_statement = 1000 # Log queries > 1 second
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h 'Create indexes for frequently queried columns:
-- Commonly queried fields
CREATE INDEX idx_projects_meta_group ON projects(meta_group_id);
CREATE INDEX idx_images_project ON data_instances(project_id);
CREATE INDEX idx_images_deleted ON data_instances(deleted_at) WHERE deleted_at IS NULL;
CREATE INDEX idx_classifications_image ON image_classifications(image_id);