Production Deployment Guide

🚀 Deploying Xtractor to Production

Pre-Deployment Checklist

📋 Production Configuration

1. Update .env for Production

FLASK_ENV=production
PORT=80
DATABASE_URL=sqlite:///./data/xtractor.db
SECRET_KEY=your_secure_random_key_here

Generate a secure key:

import secrets
print(secrets.token_hex(32))

2. Update app.py for Production

if __name__ == '__main__':
    app.run(
        host='0.0.0.0',
        port=int(os.getenv('PORT', 5000)),
        debug=False,  # NEVER True in production
        threaded=True
    )

3. Production Dependencies

Consider adding to requirements.txt:

gunicorn==21.2.0          # Production WSGI server
python-dotenv==1.0.0      # Environment variables
prometheus-flask==0.22.0  # Monitoring
Flask-Cors==4.0.0         # CORS support

🔒 Security Hardening

1. Add Security Headers

Update app/routes.py:

@app.after_request
def set_security_headers(response):
    response.headers['X-Content-Type-Options'] = 'nosniff'
    response.headers['X-Frame-Options'] = 'SAMEORIGIN'
    response.headers['X-XSS-Protection'] = '1; mode=block'
    response.headers['Strict-Transport-Security'] = 'max-age=31536000; includeSubDomains'
    return response

2. Input Validation

from werkzeug.utils import secure_filename

def validate_filename(filename):
    """Validate and secure filename"""
    filename = secure_filename(filename)
    if not filename or len(filename) > 100:
        raise ValueError("Invalid filename")
    return filename

3. Rate Limiting

from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

limiter = Limiter(
    app=app,
    key_func=get_remote_address,
    default_limits=["200 per day", "50 per hour"]
)

@bp.route('/api/upload', methods=['POST'])
@limiter.limit("10 per hour")
def upload_pdf():
    # ...

🚀 Deployment Options

Option 1: Gunicorn + Nginx

Install Gunicorn

pip install gunicorn

Create gunicorn_config.py

bind = "127.0.0.1:5000"
workers = 4
worker_class = "sync"
timeout = 120
keepalive = 2

Start with Gunicorn

gunicorn --config gunicorn_config.py app:app

Nginx Configuration

upstream xtractor {
    server 127.0.0.1:5000;
}

server {
    listen 80;
    server_name your_domain.com;

    # Redirect HTTP to HTTPS
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl;
    server_name your_domain.com;

    # SSL configuration
    ssl_certificate /etc/ssl/certs/your_cert.crt;
    ssl_certificate_key /etc/ssl/private/your_key.key;

    # Security headers
    add_header Strict-Transport-Security "max-age=31536000" always;
    add_header X-Content-Type-Options "nosniff" always;

    # Proxy settings
    location / {
        proxy_pass http://xtractor;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # File upload handling
    client_max_body_size 50M;
}

Option 2: Docker Deployment

Create Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Create necessary directories
RUN mkdir -p uploads extracted_data data logs

# Expose port
EXPOSE 5000

# Run application
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

Create docker-compose.yml

version: "3.8"

services:
  xtractor:
    build: .
    ports:
      - "5000:5000"
    volumes:
      - ./uploads:/app/uploads
      - ./extracted_data:/app/extracted_data
      - ./data:/app/data
    environment:
      - FLASK_ENV=production
      - DATABASE_URL=sqlite:///./data/xtractor.db
    restart: unless-stopped

  nginx:
    image: nginx:latest
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - xtractor
    restart: unless-stopped

Build and Run

docker-compose up -d

Option 3: Cloud Platforms

Heroku Deployment

Create Procfile:

web: gunicorn app:app

Deploy:

heroku login
heroku create xtractor-app
git push heroku main

AWS Elastic Beanstalk

pip install awsebcli

eb init -p python-3.11 xtractor-app
eb create xtractor-env
eb deploy

📊 Monitoring and Logging

1. Application Logging

import logging
from logging.handlers import RotatingFileHandler

if not app.debug:
    if not os.path.exists('logs'):
        os.mkdir('logs')

    file_handler = RotatingFileHandler('logs/xtractor.log',
                                       maxBytes=10240000,
                                       backupCount=10)
    file_handler.setFormatter(logging.Formatter(
        '%(asctime)s %(levelname)s: %(message)s [in %(pathname)s:%(lineno)d]'
    ))
    file_handler.setLevel(logging.INFO)
    app.logger.addHandler(file_handler)
    app.logger.setLevel(logging.INFO)

2. Error Tracking (Sentry)

import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration

sentry_sdk.init(
    dsn="your_sentry_dsn",
    integrations=[FlaskIntegration()]
)

3. Performance Monitoring

from flask_talisman import Talisman

Talisman(app)  # Security headers

🗄️ Database Management

Backup Strategy

# Automated daily backup
0 2 * * * sqlite3 /path/to/data/xtractor.db ".backup '/backups/xtractor_$(date +\%Y\%m\%d).db'"

Migration Management

# If using Alembic for migrations
alembic upgrade head

⚡ Performance Optimization

1. Database Indexing

Already implemented in models.py, but verify:

state_name = Column(String(100), index=True)
lga_code = Column(String(20), index=True)

2. Query Optimization

Use eager loading for related objects:

from sqlalchemy.orm import joinedload

states = db.query(State).options(joinedload(State.lgas)).all()

3. Caching

from flask_caching import Cache

cache = Cache(app, config={'CACHE_TYPE': 'simple'})

@bp.route('/api/states')
@cache.cached(timeout=300)
def get_states():
    # ...

📈 Scaling Strategies

1. Database Optimization

Use PostgreSQL instead of SQLite for production
Add read replicas
Implement connection pooling

2. Application Scaling

Run multiple Gunicorn workers
Use load balancer (Nginx, HAProxy)
Implement message queue (Celery)

3. Caching Layer

Add Redis for session storage
Cache API responses
Cache database queries

Example Production Setup

Load Balancer (HAProxy)
    ├── Gunicorn Worker 1
    ├── Gunicorn Worker 2
    ├── Gunicorn Worker 3
    └── Gunicorn Worker 4
         ↓
    PostgreSQL Database
    Redis Cache

🔄 Continuous Deployment

GitHub Actions Example

Create .github/workflows/deploy.yml:

name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2

      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      - name: Run tests
        run: python test_setup.py

      - name: Deploy to server
        run: |
          ssh user@server 'cd /app && git pull && systemctl restart xtractor'

🚨 Disaster Recovery

1. Regular Backups

# Weekly backup script
#!/bin/bash
BACKUP_DIR="/backups/xtractor"
mkdir -p $BACKUP_DIR
sqlite3 /app/data/xtractor.db ".backup '$BACKUP_DIR/xtractor_$(date +%Y%m%d_%H%M%S).db'"
# Keep only last 30 days
find $BACKUP_DIR -name "*.db" -mtime +30 -delete

2. Restore Database

# Restore from backup
sqlite3 /app/data/xtractor.db ".restore '/backups/xtractor/xtractor_20251108_000000.db'"

3. Health Checks

@bp.route('/health')
def health_check():
    try:
        db = get_db_session()
        db.query(State).first()
        db.close()
        return jsonify({"status": "healthy"}), 200
    except Exception as e:
        return jsonify({"status": "unhealthy", "error": str(e)}), 500

✅ Post-Deployment Verification

Application accessible from public domain
SSL certificate valid
File uploads working
Database queries returning correct data
API endpoints responsive
Error logging configured
Backup system active
Monitoring dashboard setup
Performance acceptable (<200ms response time)

📞 Support and Maintenance

Regular Tasks

Weekly: Review logs for errors
Monthly: Database optimization
Quarterly: Security updates
Annually: Full system audit

Maintenance Window

Schedule maintenance at low-traffic times (e.g., Sunday 2-4 AM)

Last Updated: November 2025
Version: 1.0.0 Production Ready

FilesExpand file tree

PRODUCTION_GUIDE.md

Latest commit

History