This comprehensive guide covers monitoring, observability, and Site Reliability Engineering (SRE) practices for the Fast Scriptures application.
Fast Scriptures requires robust monitoring to ensure optimal performance for scripture readers worldwide. This guide covers everything from basic health checks to advanced SRE practices.
Location: backend/app/main.py
Endpoint: /health
Features:
- Database connection warm-up
- Volume count verification
- Warm-up status reporting
- Detailed health information
Response Format:
{
"status": "healthy",
"warmed_up": true,
"database": "connected",
"volumes_count": 5,
"timestamp": "2025-01-05T00:00:00Z"
}Location: .github/workflows/synthetic-monitoring.yml
Schedule: Every 15 minutes
Features:
- 5-minute warm-up period for cold starts
- Core endpoint testing
- Performance metrics collection
- Response validation
- Manual trigger support
GitHub Actions Limitations:
- Maximum job time: 6 hours (free tier)
- Cron frequency: Minimum 5 minutes
- Resource usage: 2,000 minutes/month (free tier)
- Our setup: 15-minute intervals = 96 runs/day = 2,880 minutes/month
Location: scripts/monitor.py
Features:
- Local testing capabilities
- Configurable warm-up time
- Performance metrics
- JSON output support
- Command-line interface
-
Start the backend server:
cd backend uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 -
Run the monitoring script:
# Basic monitoring with 5-minute warm-up python scripts/monitor.py # Test without warm-up python scripts/monitor.py --warm-up false # Custom warm-up time (2 minutes) python scripts/monitor.py --wait 2 # Test against production URL python scripts/monitor.py --url https://scriptures-fast-api.onrender.com # Save results to file python scripts/monitor.py --output results.json
-
Update the API URL in
.github/workflows/synthetic-monitoring.yml:echo "API_URL=https://scriptures-fast-api.onrender.com" >> $GITHUB_ENV -
Enable GitHub Actions in your repository settings
-
Monitor the workflow:
- Go to Actions tab in GitHub
- Check "Synthetic Monitoring" workflow
- Runs every 15 minutes automatically
- Problem: Render free tier scales down after inactivity
- Impact: First request can take 30+ seconds
- Solution: Proactive warm-up strategy
- Initial Request: Triggers cold start
- 5-minute Wait: Allows full warm-up
- Testing: All endpoints tested
- Validation: Response validation
- Metrics: Performance data collection
# In .github/workflows/synthetic-monitoring.yml
sleep 300 # 5 minutes
# Change to: sleep 180 # 3 minutesโ Health Check Endpoint
- Location:
/healthendpoint in FastAPI - Returns database status and system health
- Used by GitHub Actions for monitoring
โ GitHub Actions Synthetic Monitoring
- Automated testing every 15 minutes
- Tests critical API endpoints
- Validates response times and content
- Handles cold start scenarios
โ Manual Monitoring Script
scripts/monitor.pyfor local testing- Configurable warm-up periods
- JSON output for analysis
- Production endpoint testing
- API Response Times - For health, search, and random endpoints
- HTTP Status Codes - Success/failure rates
- Cold Start Detection - Response times > 30 seconds
- Endpoint Availability - Core functionality validation
- Response Time Metrics - Detailed timing for each endpoint
- Database Connectivity - Health check validation
- Performance Trends - JSON output for analysis
- Production vs Local - Comparative testing
Application Performance Monitoring (APM)
- Consider New Relic, DataDog, or Sentry for detailed performance insights
- Add custom metrics for scripture-specific events (searches, random clicks)
- Implement user analytics for popular books/chapters
Advanced Alerting
- Email/Slack notifications for service outages
- Performance threshold alerts
- Database connectivity monitoring
Business Intelligence
- Usage pattern analysis
- Popular scripture tracking
- Search term analytics
- Health Check: < 2 seconds (accounting for cold starts)
- Scripture Search: < 5 seconds
- Random Scripture: < 3 seconds
- Cold Start Recovery: < 30 seconds
- API Availability: > 99% (monitored via GitHub Actions)
- Critical Endpoints: Health, search, random scripture
- Acceptable Downtime: Render free tier limitations acknowledged
When synthetic monitoring fails, GitHub provides:
- Email notifications to repository maintainers
- GitHub UI indicators on the Actions tab
- Workflow failure badges in repository
Currently relies on:
- Manual script execution for detailed diagnostics
- GitHub Actions history for trend analysis
- Render service logs for infrastructure issues
# Example: Add Slack/Discord notifications (not implemented)
- name: Notify on failure
if: failure()
run: |
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"Scripture App monitoring failed!"}' \
$SLACK_WEBHOOK_URL- Daily active users
- Total scripture reads
- System uptime
- User satisfaction metrics
- Revenue impact (if applicable)
- API performance metrics
- Database health
- Frontend performance
- Error rates and trends
- Infrastructure costs
- Most read scriptures
- Search patterns
- Feature usage
- User journey flows
- Geographic usage patterns
-- Average response time by endpoint
SELECT endpoint, AVG(response_time)
FROM monitoring_metrics
GROUP BY endpoint
-- Cold start detection
SELECT COUNT(*)
FROM monitoring_metrics
WHERE response_time > 30
-- Popular scripture tracking
SELECT book, chapter, COUNT(*) as reads
FROM scripture_access_log
GROUP BY book, chapter
ORDER BY reads DESC# Add to FastAPI for request tracing
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
tracer = trace.get_tracer(__name__)
@app.get("/api/scriptures/search")
async def search_scriptures(q: str):
with tracer.start_as_current_span("search_scriptures") as span:
span.set_attribute("query", q)
# Search logic here
return results# Structured logging
import structlog
logger = structlog.get_logger()
@app.get("/api/scriptures/random")
async def random_scripture():
logger.info("random_scripture_requested",
user_agent=request.headers.get("user-agent"),
ip=request.client.host)@app.get("/metrics")
async def custom_metrics():
return {
"scripture_reads_today": get_daily_reads(),
"popular_books": get_popular_books(),
"search_queries_count": get_search_count(),
"user_sessions_active": get_active_sessions()
}- โ
Health check endpoint (
/health) - โ GitHub Actions synthetic monitoring (15-minute intervals)
- โ
Local monitoring script (
scripts/monitor.py) - โ Cold start detection and warm-up strategy
- โ Basic performance tracking (response times, status codes)
- โ Production endpoint testing
- ๐ฎ Error tracking with Sentry or similar
- ๐ฎ Real-time dashboards with Grafana or New Relic
- ๐ฎ Custom business metrics (user behavior, popular scriptures)
- ๐ฎ Advanced alerting (Slack/Discord notifications)
- ๐ฎ Log aggregation for better debugging
- ๐ฎ Performance budgets and SLO tracking
1. Cold Start Timeouts:
- Increase timeout values in monitoring
- Extend warm-up period
- Consider upgrading to paid hosting tier
- Implement keep-alive strategies
2. GitHub Actions Failures:
- Check API URL is correct
- Verify endpoint availability
- Review workflow logs
- Ensure proper authentication
3. Local Script Issues:
- Install requests:
pip install requests - Check backend is running
- Verify URL accessibility
- Review firewall settings
# Test health endpoint manually
curl -v http://localhost:8000/health
# Check GitHub Actions logs
# Go to Actions tab in GitHub repository
# Test monitoring script with verbose output
python scripts/monitor.py --url http://localhost:8000 --verbose
# Check database connectivity
cd backend && python -c "from app.services.database import get_database_connection; print(get_database_connection())"- API endpoint response times (health, search, random)
- Basic availability (HTTP status codes)
- Cold start detection (response times > 30s)
- Database connectivity (via health check)
- Production vs development comparison
- Detailed error rates by endpoint
- Memory and CPU usage tracking
- User interaction metrics and analytics
- Frontend performance (Core Web Vitals)
- Business metrics (search patterns, popular books)
- Security event tracking
- Database query performance
- Google SRE Book - SRE best practices
- Sentry Documentation - Error tracking
- New Relic APM - Application monitoring
Last Updated: January 2025 Maintained By: DevOps Team Review Schedule: Monthly Next Review: February 2025
Quick Links: Documentation Index | Developer Guide | API Standards | Deployment Guide