Real-time deployment monitoring with AI-powered risk analysis
A streaming system that monitors GitHub Actions deployments, detects post-deployment risks using live metrics and logs, and generates AI-powered explanations using RAG (Retrieval-Augmented Generation) with Google Gemini.
- Real-time GitHub Webhook Ingestion - Receives deployment events from GitHub Actions
- Pathway Streaming Pipeline - Processes events in a live, incremental dataflow
- Live Metrics Integration - Queries Prometheus for error rates and latency
- Log Analysis - Fetches recent logs from Loki for context
- Risk Score Computation - Calculates deployment risk in real-time
- RAG + LLM Explanations - Uses ChromaDB + Gemini to generate contextual explanations
- Intelligent Alerts - Sends enriched Slack notifications with AI insights
- Full Observability - Prometheus, Grafana, Loki dashboard
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β GitHub Actions ββββββΆβ FastAPI Webhook ββββββΆβ JSONL File β
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββ¬ββββββββββ
β
βββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β PATHWAY STREAMING PIPELINE β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β UDF 1: Parse Event + Fetch Prometheus Metrics + Loki Logs β β
β ββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β UDF 2: Compute Risk Score β β
β ββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β UDF 3: RAG + Gemini LLM Explanation (high-risk only) β β
β β - ChromaDB retrieves relevant docs β β
β β - Gemini generates contextual explanation β β
β ββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β OUTPUT: Alert with AI Explanation β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Slack β β PostgreSQL β β Dashboard β
β Alert β β (Incidents) β β (React) β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
- Docker & Docker Compose
- Gemini API Key (free from aistudio.google.com)
git clone https://github.com/yourname/post-deploy-sentry.git
cd post-deploy-sentry
# Copy and edit environment
cp .env.example .env
# Add your GOOGLE_API_KEY to .envdocker-compose up -d# Watch the Pathway stream
docker logs -f post-deploy-sentry-pathway# Option A: Send webhook to API
curl -X POST http://localhost:8000/webhook/github \
-H "Content-Type: application/json" \
-H "X-GitHub-Event: workflow_run" \
-d @data/sample_webhook.json
# Option B: Append directly to JSONL (Pathway watches this)
cat data/sample_webhook.json >> data/deploy_events.jsonl======================================================================
DEPLOYMENT RISK ALERT WITH AI EXPLANATION
======================================================================
π΄ *HIGH RISK DEPLOYMENT DETECTED*
*Repository:* `myorg/production-api`
*Commit:* `abc123d`
*Status:* failure
*Risk Score:* *72.5* / 100
*Error Rate:* 8.50%
*Latency:* 450ms
---
*AI Analysis:*
**Likely Issue:** The deployment failed with elevated error rates, suggesting
the new code may have introduced breaking changes.
**Investigate First:**
- Check GitHub Actions logs for the failure reason
- Review error logs in Loki for specific exceptions
- Compare metrics before/after deployment
**Recommended Action:** Roll back to the previous version immediately while
investigating the root cause.
======================================================================
post-deploy-sentry/
βββ backend/ # FastAPI application
β βββ app/
β βββ main.py # Webhook endpoint
β βββ logs.py # Loki integration
β βββ projects.py # External project monitoring
β βββ insights.py # Real-time insights
β
βββ worker/ # Background workers
β βββ pathway_stream.py # π MAIN: Pathway pipeline with RAG
β βββ rag/ # RAG module
β β βββ vector_store.py # ChromaDB integration
β β βββ document_loader.py # Sample docs loader
β β βββ llm_explainer.py # Gemini LLM integration
β βββ worker.py # Redis-based worker (legacy)
β βββ risk_score.py # Risk computation
β
βββ RagMod/frontend/ # React dashboard
β βββ src/
β βββ pages/
β β βββ Dashboard.jsx
β β βββ LiveLogs.jsx
β β βββ ProjectSetup.jsx
β β βββ ProjectInsights.jsx
β
βββ sample_app/ # Demo app for generating metrics
βββ config/ # Prometheus, Loki, Promtail configs
βββ data/ # JSONL events (Pathway watches this)
βββ docker-compose.yml # Full stack orchestration
| Endpoint | Method | Description |
|---|---|---|
/webhook/github |
POST | Receive GitHub workflow events |
/health |
GET | Service health check |
/logs/live |
GET | Live logs from Loki |
/projects |
GET/POST | Manage monitored projects |
/projects/{id}/insights |
GET | Real-time project insights |
/ingest/{token} |
POST | External log ingestion webhook |
# 1. Start the system
docker-compose up -d
# 2. Wait for services (30 seconds)
sleep 30
# 3. Generate some traffic to create metrics
curl http://localhost:8001/slow
curl http://localhost:8001/error
curl http://localhost:8001/error
# 4. Trigger a deployment event
curl -X POST http://localhost:8000/webhook/github \
-H "Content-Type: application/json" \
-H "X-GitHub-Event: workflow_run" \
-d @data/sample_webhook.json
# 5. Watch Pathway process and generate AI explanation
docker logs -f post-deploy-sentry-pathway
# 6. Open dashboard
# Visit http://localhost:5173| Service | URL |
|---|---|
| FastAPI Backend | http://localhost:8000 |
| React Dashboard | http://localhost:5173 |
| Sample App | http://localhost:8001 |
| Prometheus | http://localhost:9090 |
| Grafana | http://localhost:3000 |
| Loki (query) | http://localhost:3100 |