Skip to content

Kshitij-KS/Project-Oversight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

41 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Post-Deployment Sentry πŸ›‘οΈ

Real-time deployment monitoring with AI-powered risk analysis

A streaming system that monitors GitHub Actions deployments, detects post-deployment risks using live metrics and logs, and generates AI-powered explanations using RAG (Retrieval-Augmented Generation) with Google Gemini.

Key Features

  • Real-time GitHub Webhook Ingestion - Receives deployment events from GitHub Actions
  • Pathway Streaming Pipeline - Processes events in a live, incremental dataflow
  • Live Metrics Integration - Queries Prometheus for error rates and latency
  • Log Analysis - Fetches recent logs from Loki for context
  • Risk Score Computation - Calculates deployment risk in real-time
  • RAG + LLM Explanations - Uses ChromaDB + Gemini to generate contextual explanations
  • Intelligent Alerts - Sends enriched Slack notifications with AI insights
  • Full Observability - Prometheus, Grafana, Loki dashboard

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  GitHub Actions  │────▢│  FastAPI Webhook │────▢│   JSONL File     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                           β”‚
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚                    PATHWAY STREAMING PIPELINE                      β”‚
                         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                         β”‚  β”‚  UDF 1: Parse Event + Fetch Prometheus Metrics + Loki Logs   β”‚ β”‚
                         β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                         β”‚                                 β–Ό                                  β”‚
                         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                         β”‚  β”‚  UDF 2: Compute Risk Score                                   β”‚ β”‚
                         β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                         β”‚                                 β–Ό                                  β”‚
                         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                         β”‚  β”‚  UDF 3: RAG + Gemini LLM Explanation (high-risk only)        β”‚ β”‚
                         β”‚  β”‚         - ChromaDB retrieves relevant docs                   β”‚ β”‚
                         β”‚  β”‚         - Gemini generates contextual explanation            β”‚ β”‚
                         β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                         β”‚                                 β–Ό                                  β”‚
                         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                         β”‚  β”‚  OUTPUT: Alert with AI Explanation                          β”‚ β”‚
                         β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                           β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό                                      β–Ό                                  β–Ό
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚    Slack     β”‚                       β”‚  PostgreSQL  β”‚                   β”‚  Dashboard   β”‚
           β”‚    Alert     β”‚                       β”‚  (Incidents) β”‚                   β”‚   (React)    β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Start

Prerequisites

1. Clone and Configure

git clone https://github.com/yourname/post-deploy-sentry.git
cd post-deploy-sentry

# Copy and edit environment
cp .env.example .env
# Add your GOOGLE_API_KEY to .env

2. Start All Services

docker-compose up -d

3. View Logs

# Watch the Pathway stream
docker logs -f post-deploy-sentry-pathway

4. Trigger a Test Event

# Option A: Send webhook to API
curl -X POST http://localhost:8000/webhook/github \
  -H "Content-Type: application/json" \
  -H "X-GitHub-Event: workflow_run" \
  -d @data/sample_webhook.json

# Option B: Append directly to JSONL (Pathway watches this)
cat data/sample_webhook.json >> data/deploy_events.jsonl

5. Watch the Magic!

======================================================================
DEPLOYMENT RISK ALERT WITH AI EXPLANATION
======================================================================
πŸ”΄ *HIGH RISK DEPLOYMENT DETECTED*

*Repository:* `myorg/production-api`
*Commit:* `abc123d`
*Status:* failure
*Risk Score:* *72.5* / 100
*Error Rate:* 8.50%
*Latency:* 450ms

---
*AI Analysis:*
**Likely Issue:** The deployment failed with elevated error rates, suggesting
the new code may have introduced breaking changes.

**Investigate First:**
- Check GitHub Actions logs for the failure reason
- Review error logs in Loki for specific exceptions
- Compare metrics before/after deployment

**Recommended Action:** Roll back to the previous version immediately while
investigating the root cause.
======================================================================

Project Structure

post-deploy-sentry/
β”œβ”€β”€ backend/                  # FastAPI application
β”‚   └── app/
β”‚       β”œβ”€β”€ main.py          # Webhook endpoint
β”‚       β”œβ”€β”€ logs.py          # Loki integration
β”‚       β”œβ”€β”€ projects.py      # External project monitoring
β”‚       └── insights.py      # Real-time insights
β”‚
β”œβ”€β”€ worker/                   # Background workers
β”‚   β”œβ”€β”€ pathway_stream.py    # 🌟 MAIN: Pathway pipeline with RAG
β”‚   β”œβ”€β”€ rag/                 # RAG module
β”‚   β”‚   β”œβ”€β”€ vector_store.py  # ChromaDB integration
β”‚   β”‚   β”œβ”€β”€ document_loader.py # Sample docs loader
β”‚   β”‚   └── llm_explainer.py # Gemini LLM integration
β”‚   β”œβ”€β”€ worker.py            # Redis-based worker (legacy)
β”‚   └── risk_score.py        # Risk computation
β”‚
β”œβ”€β”€ RagMod/frontend/         # React dashboard
β”‚   └── src/
β”‚       β”œβ”€β”€ pages/
β”‚       β”‚   β”œβ”€β”€ Dashboard.jsx
β”‚       β”‚   β”œβ”€β”€ LiveLogs.jsx
β”‚       β”‚   β”œβ”€β”€ ProjectSetup.jsx
β”‚       β”‚   └── ProjectInsights.jsx
β”‚
β”œβ”€β”€ sample_app/              # Demo app for generating metrics
β”œβ”€β”€ config/                  # Prometheus, Loki, Promtail configs
β”œβ”€β”€ data/                    # JSONL events (Pathway watches this)
└── docker-compose.yml       # Full stack orchestration

API Endpoints

Endpoint Method Description
/webhook/github POST Receive GitHub workflow events
/health GET Service health check
/logs/live GET Live logs from Loki
/projects GET/POST Manage monitored projects
/projects/{id}/insights GET Real-time project insights
/ingest/{token} POST External log ingestion webhook

Demo Script

# 1. Start the system
docker-compose up -d

# 2. Wait for services (30 seconds)
sleep 30

# 3. Generate some traffic to create metrics
curl http://localhost:8001/slow
curl http://localhost:8001/error
curl http://localhost:8001/error

# 4. Trigger a deployment event
curl -X POST http://localhost:8000/webhook/github \
  -H "Content-Type: application/json" \
  -H "X-GitHub-Event: workflow_run" \
  -d @data/sample_webhook.json

# 5. Watch Pathway process and generate AI explanation
docker logs -f post-deploy-sentry-pathway

# 6. Open dashboard
# Visit http://localhost:5173

πŸ“Š Observability URLs

Service URL
FastAPI Backend http://localhost:8000
React Dashboard http://localhost:5173
Sample App http://localhost:8001
Prometheus http://localhost:9090
Grafana http://localhost:3000
Loki (query) http://localhost:3100

About

Real-time deployment monitoring with AI-powered risk analysis A streaming system that monitors GitHub Actions deployments, detects post-deployment risks using live metrics and logs, and generates AI-powered explanations using RAG (Retrieval-Augmented Generation) with Google Gemini.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors