Skip to content

jasstt/failure-forensics

Repository files navigation

Failure Forensics

Production AI pipeline monitoring β€” root cause detection, anomaly alerts, regression guard, and Gemini-powered recommendations.

Installation

pip install failure-forensics

Quick Start

from failure_forensics import trace

@trace(step="retrieval", version="v1")
def my_retrieval_function(query):
    # your code here
    pass

Features πŸ”¬

PyPI Downloads License Python Tests Alerts

A self-hosted, zero-cost LLM pipeline observability tool that gives you root cause detection, anomaly alerts, A/B reporting, and a live terminal dashboard β€” without sending your data to any third-party service.


πŸ†š Why Not LangSmith or Braintrust?

Failure Forensics LangSmith Braintrust
Cost Free Paid tiers Paid tiers
Data privacy Stays on your machine Sent to cloud Sent to cloud
Customization Full control Limited Limited
Slack alerts Built-in Premium only Premium only
A/B reporting Built-in Basic Basic
Circuit breaker / trend Built-in ❌ ❌

Failure Forensics is designed for teams who need production-grade observability without vendor lock-in.


✨ What It Does

Every pipeline run passes through a structured logging and analysis layer:

Pipeline Step  β†’  logger.py  β†’  requests.jsonl
                                     ↓
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚                                 β”‚
              forensics.py                       pattern.py
          (root cause detection)          (time series + anomaly)
                    β”‚                                 β”‚
              versioning.py                      baseline.py
           (v1 vs v2 comparison)            (7-day moving average)
                    β”‚                                 β”‚
               ab_report.py                      alerts.py
            (A/B comparison table)          (Slack / console alert)
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     ↓
                              dashboard.py
                         (ASCII terminal dashboard)

πŸ“ Project Structure

failure-forensics/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ logger.py          # Logs every pipeline step to JSONL
β”‚   β”œβ”€β”€ forensics.py       # Root cause detection (5 categories)
β”‚   β”œβ”€β”€ pattern.py         # Time-series failure rate + anomaly detection
β”‚   β”œβ”€β”€ baseline.py        # 7-day moving average + trend (IMPROVING/STABLE/DEGRADING)
β”‚   β”œβ”€β”€ alerts.py          # Slack webhook + console alerts
β”‚   β”œβ”€β”€ versioning.py      # Per-version failure rate stats
β”‚   β”œβ”€β”€ ab_report.py       # A/B comparison report (table + JSON)
β”‚   └── dashboard.py       # ASCII bar chart terminal dashboard
β”œβ”€β”€ data/
β”‚   └── logs/
β”‚       └── requests.jsonl # All pipeline logs (gitignored)
β”œβ”€β”€ tests/
β”‚   └── test_forensics.py  # 8 unit tests
β”œβ”€β”€ config.py              # Thresholds, Slack URL, step limits
β”œβ”€β”€ main.py                # 5-scenario demo runner
β”œβ”€β”€ simulate.py            # Realistic test data generator (100 runs, anomaly day)
└── requirements.txt

πŸš€ Getting Started

1. Clone & Install

git clone https://github.com/jasstt/failure-forensics.git
cd failure-forensics
pip install -r requirements.txt

2. (Optional) Configure Slack Alerts

Edit config.py:

SLACK_WEBHOOK_URL = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

If left empty, all alerts print to the console.

3. Run the Full Demo

python main.py

This runs 5 scenarios:

  1. Simulation β€” generates 100 realistic pipeline runs (2 prompt versions, anomaly day)
  2. Root cause analysis β€” detects the failing step and assigns a category
  3. 7-day pattern report β€” failure rate per day + step breakdown + anomaly check
  4. A/B report β€” prompt_v1 vs prompt_v2 with per-step improvement table
  5. Terminal dashboard β€” live ASCII bar charts, trend, top 5 failed runs

4. Run Unit Tests

python tests/test_forensics.py
python tests/test_advanced.py

πŸš€ Advanced Features (New in v2)

Katman Γ–zellik Teknoloji
1 Otomatik ΓΆneri motoru Kural tabanlΔ±
2 AI destekli hata analizi Gemini 2.5 Pro
3 Eval seti otomatik bΓΌyΓΌtme Frequency analysis
4 Prompt optimizasyon aΓ§Δ±klamasΔ± Gemini 2.5 Pro
5 Regression guard Baseline comparison

Senaryo 6: Regression Guard Yeni bir prompt (v3) deploy edilmeden ΓΆnce otomatik regresyon kontrolΓΌ yapar:

REGRESSION CHECK β€” v3
Baseline (v2): 11.0% failure rate
Yeni (v3):     24.5% failure rate
Delta: +13.5pp β†’ REGRESSION_DETECTED ❌

Test Results

Katman Test SonuΓ§
1 β€” Recommender Kategori β†’ ΓΆneri mapping βœ… PASS
2 β€” LLM Analyzer Gemini fallback βœ… PASS
3 β€” Eval Collector Duplicate prevention βœ… PASS
4 β€” Prompt Optimizer A/B aΓ§Δ±klama (v2: +10pp) βœ… PASS
5 β€” Regression Guard DETECTED + PASS senaryolarΔ± βœ… PASS

Key Results

  • A/B: prompt_v2, v1'e gΓΆre 10pp iyileşme
  • Regression Guard: v3 deploy'u +6pp delta ile WARNING olarak engelledi
  • Eval Collector: 5 yeni eval adayΔ± otomatik toplandΔ±
  • LLM Analyzer: Gemini kapalΔ±yken kural tabanlΔ±na sorunsuz fallback

πŸ“Š Results

Feature Result
Unit Tests 8/8 PASS βœ…
Root cause categories 5 types (RETRIEVAL_QUALITY, RERANKER_FAILURE, LLM_HALLUCINATION, CITATION_MISS, API_ERROR)
Anomaly detection 20% delta threshold β€” flags when today's rate exceeds 7-day average by >20pp
A/B comparison v2: 11.5pp improvement over v1 (22.5% β†’ 11.0% failure rate)
Trend analysis IMPROVING / STABLE / DEGRADING based on 7-day moving average
Slack integration Webhook ready β€” fires on rate threshold, anomaly, or 3 consecutive failures

βš™οΈ Configuration (config.py)

Parameter Default Description
FAILURE_RATE_THRESHOLD 0.25 Alert fires above this failure rate
ANOMALY_THRESHOLD 0.20 Flag if today exceeds 7-day avg by this delta
SLACK_WEBHOOK_URL "" Empty = console output
CONSECUTIVE_FAILURE_THRESHOLD 3 Alert after N consecutive step failures
STEP_THRESHOLDS see config Per-step max acceptable failure rate

πŸ§ͺ Root Cause Categories

Category Trigger
RETRIEVAL_QUALITY Retrieval step fails β€” no results, low score
RERANKER_FAILURE Reranker can't parse LLM response or times out
LLM_HALLUCINATION Generation returns empty or uncited response
CITATION_MISS Answer produced but no source citations found
API_ERROR Timeout, 429 rate limit, 503 service unavailable

πŸ“ˆ Terminal Dashboard (Sample Output)

═════════════════════════════════════════════════════════════
  πŸ”¬  FAILURE FORENSICS β€” Terminal Dashboard
═════════════════════════════════════════════════════════════

  πŸ“… SON 7 GÜNÜN FAILURE RATE GRAFİĞİ
  2026-06-03  [β–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 13.0%
  2026-06-07  [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 27.3% ⚠️
  2026-06-10  [β–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 12.0%

  πŸ” ADIM BAZINDA HATA DAĞILIMI
  retrieval     [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 38.0%  (38/100 hatalΔ±)
  reranking     [β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 13.0%  (13/100 hatalΔ±)
  generation    [β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 10.0%  (10/100 hatalΔ±)
  citation      [β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘]  6.0%  (6/100 hatalΔ±)

  ⚑ ANOMALΔ°: βœ… Normal: BugΓΌn (12.0%) β‰ˆ 7g ort. (16.2%)
  πŸ“Š TREND: ➑️  STABLE β€” Hareketli Ort: 16.0%

πŸ›  Technologies Used

  • Python standard library β€” json, collections, datetime, threading
  • requests β€” Slack webhook HTTP calls
  • python-dotenv β€” Environment variable management

No heavy dependencies. No cloud. No API keys required.


πŸ”­ Roadmap

  • FastAPI REST endpoint for remote log ingestion
  • HTML report export
  • PostgreSQL backend for large-scale log storage
  • Multi-pipeline support (compare RAG vs fine-tuned model)
  • Email alerts as alternative to Slack

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages