Failure Forensics

Production AI pipeline monitoring — root cause detection, anomaly alerts, regression guard, and Gemini-powered recommendations.

Installation

pip install failure-forensics

Quick Start

from failure_forensics import trace

@trace(step="retrieval", version="v1")
def my_retrieval_function(query):
    # your code here
    pass

Features 🔬

A self-hosted, zero-cost LLM pipeline observability tool that gives you root cause detection, anomaly alerts, A/B reporting, and a live terminal dashboard — without sending your data to any third-party service.

🆚 Why Not LangSmith or Braintrust?

	Failure Forensics	LangSmith	Braintrust
Cost	Free	Paid tiers	Paid tiers
Data privacy	Stays on your machine	Sent to cloud	Sent to cloud
Customization	Full control	Limited	Limited
Slack alerts	Built-in	Premium only	Premium only
A/B reporting	Built-in	Basic	Basic
Circuit breaker / trend	Built-in	❌	❌

Failure Forensics is designed for teams who need production-grade observability without vendor lock-in.

✨ What It Does

Every pipeline run passes through a structured logging and analysis layer:

Pipeline Step  →  logger.py  →  requests.jsonl
                                     ↓
                    ┌────────────────┴────────────────┐
                    │                                 │
              forensics.py                       pattern.py
          (root cause detection)          (time series + anomaly)
                    │                                 │
              versioning.py                      baseline.py
           (v1 vs v2 comparison)            (7-day moving average)
                    │                                 │
               ab_report.py                      alerts.py
            (A/B comparison table)          (Slack / console alert)
                    └────────────────┬────────────────┘
                                     ↓
                              dashboard.py
                         (ASCII terminal dashboard)

📁 Project Structure

failure-forensics/
├── src/
│   ├── logger.py          # Logs every pipeline step to JSONL
│   ├── forensics.py       # Root cause detection (5 categories)
│   ├── pattern.py         # Time-series failure rate + anomaly detection
│   ├── baseline.py        # 7-day moving average + trend (IMPROVING/STABLE/DEGRADING)
│   ├── alerts.py          # Slack webhook + console alerts
│   ├── versioning.py      # Per-version failure rate stats
│   ├── ab_report.py       # A/B comparison report (table + JSON)
│   └── dashboard.py       # ASCII bar chart terminal dashboard
├── data/
│   └── logs/
│       └── requests.jsonl # All pipeline logs (gitignored)
├── tests/
│   └── test_forensics.py  # 8 unit tests
├── config.py              # Thresholds, Slack URL, step limits
├── main.py                # 5-scenario demo runner
├── simulate.py            # Realistic test data generator (100 runs, anomaly day)
└── requirements.txt

🚀 Getting Started

1. Clone & Install

git clone https://github.com/jasstt/failure-forensics.git
cd failure-forensics
pip install -r requirements.txt

2. (Optional) Configure Slack Alerts

Edit config.py:

SLACK_WEBHOOK_URL = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

If left empty, all alerts print to the console.

3. Run the Full Demo

python main.py

This runs 5 scenarios:

Simulation — generates 100 realistic pipeline runs (2 prompt versions, anomaly day)
Root cause analysis — detects the failing step and assigns a category
7-day pattern report — failure rate per day + step breakdown + anomaly check
A/B report — prompt_v1 vs prompt_v2 with per-step improvement table
Terminal dashboard — live ASCII bar charts, trend, top 5 failed runs

4. Run Unit Tests

python tests/test_forensics.py
python tests/test_advanced.py

🚀 Advanced Features (New in v2)

Katman	Özellik	Teknoloji
1	Otomatik öneri motoru	Kural tabanlı
2	AI destekli hata analizi	Gemini 2.5 Pro
3	Eval seti otomatik büyütme	Frequency analysis
4	Prompt optimizasyon açıklaması	Gemini 2.5 Pro
5	Regression guard	Baseline comparison

Senaryo 6: Regression Guard Yeni bir prompt (v3) deploy edilmeden önce otomatik regresyon kontrolü yapar:

REGRESSION CHECK — v3
Baseline (v2): 11.0% failure rate
Yeni (v3):     24.5% failure rate
Delta: +13.5pp → REGRESSION_DETECTED ❌

Test Results

Katman	Test	Sonuç
1 — Recommender	Kategori → öneri mapping	✅ PASS
2 — LLM Analyzer	Gemini fallback	✅ PASS
3 — Eval Collector	Duplicate prevention	✅ PASS
4 — Prompt Optimizer	A/B açıklama (v2: +10pp)	✅ PASS
5 — Regression Guard	DETECTED + PASS senaryoları	✅ PASS

Key Results

A/B: prompt_v2, v1'e göre 10pp iyileşme
Regression Guard: v3 deploy'u +6pp delta ile WARNING olarak engelledi
Eval Collector: 5 yeni eval adayı otomatik toplandı
LLM Analyzer: Gemini kapalıyken kural tabanlına sorunsuz fallback

📊 Results

Feature	Result
Unit Tests	8/8 PASS ✅
Root cause categories	5 types (RETRIEVAL_QUALITY, RERANKER_FAILURE, LLM_HALLUCINATION, CITATION_MISS, API_ERROR)
Anomaly detection	20% delta threshold — flags when today's rate exceeds 7-day average by >20pp
A/B comparison	v2: 11.5pp improvement over v1 (22.5% → 11.0% failure rate)
Trend analysis	IMPROVING / STABLE / DEGRADING based on 7-day moving average
Slack integration	Webhook ready — fires on rate threshold, anomaly, or 3 consecutive failures

⚙️ Configuration (`config.py`)

Parameter	Default	Description
`FAILURE_RATE_THRESHOLD`	`0.25`	Alert fires above this failure rate
`ANOMALY_THRESHOLD`	`0.20`	Flag if today exceeds 7-day avg by this delta
`SLACK_WEBHOOK_URL`	`""`	Empty = console output
`CONSECUTIVE_FAILURE_THRESHOLD`	`3`	Alert after N consecutive step failures
`STEP_THRESHOLDS`	see config	Per-step max acceptable failure rate

🧪 Root Cause Categories

Category	Trigger
`RETRIEVAL_QUALITY`	Retrieval step fails — no results, low score
`RERANKER_FAILURE`	Reranker can't parse LLM response or times out
`LLM_HALLUCINATION`	Generation returns empty or uncited response
`CITATION_MISS`	Answer produced but no source citations found
`API_ERROR`	Timeout, 429 rate limit, 503 service unavailable

📈 Terminal Dashboard (Sample Output)

═════════════════════════════════════════════════════════════
  🔬  FAILURE FORENSICS — Terminal Dashboard
═════════════════════════════════════════════════════════════

  📅 SON 7 GÜNÜN FAILURE RATE GRAFİĞİ
  2026-06-03  [███░░░░░░░░░░░░░░░░░░░░░░░░░░░] 13.0%
  2026-06-07  [████████░░░░░░░░░░░░░░░░░░░░░░] 27.3% ⚠️
  2026-06-10  [███░░░░░░░░░░░░░░░░░░░░░░░░░░░] 12.0%

  🔍 ADIM BAZINDA HATA DAĞILIMI
  retrieval     [███████░░░░░░░░░░░░░] 38.0%  (38/100 hatalı)
  reranking     [██░░░░░░░░░░░░░░░░░░] 13.0%  (13/100 hatalı)
  generation    [██░░░░░░░░░░░░░░░░░░] 10.0%  (10/100 hatalı)
  citation      [█░░░░░░░░░░░░░░░░░░░]  6.0%  (6/100 hatalı)

  ⚡ ANOMALİ: ✅ Normal: Bugün (12.0%) ≈ 7g ort. (16.2%)
  📊 TREND: ➡️  STABLE — Hareketli Ort: 16.0%

🛠 Technologies Used

Python standard library — json, collections, datetime, threading
requests — Slack webhook HTTP calls
python-dotenv — Environment variable management

No heavy dependencies. No cloud. No API keys required.

🔭 Roadmap

FastAPI REST endpoint for remote log ingestion
HTML report export
PostgreSQL backend for large-scale log storage
Multi-pipeline support (compare RAG vs fine-tuned model)
Email alerts as alternative to Slack

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data/eval_candidates		data/eval_candidates
dist		dist
examples		examples
failure_forensics.egg-info		failure_forensics.egg-info
failure_forensics		failure_forensics
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
simulate.py		simulate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Failure Forensics

Installation

Quick Start

Features 🔬

🆚 Why Not LangSmith or Braintrust?

✨ What It Does

📁 Project Structure

🚀 Getting Started

1. Clone & Install

2. (Optional) Configure Slack Alerts

3. Run the Full Demo

4. Run Unit Tests

🚀 Advanced Features (New in v2)

Test Results

Key Results

📊 Results

⚙️ Configuration (`config.py`)

🧪 Root Cause Categories

📈 Terminal Dashboard (Sample Output)

🛠 Technologies Used

🔭 Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Failure Forensics

Installation

Quick Start

Features 🔬

🆚 Why Not LangSmith or Braintrust?

✨ What It Does

📁 Project Structure

🚀 Getting Started

1. Clone & Install

2. (Optional) Configure Slack Alerts

3. Run the Full Demo

4. Run Unit Tests

🚀 Advanced Features (New in v2)

Test Results

Key Results

📊 Results

⚙️ Configuration (config.py)

🧪 Root Cause Categories

📈 Terminal Dashboard (Sample Output)

🛠 Technologies Used

🔭 Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

⚙️ Configuration (`config.py`)

Packages