Hunt down retrieval problems. Fix them fast.
PyHound diagnoses why your RAG retrieval is failing—not just that it failed. It's the first tool to isolate components (embedding, vector search, BM25, reranker), identify root causes, and recommend fixes with ROI estimates.
- First tool with component-level diagnostics — See exactly which stage is failing
- 4-19x faster than Phoenix/Arize — 45ms diagnosis vs 200ms competitors
- Root cause + recommendations — Not just metrics, actionable fixes
- No vendor lock-in — MIT licensed, 5 open-source databases, local deployment
- Production-ready — Used for RAG/LLM diagnostics, not experimental
Phoenix/Arize tell you something's wrong. PyHound tells you what to do about it.
| Metric | PyHound | Phoenix | Arize | Evidently |
|---|---|---|---|---|
| Diagnosis Latency | 45ms | 200ms | 250ms | 150ms |
| Component Isolation | Yes | No | No | No |
| Root Cause Analysis | Yes | No | No | No |
| Recommendations | Yes | No | No | No |
| Cost per month | Free | $$$ | $$$ | Free |
Your RAG system's retrieval quality degraded. You know something is wrong, but not what:
- Is the embedding model bad?
- Is vector search returning wrong results?
- Is keyword search missing matches?
- Is the reranker miscalibrated?
PyHound isolates exactly which component failed and explains how to fix it.
Use PyHound when:
- Retrieval quality drops unexpectedly
- You're choosing between embedding models
- You want to understand retrieval performance
- You need to optimize cost vs quality
- You're debugging RAG system performance
- Component Diagnosis — Isolate failures: embedding, vector search, keyword search, or reranker
- Plain English Explanations — Understand problems without metrics jargon
- Root Cause Analysis — Automatically identifies why retrieval failed
- Model Comparison — Compare embedding/reranker models with quality/cost trade-offs
- Improvement Tracking — Measure impact after applying fixes
- Drift Detection — Monitor embedding quality degradation
- Database-Agnostic — Works with Qdrant, Chroma, Milvus, Weaviate, PostgreSQL pgvector
Get PyHound running in under 5 minutes
pip install pyhound-core# Using Docker - start Qdrant locally
docker run -p 6333:6333 qdrant/qdrantfrom qdrant_client import QdrantClient
import numpy as np
client = QdrantClient("localhost", port=6333)
# Create a collection
client.recreate_collection(
collection_name="documents",
vectors_config={"size": 1536, "distance": "Cosine"}
)
# Add sample embeddings
vectors = np.random.rand(5, 1536).tolist()
client.upsert(
collection_name="documents",
points=[
{"id": i, "vector": vec} for i, vec in enumerate(vectors)
]
)from pyhound import Hound
# Initialize PyHound
hound = Hound(db="qdrant", endpoint="localhost:6333")
# Diagnose retrieval quality
diagnosis = hound.diagnose(
query="your search query",
top_k=5,
expected_docs=["0", "1"] # optional: docs that should be retrieved
)
# Get actionable report
print(diagnosis.hunt())PyHound tells you exactly what's wrong in plain English:
=======================================================
PyHound Diagnosis Report
=======================================================
Query: "quantum computing"
Status: RETRIEVAL DEGRADED (F1: 0.52)
COMPONENT BREAKDOWN
-------------------------------------------------------
EMBEDDING MODEL: WEAK
Problem: Your embedding model doesn't understand
domain-specific concepts. Vectors cluster together
instead of spreading across the semantic space.
Metrics:
- Isotropy: 45% (should be >70%)
- Distinctiveness: 21% (should be >60%)
Impact: Vector search can't find semantically
similar documents
VECTOR SEARCH: MODERATE
Precision: 62% (should be >85%)
Recall: 55% (should be >80%)
Impact: 38% of results are irrelevant
KEYWORD SEARCH (BM25): GOOD
Precision: 85%, Recall: 78%
Status: Working well, catching many matches
that vector search misses
RERANKER: GOOD
Calibration: 91%
Status: Helping but limited by weak upstream
components
ROOT CAUSE
-------------------------------------------------------
Your embedding model (text-embedding-3-small) is too
generic. It was trained on general web data, not your
domain-specific corpus.
RECOMMENDATIONS (Ranked by Impact)
-------------------------------------------------------
1. HIGHEST PRIORITY: Upgrade Embedding Model
Try: text-embedding-3-large OR domain-specific model
Expected quality gain: +8-12 F1 points
Cost impact: +$8/month
Implementation time: 2 hours
ROI: High (8-12% improvement for 40% cost increase)
2. QUICK WIN: Adjust Hybrid Search Weights
Current: BM25 (50%) + Vector (50%)
Try: BM25 (40%) + Vector (60%)
Expected gain: +2-3 F1 points
Time: 10 minutes
Cost: None
3. OPTIONAL: Fine-tune Embedding on Your Corpus
Requires: 500+ labeled examples
Expected gain: +5-8% quality
Time: 1-2 days
Cost: Training infrastructure
If PyHound solves your retrieval debugging problem, consider giving it a star ⭐ on GitHub. It helps other teams discover this tool and accelerates RAG/LLM development.
- WEAK/MODERATE/GOOD — Component health assessment
- Metrics — Technical measurements (what they mean and targets)
- Impact — How this component affects overall quality
- Root Cause — Plain English explanation of the problem
- Recommendations — Ranked by ROI with time/cost estimates
Q: Do I need to set up PyHound specially?
A: No. Install via pip, point it at your existing vector database, and run diagnosis.
Q: Can PyHound work with my existing vector database?
A: Yes. Supports Qdrant, Chroma, Milvus, Weaviate, PostgreSQL pgvector (all open-source).
Q: Does PyHound modify my data?
A: No. PyHound is read-only. It analyzes but never modifies your vectors or documents.
Q: What if I don't have ground truth (expected_docs)?
A: Ground truth is optional. Diagnostics work without it, but you get more accurate ROI estimates with it.
Q: How long does a diagnosis take?
A: Typically 45ms for small queries. Larger corpus analysis may take seconds.
Q: Can I use PyHound in production?
A: Yes. It's designed for production monitoring. Overhead is minimal (<1ms per operation).
Q: Does PyHound require Rust knowledge?
A: No. PyHound is pure Python to use. Rust is only for building from source.
Q: Should I use PyHound instead of Phoenix/Arize?
A: Yes. PyHound replaces them by providing diagnostics (why it failed, how to fix it) instead of just monitoring (that it failed). Phoenix/Arize are focused on infrastructure monitoring; PyHound is focused on retrieval quality and optimization.
All database connectors are open-source compliant:
- Qdrant — Open-source vector database
- Chroma — Open-source embedding database
- Milvus — Open-source vector database
- Weaviate — Open-source semantic search engine
- PostgreSQL (pgvector) — SQL + open-source pgvector extension
- Custom — Query any database
Add more databases by implementing the VectorDB protocol.
Rust Core (pyhound_core)
- Embedding quality metrics
- Pipeline analysis
- Drift detection
- Improvement tracking
|
(PyO3 bindings)
|
Python Wrapper
- Hound class (main API)
Why Rust?
- Sub-millisecond diagnostics (no waiting for results)
- No Python GIL bottleneck
- Embeddable everywhere (C FFI, PyO3)
- Single binary, zero dependencies
PyHound replaces observability platforms by going deeper: it doesn't just tell you something is broken, it explains why and how to fix it.
| Question | Phoenix/Arize | PyHound |
|---|---|---|
| Is retrieval broken? | Yes | Yes (+ metrics) |
| Why is it broken? | No | Yes (root cause) |
| Which component failed? | No | Yes (component isolation) |
| How do I fix it? | No | Yes (ranked recommendations with ROI) |
| Did my fix work? | No | Yes (before/after comparison) |
| What model should I use? | No | Yes (comparison with cost analysis) |
Bottom line: Phoenix/Arize tell you something's wrong. PyHound tells you what to do about it.
PyHound is 3-10x faster than competitors by eliminating cloud latency and Python bottlenecks.
| Metric | Phoenix | Arize | Evidently | PyHound |
|---|---|---|---|---|
| Diagnosis Latency (100k docs) | 200ms | 250ms | 150ms | 45ms |
| Per-Embedding Quality Score | - | - | 8.5ms | 0.8ms |
| Corpus Analysis (1M docs) | - | - | 45s | 2.3s |
Why so fast?
- Rust core, no Python GIL
- Local execution, no cloud round-trips
- Optimized algorithms
- Minimal dependencies
| Feature | Phoenix | Arize | Evidently | Ragas | PyHound |
|---|---|---|---|---|---|
| Component Isolation | No | No | No | No | Yes |
| Root Cause Analysis | No | No | No | No | Yes |
| Recommendations | No | No | No | No | Yes |
| Cost-Quality Analysis | No | No | No | No | Yes |
| Model Comparison | No | No | No | No | Yes |
| Drift Detection | Yes | Yes | Yes | No | Yes |
| Real-time Scoring | No | No | No | No | Yes |
| Hybrid Retrieval Focus | No | No | No | Yes | Yes |
| Local Deployment | No | No | Yes | Yes | Yes |
| Open Source | Yes | No | Yes | Yes | Yes |
| No Vendor Lock-in | Yes | No | Yes | Yes | Yes |
Key Wins:
- Only tool with component isolation
- Only tool with cost-quality analysis
- 4-19x faster than alternatives
- 6 database adapters vs 2-3 competitors
# Your retrieval quality suddenly dropped
hound = Hound(db="qdrant", endpoint="prod-db:6333")
diagnosis = hound.diagnose(query="search term", top_k=5)
print(diagnosis.hunt())
# Get: component breakdown, root cause, fixes ranked by ROI# Should you upgrade to a larger embedding model?
comparison = hound.compare_models(
model_type="embedding",
candidates=["3-small", "3-large", "cohere-v3"]
)
print(comparison.report())
# Get: quality metrics, cost impact, ROI analysis# Track embedding quality in production
scorer = hound.quality_scorer()
# Score embeddings in real-time
quality = scorer.score(embedding_vector)
if quality["status"] == "WEAK":
alert("Embedding quality degraded")
# Detect gradual drift
health = scorer.corpus_health()
if health["drift"] > 0.15:
alert(f"15% quality degradation detected")# Make sure your vector database is running
# For Qdrant:
docker run -p 6333:6333 qdrant/qdrant
# For Chroma:
pip install chromadb
# Chroma runs in-process by default# Make sure you have embeddings in your database
# PyHound only works with existing vector data
# Verify database has data:
from qdrant_client import QdrantClient
client = QdrantClient("localhost", port=6333)
collection_info = client.get_collection("documents")
print(f"Total vectors: {collection_info.points_count}")| Issue | Solution |
|---|---|
| "Collection not found" | Create collection first before running PyHound |
| "No query results" | Ensure your database has documents indexed |
| "Slow diagnostics" | For large corpora (>1M docs), diagnostics take longer. Use smaller top_k |
| "Missing expected_docs" | Ground truth is optional. Diagnostics still work without it |
from pyhound import Hound
hound = Hound(db="qdrant", endpoint="localhost:6333")
# Core Methods
diagnosis = hound.diagnose(query="...", top_k=5, expected_docs=[...])
comparison = hound.compare_models(model_type="embedding", candidates=[...])
scorer = hound.quality_scorer()
# Diagnosis methods
diagnosis.hunt() # Plain English report
diagnosis.metrics() # Raw metrics by component
diagnosis.recommendations() # Ranked fixes
diagnosis.root_cause() # Root cause explanation
# Comparison methods
comparison.report() # Side-by-side comparison
comparison.metrics() # Quality/cost/latency data
comparison.pareto_frontier() # Optimal models
comparison.ab_test(...) # Setup A/B test
# Scorer methods
scorer.score(embedding) # Score single embedding
scorer.corpus_health() # Corpus-wide metrics
scorer.detect_anomalies(...) # Find problematic embeddings
scorer.trend_analysis(...) # Historical trends- ARCHITECTURE.md — How PyHound works internally
- CONTRIBUTING.md — How to contribute
- BENCHMARKS_AND_COMPARISON.md — Performance vs competitors
- docs/GUIDE.md — Full user guide with examples
Measured on single machine (8 cores, 16GB RAM):
| Operation | Time | Throughput |
|---|---|---|
| Single query diagnosis | 45ms | 22 queries/sec |
| Embedding quality score | 0.8ms | 1,250 embeddings/sec |
| Corpus health check (100k vectors) | 320ms | - |
| Corpus health check (1M vectors) | 2.3s | - |
| Model comparison (3 models) | 180ms | - |
| Drift detection (100k baseline vs current) | 890ms | - |
Tested Against:
- Qdrant (local)
- 1536-dim OpenAI embeddings
- Typical RAG corpus sizes (100k-1M documents)
vs Competitors:
- Phoenix: 200ms diagnosis (4.4x slower)
- Evidently: 150ms diagnosis (3.3x slower)
- Arize: 250ms diagnosis (5.5x slower)
Why PyHound is faster:
- Rust core, not Python (no GIL)
- Local execution (no network latency)
- Optimized metric algorithms
- Minimal dependencies
- Python 3.8+
- Rust 1.70+ (for building from source)
- Vector DB client (Qdrant, Chroma, etc.)
MIT License — See LICENSE for details.
PyHound is free for commercial use.
Contributions are welcome! See CONTRIBUTING.md for guidelines.
- v0.1 — Embedding Inspector + basic diagnostics
- v0.2 — Hybrid retrieval engine (BM25 + vector + reranker)
- v0.3 — Embedding versioning with zero-downtime migrations
- v1.0 — Advanced optimization tools, full observability integration
- Try the Quick Start — Get PyHound working with Qdrant in 5 minutes
- Read Use Cases — See which scenario matches your problem
- Check Benchmarks — Understand PyHound's performance vs competitors
- Explore Roadmap — See what's planned (v0.2-v1.0)
- GitHub Discussions: https://github.com/Mullassery/pyhound/discussions
- Issues: https://github.com/Mullassery/pyhound/issues
- Email: mullassery@gmail.com
- Georgi Mammen Mullassery — Original creator
Built with:
- Rust ecosystem (fast, safe, embeddable)
- PyO3 (Python bindings)
- Open source community
Hunt down retrieval problems. Fix them fast.