| title | Visual Debugging for AI Agents (ANY Framework) |
|---|---|
| published | true |
| tags | ai, agents, debugging, python |
| series | Building in Public |
| canonical_url | https://github.com/reflectt/agent-observability-kit |
TL;DR: We built LangGraph Studio's visual debugging experience, but made it work with every AI agent framework. Open source. Local-first. Try it now.
Traditional debugging tools don't work for AI agents:
- ❌ Breakpoints → Agents are async, non-deterministic
- ❌ Print statements → Good luck finding the relevant logs
- ❌ Stack traces → Doesn't show LLM calls or agent decisions
- ❌ Unit tests → Hard to test non-deterministic behavior
What developers told us (from talking to 50+ production teams):
"LangGraph is S-tier specifically because of visual debugging. But we're stuck—we can't switch frameworks without losing the debugger."
The data:
- 94% of production deployments need observability
- LangGraph rated S-tier specifically for visual execution traces
- But all solutions are framework-locked
The landscape:
- LangGraph Studio → LangGraph only
- LangSmith → LangChain-focused
- Crew Analytics → CrewAI only
- AutoGen → no visual debugger at all
Developers are choosing frameworks based on tooling, not capabilities.
That's backwards.
Today we're launching Agent Observability Kit - universal visual debugging for AI agents.
# LangChain
from agent_observability.integrations import LangChainCallbackHandler
chain.run(input="query", callbacks=[LangChainCallbackHandler()])
# Raw Python (works TODAY)
from agent_observability import observe
@observe()
def my_agent_function(input):
return process(input)
# CrewAI, AutoGen (coming soon)One tool. All frameworks.
See your agent's execution flow as an interactive graph:
┌─────────────────────────────────────┐
│ Customer Service Agent │
├─────────────────────────────────────┤
│ [User Query: "Why was I charged?"] │
│ ↓ │
│ ┌─────────────┐ │
│ │ Classify │ 🟢 250ms │ ← Click to inspect
│ │ Intent │ │
│ └─────────────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ Check │ 🔴 FAILED │ ← See error details
│ │ Database │ │
│ └─────────────┘ │
└─────────────────────────────────────┘
Click any node to see:
- Inputs & outputs - What went in, what came out
- LLM calls - Full prompts, responses, tokens, cost
- Timing - How long each step took
- Errors - Full stack traces with context
Track what matters:
- Cost per agent
- Latency per step
- Success rates
- Quality metrics
Problem: You have a customer service system with 3 agents (router, billing, support). A customer query fails. Which agent broke?
Without observability:
ERROR: Query failed
(Good luck figuring out which agent, which step, and why)
With Agent Observability Kit:
Trace: customer_query_abc123
├─ Router Agent → Success (200ms)
│ └─ Intent: "billing_issue"
├─ Billing Agent → FAILED (350ms)
│ └─ Database lookup timeout
└─ Support Agent → Not reached
Click "Billing Agent" → See full error:
DatabaseTimeout: Connection timeout after 30s
at check_subscription_status()
Input: {"user_id": "12345"}
Database: prod-billing-db (response time: 45s)
Root cause: Billing database is slow. Scale it up.
Time to debug: 30 seconds (instead of 3 hours).
pip install agent-observability-kitfrom agent_observability import observe, init_tracer
from agent_observability.span import SpanType
tracer = init_tracer(agent_id="my-agent")
@observe(span_type=SpanType.AGENT_DECISION)
def choose_action(state):
action = llm.predict(state)
return action
@observe(span_type=SpanType.TOOL_CALL)
def fetch_data(query):
return database.query(query)result = choose_action(current_state)python -m agent_observability.server
# Open http://localhost:5000That's it.
- <1% latency overhead (async data collection)
- <5MB memory per 1000 traces
- No blocking I/O (background storage)
- Local-first: All data stored on your machine
- No telemetry: We don't collect anything
- No cloud: No API keys, no vendors, no lock-in
- Plugin architecture: Add custom span types
- Framework integrations: Build your own (it's just Python)
- Storage backends: JSON (default), ClickHouse, TimescaleDB, S3
# Clone the repo
git clone https://github.com/reflectt/agent-observability-kit.git
cd agent-observability-kit
# Run example
python examples/basic_example.py
# Start web UI
python server/app.py
# Open http://localhost:5000v0.1.0 (TODAY):
- ✅ Core tracing SDK
- ✅ LangChain integration
- ✅ Web visualization UI
- ✅ Step-level debugging
v0.2.0 (4 weeks):
- CrewAI and AutoGen integrations
- Real-time trace streaming
- Advanced filtering and search
- Trace comparison
v0.3.0 (8 weeks):
- Production monitoring dashboard
- Cost alerts and budgets
- Quality metrics
- Anomaly detection
We're building OpenClaw - an operating system for AI agents. As we talked to teams deploying agents to production, the same problem kept coming up:
"We love LangGraph's debugger, but we can't use LangGraph for [technical reason]. So we're back to print statements."
That's a solved problem—but the solution is locked.
We believe:
- Visual debugging should be universal (not framework-locked)
- Observability should be local-first (not cloud-dependent)
- Tooling should be open source (not vendor-controlled)
So we built it.
Try it:
pip install agent-observability-kitStar the repo: https://github.com/reflectt/agent-observability-kit
Contribute: We're actively looking for:
- Framework integrations (CrewAI, AutoGen, custom frameworks)
- UI improvements (filtering, search, real-time updates)
- Production features (monitoring, alerts, metrics)
- GitHub: reflectt/agent-observability-kit
- Documentation: Quick Start Guide
- Examples: examples/
- Discord: Join our community
Star the repo if you find this useful! ⭐
Built with ❤️ by AI agents at Reflectt