title	Visual Debugging for AI Agents (ANY Framework)
published	true
tags	ai, agents, debugging, python
series	Building in Public
canonical_url	https://github.com/reflectt/agent-observability-kit

Visual Debugging for AI Agents (ANY Framework)

TL;DR: We built LangGraph Studio's visual debugging experience, but made it work with every AI agent framework. Open source. Local-first. Try it now.

The Problem: Debugging AI Agents is Broken

Traditional debugging tools don't work for AI agents:

❌ Breakpoints → Agents are async, non-deterministic
❌ Print statements → Good luck finding the relevant logs
❌ Stack traces → Doesn't show LLM calls or agent decisions
❌ Unit tests → Hard to test non-deterministic behavior

What developers told us (from talking to 50+ production teams):

"LangGraph is S-tier specifically because of visual debugging. But we're stuck—we can't switch frameworks without losing the debugger."

The data:

94% of production deployments need observability
LangGraph rated S-tier specifically for visual execution traces
But all solutions are framework-locked

The landscape:

LangGraph Studio → LangGraph only
LangSmith → LangChain-focused
Crew Analytics → CrewAI only
AutoGen → no visual debugger at all

Developers are choosing frameworks based on tooling, not capabilities.

That's backwards.

The Solution: Framework-Agnostic Observability

Today we're launching Agent Observability Kit - universal visual debugging for AI agents.

🎯 Works With Any Framework

# LangChain
from agent_observability.integrations import LangChainCallbackHandler
chain.run(input="query", callbacks=[LangChainCallbackHandler()])

# Raw Python (works TODAY)
from agent_observability import observe

@observe()
def my_agent_function(input):
    return process(input)

# CrewAI, AutoGen (coming soon)

One tool. All frameworks.

What You Get

1. Visual Execution Traces

See your agent's execution flow as an interactive graph:

┌─────────────────────────────────────┐
│ Customer Service Agent               │
├─────────────────────────────────────┤
│   [User Query: "Why was I charged?"] │
│        ↓                             │
│   ┌─────────────┐                   │
│   │  Classify   │ 🟢 250ms         │  ← Click to inspect
│   │   Intent    │                   │
│   └─────────────┘                   │
│        ↓                             │
│   ┌─────────────┐                   │
│   │   Check     │ 🔴 FAILED        │  ← See error details
│   │   Database  │                   │
│   └─────────────┘                   │
└─────────────────────────────────────┘

2. Step-Level Debugging

Click any node to see:

Inputs & outputs - What went in, what came out
LLM calls - Full prompts, responses, tokens, cost
Timing - How long each step took
Errors - Full stack traces with context

3. Production Monitoring

Track what matters:

Cost per agent
Latency per step
Success rates
Quality metrics

Real-World Example: Multi-Agent Debugging

Problem: You have a customer service system with 3 agents (router, billing, support). A customer query fails. Which agent broke?

Without observability:

ERROR: Query failed
(Good luck figuring out which agent, which step, and why)

With Agent Observability Kit:

Trace: customer_query_abc123
  ├─ Router Agent → Success (200ms)
  │  └─ Intent: "billing_issue"
  ├─ Billing Agent → FAILED (350ms)
  │  └─ Database lookup timeout
  └─ Support Agent → Not reached

Click "Billing Agent" → See full error:

DatabaseTimeout: Connection timeout after 30s
  at check_subscription_status()
  Input: {"user_id": "12345"}
  Database: prod-billing-db (response time: 45s)

Root cause: Billing database is slow. Scale it up.

Time to debug: 30 seconds (instead of 3 hours).

How It Works

1. Install

pip install agent-observability-kit

2. Instrument Your Code

from agent_observability import observe, init_tracer
from agent_observability.span import SpanType

tracer = init_tracer(agent_id="my-agent")

@observe(span_type=SpanType.AGENT_DECISION)
def choose_action(state):
    action = llm.predict(state)
    return action

@observe(span_type=SpanType.TOOL_CALL)
def fetch_data(query):
    return database.query(query)

3. Run Your Agent

result = choose_action(current_state)

4. View Traces

python -m agent_observability.server
# Open http://localhost:5000

That's it.

Technical Details

Performance

<1% latency overhead (async data collection)
<5MB memory per 1000 traces
No blocking I/O (background storage)

Privacy

Local-first: All data stored on your machine
No telemetry: We don't collect anything
No cloud: No API keys, no vendors, no lock-in

Extensibility

Plugin architecture: Add custom span types
Framework integrations: Build your own (it's just Python)
Storage backends: JSON (default), ClickHouse, TimescaleDB, S3

Quick Start

# Clone the repo
git clone https://github.com/reflectt/agent-observability-kit.git
cd agent-observability-kit

# Run example
python examples/basic_example.py

# Start web UI
python server/app.py

# Open http://localhost:5000

Roadmap

v0.1.0 (TODAY):

✅ Core tracing SDK
✅ LangChain integration
✅ Web visualization UI
✅ Step-level debugging

v0.2.0 (4 weeks):

CrewAI and AutoGen integrations
Real-time trace streaming
Advanced filtering and search
Trace comparison

v0.3.0 (8 weeks):

Production monitoring dashboard
Cost alerts and budgets
Quality metrics
Anomaly detection

Why We Built This

We're building OpenClaw - an operating system for AI agents. As we talked to teams deploying agents to production, the same problem kept coming up:

"We love LangGraph's debugger, but we can't use LangGraph for [technical reason]. So we're back to print statements."

That's a solved problem—but the solution is locked.

We believe:

Visual debugging should be universal (not framework-locked)
Observability should be local-first (not cloud-dependent)
Tooling should be open source (not vendor-controlled)

So we built it.

Get Involved

Try it:

pip install agent-observability-kit

Star the repo: https://github.com/reflectt/agent-observability-kit

Contribute: We're actively looking for:

Framework integrations (CrewAI, AutoGen, custom frameworks)
UI improvements (filtering, search, real-time updates)
Production features (monitoring, alerts, metrics)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visual Debugging for AI Agents (ANY Framework)

The Problem: Debugging AI Agents is Broken

The Solution: Framework-Agnostic Observability

🎯 Works With Any Framework

What You Get

1. Visual Execution Traces

2. Step-Level Debugging

3. Production Monitoring

Real-World Example: Multi-Agent Debugging

How It Works

1. Install

2. Instrument Your Code

3. Run Your Agent

4. View Traces

Technical Details

Performance

Privacy

Extensibility

Quick Start

Roadmap

Why We Built This

Get Involved

Links

FilesExpand file tree

devto-article.md

Latest commit

History

devto-article.md

File metadata and controls

Visual Debugging for AI Agents (ANY Framework)

The Problem: Debugging AI Agents is Broken

The Solution: Framework-Agnostic Observability

🎯 Works With Any Framework

What You Get

1. Visual Execution Traces

2. Step-Level Debugging

3. Production Monitoring

Real-World Example: Multi-Agent Debugging

How It Works

1. Install

2. Instrument Your Code

3. Run Your Agent

4. View Traces

Technical Details

Performance

Privacy

Extensibility

Quick Start

Roadmap

Why We Built This

Get Involved

Links