📼 Agent VCR

ACID transactions, time-travel debugging, and zero-cost replay for AI agents.

The only tool that rolls back the filesystem — not just the state object.

📖 Docs · 🚀 Examples · 🛡️ Sentinel · 📊 Benchmarks

pip install ai-agent-vcr

No API keys. No cloud. No vendor lock-in. Works with LangGraph, CrewAI, or raw Python.

Observability tools show you what happened. Agent VCR lets you undo it.

❌ Without Agent VCR

Agent fails at step 8 of 10
         ↓
You patch the code
         ↓
Re-run ALL 10 steps from scratch
         ↓
$0.04 + 2 minutes wasted
         ↓
Repeat for every bug

✅ With Agent VCR

player = VCRPlayer.load("run.vcr")

# Jump to step 8, see what went wrong
state = player.goto_frame(7)

# Fix it and resume — skip steps 0-7
player.resume(agent, ResumeConfig(
    from_frame=7,
    state_overrides={"prompt": "fixed"}
))

✨ Features

⏮️ Time Travel Jump to any step. Full state snapshot at every node. Inspect input, output, diffs.	✏️ Edit & Resume Fix a prompt, patch a tool output, inject context — then resume from that point. No re-runs.	🌿 Session Forking Fork from any frame. Create parallel runs. Compare how fixes change downstream behavior.
👻 Ghost Replay Save successful runs. Replay the same task instantly — zero tokens, zero cost, 100% savings.	🔒 ACID Transactions `BEGIN / SAVEPOINT / ROLLBACK / COMMIT` backed by git. Rollback deletes files from disk.	🛡️ Sentinel Guardian Real-time AST analysis catches duplicate functions, complexity spikes, and makes the agent self-correct.
🖥️ TUI Debugger `vcr-tui` in your terminal. Navigate frames, edit state, diff, resume — all keyboard-driven.	📡 Live Dashboard `vcr-server` → `localhost:8000`. WebSocket streaming, session browser, DAG visualization.	⚡ <5ms Overhead P99 under 5ms. Benchmarked in CI on every commit. Safe for production.

Quick Start

Record

from agent_vcr import VCRRecorder

recorder = VCRRecorder()
recorder.start_session("my_run")

# Your existing agent code — unchanged
state = {"query": "build a REST API"}
state = planner(state)          # step 1
recorder.record_step("planner", input_state, state)

state = coder(state)            # step 2
recorder.record_step("coder", input_state, state)

recorder.save()                 # → .vcr/my_run.vcr

Or use the context manager — never lose frames even if the agent crashes:

with VCRRecorder() as recorder:
    recorder.start_session("my_run")
    # ... your agent code ...
# auto-saved on exit

Rewind & Fix

from agent_vcr import VCRPlayer
from agent_vcr.models import ResumeConfig

player = VCRPlayer.load(".vcr/my_run.vcr")

# Inspect any step
print(player.goto_frame(0))     # {'query': 'build a REST API', ...}
print(player.goto_frame(1))     # {'plan': '...', 'steps': [...], ...}
print(player.get_errors())      # see what failed

# Diff two frames
diff = player.compare_frames(0, 1)
# {'added': {'plan': ...}, 'modified': {'query': ...}, ...}

# Fix and resume from step 1 with a different plan
player.resume(
    agent_callable=coder,
    config=ResumeConfig(
        from_frame=1,
        state_overrides={"plan": "use FastAPI instead of Flask"}
    )
)

Integrations

LangGraph

from langgraph.graph import StateGraph
from agent_vcr import VCRRecorder
from agent_vcr.integrations.langgraph import VCRLangGraph

graph = StateGraph(MyState)
graph.add_node("planner", planner_node)
graph.add_node("coder", coder_node)
graph.add_edge("planner", "coder")

recorder = VCRRecorder()
graph = VCRLangGraph(recorder).wrap_graph(graph)  # one line

result = graph.invoke({"query": "Build a todo app"})
recorder.save()

CrewAI

from crewai import Crew
from agent_vcr import VCRRecorder
from agent_vcr.integrations.crewai import VCRCrewAI

recorder = VCRRecorder()
recorder.start_session("crew_run")

crew = Crew(agents=[researcher, writer], tasks=[task1, task2])
result = VCRCrewAI(recorder).kickoff(crew)

recorder.save()

Install extras:

pip install "ai-agent-vcr[crewai]"
pip install "ai-agent-vcr[langgraph]"

Raw Python (decorator)

from agent_vcr import VCRRecorder
from agent_vcr.integrations.langgraph import vcr_record

recorder = VCRRecorder()

@vcr_record(recorder, node_name="research_step")
def research(state: dict) -> dict:
    return {"findings": search(state["query"])}

🔒 ACID Transactions

Databases solved the partial-failure problem 40 years ago. Agents have the exact same problem — when your agent fails mid-run, you don't just have bad in-memory state. You have files written to disk that shouldn't exist.

Current tools only roll back state objects. The filesystem stays polluted.

Agent VCR wraps agent execution in real transactional semantics:

from agent_vcr import VCRRecorder
from agent_vcr.integrations.openhands import ACIDWorkspace

recorder = VCRRecorder()
acid = ACIDWorkspace("/my/workspace", recorder=recorder)

acid.begin(session_id="task-001")        # isolated git branch
acid.savepoint(state, node_name="coder") # checkpoint state + filesystem
acid.savepoint(state, node_name="tester")

# Agent writes bad code at step 4 — rollback
acid.rollback(to_frame_index=1)
# git reset --hard → bad files are GONE from disk, not just hidden

acid.commit()                            # merge clean branch into main

BEGIN → isolated git branch per agent session. Parallel agents can't clobber each other.
SAVEPOINT → checkpoints both VCR state AND filesystem. Every frame has a matching git commit.
ROLLBACK → git reset --hard. Files your agent hallucinated are physically deleted.
COMMIT → clean merge back into main.

python examples/acid_golden_run.py

👻 Ghost Replay — Never Pay for the Same Task Twice

When your agent succeeds, save the entire execution as a replayable ghost run. Next time you hit the same task, replay it instantly — zero LLM calls, zero tokens, zero cost.

from agent_vcr.golden_cache import GoldenRunCache

cache = GoldenRunCache()

# After a successful run:
cache.save_golden_run("Build a REST API with JWT auth", recorder)

# Next time — instant, $0.00:
outputs, ledger = cache.replay("Build a REST API with JWT auth")
print(ledger)
# CostLedger(saved=100% | $0.0123 | 4,100 tokens | 2,349ms)

The CostLedger tracks original vs replay: tokens, dollars, milliseconds, and reduction percentage. The demo shows it live:

python examples/acid_golden_run.py

RUN 1: Original            RUN 2: Ghost Replay
Tokens:    4,100           Tokens:    0
Cost:    $0.0123           Cost:    $0.00
Latency: 2,350ms           Latency:  1ms

💰 Savings: 100% · $0.0123 · 4,100 tokens · 2,349ms

🖥 TUI Debugger

Run the terminal debugger on any recorded session:

vcr-tui .vcr/my_run.vcr

┌──────────────────────────────────────────────────────────┐
│ 📼 Agent VCR TUI              Session: my_run · 8 frames │
├──────────────────────────────────────────────────────────┤
│ ▶ Frame 0  │ planner     │ 100ms  │ ●                    │
│   Frame 1  │ researcher  │  250ms │ ●                    │
│   Frame 2  │ coder       │  480ms │ ✗ ERROR              │
│   Frame 3  │ tester      │   80ms │ ●                    │
├──────────────────────────────────────────────────────────┤
│  State at frame 0:                                       │
│  { "query": "build a todo app",                          │
│    "context": "...",                                     │
│    "plan": null }                                        │
├──────────────────────────────────────────────────────────┤
│ ← → navigate  │ e edit  │ d diff  │ r resume  │ q quit   │
└──────────────────────────────────────────────────────────┘

Keybindings:

← → — navigate frames
e — edit state inline (opens editor, saves on exit)
d — diff current frame vs previous
r — resume from current frame
f — fork current frame to new session
q — quit

📊 DAG Visualization

See your agent's full execution graph — forks, parallel branches, error paths:

vcr-server .vcr/
# Open localhost:8000

The dashboard renders your session as a DAG:

original_run ────────────────────────────────────────────► [done]
               │ frame 3
               ╰──► fork_v1 ──► [coder] ──► [tester] ──► [done]
               │
               ╰──► fork_v2 ──► [coder] ──► [done]

Every fork is a branch node
Error frames shown in red
Click any node to inspect full state
Live WebSocket streaming for in-progress sessions

🛡️ OpenHands Sentinel

"Code is cheap now. Good code is not." — Graham Neubig, OpenHands Chief Scientist

Sentinel watches every file an AI agent writes and catches quality violations in real time — before the agent moves on.

from openhands_sentinel import Sentinel
from agent_vcr import VCRRecorder

recorder = VCRRecorder()
sentinel = Sentinel(recorder=recorder)
sentinel.attach(runtime.event_stream)  # 3 lines, auto-intercepts every file write

python examples/sentinel_demo.py

STEP 1: Agent writes auth/utils.py
🛡️ SENTINEL: auth/utils.py — CLEAN ✓

STEP 2: Agent writes handlers.py
🛡️ SENTINEL: VIOLATIONS DETECTED!
  CRITICAL  hash_password() already exists in auth/utils.py:8 — reuse it
  CRITICAL  handle_auth_request() is 109 lines (max 40) — break it up
  CRITICAL  Cyclomatic complexity 32 (max 8) — simplify
  WARNING   9 parameters (max 5) — use a config object

STEP 3: Agent self-corrects
🛡️ SENTINEL: handlers.py — CLEAN ✓ All issues resolved!

📼 Audit trail: .vcr/sentinel-demo.vcr

Or scan any directory standalone:

sentinel scan ./my-ai-project

Without Sentinel	With Sentinel
Agent writes bad code	Agent writes bad code
Human reviews PR	Sentinel catches in <10ms
Human rejects PR	Agent self-corrects
Agent rewrites	(already done)
Human reviews again	Zero human time
Cost: 2× LLM + human hours	Cost: 1 extra LLM call

"Why not just use LangGraph's built-in time-travel?"

Great question. LangGraph's checkpointer persists graph state at every super-step and lets you inspect/replay from any checkpoint. If you're 100% LangGraph and only need state inspection, it's a solid built-in.

Agent VCR exists because state checkpoints aren't enough:

	LangGraph Checkpointer	Agent VCR
Checkpoint in-memory state	✅	✅
Rollback files on disk (`git reset --hard`)	❌	✅
Ghost Replay (zero tokens, zero cost)	❌	✅
Sentinel (real-time AST quality guard)	❌	✅
Works with CrewAI, raw Python, any framework	❌ LangGraph only	✅
JSONL format (git-diffable, streamable)	❌ Opaque persistence	✅
Session forking with parallel comparison	❌	✅

When your agent writes files to disk — code, configs, data — and then fails, LangGraph's checkpointer rolls back the state object but the files stay. Agent VCR's ACID workspace runs git reset --hard and physically deletes the hallucinated files. That's the difference between "debugger" and "undo."

How It Compares

Honest note: LangSmith, LangFuse, and Arize Phoenix are excellent observability platforms with large teams and production deployments. Agent VCR is not an observability tool — it's an intervention tool. They show you what happened. We let you change it. The categories overlap on tracing but diverge on everything else.

Capability	📼 Agent VCR	LangSmith	LangFuse	Arize Phoenix
Record execution traces	✅	✅	✅	✅
Production-grade dashboards	Basic (local)	✅ Best-in-class	✅	✅
Eval / scoring pipelines	❌	✅	✅	✅
Cost & latency analytics	✅ (per-session)	✅	✅	✅
↓ What only Agent VCR does ↓
Time-travel to any step	✅	❌	❌	❌
Edit state & resume mid-chain	✅	❌	❌	❌
Fork from any frame	✅	❌	❌	❌
ACID filesystem rollback	✅	❌	❌	❌
Ghost Replay (zero-token re-runs)	✅	❌	❌	❌
Sentinel (real-time code guardian)	✅	❌	❌	❌
Terminal TUI debugger	✅	❌	❌	❌
Fully local / self-hosted	✅	❌ (Cloud)	✅	✅
Framework-agnostic	✅	⚠️ Best w/ LangChain	✅	✅

TL;DR: Use LangSmith/LangFuse/Phoenix for production observability and evals. Use Agent VCR when you need to actually intervene — fix a broken run without re-running it, replay a successful run for free, or rollback filesystem damage from a rogue agent.

API Reference

`VCRRecorder`

recorder = VCRRecorder(
    output_dir=".vcr",     # where to save sessions
    auto_save=True,        # flush frames to disk as you go
    diff_mode=False,       # also store state diffs (jsonpatch)
)

recorder.start_session(session_id="my_run", tags=["prod"])
recorder.record_step(node_name, input_state, output_state, metadata)
recorder.record_llm_call(node_name, prompt, response, tokens, cost_usd)
recorder.record_tool_call(node_name, tool_name, args, result)
recorder.record_error(node_name, input_state, error)
recorder.save() -> Path
recorder.fork(from_frame=3) -> VCRRecorder  # branch from a frame

# Context manager — auto-saves on exit
with VCRRecorder() as r:
    r.start_session("run")
    ...

`VCRPlayer`

player = VCRPlayer.load(".vcr/my_run.vcr")
player = VCRPlayer.load_by_id("my_run")

player.goto_frame(index)           # → dict (output state at frame N)
player.get_frame(index)            # → Frame object
player.get_input_state(index)      # → dict (input state at frame N)
player.list_nodes()                # → ['planner', 'coder', ...]
player.get_errors()                # → [Frame, ...]
player.compare_frames(a, b)        # → {'added': {}, 'removed': {}, 'modified': {}}
player.get_total_latency()         # → float (ms)
player.get_total_tokens()          # → int
player.get_total_cost()            # → float (USD)

player.resume(
    agent_callable,                # your agent function
    config=ResumeConfig(
        from_frame=7,              # rewind to BEFORE step 7 ran
        state_overrides={"k": "v"},# apply these before re-running
        mode=ResumeMode.FORK,      # FORK | REPLAY | MOCK
    )
) -> str                           # new session ID

`ACIDWorkspace`

acid = ACIDWorkspace("/workspace", recorder=recorder)
acid.begin(session_id="task-001")
acid.savepoint(state, node_name="coder")
acid.rollback(to_frame_index=2)    # git reset --hard
acid.commit()                      # merge to main

`GoldenRunCache` (Ghost Replay)

from agent_vcr.golden_cache import GoldenRunCache

cache = GoldenRunCache(cache_dir=".vcr/golden")
cache.save_golden_run(task_description, recorder) -> str  # fingerprint
cache.replay(task_description)    -> (outputs, CostLedger)
cache.invalidate(task_description) -> bool
cache.list_runs()                  -> list[dict]

Examples

# Basic recording and playback
python examples/basic_usage.py

# Time-travel: rewind, edit state, resume (with assertion)
python examples/time_travel_demo.py

# LangGraph auto-instrumentation
python examples/langgraph_integration.py

# ACID transactions + Ghost Replay (most impressive demo)
python examples/acid_golden_run.py

# OpenHands Sentinel: agent self-correction live
python examples/sentinel_demo.py

# Async recording
python examples/async_example.py

Storage Format

Sessions are plain JSONL — one JSON object per line:

{"type": "session", "data": {"session_id": "my_run", "created_at": "2024-01-01T00:00:00Z", ...}}
{"type": "frame", "data": {"node_name": "planner", "input_state": {...}, "output_state": {...}, "metadata": {"latency_ms": 120}}}
{"type": "frame", "data": {"node_name": "coder", ...}}

Human-readable — open in any text editor
Git-diffable — review agent state changes in PRs
Append-only — no rewrites, safe for concurrent agents
Streamable — parse line-by-line, no full-file load required

Performance

Recording overhead is benchmarked in CI on every commit. The benchmark suite enforces hard limits — CI fails if any threshold is exceeded.

Reproduce locally:

pip install -e ".[dev]"
pytest tests/benchmarks/ -v --benchmark-only --benchmark-columns="min,max,mean,stddev,rounds"

Benchmark	Threshold	What it measures
`test_benchmark_recorder_overhead`	<5ms mean per frame	Time to serialize and buffer one state snapshot
`test_benchmark_file_write_speed`	>1,000 frames/sec	Sustained write throughput (10K frames)
`test_benchmark_load_speed`	<500ms	Load a 10,000-frame session from disk
`test_benchmark_goto_frame`	<1ms	Random-access time-travel to any frame

These are real pytest-benchmark tests with assertions. If they regress, CI breaks. Historical results are published at ixchio.github.io/agent-vcr/dev/bench/.

Roadmap

Contributing

git clone https://github.com/ixchio/agent-vcr.git
cd agent-vcr
pip install -e ".[dev,tui]"
pytest tests/unit/ -v

See CONTRIBUTING.md for guidelines.

License

MIT — see LICENSE.

📼

Observability shows you what happened. Agent VCR lets you undo it.

pip install ai-agent-vcr

⭐ Star on GitHub · 📦 PyPI · 📖 Docs

_{Built with 🤍 by ixchio · MIT License}

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

📼 Agent VCR

ACID transactions, time-travel debugging, and zero-cost replay for AI agents.

Observability tools show you what happened. Agent VCR lets you undo it.

✨ Features

⏮️ Time Travel

✏️ Edit & Resume

🌿 Session Forking

👻 Ghost Replay

🔒 ACID Transactions

🛡️ Sentinel Guardian

🖥️ TUI Debugger

📡 Live Dashboard

⚡ <5ms Overhead

Quick Start

Record

Rewind & Fix

Integrations

LangGraph

CrewAI

Raw Python (decorator)

🔒 ACID Transactions

👻 Ghost Replay — Never Pay for the Same Task Twice

🖥 TUI Debugger

📊 DAG Visualization

🛡️ OpenHands Sentinel

"Why not just use LangGraph's built-in time-travel?"

How It Compares

API Reference

VCRRecorder

VCRPlayer

ACIDWorkspace

GoldenRunCache (Ghost Replay)

Examples

Storage Format

Performance

Roadmap

Contributing

License

📼

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`VCRRecorder`

`VCRPlayer`

`ACIDWorkspace`

`GoldenRunCache` (Ghost Replay)

Packages