diff --git a/README.md b/README.md index ba8184c..29458cb 100644 --- a/README.md +++ b/README.md @@ -1,33 +1,96 @@ # contextweaver -> Dynamic context management for tool-using AI agents. +> Phase-specific, budget-aware context compilation for tool-using AI agents. -contextweaver solves the **context window problem**: as tool catalogs grow and -conversations accumulate history, naive concatenation blows past token limits. -contextweaver provides **phase-specific budgeted context compilation**, a -**context firewall** for large tool outputs, **result envelopes** with -structured fact extraction, and **bounded-choice routing** over large tool -catalogs via DAG + beam search. +**500+ tests passing · zero runtime dependencies · deterministic output · Python ≥ 3.10** -## Features +--- -- **Context Engine** — seven-stage pipeline that compiles a phase-aware, - budget-constrained prompt from the event log. -- **Context Firewall** — intercepts large tool outputs, stores raw data - out-of-band, and injects compact summaries. -- **Routing Engine** — navigates catalogs of 100+ tools via a bounded DAG - so the LLM only sees a focused shortlist. -- **Protocol Adapters** — first-class adapters for MCP and A2A protocols. -- **Zero Dependencies** — pure Python ≥ 3.10, stdlib only. -- **Deterministic** — identical inputs always produce identical outputs. +## The Problem -## Installation +Imagine a tool-using agent with a 100-tool catalog and a 50-turn conversation history. +At each step the agent must answer four questions: + +1. **Route** — which tool should I call? +2. **Call** — what arguments? +3. **Interpret** — what did it return? +4. **Answer** — how do I respond to the user? + +**Naive approach A — concatenate everything:** + +``` +100 tool schemas (≈50k tokens) + 50 turns (≈30k tokens) = 80k tokens +Token limit: 8k → 10× overflow +``` + +**Naive approach B — cherry-pick manually:** + +``` +Pick 10 tools, last 5 turns → lose dependency chains +Agent hallucinates tool calls, repeats questions, forgets context +``` + +**contextweaver approach — phase-specific budgeted compilation:** + +``` +Route phase: 5 tool cards (≈500 tokens), no full schemas +Answer phase: 3 relevant turns + dependency closure (≈2k tokens) +Result: 2.5k tokens, complete context, deterministic +``` + +See [`examples/before_after.py`](examples/before_after.py) for a runnable side-by-side comparison. + +--- + +## How contextweaver Solves It + +contextweaver provides two cooperating engines: + +``` + ┌────────────────────────────┐ + Events ──────>│ Context Engine │──> ContextPack (prompt) + │ candidates → closure → │ + │ sensitivity → firewall → │ + │ score → dedup → select → │ + │ render │ + └────────────────────────────┘ + ▲ facts / episodes + ┌──────────┴─────────────────┐ + Tools ───────>│ Routing Engine │──> ChoiceCards + │ Catalog → TreeBuilder → │ + │ ChoiceGraph → Router │ + └────────────────────────────┘ +``` + +**Context Engine** — eight-stage pipeline: + +1. **generate_candidates** — pull phase-relevant events from the log for this request. +2. **dependency_closure** — if a selected item has a `parent_id`, include the parent automatically. +3. **sensitivity_filter** — drop or redact items at or above the configured sensitivity floor. +4. **apply_firewall** — tool results are stored out-of-band; large outputs are summarized/truncated before prompt assembly. +5. **score_candidates** — rank by recency, tag match, kind priority, and token cost. +6. **deduplicate_candidates** — remove near-duplicates using Jaccard similarity. +7. **select_and_pack** — greedily pack highest-scoring items into the phase token budget. +8. **render_context** — assemble final prompt string with `BuildStats` metadata. + +**Routing Engine** — four-stage pipeline: + +1. **Catalog** — register and manage `SelectableItem` objects. +2. **TreeBuilder** — convert a flat catalog into a bounded `ChoiceGraph` DAG. +3. **Router** — beam-search over the graph; deterministic tie-breaking by ID. +4. **ChoiceCards** — compact, LLM-friendly cards (never includes full schemas). + +--- + +## Quickstart + +### Install ```bash pip install contextweaver ``` -Or install from source: +Or from source: ```bash git clone https://github.com/dgenio/contextweaver.git @@ -35,7 +98,7 @@ cd contextweaver pip install -e ".[dev]" ``` -## Quick start +### Minimal agent loop ```python from contextweaver.context.manager import ContextManager @@ -43,24 +106,25 @@ from contextweaver.types import ContextItem, ItemKind, Phase mgr = ContextManager() mgr.ingest(ContextItem(id="u1", kind=ItemKind.user_turn, text="How many users?")) -mgr.ingest(ContextItem(id="tc1", kind=ItemKind.tool_call, text="db_query('SELECT COUNT(*) FROM users')", parent_id="u1")) -mgr.ingest(ContextItem(id="tr1", kind=ItemKind.tool_result, text="count: 1042", parent_id="tc1")) +mgr.ingest(ContextItem(id="tc1", kind=ItemKind.tool_call, + text="db_query('SELECT COUNT(*) FROM users')", parent_id="u1")) +mgr.ingest(ContextItem(id="tr1", kind=ItemKind.tool_result, + text="count: 1042", parent_id="tc1")) pack = mgr.build_sync(phase=Phase.answer, query="user count") -print(pack.prompt) # budget-aware compiled context -print(pack.stats) # what was kept, dropped, deduplicated +print(pack.prompt) # budget-aware compiled context +print(pack.stats) # what was kept, dropped, deduplicated ``` -## Routing large tool catalogs +### Route a large tool catalog ```python from contextweaver.routing.catalog import Catalog, load_catalog_json from contextweaver.routing.tree import TreeBuilder from contextweaver.routing.router import Router -items = load_catalog_json("catalog.json") catalog = Catalog() -for item in items: +for item in load_catalog_json("catalog.json"): catalog.register(item) graph = TreeBuilder(max_children=10).build(catalog.all()) @@ -69,14 +133,58 @@ result = router.route("send a reminder email about unpaid invoices") print(result.candidate_ids) ``` +--- + +## Framework Integrations + +| Framework | Guide | Use Case | +|---|---|---| +| MCP | [Guide](docs/integration_mcp.md) | Tool conversion, session loading, firewall | +| A2A | [Guide](docs/integration_a2a.md) | Agent cards, multi-agent sessions | +| LlamaIndex | Guide (coming soon) | RAG + tools with budget control | +| OpenAI Agents SDK | Guide (coming soon) | Function-calling agents with routing | +| Google ADK | Guide (coming soon) | Gemini tool-use with context budgets | +| LangChain / LangGraph | Guide (coming soon) | Chain + graph agents with firewall | + +--- + +## Why Trust contextweaver? + +| Proof point | Detail | +|---|---| +| **500+ tests passing** | Context pipeline, routing engine, firewall, adapters, CLI, sensitivity enforcement | +| **Zero runtime dependencies** | Stdlib-only, Python ≥ 3.10. Works with any LLM provider. No vendor lock-in. | +| **Deterministic** | Tie-break by ID, sorted keys. Identical inputs always produce identical outputs. | +| **Protocol-based stores** | `EventLog`, `ArtifactStore`, `EpisodicStore`, `FactStore` are `typing.Protocol` interfaces — swap any backend. | +| **MCP + A2A adapters** | First-class support for both emerging agentic standards. | +| **`BuildStats` transparency** | Every context build reports exactly what was kept, dropped, deduplicated, and why. | + +--- + +## Core Concepts + +| Concept | Description | +|---|---| +| `ContextItem` | Atomic event log entry: user turn, agent message, tool call, tool result, fact, plan state. | +| `Phase` | `route` / `call` / `interpret` / `answer` — each with its own token budget. | +| `ContextFirewall` | Intercepts tool results: stores raw bytes out-of-band, injects compact summary (with truncation for large outputs). | +| `ChoiceGraph` | Bounded DAG over the tool catalog. Router beam-searches it; LLM sees only a focused shortlist. | +| `ResultEnvelope` | Structured tool output: summary + extracted facts + artifact handles + views. | +| `BuildStats` | Per-build diagnostics: candidate count, included/dropped counts, token usage, drop reasons. | + +See [`docs/concepts.md`](docs/concepts.md) for the full glossary and +[`docs/architecture.md`](docs/architecture.md) for pipeline detail and design rationale. + +--- + ## CLI contextweaver ships with a CLI for quick experimentation: ```bash -contextweaver demo # end-to-end demonstration -contextweaver init # scaffold config + sample catalog -contextweaver build --catalog c.json --out g.json # build routing graph +contextweaver demo # end-to-end demonstration +contextweaver init # scaffold config + sample catalog +contextweaver build --catalog c.json --out g.json # build routing graph contextweaver route --graph g.json --query "send email" contextweaver print-tree --graph g.json contextweaver ingest --events session.jsonl --out session.json @@ -94,18 +202,11 @@ contextweaver replay --session session.json --phase answer | `mcp_adapter_demo.py` | MCP adapter: tool conversion, session loading, firewall | | `a2a_adapter_demo.py` | A2A adapter: agent cards, multi-agent sessions | -Run all examples: - ```bash -make example +make example # run all examples ``` -## Documentation - -- [Architecture](docs/architecture.md) — package layout, pipeline stages, design principles -- [Concepts](docs/concepts.md) — ContextItem, phases, firewall, ChoiceGraph, etc. -- [MCP Integration](docs/integration_mcp.md) — adapter functions, JSONL format, end-to-end example -- [A2A Integration](docs/integration_a2a.md) — adapter functions, multi-agent sessions +--- ## Development @@ -119,6 +220,23 @@ make demo # run the built-in demo make ci # all of the above ``` +See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions. + +--- + +## Roadmap + +| Milestone | Status | Highlights | +|---|---|---| +| **v0.1 — Foundation** | ✅ complete | Context Engine, Routing Engine, MCP + A2A adapters, CLI, sensitivity enforcement, logging | +| **v0.2 — Integrations** | 🚧 in progress | Framework integration guides (LlamaIndex, OpenAI Agents SDK, Google ADK, LangChain) | +| **v0.3 — Tooling** | 📋 planned | DAG visualization, merge compression, LLM-assisted labeler | +| **Future** | 📋 planned | Context versioning, distributed stores, multi-agent coordination | + +See [CHANGELOG.md](CHANGELOG.md) for the detailed release history. + +--- + ## License Apache-2.0 diff --git a/docs/architecture.md b/docs/architecture.md index a6a2359..c2682d9 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -8,9 +8,10 @@ the "context window problem" for tool-using AI agents. ``` ┌────────────────────────────┐ Events ─────>│ Context Engine │──> ContextPack (prompt) - │ candidates → score → │ - │ dedup → select → firewall │ - │ → prompt │ + │ candidates → closure → │ + │ sensitivity → firewall → │ + │ score → dedup → select → │ + │ render │ └────────────────────────────┘ ▲ facts / episodes ┌──────────┴─────────────────┐ @@ -43,15 +44,15 @@ the "context window problem" for tool-using AI agents. The Context Engine compiles a phase-aware, budget-constrained prompt from the event log. The pipeline has eight stages: -1. **generate_candidates** — pull events from the event log and inject - episodic memory and facts into the candidate pool. +1. **generate_candidates** — pull phase-relevant events from the event log + into the initial candidate pool. 2. **dependency_closure** — if a selected item has a `parent_id`, bring the parent along even if it scored lower. 3. **sensitivity_filter** — drop or redact items whose `sensitivity` level meets or exceeds `ContextPolicy.sensitivity_floor`. -4. **apply_firewall** — large tool results (above threshold) are - summarised; the raw output is stored in the ArtifactStore and replaced - with a compact reference + summary. +4. **apply_firewall** — tool results are stored out-of-band in the + ArtifactStore and replaced with summarized/truncated text for prompt + assembly. 5. **score_candidates** — rank candidates by recency, tag match, kind priority, and token cost. 6. **deduplicate_candidates** — remove near-duplicate items using Jaccard diff --git a/docs/concepts.md b/docs/concepts.md index 9731c12..96d53c2 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -46,13 +46,14 @@ Key fields: `id`, `kind`, `name`, `description`, `tags`, `namespace`, ## Context Firewall -The context firewall prevents large tool outputs from consuming the -entire token budget. When a tool result exceeds the configured threshold -(default 2 000 characters), the firewall: +The context firewall intercepts `tool_result` items before raw output +reaches the prompt. It stores the raw output in the `ArtifactStore`, +replaces the prompt-facing text with a compact summary, and prevents +large tool outputs from consuming the entire token budget. In practice: 1. Stores the raw output in the `ArtifactStore`. 2. Generates a compact summary using the `Summarizer`. -3. Extracts structured facts for the `FactStore`. +3. Extracts structured facts into the `ResultEnvelope`. 4. Replaces the original item text with a summary + artifact reference. ## Result Envelope