dgenio · dgenio · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026
diff --git a/README.md b/README.md
@@ -1,66 +1,130 @@
 # contextweaver
 
-> Dynamic context management for tool-using AI agents.
+> Phase-specific, budget-aware context compilation for tool-using AI agents.
 
-contextweaver solves the **context window problem**: as tool catalogs grow and
-conversations accumulate history, naive concatenation blows past token limits.
-contextweaver provides **phase-specific budgeted context compilation**, a
-**context firewall** for large tool outputs, **result envelopes** with
-structured fact extraction, and **bounded-choice routing** over large tool
-catalogs via DAG + beam search.
+**500+ tests passing · zero runtime dependencies · deterministic output · Python ≥ 3.10**
 
-## Features
+---
 
-- **Context Engine** — seven-stage pipeline that compiles a phase-aware,
-  budget-constrained prompt from the event log.
-- **Context Firewall** — intercepts large tool outputs, stores raw data
-  out-of-band, and injects compact summaries.
-- **Routing Engine** — navigates catalogs of 100+ tools via a bounded DAG
-  so the LLM only sees a focused shortlist.
-- **Protocol Adapters** — first-class adapters for MCP and A2A protocols.
-- **Zero Dependencies** — pure Python ≥ 3.10, stdlib only.
-- **Deterministic** — identical inputs always produce identical outputs.
+## The Problem
 
-## Installation
+Imagine a tool-using agent with a 100-tool catalog and a 50-turn conversation history.
+At each step the agent must answer four questions:
+
+1. **Route** — which tool should I call?
+2. **Call** — what arguments?
+3. **Interpret** — what did it return?
+4. **Answer** — how do I respond to the user?
+
+**Naive approach A — concatenate everything:**
+
+```
+100 tool schemas (≈50k tokens) + 50 turns (≈30k tokens) = 80k tokens
+Token limit: 8k → 10× overflow
+```
+
+**Naive approach B — cherry-pick manually:**
+
+```
+Pick 10 tools, last 5 turns → lose dependency chains
+Agent hallucinates tool calls, repeats questions, forgets context
+```
+
+**contextweaver approach — phase-specific budgeted compilation:**
+
+```
+Route phase:  5 tool cards (≈500 tokens), no full schemas
+Answer phase: 3 relevant turns + dependency closure (≈2k tokens)
+Result:       2.5k tokens, complete context, deterministic
+```
+
+See [`examples/before_after.py`](examples/before_after.py) for a runnable side-by-side comparison.
+
+---
+
+## How contextweaver Solves It
+
+contextweaver provides two cooperating engines:
+
+```
+                ┌────────────────────────────┐
+  Events ──────>│      Context Engine         │──> ContextPack (prompt)
+                │  candidates → closure →     │
+                │  sensitivity → firewall →   │
+                │  score → dedup → select →   │
+                │  render                     │
+                └────────────────────────────┘
+                           ▲ facts / episodes
+                ┌──────────┴─────────────────┐
+  Tools ───────>│      Routing Engine         │──> ChoiceCards
+                │  Catalog → TreeBuilder →    │
+                │  ChoiceGraph → Router       │
+                └────────────────────────────┘
+```
+
+**Context Engine** — eight-stage pipeline:
+
+1. **generate_candidates** — pull phase-relevant events from the log for this request.
+2. **dependency_closure** — if a selected item has a `parent_id`, include the parent automatically.
+3. **sensitivity_filter** — drop or redact items at or above the configured sensitivity floor.
+4. **apply_firewall** — tool results are stored out-of-band; large outputs are summarized/truncated before prompt assembly.
+5. **score_candidates** — rank by recency, tag match, kind priority, and token cost.
+6. **deduplicate_candidates** — remove near-duplicates using Jaccard similarity.
+7. **select_and_pack** — greedily pack highest-scoring items into the phase token budget.
+8. **render_context** — assemble final prompt string with `BuildStats` metadata.
+
+**Routing Engine** — four-stage pipeline:
+
+1. **Catalog** — register and manage `SelectableItem` objects.
+2. **TreeBuilder** — convert a flat catalog into a bounded `ChoiceGraph` DAG.
+3. **Router** — beam-search over the graph; deterministic tie-breaking by ID.
+4. **ChoiceCards** — compact, LLM-friendly cards (never includes full schemas).
+
+---
+
+## Quickstart
+
+### Install
 
 ```bash
 pip install contextweaver
 ```
 
-Or install from source:
+Or from source:
 
 ```bash
 git clone https://github.com/dgenio/contextweaver.git
 cd contextweaver
 pip install -e ".[dev]"
 ```
 
-## Quick start
+### Minimal agent loop
 
 ```python
 from contextweaver.context.manager import ContextManager
 from contextweaver.types import ContextItem, ItemKind, Phase
 
 mgr = ContextManager()
 mgr.ingest(ContextItem(id="u1", kind=ItemKind.user_turn, text="How many users?"))
-mgr.ingest(ContextItem(id="tc1", kind=ItemKind.tool_call, text="db_query('SELECT COUNT(*) FROM users')", parent_id="u1"))
-mgr.ingest(ContextItem(id="tr1", kind=ItemKind.tool_result, text="count: 1042", parent_id="tc1"))
+mgr.ingest(ContextItem(id="tc1", kind=ItemKind.tool_call,
+                       text="db_query('SELECT COUNT(*) FROM users')", parent_id="u1"))
+mgr.ingest(ContextItem(id="tr1", kind=ItemKind.tool_result,
+                       text="count: 1042", parent_id="tc1"))
 
 pack = mgr.build_sync(phase=Phase.answer, query="user count")
-print(pack.prompt)       # budget-aware compiled context
-print(pack.stats)        # what was kept, dropped, deduplicated
+print(pack.prompt)   # budget-aware compiled context
+print(pack.stats)    # what was kept, dropped, deduplicated
 ```
 
-## Routing large tool catalogs
+### Route a large tool catalog
 
 ```python
 from contextweaver.routing.catalog import Catalog, load_catalog_json
 from contextweaver.routing.tree import TreeBuilder
 from contextweaver.routing.router import Router
 
-items = load_catalog_json("catalog.json")
 catalog = Catalog()
-for item in items:
+for item in load_catalog_json("catalog.json"):
     catalog.register(item)
 
 graph = TreeBuilder(max_children=10).build(catalog.all())
@@ -69,14 +133,58 @@ result = router.route("send a reminder email about unpaid invoices")
 print(result.candidate_ids)
 ```
 
+---
+
+## Framework Integrations
+
+| Framework | Guide | Use Case |
+|---|---|---|
+| MCP | [Guide](docs/integration_mcp.md) | Tool conversion, session loading, firewall |
+| A2A | [Guide](docs/integration_a2a.md) | Agent cards, multi-agent sessions |
+| LlamaIndex | Guide (coming soon) | RAG + tools with budget control |
+| OpenAI Agents SDK | Guide (coming soon) | Function-calling agents with routing |
+| Google ADK | Guide (coming soon) | Gemini tool-use with context budgets |
+| LangChain / LangGraph | Guide (coming soon) | Chain + graph agents with firewall |
+
+---
+
+## Why Trust contextweaver?
+
+| Proof point | Detail |
+|---|---|
+| **500+ tests passing** | Context pipeline, routing engine, firewall, adapters, CLI, sensitivity enforcement |
+| **Zero runtime dependencies** | Stdlib-only, Python ≥ 3.10. Works with any LLM provider. No vendor lock-in. |
+| **Deterministic** | Tie-break by ID, sorted keys. Identical inputs always produce identical outputs. |
+| **Protocol-based stores** | `EventLog`, `ArtifactStore`, `EpisodicStore`, `FactStore` are `typing.Protocol` interfaces — swap any backend. |
+| **MCP + A2A adapters** | First-class support for both emerging agentic standards. |
+| **`BuildStats` transparency** | Every context build reports exactly what was kept, dropped, deduplicated, and why. |
+
+---
+
+## Core Concepts
+
+| Concept | Description |
+|---|---|
+| `ContextItem` | Atomic event log entry: user turn, agent message, tool call, tool result, fact, plan state. |
+| `Phase` | `route` / `call` / `interpret` / `answer` — each with its own token budget. |
+| `ContextFirewall` | Intercepts tool results: stores raw bytes out-of-band, injects compact summary (with truncation for large outputs). |
+| `ChoiceGraph` | Bounded DAG over the tool catalog. Router beam-searches it; LLM sees only a focused shortlist. |
+| `ResultEnvelope` | Structured tool output: summary + extracted facts + artifact handles + views. |
+| `BuildStats` | Per-build diagnostics: candidate count, included/dropped counts, token usage, drop reasons. |
+
+See [`docs/concepts.md`](docs/concepts.md) for the full glossary and
+[`docs/architecture.md`](docs/architecture.md) for pipeline detail and design rationale.
+
+---
+
 ## CLI
 
 contextweaver ships with a CLI for quick experimentation:
 
 ```bash
-contextweaver demo                          # end-to-end demonstration
-contextweaver init                          # scaffold config + sample catalog
-contextweaver build --catalog c.json --out g.json  # build routing graph
+contextweaver demo                                    # end-to-end demonstration
+contextweaver init                                    # scaffold config + sample catalog
+contextweaver build --catalog c.json --out g.json    # build routing graph
 contextweaver route --graph g.json --query "send email"
 contextweaver print-tree --graph g.json
 contextweaver ingest --events session.jsonl --out session.json
@@ -94,18 +202,11 @@ contextweaver replay --session session.json --phase answer
 | `mcp_adapter_demo.py` | MCP adapter: tool conversion, session loading, firewall |
 | `a2a_adapter_demo.py` | A2A adapter: agent cards, multi-agent sessions |
 
-Run all examples:
-
 ```bash
-make example
+make example   # run all examples
 ```
 
-## Documentation
-
-- [Architecture](docs/architecture.md) — package layout, pipeline stages, design principles
-- [Concepts](docs/concepts.md) — ContextItem, phases, firewall, ChoiceGraph, etc.
-- [MCP Integration](docs/integration_mcp.md) — adapter functions, JSONL format, end-to-end example
-- [A2A Integration](docs/integration_a2a.md) — adapter functions, multi-agent sessions
+---
 
 ## Development
 
@@ -119,6 +220,23 @@ make demo     # run the built-in demo
 make ci       # all of the above
 ```
 
+See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions.
+
+---
+
+## Roadmap
+
+| Milestone | Status | Highlights |
+|---|---|---|
+| **v0.1 — Foundation** | ✅ complete | Context Engine, Routing Engine, MCP + A2A adapters, CLI, sensitivity enforcement, logging |
+| **v0.2 — Integrations** | 🚧 in progress | Framework integration guides (LlamaIndex, OpenAI Agents SDK, Google ADK, LangChain) |
+| **v0.3 — Tooling** | 📋 planned | DAG visualization, merge compression, LLM-assisted labeler |
+| **Future** | 📋 planned | Context versioning, distributed stores, multi-agent coordination |
+
+See [CHANGELOG.md](CHANGELOG.md) for the detailed release history.
+
+---
+
 ## License
 
 Apache-2.0
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -8,9 +8,10 @@ the "context window problem" for tool-using AI agents.
 ```
                ┌────────────────────────────┐
   Events ─────>│      Context Engine         │──> ContextPack (prompt)
-               │  candidates → score →       │
-               │  dedup → select → firewall  │
-               │  → prompt                   │
+               │  candidates → closure →     │
+               │  sensitivity → firewall →   │
+               │  score → dedup → select →   │
+               │  render                     │
                └────────────────────────────┘
                           ▲ facts / episodes
                ┌──────────┴─────────────────┐
@@ -43,15 +44,15 @@ the "context window problem" for tool-using AI agents.
 The Context Engine compiles a phase-aware, budget-constrained prompt from
 the event log. The pipeline has eight stages:
 
-1. **generate_candidates** — pull events from the event log and inject
-   episodic memory and facts into the candidate pool.
+1. **generate_candidates** — pull phase-relevant events from the event log
+   into the initial candidate pool.
 2. **dependency_closure** — if a selected item has a `parent_id`, bring
    the parent along even if it scored lower.
 3. **sensitivity_filter** — drop or redact items whose `sensitivity`
    level meets or exceeds `ContextPolicy.sensitivity_floor`.
-4. **apply_firewall** — large tool results (above threshold) are
-   summarised; the raw output is stored in the ArtifactStore and replaced
-   with a compact reference + summary.
+4. **apply_firewall** — tool results are stored out-of-band in the
+   ArtifactStore and replaced with summarized/truncated text for prompt
+   assembly.
 5. **score_candidates** — rank candidates by recency, tag match, kind
    priority, and token cost.
 6. **deduplicate_candidates** — remove near-duplicate items using Jaccard

diff --git a/docs/concepts.md b/docs/concepts.md
@@ -46,13 +46,14 @@ Key fields: `id`, `kind`, `name`, `description`, `tags`, `namespace`,
 
 ## Context Firewall
 
-The context firewall prevents large tool outputs from consuming the
-entire token budget. When a tool result exceeds the configured threshold
-(default 2 000 characters), the firewall:
+The context firewall intercepts `tool_result` items before raw output
+reaches the prompt. It stores the raw output in the `ArtifactStore`,
+replaces the prompt-facing text with a compact summary, and prevents
+large tool outputs from consuming the entire token budget. In practice:
 
 1. Stores the raw output in the `ArtifactStore`.
 2. Generates a compact summary using the `Summarizer`.
-3. Extracts structured facts for the `FactStore`.
+3. Extracts structured facts into the `ResultEnvelope`.
 4. Replaces the original item text with a summary + artifact reference.
 
 ## Result Envelope