From 53dbd6eb26d369de2be2adfeda76fdec7e65c996 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 20 Mar 2026 05:45:43 +0000 Subject: [PATCH 1/7] Initial plan From af66230c8a6456e62f4a69770d85e6b2d4c3446b Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 20 Mar 2026 05:49:45 +0000 Subject: [PATCH 2/7] =?UTF-8?q?docs:=20rewrite=20README=20with=20compellin?= =?UTF-8?q?g=20problem=20=E2=86=92=20solution=20narrative?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: dgenio <12731907+dgenio@users.noreply.github.com> --- README.md | 196 +++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 157 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index ba8184c..752a867 100644 --- a/README.md +++ b/README.md @@ -1,33 +1,96 @@ # contextweaver -> Dynamic context management for tool-using AI agents. +> Phase-specific, budget-aware context compilation for tool-using AI agents. -contextweaver solves the **context window problem**: as tool catalogs grow and -conversations accumulate history, naive concatenation blows past token limits. -contextweaver provides **phase-specific budgeted context compilation**, a -**context firewall** for large tool outputs, **result envelopes** with -structured fact extraction, and **bounded-choice routing** over large tool -catalogs via DAG + beam search. +**536 tests passing · zero runtime dependencies · deterministic output · Python ≥ 3.10** -## Features +--- -- **Context Engine** — seven-stage pipeline that compiles a phase-aware, - budget-constrained prompt from the event log. -- **Context Firewall** — intercepts large tool outputs, stores raw data - out-of-band, and injects compact summaries. -- **Routing Engine** — navigates catalogs of 100+ tools via a bounded DAG - so the LLM only sees a focused shortlist. -- **Protocol Adapters** — first-class adapters for MCP and A2A protocols. -- **Zero Dependencies** — pure Python ≥ 3.10, stdlib only. -- **Deterministic** — identical inputs always produce identical outputs. +## The Problem -## Installation +Imagine a tool-using agent with a 100-tool catalog and a 50-turn conversation history. +At each step the agent must answer four questions: + +1. **Route** — which tool should I call? +2. **Call** — what arguments? +3. **Interpret** — what did it return? +4. **Answer** — how do I respond to the user? + +**Naive approach A — concatenate everything:** + +``` +100 tool schemas (≈50k tokens) + 50 turns (≈30k tokens) = 80k tokens +Token limit: 8k → 10× overflow +``` + +**Naive approach B — cherry-pick manually:** + +``` +Pick 10 tools, last 5 turns → lose dependency chains +Agent hallucinates tool calls, repeats questions, forgets context +``` + +**contextweaver approach — phase-specific budgeted compilation:** + +``` +Route phase: 5 tool cards (≈500 tokens), no full schemas +Answer phase: 3 relevant turns + dependency closure (≈2k tokens) +Result: 2.5k tokens, complete context, deterministic +``` + +See [`examples/before_after.py`](examples/before_after.py) for a runnable side-by-side comparison. + +--- + +## How contextweaver Solves It + +contextweaver provides two cooperating engines: + +``` + ┌────────────────────────────┐ + Events ──────>│ Context Engine │──> ContextPack (prompt) + │ candidates → closure → │ + │ sensitivity → firewall → │ + │ score → dedup → select → │ + │ render │ + └────────────────────────────┘ + ▲ facts / episodes + ┌──────────┴─────────────────┐ + Tools ───────>│ Routing Engine │──> ChoiceCards + │ Catalog → TreeBuilder → │ + │ ChoiceGraph → Router │ + └────────────────────────────┘ +``` + +**Context Engine** — eight-stage pipeline: + +1. **generate_candidates** — pull events from the log; inject episodic memory and facts. +2. **dependency_closure** — if a selected item has a `parent_id`, include the parent automatically. +3. **sensitivity_filter** — drop or redact items at or above the configured sensitivity floor. +4. **apply_firewall** — large tool outputs are summarised; raw bytes move to `ArtifactStore`. +5. **score_candidates** — rank by recency, tag match, kind priority, and token cost. +6. **deduplicate_candidates** — remove near-duplicates using Jaccard similarity. +7. **select_and_pack** — greedily pack highest-scoring items into the phase token budget. +8. **render_context** — assemble final prompt string with `BuildStats` metadata. + +**Routing Engine** — four-stage pipeline: + +1. **Catalog** — register and manage `SelectableItem` objects. +2. **TreeBuilder** — convert a flat catalog into a bounded `ChoiceGraph` DAG. +3. **Router** — beam-search over the graph; deterministic tie-breaking by ID. +4. **ChoiceCards** — compact, LLM-friendly cards (never includes full schemas). + +--- + +## Quickstart + +### Install ```bash pip install contextweaver ``` -Or install from source: +Or from source: ```bash git clone https://github.com/dgenio/contextweaver.git @@ -35,7 +98,7 @@ cd contextweaver pip install -e ".[dev]" ``` -## Quick start +### Minimal agent loop ```python from contextweaver.context.manager import ContextManager @@ -43,24 +106,25 @@ from contextweaver.types import ContextItem, ItemKind, Phase mgr = ContextManager() mgr.ingest(ContextItem(id="u1", kind=ItemKind.user_turn, text="How many users?")) -mgr.ingest(ContextItem(id="tc1", kind=ItemKind.tool_call, text="db_query('SELECT COUNT(*) FROM users')", parent_id="u1")) -mgr.ingest(ContextItem(id="tr1", kind=ItemKind.tool_result, text="count: 1042", parent_id="tc1")) +mgr.ingest(ContextItem(id="tc1", kind=ItemKind.tool_call, + text="db_query('SELECT COUNT(*) FROM users')", parent_id="u1")) +mgr.ingest(ContextItem(id="tr1", kind=ItemKind.tool_result, + text="count: 1042", parent_id="tc1")) pack = mgr.build_sync(phase=Phase.answer, query="user count") -print(pack.prompt) # budget-aware compiled context -print(pack.stats) # what was kept, dropped, deduplicated +print(pack.prompt) # budget-aware compiled context +print(pack.stats) # what was kept, dropped, deduplicated ``` -## Routing large tool catalogs +### Route a large tool catalog ```python from contextweaver.routing.catalog import Catalog, load_catalog_json from contextweaver.routing.tree import TreeBuilder from contextweaver.routing.router import Router -items = load_catalog_json("catalog.json") catalog = Catalog() -for item in items: +for item in load_catalog_json("catalog.json"): catalog.register(item) graph = TreeBuilder(max_children=10).build(catalog.all()) @@ -69,14 +133,58 @@ result = router.route("send a reminder email about unpaid invoices") print(result.candidate_ids) ``` +--- + +## Framework Integrations + +| Framework | Guide | Use Case | +|---|---|---| +| MCP | [Guide](docs/integration_mcp.md) | Tool conversion, session loading, firewall | +| A2A | [Guide](docs/integration_a2a.md) | Agent cards, multi-agent sessions | +| LlamaIndex | Guide (coming soon) | RAG + tools with budget control | +| OpenAI Agents SDK | Guide (coming soon) | Function-calling agents with routing | +| Google ADK | Guide (coming soon) | Gemini tool-use with context budgets | +| LangChain / LangGraph | Guide (coming soon) | Chain + graph agents with firewall | + +--- + +## Why Trust contextweaver? + +| Proof point | Detail | +|---|---| +| **536 tests passing** | Context pipeline, routing engine, firewall, adapters, CLI, sensitivity enforcement | +| **Zero runtime dependencies** | Stdlib-only, Python ≥ 3.10. Works with any LLM provider. No vendor lock-in. | +| **Deterministic** | Tie-break by ID, sorted keys. Identical inputs always produce identical outputs. | +| **Protocol-based stores** | `EventLog`, `ArtifactStore`, `EpisodicStore`, `FactStore` are `typing.Protocol` interfaces — swap any backend. | +| **MCP + A2A adapters** | First-class support for both emerging agentic standards. | +| **`BuildStats` transparency** | Every context build reports exactly what was kept, dropped, deduplicated, and why. | + +--- + +## Core Concepts + +| Concept | Description | +|---|---| +| `ContextItem` | Atomic event log entry: user turn, agent message, tool call, tool result, fact, plan state. | +| `Phase` | `route` / `call` / `interpret` / `answer` — each with its own token budget. | +| `ContextFirewall` | Intercepts large tool outputs: stores raw bytes out-of-band, injects compact summary. | +| `ChoiceGraph` | Bounded DAG over the tool catalog. Router beam-searches it; LLM sees only a focused shortlist. | +| `ResultEnvelope` | Structured tool output: summary + extracted facts + artifact handles + views. | +| `BuildStats` | Per-build diagnostics: candidate count, included/dropped counts, token usage, drop reasons. | + +See [`docs/concepts.md`](docs/concepts.md) for the full glossary and +[`docs/architecture.md`](docs/architecture.md) for pipeline detail and design rationale. + +--- + ## CLI contextweaver ships with a CLI for quick experimentation: ```bash -contextweaver demo # end-to-end demonstration -contextweaver init # scaffold config + sample catalog -contextweaver build --catalog c.json --out g.json # build routing graph +contextweaver demo # end-to-end demonstration +contextweaver init # scaffold config + sample catalog +contextweaver build --catalog c.json --out g.json # build routing graph contextweaver route --graph g.json --query "send email" contextweaver print-tree --graph g.json contextweaver ingest --events session.jsonl --out session.json @@ -94,18 +202,11 @@ contextweaver replay --session session.json --phase answer | `mcp_adapter_demo.py` | MCP adapter: tool conversion, session loading, firewall | | `a2a_adapter_demo.py` | A2A adapter: agent cards, multi-agent sessions | -Run all examples: - ```bash -make example +make example # run all examples ``` -## Documentation - -- [Architecture](docs/architecture.md) — package layout, pipeline stages, design principles -- [Concepts](docs/concepts.md) — ContextItem, phases, firewall, ChoiceGraph, etc. -- [MCP Integration](docs/integration_mcp.md) — adapter functions, JSONL format, end-to-end example -- [A2A Integration](docs/integration_a2a.md) — adapter functions, multi-agent sessions +--- ## Development @@ -119,6 +220,23 @@ make demo # run the built-in demo make ci # all of the above ``` +See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions. + +--- + +## Roadmap + +| Milestone | Status | Highlights | +|---|---|---| +| **v0.1 — Foundation** | ✅ complete | Context Engine, Routing Engine, MCP + A2A adapters, CLI, sensitivity enforcement, logging | +| **v0.2 — Integrations** | 🚧 in progress | Framework integration guides (LlamaIndex, OpenAI Agents SDK, Google ADK, LangChain) | +| **v0.3 — Tooling** | 📋 planned | DAG visualization, merge compression, LLM-assisted labeler | +| **Future** | 📋 planned | Context versioning, distributed stores, multi-agent coordination | + +See [CHANGELOG.md](CHANGELOG.md) for the detailed release history. + +--- + ## License Apache-2.0 From c896cb7bc4d04e3b68d5c846fbbc967284a85dbc Mon Sep 17 00:00:00 2001 From: Diogo Andre Passagem Santos Date: Fri, 20 Mar 2026 06:32:59 +0000 Subject: [PATCH 3/7] docs: align stage-1 pipeline wording with implementation --- README.md | 2 +- docs/architecture.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 752a867..e5b0c7d 100644 --- a/README.md +++ b/README.md @@ -64,7 +64,7 @@ contextweaver provides two cooperating engines: **Context Engine** — eight-stage pipeline: -1. **generate_candidates** — pull events from the log; inject episodic memory and facts. +1. **generate_candidates** — pull phase-relevant events from the log for this request. 2. **dependency_closure** — if a selected item has a `parent_id`, include the parent automatically. 3. **sensitivity_filter** — drop or redact items at or above the configured sensitivity floor. 4. **apply_firewall** — large tool outputs are summarised; raw bytes move to `ArtifactStore`. diff --git a/docs/architecture.md b/docs/architecture.md index a6a2359..6c924fa 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -43,8 +43,8 @@ the "context window problem" for tool-using AI agents. The Context Engine compiles a phase-aware, budget-constrained prompt from the event log. The pipeline has eight stages: -1. **generate_candidates** — pull events from the event log and inject - episodic memory and facts into the candidate pool. +1. **generate_candidates** — pull phase-relevant events from the event log + into the initial candidate pool. 2. **dependency_closure** — if a selected item has a `parent_id`, bring the parent along even if it scored lower. 3. **sensitivity_filter** — drop or redact items whose `sensitivity` From 599a17f96c0afb87e3bac81ca290003193b1216d Mon Sep 17 00:00:00 2001 From: Diogo Andre Passagem Santos Date: Fri, 20 Mar 2026 06:35:27 +0000 Subject: [PATCH 4/7] docs: align firewall wording with implementation --- README.md | 4 ++-- docs/architecture.md | 6 +++--- docs/concepts.md | 7 ++++--- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index e5b0c7d..5259d30 100644 --- a/README.md +++ b/README.md @@ -67,7 +67,7 @@ contextweaver provides two cooperating engines: 1. **generate_candidates** — pull phase-relevant events from the log for this request. 2. **dependency_closure** — if a selected item has a `parent_id`, include the parent automatically. 3. **sensitivity_filter** — drop or redact items at or above the configured sensitivity floor. -4. **apply_firewall** — large tool outputs are summarised; raw bytes move to `ArtifactStore`. +4. **apply_firewall** — tool results are stored out-of-band; large outputs are summarized/truncated before prompt assembly. 5. **score_candidates** — rank by recency, tag match, kind priority, and token cost. 6. **deduplicate_candidates** — remove near-duplicates using Jaccard similarity. 7. **select_and_pack** — greedily pack highest-scoring items into the phase token budget. @@ -167,7 +167,7 @@ print(result.candidate_ids) |---|---| | `ContextItem` | Atomic event log entry: user turn, agent message, tool call, tool result, fact, plan state. | | `Phase` | `route` / `call` / `interpret` / `answer` — each with its own token budget. | -| `ContextFirewall` | Intercepts large tool outputs: stores raw bytes out-of-band, injects compact summary. | +| `ContextFirewall` | Intercepts tool results: stores raw bytes out-of-band, injects compact summary (with truncation for large outputs). | | `ChoiceGraph` | Bounded DAG over the tool catalog. Router beam-searches it; LLM sees only a focused shortlist. | | `ResultEnvelope` | Structured tool output: summary + extracted facts + artifact handles + views. | | `BuildStats` | Per-build diagnostics: candidate count, included/dropped counts, token usage, drop reasons. | diff --git a/docs/architecture.md b/docs/architecture.md index 6c924fa..d49ee89 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -49,9 +49,9 @@ the event log. The pipeline has eight stages: the parent along even if it scored lower. 3. **sensitivity_filter** — drop or redact items whose `sensitivity` level meets or exceeds `ContextPolicy.sensitivity_floor`. -4. **apply_firewall** — large tool results (above threshold) are - summarised; the raw output is stored in the ArtifactStore and replaced - with a compact reference + summary. +4. **apply_firewall** — tool results are stored out-of-band in the + ArtifactStore and replaced with summarized/truncated text for prompt + assembly. 5. **score_candidates** — rank candidates by recency, tag match, kind priority, and token cost. 6. **deduplicate_candidates** — remove near-duplicate items using Jaccard diff --git a/docs/concepts.md b/docs/concepts.md index 9731c12..e9337c9 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -46,9 +46,10 @@ Key fields: `id`, `kind`, `name`, `description`, `tags`, `namespace`, ## Context Firewall -The context firewall prevents large tool outputs from consuming the -entire token budget. When a tool result exceeds the configured threshold -(default 2 000 characters), the firewall: +The context firewall intercepts `tool_result` items before raw output +reaches the prompt. It stores the raw output in the `ArtifactStore`, +replaces the prompt-facing text with a compact summary, and prevents +large tool outputs from consuming the entire token budget. In practice: 1. Stores the raw output in the `ArtifactStore`. 2. Generates a compact summary using the `Summarizer`. From 0e26d53dd2ab22f632b1c2851749aef281509b4d Mon Sep 17 00:00:00 2001 From: Diogo Andre Passagem Santos Date: Fri, 20 Mar 2026 06:36:54 +0000 Subject: [PATCH 5/7] docs: make README test count resilient --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5259d30..29458cb 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ > Phase-specific, budget-aware context compilation for tool-using AI agents. -**536 tests passing · zero runtime dependencies · deterministic output · Python ≥ 3.10** +**500+ tests passing · zero runtime dependencies · deterministic output · Python ≥ 3.10** --- @@ -152,7 +152,7 @@ print(result.candidate_ids) | Proof point | Detail | |---|---| -| **536 tests passing** | Context pipeline, routing engine, firewall, adapters, CLI, sensitivity enforcement | +| **500+ tests passing** | Context pipeline, routing engine, firewall, adapters, CLI, sensitivity enforcement | | **Zero runtime dependencies** | Stdlib-only, Python ≥ 3.10. Works with any LLM provider. No vendor lock-in. | | **Deterministic** | Tie-break by ID, sorted keys. Identical inputs always produce identical outputs. | | **Protocol-based stores** | `EventLog`, `ArtifactStore`, `EpisodicStore`, `FactStore` are `typing.Protocol` interfaces — swap any backend. | From 2053c96178afd633343d3212c1bd170c063f77fc Mon Sep 17 00:00:00 2001 From: Diogo Andre Passagem Santos Date: Fri, 20 Mar 2026 06:44:31 +0000 Subject: [PATCH 6/7] docs: fix architecture pipeline diagram order --- docs/architecture.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/architecture.md b/docs/architecture.md index d49ee89..c2682d9 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -8,9 +8,10 @@ the "context window problem" for tool-using AI agents. ``` ┌────────────────────────────┐ Events ─────>│ Context Engine │──> ContextPack (prompt) - │ candidates → score → │ - │ dedup → select → firewall │ - │ → prompt │ + │ candidates → closure → │ + │ sensitivity → firewall → │ + │ score → dedup → select → │ + │ render │ └────────────────────────────┘ ▲ facts / episodes ┌──────────┴─────────────────┐ From cd1727dd9431e308dce2c63b0535875a6b7e8a48 Mon Sep 17 00:00:00 2001 From: Diogo Andre Passagem Santos Date: Fri, 20 Mar 2026 06:45:36 +0000 Subject: [PATCH 7/7] docs: clarify firewall fact extraction output --- docs/concepts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/concepts.md b/docs/concepts.md index e9337c9..96d53c2 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -53,7 +53,7 @@ large tool outputs from consuming the entire token budget. In practice: 1. Stores the raw output in the `ArtifactStore`. 2. Generates a compact summary using the `Summarizer`. -3. Extracts structured facts for the `FactStore`. +3. Extracts structured facts into the `ResultEnvelope`. 4. Replaces the original item text with a summary + artifact reference. ## Result Envelope