Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 157 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,130 @@
# contextweaver

> Dynamic context management for tool-using AI agents.
> Phase-specific, budget-aware context compilation for tool-using AI agents.

contextweaver solves the **context window problem**: as tool catalogs grow and
conversations accumulate history, naive concatenation blows past token limits.
contextweaver provides **phase-specific budgeted context compilation**, a
**context firewall** for large tool outputs, **result envelopes** with
structured fact extraction, and **bounded-choice routing** over large tool
catalogs via DAG + beam search.
**500+ tests passing · zero runtime dependencies · deterministic output · Python ≥ 3.10**

## Features
---

- **Context Engine** — seven-stage pipeline that compiles a phase-aware,
budget-constrained prompt from the event log.
- **Context Firewall** — intercepts large tool outputs, stores raw data
out-of-band, and injects compact summaries.
- **Routing Engine** — navigates catalogs of 100+ tools via a bounded DAG
so the LLM only sees a focused shortlist.
- **Protocol Adapters** — first-class adapters for MCP and A2A protocols.
- **Zero Dependencies** — pure Python ≥ 3.10, stdlib only.
- **Deterministic** — identical inputs always produce identical outputs.
## The Problem

## Installation
Imagine a tool-using agent with a 100-tool catalog and a 50-turn conversation history.
At each step the agent must answer four questions:

1. **Route** — which tool should I call?
2. **Call** — what arguments?
3. **Interpret** — what did it return?
4. **Answer** — how do I respond to the user?

**Naive approach A — concatenate everything:**

```
100 tool schemas (≈50k tokens) + 50 turns (≈30k tokens) = 80k tokens
Token limit: 8k → 10× overflow
```

**Naive approach B — cherry-pick manually:**

```
Pick 10 tools, last 5 turns → lose dependency chains
Agent hallucinates tool calls, repeats questions, forgets context
```

**contextweaver approach — phase-specific budgeted compilation:**

```
Route phase: 5 tool cards (≈500 tokens), no full schemas
Answer phase: 3 relevant turns + dependency closure (≈2k tokens)
Result: 2.5k tokens, complete context, deterministic
```

See [`examples/before_after.py`](examples/before_after.py) for a runnable side-by-side comparison.

---

## How contextweaver Solves It

contextweaver provides two cooperating engines:

```
┌────────────────────────────┐
Events ──────>│ Context Engine │──> ContextPack (prompt)
│ candidates → closure → │
│ sensitivity → firewall → │
│ score → dedup → select → │
│ render │
└────────────────────────────┘
▲ facts / episodes
┌──────────┴─────────────────┐
Tools ───────>│ Routing Engine │──> ChoiceCards
│ Catalog → TreeBuilder → │
│ ChoiceGraph → Router │
└────────────────────────────┘
```

**Context Engine** — eight-stage pipeline:

1. **generate_candidates** — pull phase-relevant events from the log for this request.
2. **dependency_closure** — if a selected item has a `parent_id`, include the parent automatically.
3. **sensitivity_filter** — drop or redact items at or above the configured sensitivity floor.
4. **apply_firewall** — tool results are stored out-of-band; large outputs are summarized/truncated before prompt assembly.
5. **score_candidates** — rank by recency, tag match, kind priority, and token cost.
6. **deduplicate_candidates** — remove near-duplicates using Jaccard similarity.
7. **select_and_pack** — greedily pack highest-scoring items into the phase token budget.
8. **render_context** — assemble final prompt string with `BuildStats` metadata.

**Routing Engine** — four-stage pipeline:

1. **Catalog** — register and manage `SelectableItem` objects.
2. **TreeBuilder** — convert a flat catalog into a bounded `ChoiceGraph` DAG.
3. **Router** — beam-search over the graph; deterministic tie-breaking by ID.
4. **ChoiceCards** — compact, LLM-friendly cards (never includes full schemas).

---

## Quickstart

### Install

```bash
pip install contextweaver
```

Or install from source:
Or from source:

```bash
git clone https://github.com/dgenio/contextweaver.git
cd contextweaver
pip install -e ".[dev]"
```

## Quick start
### Minimal agent loop

```python
from contextweaver.context.manager import ContextManager
from contextweaver.types import ContextItem, ItemKind, Phase

mgr = ContextManager()
mgr.ingest(ContextItem(id="u1", kind=ItemKind.user_turn, text="How many users?"))
mgr.ingest(ContextItem(id="tc1", kind=ItemKind.tool_call, text="db_query('SELECT COUNT(*) FROM users')", parent_id="u1"))
mgr.ingest(ContextItem(id="tr1", kind=ItemKind.tool_result, text="count: 1042", parent_id="tc1"))
mgr.ingest(ContextItem(id="tc1", kind=ItemKind.tool_call,
text="db_query('SELECT COUNT(*) FROM users')", parent_id="u1"))
mgr.ingest(ContextItem(id="tr1", kind=ItemKind.tool_result,
text="count: 1042", parent_id="tc1"))

pack = mgr.build_sync(phase=Phase.answer, query="user count")
print(pack.prompt) # budget-aware compiled context
print(pack.stats) # what was kept, dropped, deduplicated
print(pack.prompt) # budget-aware compiled context
print(pack.stats) # what was kept, dropped, deduplicated
```

## Routing large tool catalogs
### Route a large tool catalog

```python
from contextweaver.routing.catalog import Catalog, load_catalog_json
from contextweaver.routing.tree import TreeBuilder
from contextweaver.routing.router import Router

items = load_catalog_json("catalog.json")
catalog = Catalog()
for item in items:
for item in load_catalog_json("catalog.json"):
catalog.register(item)

graph = TreeBuilder(max_children=10).build(catalog.all())
Expand All @@ -69,14 +133,58 @@ result = router.route("send a reminder email about unpaid invoices")
print(result.candidate_ids)
```

---

## Framework Integrations

| Framework | Guide | Use Case |
|---|---|---|
| MCP | [Guide](docs/integration_mcp.md) | Tool conversion, session loading, firewall |
| A2A | [Guide](docs/integration_a2a.md) | Agent cards, multi-agent sessions |
| LlamaIndex | Guide (coming soon) | RAG + tools with budget control |
| OpenAI Agents SDK | Guide (coming soon) | Function-calling agents with routing |
| Google ADK | Guide (coming soon) | Gemini tool-use with context budgets |
| LangChain / LangGraph | Guide (coming soon) | Chain + graph agents with firewall |

---

## Why Trust contextweaver?

| Proof point | Detail |
|---|---|
| **500+ tests passing** | Context pipeline, routing engine, firewall, adapters, CLI, sensitivity enforcement |
| **Zero runtime dependencies** | Stdlib-only, Python ≥ 3.10. Works with any LLM provider. No vendor lock-in. |
| **Deterministic** | Tie-break by ID, sorted keys. Identical inputs always produce identical outputs. |
| **Protocol-based stores** | `EventLog`, `ArtifactStore`, `EpisodicStore`, `FactStore` are `typing.Protocol` interfaces — swap any backend. |
| **MCP + A2A adapters** | First-class support for both emerging agentic standards. |
| **`BuildStats` transparency** | Every context build reports exactly what was kept, dropped, deduplicated, and why. |

---

## Core Concepts

| Concept | Description |
|---|---|
| `ContextItem` | Atomic event log entry: user turn, agent message, tool call, tool result, fact, plan state. |
| `Phase` | `route` / `call` / `interpret` / `answer` — each with its own token budget. |
| `ContextFirewall` | Intercepts tool results: stores raw bytes out-of-band, injects compact summary (with truncation for large outputs). |
| `ChoiceGraph` | Bounded DAG over the tool catalog. Router beam-searches it; LLM sees only a focused shortlist. |
| `ResultEnvelope` | Structured tool output: summary + extracted facts + artifact handles + views. |
| `BuildStats` | Per-build diagnostics: candidate count, included/dropped counts, token usage, drop reasons. |

See [`docs/concepts.md`](docs/concepts.md) for the full glossary and
[`docs/architecture.md`](docs/architecture.md) for pipeline detail and design rationale.

---

## CLI

contextweaver ships with a CLI for quick experimentation:

```bash
contextweaver demo # end-to-end demonstration
contextweaver init # scaffold config + sample catalog
contextweaver build --catalog c.json --out g.json # build routing graph
contextweaver demo # end-to-end demonstration
contextweaver init # scaffold config + sample catalog
contextweaver build --catalog c.json --out g.json # build routing graph
contextweaver route --graph g.json --query "send email"
contextweaver print-tree --graph g.json
contextweaver ingest --events session.jsonl --out session.json
Expand All @@ -94,18 +202,11 @@ contextweaver replay --session session.json --phase answer
| `mcp_adapter_demo.py` | MCP adapter: tool conversion, session loading, firewall |
| `a2a_adapter_demo.py` | A2A adapter: agent cards, multi-agent sessions |

Run all examples:

```bash
make example
make example # run all examples
```

## Documentation

- [Architecture](docs/architecture.md) — package layout, pipeline stages, design principles
- [Concepts](docs/concepts.md) — ContextItem, phases, firewall, ChoiceGraph, etc.
- [MCP Integration](docs/integration_mcp.md) — adapter functions, JSONL format, end-to-end example
- [A2A Integration](docs/integration_a2a.md) — adapter functions, multi-agent sessions
---

## Development

Expand All @@ -119,6 +220,23 @@ make demo # run the built-in demo
make ci # all of the above
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions.

---

## Roadmap

| Milestone | Status | Highlights |
|---|---|---|
| **v0.1 — Foundation** | ✅ complete | Context Engine, Routing Engine, MCP + A2A adapters, CLI, sensitivity enforcement, logging |
| **v0.2 — Integrations** | 🚧 in progress | Framework integration guides (LlamaIndex, OpenAI Agents SDK, Google ADK, LangChain) |
| **v0.3 — Tooling** | 📋 planned | DAG visualization, merge compression, LLM-assisted labeler |
| **Future** | 📋 planned | Context versioning, distributed stores, multi-agent coordination |

See [CHANGELOG.md](CHANGELOG.md) for the detailed release history.

---

## License

Apache-2.0
17 changes: 9 additions & 8 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@ the "context window problem" for tool-using AI agents.
```
┌────────────────────────────┐
Events ─────>│ Context Engine │──> ContextPack (prompt)
│ candidates → score → │
│ dedup → select → firewall │
│ → prompt │
│ candidates → closure → │
│ sensitivity → firewall → │
│ score → dedup → select → │
│ render │
└────────────────────────────┘
▲ facts / episodes
┌──────────┴─────────────────┐
Expand Down Expand Up @@ -43,15 +44,15 @@ the "context window problem" for tool-using AI agents.
The Context Engine compiles a phase-aware, budget-constrained prompt from
the event log. The pipeline has eight stages:

1. **generate_candidates** — pull events from the event log and inject
episodic memory and facts into the candidate pool.
1. **generate_candidates** — pull phase-relevant events from the event log
into the initial candidate pool.
2. **dependency_closure** — if a selected item has a `parent_id`, bring
the parent along even if it scored lower.
3. **sensitivity_filter** — drop or redact items whose `sensitivity`
level meets or exceeds `ContextPolicy.sensitivity_floor`.
4. **apply_firewall** — large tool results (above threshold) are
summarised; the raw output is stored in the ArtifactStore and replaced
with a compact reference + summary.
4. **apply_firewall** — tool results are stored out-of-band in the
ArtifactStore and replaced with summarized/truncated text for prompt
assembly.
5. **score_candidates** — rank candidates by recency, tag match, kind
priority, and token cost.
6. **deduplicate_candidates** — remove near-duplicate items using Jaccard
Expand Down
9 changes: 5 additions & 4 deletions docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,14 @@ Key fields: `id`, `kind`, `name`, `description`, `tags`, `namespace`,

## Context Firewall

The context firewall prevents large tool outputs from consuming the
entire token budget. When a tool result exceeds the configured threshold
(default 2 000 characters), the firewall:
The context firewall intercepts `tool_result` items before raw output
reaches the prompt. It stores the raw output in the `ArtifactStore`,
replaces the prompt-facing text with a compact summary, and prevents
large tool outputs from consuming the entire token budget. In practice:

1. Stores the raw output in the `ArtifactStore`.
2. Generates a compact summary using the `Summarizer`.
3. Extracts structured facts for the `FactStore`.
3. Extracts structured facts into the `ResultEnvelope`.
4. Replaces the original item text with a summary + artifact reference.

## Result Envelope
Expand Down
Loading