turboagents

Compression Infrastructure For Agent Runtimes And Retrieval Stacks

turboagents is a single Python package for TurboQuant-style KV-cache and vector compression. It is designed to sit underneath existing AI systems, not replace them. If you already have an agent framework, a local inference stack, or a RAG pipeline, TurboAgents gives you a way to add compression, reranking, and benchmarking without rebuilding the rest of your application.

Start Here

Use the links above as the fastest route through the project. Start with the website if you want the product overview, the public docs if you want the package detail, Getting Started if you want the first commands, Benchmarks if you want the current numbers, and the SuperOptiX integration page if you want the end-to-end application story.

Why It Exists

Most AI stacks do not need another agent framework. They need the memory and retrieval layer underneath their existing agents to stop getting in the way.

turboagents is aimed at that layer:

compress KV-cache payloads so local and server-side inference can hold more context
compress vector payloads so retrieval systems can store and rerank more cheaply
benchmark the quality, latency, and recall tradeoffs explicitly instead of hiding them
integrate with runtimes and vector backends teams already use

At A Glance

Area	Current State
Quant core	Fast Walsh-Hadamard rotation, PolarQuant-style angle/radius stage, seeded QJL-style residual sketch, binary payloads
Engines	MLX wrapper, llama.cpp wrapper, experimental vLLM wrapper/plugin scaffold
Retrieval	Chroma, FAISS, LanceDB, SurrealDB, and pgvector client adapters
Benchmarks	Synthetic CLI, benchmark matrix, MLX sweep, adapter matrix, minimal Needle harness
Packaging	`uv`-first local workflow, docs, CI, release workflow, PyPI package

What You Can Use Today

The package is already useful in three common situations. If you are running local agents, the MLX and llama.cpp wrappers give you a clean way to script and inspect runtime paths. If you are running retrieval, the TurboRAG adapters let you keep Chroma, FAISS, LanceDB, SurrealDB, or pgvector in place while adding a compressed rerank layer. If you are still evaluating fit, the built-in benchmarks give you a narrow and repeatable way to measure payload size, reconstruction quality, and retrieval agreement before you change application code.

The benchmark story is also real, not just conceptual. Chroma and FAISS both held recall@10 = 1.0 on the validated adapter sweep, pgvector reached recall@10 = 0.896875 at 4.0 bits, and the current MLX 3B run showed 3.5 bits as the best quality and throughput tradeoff in that configuration. The long-context story is intentionally narrower: the minimal Needle harness shows early-position retrieval, but not robust mid- or late-position recall.

That is the right way to read this project today. TurboAgents is ready to use as compression infrastructure and benchmark tooling. It is not yet making broad claims about long-context quality or production-native kernels.

Fast Start

Install the package with uv:

uv add turboagents

Install with useful extras:

uv add "turboagents[mlx]"
uv add "turboagents[rag]"
uv add "turboagents[all]"

Try the CLI first:

turboagents doctor
turboagents bench kv --format json
turboagents bench rag --format markdown
turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run

Reference Integration

TurboAgents stays framework-agnostic, but the first full reference integration is now in SuperOptiX.

That matters because the validated story is not limited to package-level tests. It also includes real SuperOptiX retrieval paths using TurboAgents under framework runtimes.

turboagents-chroma is wired into SuperOptiX and covered by focused runtime tests
turboagents-lancedb is validated through the real rag_lancedb_demo flow
turboagents-surrealdb is validated through the real SuperOptiX OpenAI Agents and Pydantic AI demo flows

If you want the end-to-end integration story, start here after installing TurboAgents:

SuperOptiX integration guide: https://superagenticai.github.io/superoptix/guides/turboagents-integration/
SuperOptiX LanceDB demo: https://superagenticai.github.io/superoptix/examples/agents/rag-lancedb-demo/
SuperOptiX SurrealDB frameworks guide: https://superagenticai.github.io/superoptix/examples/agents/surrealdb-frameworks-demo/

What It Is

turboagents is not an agent framework. It is the compression layer you put under existing AI agents, inference engines, and RAG stacks so they can:

hold longer contexts
use less KV-cache memory
store more embeddings at lower cost
benchmark quality and memory tradeoffs explicitly

Think of it as:

TurboQuant for real systems
TurboRAG for vector retrieval stacks
adapters and tooling around existing engines instead of a replacement for them

Who It Is For

turboagents is for teams and developers who already have:

AI agents that hit memory limits on long prompts
RAG systems with large embedding stores
inference stacks built on MLX, llama.cpp, vLLM, Chroma, FAISS, LanceDB, SurrealDB, or pgvector
agent frameworks that need compression infrastructure, not another framework

How To Use It

Most users approach TurboAgents in one of three ways.

1. Add It Under An Existing Agent Runtime

If you already have an agent system, keep the agent layer and use turboagents to improve the inference or memory layer under it.

Examples:

use turboagents.engines.mlx for MLX-based local agents
use turboagents.engines.llamacpp to build llama.cpp runtime commands
use turboagents.engines.vllm as an experimental runtime wrapper

2. Add It Under An Existing RAG Stack

If you already have retrieval, keep your current application logic and add TurboRAG where vectors are stored or searched.

Examples:

use TurboFAISS when you want a local FAISS-backed retrieval path
use TurboChroma when you want Chroma candidate search plus TurboAgents rerank
use TurboLanceDB or TurboSurrealDB when you want a sidecar/rerank integration
use TurboPgvector when your application already depends on PostgreSQL

3. Use It As A Benchmark And Compression Tool

If you are still evaluating whether TurboQuant-style compression makes sense for your stack, use the CLI first:

turboagents doctor
turboagents bench kv
turboagents bench rag
turboagents compress

That gives you a way to validate fit before deeper integration work.

Chroma and Context-1

TurboAgents now includes a Chroma adapter aligned to chromadb 1.5.5.

The right integration model is:

Context-1 handles search policy and context management
TurboAgents handles compressed retrieval and rerank
Chroma retrieves candidates while TurboAgents reranks or compresses the working set under that loop

Benchmarks Snapshot

Latest validated benchmark work:

Surface	Result
Chroma	`recall@1 = 1.0`, `recall@10 = 1.0` across the tested sweep in the local adapter benchmark
MLX sweep	`3.5` bits was the best current quality/performance tradeoff on `mlx-community/Llama-3.2-3B-Instruct-4bit`
FAISS	`recall@1 = 1.0`, `recall@10 = 1.0` across the tested sweep
LanceDB	`recall@10` landed in the `0.70` to `0.75` range on `medium-rag`
pgvector	`recall@10` improved monotonically up to `0.896875` at `4.0` bits
Needle	exact match held for insertion fraction `0.1`, but failed at `0.5` and `0.9`

If you want the full numbers and command paths, see:

Docs Map

For the shortest path through the public docs:

docs/getting-started.md for install and first commands
docs/adapters.md for backend-specific retrieval surfaces
docs/examples.md for runnable local examples
docs/benchmarks.md for validated benchmark numbers
docs/architecture.md for the runtime and retrieval layout

Install And CLI Reference

uv add turboagents

Optional extras:

uv add "turboagents[mlx]"
uv add "turboagents[vllm]"
uv add "turboagents[rag]"
uv add "turboagents[all]"

For local development in this repository:

uv sync
uv sync --extra rag

Core CLI

turboagents doctor
turboagents bench kv --format json
turboagents bench rag --format markdown
turboagents bench paper
turboagents compress --input vectors.npy --output vectors.npz --head-dim 128
turboagents serve --backend mlx --model mlx-community/Qwen3-0.6B-4bit --dry-run
turboagents serve --backend vllm --model meta-llama/Llama-3.1-8B-Instruct --dry-run

Examples

python3 examples/quickstart.py
python3 examples/bench_profiles.py
python3 examples/faiss_turborag.py
python3 examples/chroma_turborag.py
python3 examples/mlx_server_dry_run.py

Development

Common local commands:

PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run python -m pytest -q
uv run mkdocs serve -f mkdocs.local.yml
uv build

Benchmark harness commands:

uv sync --extra rag --extra mlx
uv run python scripts/run_benchmark_matrix.py --output-dir benchmark-results/$(date +%Y%m%d-%H%M%S)
uv run python scripts/benchmark_needle.py --model mlx-community/Llama-3.2-3B-Instruct-4bit --context-tokens 2048 4096 8192 --output benchmark-results/needle-$(date +%Y%m%d-%H%M%S).json

Community and project health files:

Attribution

See ATTRIBUTION.md. This repository is not affiliated with Google Research.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
assets		assets
benchmarks		benchmarks
docs		docs
examples		examples
scripts		scripts
tests		tests
turboagents		turboagents
.gitignore		.gitignore
.python-version		.python-version
ATTRIBUTION.md		ATTRIBUTION.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
mkdocs.local.yml		mkdocs.local.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

turboagents

Start Here

Why It Exists

At A Glance

What You Can Use Today

Fast Start

Reference Integration

What It Is

Who It Is For

How To Use It

1. Add It Under An Existing Agent Runtime

2. Add It Under An Existing RAG Stack

3. Use It As A Benchmark And Compression Tool

Chroma and Context-1

Benchmarks Snapshot

Docs Map

Install And CLI Reference

Core CLI

Examples

Development

Attribution

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

turboagents

Start Here

Why It Exists

At A Glance

What You Can Use Today

Fast Start

Reference Integration

What It Is

Who It Is For

How To Use It

1. Add It Under An Existing Agent Runtime

2. Add It Under An Existing RAG Stack

3. Use It As A Benchmark And Compression Tool

Chroma and Context-1

Benchmarks Snapshot

Docs Map

Install And CLI Reference

Core CLI

Examples

Development

Attribution

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 1

Languages

Packages