Plexus

Pool the VRAM of several consumer GPUs to run an LLM that none of them could run alone.

Plexus is a distributed LLM inference mesh written in Rust. The goal is to let a handful of heterogeneous GPUs — different vendors, different VRAM sizes — act as a single, larger pool, so you can run a model that wouldn't fit on any one of them.

⚠️ Plexus is in early development. It does not run real distributed inference yet. The pieces that exist today are a single-machine walking skeleton and a CPU-only model forward pass. See Project Status before evaluating it. The roadmap below describes the intended system, most of which is not built.

What is Plexus?

A single 8 GB GPU can't load a 24 GB model. Plexus aims to make three 8 GB GPUs cooperate as one ~24 GB pool so the model can run. Memory-bound inference is split across machines (tensor / pipeline parallelism), each holding part of the weights and KV cache.

The longer-term vision is a vendor-neutral pool that mixes NVIDIA, Apple Silicon, AMD, Intel, and CPU nodes, with optional verifiability so you can trade latency for trust on a per-request basis. None of that cross-vendor / verifiable functionality exists yet — it is the direction, not the current state.

Who it's for: homelab and small-lab operators who already own a few budget GPUs and want to run larger models than a single card allows.

Project Status

Plexus is pre-alpha. Roughly 5.8k lines of Rust across 10 crates, with ~2.1k lines of tests. Here is an honest breakdown of what is real versus designed:

Area	State	Notes
Workspace, crates, CI, license/policy docs	✅ Done	10-crate Cargo workspace, lefthook gates, GitLab CI (shadow)
HTTP gateway with OpenAI-shaped `/v1/chat/completions`	✅ Works	Request/response shape + `X-Plexus-Trust` header parsing
Worker gRPC service + ring forward	⚠️ Placeholder	Returns a BLAKE3 hash of the input, not a real generation
Single-machine "walking skeleton" on Kubernetes	✅ Works	Verified on a local `kind` cluster (gateway + 3 worker pods)
Layer-by-layer partitioning (even split across N workers)	✅ Works	Pure arithmetic, unit-tested
Llama 3.1 8B CPU forward pass	⚠️ Partial	Real attention/RoPE/RMSNorm/SwiGLU ops, deterministic, tested on a tiny config; uses a placeholder tokenizer (`cl100k_base`, not the real Llama vocab)
GPU kernels (CUDA/Metal/ROCm/SYCL)	❌ Not started	`plexus-kernel` is an enum + TODO; the GPU backend stub `panic!`s
Heterogeneous multi-GPU pooling	❌ Design only	The core value proposition — not implemented
Multi-node LAN cluster, public swarm, TEE attestation	❌ Design only	Plans exist under `docs/superpowers/plans/`
Anthropic / Ollama / MCP API surfaces	❌ Not started	Only the OpenAI shape exists today

In short: today Plexus can stand up a gateway and worker pods that pass tensors around and echo a placeholder, and it can run a small Llama forward pass on CPU. It cannot yet run a real model split across real GPUs. If you're looking for working distributed inference right now, see exo or petals.

Why Plexus (design goals)

Goal	Intent
Heterogeneous pooling	Treat mixed-vendor GPUs (NVIDIA / Apple / AMD / Intel / CPU) as one logical pool
Native Rust runtime	Implement the inference stack directly (no Candle / vLLM / llama.cpp dependency)
Optional verifiability	Per-request trust levels (`fast` / `verified` / `attested`) — design stage
Permissive open source	MIT licensed; no token, NFT, or paywalled "enterprise edition"

Explicitly not goals

❌ Yet another router that wraps existing inference engines — Plexus implements its own.
❌ Crypto token / NFT / payment mechanism.
❌ Single-vendor lock-in.
❌ A Python inference path (Rust only).

Architecture

Plexus is organized as five layers, from GPU kernels up to the API gateway:

┌─────────────────────────────────────────────────────┐
│  Layer 4 — API & Gateway (OpenAI today; more planned)│
├─────────────────────────────────────────────────────┤
│  Layer 3 — Native inference runtime (Llama, CPU)     │
├─────────────────────────────────────────────────────┤
│  Layer 2 — Compute graph & scheduler (TP/PP) [design]│
├─────────────────────────────────────────────────────┤
│  Layer 1 — Distributed tensor & collective ops       │
├─────────────────────────────────────────────────────┤
│  Layer 0 — Kernel backend (CUDA/Metal/...) [planned] │
└─────────────────────────────────────────────────────┘

Full design (intended system, ~15 sections): docs/architecture/2026-05-22-plexus-design.md.

Workspace layout

plexus/
├── crates/
│   ├── plexus-core/        # error types, shared primitives
│   ├── plexus-tensor/      # tensor + device abstraction
│   ├── plexus-graph/       # shard / partition logic
│   ├── plexus-runtime/     # native model code (Llama CPU forward)
│   ├── plexus-gateway/     # OpenAI-shaped HTTP API
│   ├── plexus-worker/      # gRPC worker (placeholder forward)
│   ├── plexus-kernel/      # GPU backend enum (kernels: planned)
│   ├── plexus-verifier/    # verification primitives (planned)
│   ├── plexus-telemetry/   # metrics scaffolding
│   └── plexus-cli/         # `plexus` binary (serve / worker)
├── proto/                  # gRPC protobuf (inference.proto)
├── deploy/                 # Dockerfile + Kubernetes manifests
├── docs/                   # architecture, ADRs, operations, plans
└── tests/                  # integration / determinism / conformance / perf

Building

Requires the Rust toolchain pinned in rust-toolchain.toml (1.95.0); rustup fetches it automatically.

# build everything
cargo build --workspace

# run the test suite (unit + integration + e2e walking skeleton)
cargo test --workspace

# lint (Plexus uses a strict clippy gate — see CLAUDE.md / STYLE.md)
cargo clippy --workspace --all-targets

# run the gateway locally (binds to 127.0.0.1 by default)
cargo run -p plexus-cli -- serve --port 8080 --model test

# health check
curl localhost:8080/health        # -> {"status":"ok","version":"0.1.0"}

The CPU backend works without any GPU, so the current code can be built and tested on any machine.

The curl … install.sh one-liner and the heterogeneous-pool / swarm CLI flags described in the design doc are not available yet; they describe the target UX.

Roadmap

Phase-based, no fixed dates. [x] done, [~] partial, [ ] not started.

Phase	State	Scope
0 — Foundation	`[~]`	Scaffold, license/policy layer, single-machine walking skeleton
1 — Single GPU + real model	`[~]`	CPU Llama forward done; GPU backends + real tokenizer pending
2 — Heterogeneous pool ⭐	`[ ]`	Mixed-vendor GPUs as one pool (the core wedge)
3 — LAN multi-node cluster	`[ ]`	libp2p + pipeline / cross-node tensor parallel
4 — Public swarm + verifiability	`[ ]`	Spot-check / dispute
5 — TEE attestation	`[ ]`	Confidential-compute backends
6 — API + multimodal	`[ ]`	Additional API surfaces, more model families
7 — v1.0 launch	`[ ]`	Security audit, broader worker testing

Detailed phase plans live under docs/superpowers/plans/ and docs/architecture/2026-05-22-plexus-design.md §10.

Contributing

Early-stage projects benefit most from focused contributions. See CONTRIBUTING.md for the full process. In short:

DCO sign-off on every commit (git commit -s)
Conventional Commits for messages
Open PRs against the dev branch
Include the output of cargo test --workspace / cargo clippy --workspace in the PR

Please also read the Code of Conduct.

Prior art & inspiration

Plexus is independent work, but it draws on ideas from:

exo-explore/exo — LAN clusters + tensor parallelism
bigscience-workshop/petals — global swarm + block partitioning
Folding@home — non-monetary contribution model

License

Code: MIT
Docs: CC-BY-SA 4.0

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
crates		crates
deploy		deploy
docs		docs
proto/plexus/v1		proto/plexus/v1
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
AUTHORS.md		AUTHORS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DCO.md		DCO.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
PATENTS.md		PATENTS.md
PRIVACY.md		PRIVACY.md
README.md		README.md
SECURITY.md		SECURITY.md
STYLE.md		STYLE.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
TRADEMARK.md		TRADEMARK.md
lefthook.yml		lefthook.yml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plexus

What is Plexus?

Project Status

Why Plexus (design goals)

Explicitly not goals

Architecture

Workspace layout

Building

Roadmap

Contributing

Prior art & inspiration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Plexus

What is Plexus?

Project Status

Why Plexus (design goals)

Explicitly not goals

Architecture

Workspace layout

Building

Roadmap

Contributing

Prior art & inspiration

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages