docs(readme): add openinfer (Rust) as a DFlash backend by xiaguan · Pull Request #141 · z-lab/dflash

xiaguan · 2026-06-24T16:58:57Z

Hi DFlash team — thanks for the work on DFlash, and for releasing the Qwen3-4B/8B -b16 drafters.

We added native DFlash speculative decoding to openinfer, a from-scratch LLM inference engine in pure Rust + CUDA (no PyTorch). It runs the z-lab/Qwen3-{4B,8B}-DFlash-b16 drafters behind a --dflash-draft-model-path flag; single-stream decode comes out to 1.82× on an RTX 5070 Ti and 1.56× on a 5090.

This PR adds a short openinfer (Rust) entry to the Quick Start, after the existing backends, so DFlash users on NVIDIA GPUs have the option. Happy to adjust the wording or placement however you'd prefer — and thanks again for making DFlash easy to integrate.

openinfer (https://github.com/openinfer-project/openinfer) is a pure Rust + CUDA inference engine with native DFlash support for Qwen3-4B/8B, using the z-lab/Qwen3-{4B,8B}-DFlash-b16 drafters. Adds a short Quick Start entry alongside the existing backends. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

xiaguan force-pushed the docs/openinfer-backend branch from 6fb4cbd to 150648f Compare June 24, 2026 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(readme): add openinfer (Rust) as a DFlash backend#141

docs(readme): add openinfer (Rust) as a DFlash backend#141
xiaguan wants to merge 1 commit into
z-lab:mainfrom
xiaguan:docs/openinfer-backend

xiaguan commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

xiaguan commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xiaguan commented Jun 24, 2026 •

edited

Loading