CLI for building and testing DFlash-style speculative decoding draft models.
-
Updated
Jun 2, 2026 - Python
CLI for building and testing DFlash-style speculative decoding draft models.
multiple tokens, and a verifier filters them using the main model’s confidence. Focuses on speed–accuracy tradeoffs, visualization, and modular design for easy benchmarking and research.
Cross-vocabulary speculative decoding: a CPU-verifiable reference implementation and acceptance-length (tau) measurement harness.
Add a description, image, and links to the draft-model topic page so that developers can more easily learn about it.
To associate your repository with the draft-model topic, visit your repo's landing page and select "manage topics."