From dbdb2492098bb9c502f281843e718d3bef56920f Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto <mudler@localai.io>
Date: Sat, 6 Jun 2026 10:33:15 +0000
Subject: [PATCH] docs: add a supported-models table with links

Lists all 11 published models (the 10 Parakeet checkpoints plus the new
multilingual streaming nemotron-3.5-asr-streaming-0.6b) with their type, size,
notes, and a link to each NVIDIA source, plus a pointer to the GGUF collection
repo and docs/parity.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 README.md | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/README.md b/README.md
index 027695e..a7cd0f3 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,26 @@ It also runs circles around whisper.cpp on the same audio: the 110M Parakeet is
 
 ---
 
+## Supported models
+
+Every model below is validated at WER 0 against NeMo and published as GGUF (f16, q8_0, q6_k, q5_k, q4_k) in the single collection repo [mudler/parakeet-cpp-gguf](https://huggingface.co/mudler/parakeet-cpp-gguf). Convert any of them yourself with `scripts/convert_parakeet_to_gguf.py`. The per-model parity matrix is in [docs/parity.md](docs/parity.md).
+
+| Model | Type | Size | Notes | Source |
+| ----- | ---- | ---- | ----- | ------ |
+| [parakeet-tdt_ctc-110m](https://huggingface.co/nvidia/parakeet-tdt_ctc-110m) | hybrid TDT+CTC | 110M | English, the small anchor checkpoint | NVIDIA |
+| [parakeet-ctc-0.6b](https://huggingface.co/nvidia/parakeet-ctc-0.6b) | CTC | 0.6B | English | NVIDIA |
+| [parakeet-rnnt-0.6b](https://huggingface.co/nvidia/parakeet-rnnt-0.6b) | RNNT | 0.6B | English | NVIDIA |
+| [parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) | TDT | 0.6B | English | NVIDIA |
+| [parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) | TDT | 0.6B | multilingual (25 European languages) | NVIDIA |
+| [parakeet-ctc-1.1b](https://huggingface.co/nvidia/parakeet-ctc-1.1b) | CTC | 1.1B | English | NVIDIA |
+| [parakeet-rnnt-1.1b](https://huggingface.co/nvidia/parakeet-rnnt-1.1b) | RNNT | 1.1B | English | NVIDIA |
+| [parakeet-tdt-1.1b](https://huggingface.co/nvidia/parakeet-tdt-1.1b) | TDT | 1.1B | English | NVIDIA |
+| [parakeet-tdt_ctc-1.1b](https://huggingface.co/nvidia/parakeet-tdt_ctc-1.1b) | hybrid TDT+CTC | 1.1B | English | NVIDIA |
+| [parakeet_realtime_eou_120m-v1](https://huggingface.co/nvidia/parakeet_realtime_eou_120m-v1) | RNNT, streaming | 120M | cache-aware streaming with end-of-utterance detection (`--stream`) | NVIDIA |
+| [nemotron-3.5-asr-streaming-0.6b](https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b) | RNNT, streaming | 0.6B | multilingual (40+ locales), prompt-conditioned, offline and cache-aware streaming, pick a language with `--lang` (default `auto`). OpenMDW-1.1 | NVIDIA |
+
+---
+
 ## Performance
 
 parakeet.cpp is faster than NeMo's PyTorch runtime on every Parakeet model, on both CPU and GPU, and the transcripts come out byte-identical (WER 0 vs NeMo). Full methodology, all 10 models, quantization tradeoffs, and plots are in [`benchmarks/BENCHMARK.md`](benchmarks/BENCHMARK.md).