Local Voice Quality Enhancement — a compact neural model for joint acoustic echo cancellation (AEC), noise suppression, and dereverberation of 16 kHz speech, designed to run on commodity CPUs in real time.
- Two sizes — choose by CPU budget:
- v1.3 (current) — 4.8 M parameters (~19 MB F32), ~3.3 ms per 16 ms frame on Zen4 (4 threads), ≈4.7× realtime.
- v1.2 — 1.3 M parameters (~5 MB F32), ~1.6 ms per 16 ms frame on Zen4 (4 threads), ≈9.7× realtime.
- Causal, streaming: 256-sample hop, 16 ms algorithmic latency
- F32 reference inference in C++ via GGML; PyTorch reference included for verification and research
Try it: https://huggingface.co/spaces/LocalAI-io/LocalVQE-demo.
LocalVQE is a derivative of DeepVQE (Indenbom et al., Interspeech 2023) — smaller, GGML-native, and tuned for streaming CPU inference.
Picture a video call from a laptop. Your microphone picks up three things alongside your voice:
- The remote participant's voice, played back through your speakers and caught again by your mic — this is the echo. Without cancellation they hear themselves a fraction of a second later.
- Your own voice bouncing off walls, desk, and monitor before reaching the mic — this is reverberation, the "tunnel" or "bathroom" sound that makes you feel far away from the listener.
- A fan, keyboard clatter, a dog barking, or traffic outside — plain background noise.
LocalVQE removes all three in a single causal pass, frame by frame, on the CPU, so only your voice reaches the far end.
Hand-tuned DSP pipelines (NLMS/AP/Kalman AEC, Wiener/spectral-subtraction NS, MCRA noise tracking, RLS dereverb) can run in tens of microseconds per frame and remain a strong baseline when the acoustic path is benign. LocalVQE is interesting when you want:
- Robustness to non-linear echo paths (small loudspeakers, handheld devices, plastic laptop chassis) where linear AEC leaves residual echo.
- Non-stationary noise suppression (babble, keyboards, fans changing speed) that energy-based noise estimators struggle with.
- One model, many conditions — no per-device tuning of step sizes, forgetting factors, or VAD thresholds.
- A single deterministic causal pass — no double-talk detector, no adaptation state that can diverge.
The trade-off is CPU: a classical stack might cost ~0.1 ms/frame, LocalVQE ~1–2 ms/frame. On anything larger than a microcontroller that's still a small fraction of a real-time budget.
Microsoft never released DeepVQE — no weights, no reference implementation, no streaming runtime. We re-implemented it from the paper as a GGML graph at richiejp/deepvqe-ggml (the full-width ~7.5 M-parameter version) before starting LocalVQE. LocalVQE is the same idea rebuilt for streaming CPU inference, and published in two sizes: a 1.3 M-parameter compact build (v1.2, ~5 MB F32) for tight CPU budgets, and a 4.8 M-parameter wider build (v1.3, ~19 MB F32) that filters noise better on some clips at ~2× the per-hop cost. Both are small enough to run real time on commodity CPUs.
Pre-trained weights are published on Hugging Face at LocalAI-io/LocalVQE:
| File | Description |
|---|---|
localvqe-v1.3-4.8M-f32.gguf |
F32 GGUF — what the C++ engine loads (current default). |
localvqe-v1.3-4.8M.pt |
PyTorch checkpoint — for verification, ablation, and downstream research. |
localvqe-v1.2-1.3M-f32.gguf |
Compact alternative — same architecture family, ~1/4 the cost per hop. |
localvqe-v1.2-1.3M.pt |
PyTorch checkpoint for the compact variant. |
localvqe-v1.1-1.3M-f32.gguf |
Older release. |
localvqe-v1-1.3M-f32.gguf |
Original release. |
The current release is v1.3. It widens the encoder/decoder
(mic channels [2,112,32,104,96,152], far [2,64,32], bottleneck
256) and trains from scratch under a noise-floor-aware loss recipe.
On doubletalk it filters noise better than v1.2 (deg MOS +0.25 on
the stratified dev sample, with stronger ERLE). On far-end-only
echo it cancels harder but the residual rates rougher in AECMOS —
some users will prefer v1.2's gentler trade-off on FE-ST scenes.
v1.2 stays on the repo as the small/fast option (~1/4 the per-hop
cost). Both reuse v1.2's 1024 ms echo-search window.
Per-hop, 16 kHz / 256-sample hop → 16 ms budget. Each hop is a full
ggml_backend_graph_compute. Run any of these locally with the
bench-run cmake target — see Benchmark below. 30
iters × 625 hops/iter = 18 750 hops per row.
| Hardware | Backend | Threads | Hop p50 | Hop p99 | RT factor |
|---|---|---|---|---|---|
| Ryzen 9 7900 (Zen4 desktop) | CPU | 1 | 9.73 ms | 14.48 ms | 1.58× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 2 | 5.41 ms | 5.62 ms | 2.95× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 4 | 3.21 ms | 3.42 ms | 4.97× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 8 | 3.47 ms | 3.80 ms | 4.59× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 16 | 3.79 ms | 4.06 ms | 4.19× |
| Ryzen 9 7900 + RADV iGPU (Raphael) | Vulkan | — | 8.71 ms | 9.15 ms | 1.83× |
| Ryzen 9 7900 + RTX 5070 Ti (dGPU) | Vulkan | — | 2.57 ms | 4.21 ms | 6.07× |
The wider model is ~2× the per-hop cost of v1.2 in matching configurations — the dGPU (RTX 5070 Ti) ends up the fastest option for v1.3 by ~1.25× vs 4-thread CPU. The 1-thread case is the worst, still real-time (RT 1.58×) but with little margin; running v1.3 on a low-core / power-constrained device should use v1.2 instead. Re-runs on other CPUs (Apple M4, Alder Lake, mobile Zen3+) will be published as we collect them — until then the v1.2 sweep below is representative shape-wise and expects roughly the same ~2× multiplier.
| Hardware | Backend | Threads | Hop p50 | Hop p99 | Hop max | RT factor |
|---|---|---|---|---|---|---|
| Ryzen 9 7900 (Zen4 desktop) | CPU | 1 | 4.28 ms | 4.85 ms | 6.23 ms | 3.72× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 2 | 2.59 ms | 3.80 ms | 3.81 ms | 6.09× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 4 | 1.65 ms | 2.91 ms | 4.57 ms | 8.90× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 8 | 1.93 ms | 2.41 ms | 6.91 ms ‡ | 8.22× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 16 | 2.09 ms | 2.22 ms | 6.43 ms ‡ | 7.69× |
| Ryzen 9 7900 + RADV iGPU (Raphael) | Vulkan | — | 6.10 ms | 6.53 ms | 6.24 ms | 2.61× |
| Ryzen 9 7900 + RTX 5070 Ti (dGPU) | Vulkan | — | 1.96 ms | 3.64 ms | 5.42 ms | 7.85× |
| Ryzen 7 6800U (Zen3+ laptop) | CPU | 1 | 4.69 ms | 6.08 ms | 19.31 ms ‡ | 3.37× |
| Ryzen 7 6800U (Zen3+ laptop) | CPU | 4 | 2.11 ms | 2.77 ms | 4.90 ms | 7.44× |
| Ryzen 7 6800U (Zen3+ laptop) | CPU | 8 | 1.94 ms | 2.60 ms | 5.52 ms | 7.94× |
| Ryzen 7 6800U + RADV iGPU (Rembrandt) | Vulkan | — | 9.84 ms | 14.75 ms | 20.87 ms ‡ | 1.53× |
The wider echo-search window v1.2 introduced (1024 ms vs v1.1's 512 ms) costs ~20–25 % per-hop on CPU vs v1.1.
| Hardware | Backend | Threads | Hop p50 | Hop p99 | Hop max | RT factor |
|---|---|---|---|---|---|---|
| Ryzen 9 7900 (Zen4 desktop) | CPU | 1 | 3.40 ms | 3.57 ms | 5.06 ms | 4.7× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 2 | 2.07 ms | 2.25 ms | 3.65 ms | 7.7× |
| Ryzen 9 7900 (Zen4 desktop) | CPU | 4 | 1.32 ms | 1.57 ms | 6.91 ms ‡ | 12.0× |
| Ryzen 9 7900 + RADV iGPU (Raphael) | Vulkan | — | 4.43 ms | 4.62 ms | 5.07 ms | 3.60× |
| Ryzen 9 7900 + RTX 5070 Ti (dGPU) | Vulkan | — | 1.79 ms | 3.41 ms | 4.14 ms | 8.63× |
| Apple M4 (4P + 6E, macOS 25.3) | CPU | 1 | 2.98 ms | 3.16 ms | 19.11 ms ‡ | 5.4× |
| Apple M4 (4P + 6E, macOS 25.3) | CPU | 2 | 1.82 ms | 1.93 ms | 3.17 ms | 8.8× |
| Apple M4 (4P + 6E, macOS 25.3) | CPU | 4 | 1.11 ms | 1.81 ms | 10.41 ms ‡ | 14.4× |
| Core i5-14500 (Alder Lake-S) | CPU | 1 | 3.25 ms | 3.53 ms | 6.73 ms | 4.93× |
| Core i5-14500 (Alder Lake-S) | CPU | 2 | 2.55 ms | 2.81 ms | 5.20 ms | 6.23× |
| Core i5-14500 (Alder Lake-S) | CPU | 3 | 2.26 ms | 3.09 ms | 3.85 ms | 7.06× |
| Core i5-14500 (Alder Lake-S) | CPU | 4 | 2.02 ms | 2.89 ms | 3.59 ms | 7.79× |
| Core i5-14500 + Arc A770 (dGPU) | Vulkan | — | 10.90 ms | 12.00 ms | 13.38 ms | 1.48× |
| Core i5-14500 + UHD 770 (iGPU) | Vulkan | — | 9.02 ms | 11.77 ms | 17.93 ms | 1.74× |
Adding cores hits diminishing returns quickly: even the wider v1.3
graph is small enough that thread-launch and synchronisation
overhead start to dominate beyond ≈4 threads on these CPUs. The
Zen4 sweeps show it plainly on both versions — the 1→4 thread step
gives a 2.59× speedup on v1.2 and a 3.03× speedup on v1.3, but
4→8 is a regression on both and 8→16 worse still. The 6800U mobile
Zen3+ on v1.2 agrees: 1→4 is a 2.21× speedup, 4→8 only buys another
7%. The library's default thread count is min(4, sched_getaffinity) —
auto-capped at 4 with respect for taskset, cgroup, and VM CPU
limits, so over-subscription doesn't happen on resource-constrained
hosts. Pass a non-zero value to localvqe_options_set_threads to
override.
‡ Outliers are single hops early in the first iteration (cold caches); p99 is representative of steady-state.
Vulkan p50/p95/p99 are typically tight, but worst-case single-hop latency on a shared desktop is sensitive to external GPU clients (display compositor, browser). On a dedicated embedded device with no compositor contending for the queue, expect the quieter end of the range.
The bench binary prints the top-10 slowest hops with
(iteration, hop-in-iteration) coordinates so you can check whether
outliers cluster at post-localvqe_reset() boundaries (cold path)
or scatter through the stream (external contention). In practice we
see the latter.
bench reports process RSS from /proc/self/status alongside the
internal allocator accounting from --profile. The numbers are
essentially thread-count-invariant — both 1 and 16 threads land on
the same peak within a few hundred KiB — so one row per model
suffices.
| Model | Post-load delta ¹ | Peak RSS (VmHWM) ² | Internal total resident ³ |
|---|---|---|---|
| v1.3 (4.8 M) | +24.4 MiB | 34.1 MiB | 23.0 MiB |
| v1.2 (1.3 M) | +10.0 MiB | 19.6 MiB | 8.7 MiB |
¹ RSS added by localvqe_new_with_options + CPU backend init, on
top of the ~7 MiB binary/libs baseline measured by bench itself.
This is the portable "working set the model brings" number; the
absolute peak will depend on your host process baseline.
² VmHWM after warmup + sustained streaming on a Zen4 desktop
(Ryzen 9 7900). v1.3 is ~1.75× v1.2 in RSS terms despite carrying
~3.7× more parameters — activation, history-scratch, and per-frame
history buffers don't scale with channel width the way the weight
buffer does.
³ Backend-internal accounting from bench --profile: the sum of
the weights buffer, activation buffer (gallocr), and one-shot
history scratch. Excludes the double-buffered history-tensor swap
pages (already counted in the activation buffer for the read side).
For GPU backends (Vulkan), RSS understates real usage — VRAM isn't
visible in /proc/self/status. Use bench --profile on the GPU
build to read the same weights/activation/scratch breakdown from
the backend-internal allocator.
Full 800-clip eval on the ICASSP 2022 AEC Challenge blind test set (real recordings, not synthetic mixes):
v1.3 (current, 4.8 M):
| Scenario | n | AECMOS echo ↑ | AECMOS deg ↑ | blind ERLE ↑ | DNSMOS OVRL ↑ |
|---|---|---|---|---|---|
| doubletalk | 115 | 4.73 | 2.62 | 8.5 dB | 2.89 |
| doubletalk-with-movement | 185 | 4.67 | 2.43 | 8.3 dB | 2.85 |
| farend-singletalk | 107 | 3.69 | 4.83 | 50.9 dB | 1.94 |
| farend-singletalk-with-movement | 193 | 3.88 | 4.98 | 49.9 dB | 1.96 |
| nearend-singletalk | 200 | 5.00 | 4.18 | 2.4 dB | 3.17 |
v1.2 (compact alternative, 1.3 M):
| Scenario | n | AECMOS echo ↑ | AECMOS deg ↑ | blind ERLE ↑ | DNSMOS OVRL ↑ |
|---|---|---|---|---|---|
| doubletalk | 115 | 4.72 | 2.37 | 8.4 dB | 2.83 |
| doubletalk-with-movement | 185 | 4.65 | 2.30 | 8.1 dB | 2.79 |
| farend-singletalk | 107 | 3.78 | 4.91 | 45.7 dB | 1.80 |
| farend-singletalk-with-movement | 193 | 4.12 | 4.96 | 40.6 dB | 1.75 |
| nearend-singletalk | 200 | 5.00 | 4.16 | 2.1 dB | 3.17 |
v1.3 vs v1.2 deltas (same 800-clip set, same eval pipeline):
- Doubletalk deg MOS +0.25, dt-with-movement deg MOS +0.13 — the wider model + noise-floor-aware loss recipe noticeably reduces perceived speech degradation when both talkers are active. The primary v1.3 release goal.
- FE-ST-with-movement ERLE +9.3 dB, FE-ST ERLE +5.2 dB — v1.3 cancels far-end echo substantially harder. AECMOS echo MOS drops −0.24 / −0.09 at the same time: the residual after cancellation rates rougher on AECMOS's perceptual scale even though there's numerically less of it. Some users will prefer v1.2's gentler trade-off on far-end-only scenes.
- Nearend-singletalk identical within noise (deg +0.02, OVRL +0.00) — wider capacity doesn't help (or hurt) when there's nothing to cancel.
- DNSMOS OVRL is up 0.04–0.21 across all scenarios — the wider model produces consistently cleaner-rated output by DNS metrics.
For the original v1.2-vs-v1.1 release notes (the previous headline: echo MOS +0.80 / +0.72 on FE-ST and FE-ST-with-movement, near-end deg MOS +0.11), see the v1.2 git tag.
- AECMOS (Purin et al., ICASSP 2022) is Microsoft's non-intrusive AEC quality predictor. "Echo" rates how well echo was removed; "degradation" rates how clean the resulting speech is. 1–5 MOS scale, higher is better.
- Blind ERLE is
10·log10(E[mic²] / E[enh²]). Only meaningful on far-end single-talk where the input is echo-only; on scenes with active near-end speech it understates echo removal because both numerator and denominator are dominated by speech.
PyTorch checkpoint integrity (SHA256):
22d3e2f33bb8b25ec1c6a928cfb741bb631d45bae2b3759684818b101c95878e localvqe-v1.3-4.8M.pt
ff6885e7c8d7d29a8ce963303dcd668ae0f2a7bdafae28631292fe6f06f7cd77 localvqe-v1.2-1.3M.pt
ggml/ C++ streaming inference (GGML graph, CLI, C API, tests)
pytorch/ PyTorch reference implementation (model definition only)
obs-plugin/ OBS Studio audio filter wrapping liblocalvqe.so
CITATION.cff
LICENSE
flake.nix
Requires CMake ≥ 3.20 and a C++17 compiler. A Nix flake is provided:
git clone --recursive https://github.com/localai-org/LocalVQE.git
cd LocalVQE
# With Nix:
nix develop
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release
cmake --build ggml/build -j$(nproc)
# Without Nix — install cmake, gcc/clang, pkg-config, libsndfile, then:
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release
cmake --build ggml/build -j$(nproc)Binaries land in ggml/build/bin/. The CPU build produces multiple
libggml-cpu-*.so variants (SSE4.2 / AVX2 / AVX-512) selected at runtime.
Keep the binaries and .so files together.
Add -DLOCALVQE_VULKAN=ON to the configure step. This composes with the
CPU build — an additional libggml-vulkan.so is produced in
ggml/build/bin/ and the runtime loader picks it up when a Vulkan ICD is
present, otherwise it falls back to the CPU variants.
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release -DLOCALVQE_VULKAN=ON
cmake --build ggml/build -j$(nproc)The Nix flake's dev shell already includes vulkan-loader,
vulkan-headers, and shaderc. Without Nix, install the equivalents
from your distro (Debian: libvulkan-dev vulkan-headers glslc/shaderc).
./ggml/build/bin/localvqe localvqe-v1.2-1.3M-f32.gguf \
--in-wav mic.wav ref.wav \
--out-wav enhanced.wavExpects 16 kHz mono PCM for both mic and far-end reference.
The bench-run cmake target is the turnkey path: it builds bench,
downloads the released F32 model and a doubletalk mic/ref WAV pair from
HuggingFace into ggml/build/bench_assets/, and runs the benchmark.
# Configure once (Vulkan optional but recommended for GPU runs)
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release -DLOCALVQE_VULKAN=ON
# Discover backends + device indices
cmake --build ggml/build --target bench-list-devices
# Run on the default backend (CPU device 0, 10 iterations)
cmake --build ggml/build --target bench-runTo pick a specific backend or device, set the cache variables at configure time and rebuild the target:
# Vulkan device 0 (e.g. dGPU) with 30 iterations
cmake -S ggml -B ggml/build -DBENCH_BACKEND=Vulkan -DBENCH_DEVICE=0 -DBENCH_ITERS=30
cmake --build ggml/build --target bench-run
# Vulkan device 1 (e.g. iGPU)
cmake -S ggml -B ggml/build -DBENCH_DEVICE=1
cmake --build ggml/build --target bench-runSweeping every backend/device on the box is just a shell loop over the
indices bench-list-devices printed:
for dev in 0 1; do
cmake -S ggml -B ggml/build -DBENCH_BACKEND=Vulkan -DBENCH_DEVICE=$dev
cmake --build ggml/build --target bench-run
doneOr invoke the binary directly against your own WAV pair:
./ggml/build/bin/bench localvqe-v1.2-1.3M-f32.gguf \
--backend Vulkan --device 0 \
--in-wav mic.wav ref.wav --iters 10 --profilecmake -S ggml -B ggml/build -DLOCALVQE_BUILD_SHARED=ON
cmake --build ggml/build -j$(nproc)Produces liblocalvqe.so with the API in ggml/localvqe_api.h. See
ggml/example_purego_test.go for a Go / purego integration.
ggml/tests/test_regression.cpp is an end-to-end check: it runs
localvqe_process_f32 on a fixed seeded input through each published
.gguf and compares against a committed reference output, mirroring
the PyTorch suite under pytorch/tests/. Build, fetch both released
GGUFs from HuggingFace, and run via CTest:
cmake --build ggml/build --target test_regression regression-assets
ctest --test-dir ggml/build --output-on-failureregression-assets reuses the same SHA256-verified download path as
bench-assets. Missing GGUFs make the corresponding test entry SKIP
rather than fail, so CI without network access still runs cleanly.
To refresh a reference output after an intentional graph change:
python ggml/tests/regenerate_fixtures.py \
--gguf ggml/build/bench_assets/localvqe-v1.2-1.3M-f32.ggufCalibrated Q4_K / Q8_0 weights are not yet published. The quantize
tool in the C++ build can produce GGUF variants from the F32 reference
for experimentation:
./ggml/build/bin/quantize localvqe-v1.2-1.3M-f32.gguf localvqe-v1.2-1.3M-q8.gguf Q8_0Expect end-to-end quality loss until proper per-tensor selection and calibration have been worked through.
obs-plugin/ wraps liblocalvqe.so as an OBS Studio audio source
filter. Once installed it appears as "LocalVQE (AEC + Noise +
Dereverb)" in any audio source's filter list. The bundled v1.3 GGUF
is preselected on first use, so noise suppression and dereverberation
work out of the box; AEC additionally requires picking a reference
source — typically "Desktop Audio" — so the model knows what's
playing through the speakers.
The flake provides a dedicated dev shell with libobs alongside the parent build deps:
nix develop .#obs-plugin
# Parent library (shared); the plugin links against it.
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release -DLOCALVQE_BUILD_SHARED=ON
cmake --build ggml/build -j$(nproc)
# Stage the bundled GGUF so the plugin's default-model resolver finds it.
cmake --build ggml/build --target regression-assets
cp ggml/build/bench_assets/localvqe-v1.3-4.8M-f32.gguf obs-plugin/data/
# Plugin
cmake -S obs-plugin -B obs-plugin/build -DCMAKE_BUILD_TYPE=Release
cmake --build obs-plugin/build -j$(nproc)
cmake --install obs-plugin/buildThe install copies the plugin .so along with liblocalvqe.so and
every libggml-cpu-*.so variant into
~/.config/obs-studio/plugins/obs-localvqe/, so the tree is
self-contained — no LD_LIBRARY_PATH, no system-wide install of
LocalVQE required. Pass
-DOBS_PLUGIN_DESTINATION=/usr/lib/x86_64-linux-gnu/obs-plugins/obs-localvqe
to the plugin's configure step for a system-wide install instead.
Restart OBS, right-click any audio source → Filters → Add → LocalVQE.
| Property | Default | Notes |
|---|---|---|
| Model (.gguf) | bundled | Auto-resolved to data/localvqe-v1.3-4.8M-f32.gguf if staged; otherwise browse to a path. |
| Inference threads | 4 | Sweet spot on Zen4 (see the benchmark table). Changing this rebuilds the model ctx. |
| Residual noise gate | off | Mutes hops below an RMS threshold; cleans up quiet model residual during silence. |
| Gate threshold (dBFS) | -45 | Only used when the gate is on. -45 mutes the typical -60 dBFS residual but preserves speech. |
| Reference source | (none) | For AEC: pick the OBS source feeding your speakers (usually "Desktop Audio"). Off → NS + dereverb only. |
Without a reference, the AEC head sees silence and contributes nothing — the filter still runs noise suppression and dereverberation on the mic alone. With a reference, the plugin time-aligns it to the mic queue via OBS timestamps; the model's AlignBlock then absorbs the remaining speaker→mic acoustic delay (up to ~1 s on v1.2 and v1.3).
Tested on Linux. macOS uses the same POSIX dladdr path and is
expected to work unchanged. The Windows path is implemented (via
GetModuleHandleEx + GetModuleFileName in ensure_backends_loaded)
but is currently unverified — please open an issue if you hit a
problem there.
pytorch/ contains the model definition used to train and export the
weights. It's provided for verification, ablation, and downstream research
— not for end-user inference, which should go through the GGML build.
cd pytorch
pip install -r requirements.txt
python -c "
import yaml, torch
from localvqe.model import LocalVQE
cfg = yaml.safe_load(open('configs/default.yaml'))
model = LocalVQE(**cfg['model'], n_freqs=cfg['audio']['n_freqs'])
print(sum(p.numel() for p in model.parameters()))
"If you use LocalVQE in academic work, please cite the repository via the
CITATION.cff file at the root — GitHub renders a "Cite this repository"
button that produces APA and BibTeX entries automatically.
For a DOI, we recommend citing a specific release via Zenodo, which mints a DOI per GitHub release. Please also cite the upstream DeepVQE paper:
@inproceedings{indenbom2023deepvqe,
title = {DeepVQE: Real Time Deep Voice Quality Enhancement for Joint
Acoustic Echo Cancellation, Noise Suppression and Dereverberation},
author = {Indenbom, Evgenii and Beltr{\'a}n, Nicolae-C{\u{a}}t{\u{a}}lin
and Chernov, Mykola and Aichner, Robert},
booktitle = {Interspeech},
year = {2023},
doi = {10.21437/Interspeech.2023-2176}
}Published weights are trained on data from the ICASSP 2023 Deep Noise Suppression Challenge (Microsoft, CC BY 4.0) and fine-tuned on the ICASSP 2022/2023 Acoustic Echo Cancellation Challenge.
Training data was filtered by DNSMOS perceived-quality scores, which can misclassify distressed speech (screaming, crying) as noise. LocalVQE may attenuate or distort such signals and must not be relied upon for emergency call or safety-critical applications.
Apache License 2.0 — see LICENSE.