Skip to content

localai-org/LocalVQE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LocalVQE

Open in Spaces Model on HF

Local Voice Quality Enhancement — a compact neural model for joint acoustic echo cancellation (AEC), noise suppression, and dereverberation of 16 kHz speech, designed to run on commodity CPUs in real time.

  • Two sizes — choose by CPU budget:
    • v1.3 (current) — 4.8 M parameters (~19 MB F32), ~3.3 ms per 16 ms frame on Zen4 (4 threads), ≈4.7× realtime.
    • v1.2 — 1.3 M parameters (~5 MB F32), ~1.6 ms per 16 ms frame on Zen4 (4 threads), ≈9.7× realtime.
  • Causal, streaming: 256-sample hop, 16 ms algorithmic latency
  • F32 reference inference in C++ via GGML; PyTorch reference included for verification and research

Try it: https://huggingface.co/spaces/LocalAI-io/LocalVQE-demo.

LocalVQE is a derivative of DeepVQE (Indenbom et al., Interspeech 2023) — smaller, GGML-native, and tuned for streaming CPU inference.

A concrete example

Picture a video call from a laptop. Your microphone picks up three things alongside your voice:

  1. The remote participant's voice, played back through your speakers and caught again by your mic — this is the echo. Without cancellation they hear themselves a fraction of a second later.
  2. Your own voice bouncing off walls, desk, and monitor before reaching the mic — this is reverberation, the "tunnel" or "bathroom" sound that makes you feel far away from the listener.
  3. A fan, keyboard clatter, a dog barking, or traffic outside — plain background noise.

LocalVQE removes all three in a single causal pass, frame by frame, on the CPU, so only your voice reaches the far end.

Why this, and not a classical AEC/NS stack?

Hand-tuned DSP pipelines (NLMS/AP/Kalman AEC, Wiener/spectral-subtraction NS, MCRA noise tracking, RLS dereverb) can run in tens of microseconds per frame and remain a strong baseline when the acoustic path is benign. LocalVQE is interesting when you want:

  • Robustness to non-linear echo paths (small loudspeakers, handheld devices, plastic laptop chassis) where linear AEC leaves residual echo.
  • Non-stationary noise suppression (babble, keyboards, fans changing speed) that energy-based noise estimators struggle with.
  • One model, many conditions — no per-device tuning of step sizes, forgetting factors, or VAD thresholds.
  • A single deterministic causal pass — no double-talk detector, no adaptation state that can diverge.

The trade-off is CPU: a classical stack might cost ~0.1 ms/frame, LocalVQE ~1–2 ms/frame. On anything larger than a microcontroller that's still a small fraction of a real-time budget.

Why this, and not DeepVQE?

Microsoft never released DeepVQE — no weights, no reference implementation, no streaming runtime. We re-implemented it from the paper as a GGML graph at richiejp/deepvqe-ggml (the full-width ~7.5 M-parameter version) before starting LocalVQE. LocalVQE is the same idea rebuilt for streaming CPU inference, and published in two sizes: a 1.3 M-parameter compact build (v1.2, ~5 MB F32) for tight CPU budgets, and a 4.8 M-parameter wider build (v1.3, ~19 MB F32) that filters noise better on some clips at ~2× the per-hop cost. Both are small enough to run real time on commodity CPUs.

Model Weights

Pre-trained weights are published on Hugging Face at LocalAI-io/LocalVQE:

File Description
localvqe-v1.3-4.8M-f32.gguf F32 GGUF — what the C++ engine loads (current default).
localvqe-v1.3-4.8M.pt PyTorch checkpoint — for verification, ablation, and downstream research.
localvqe-v1.2-1.3M-f32.gguf Compact alternative — same architecture family, ~1/4 the cost per hop.
localvqe-v1.2-1.3M.pt PyTorch checkpoint for the compact variant.
localvqe-v1.1-1.3M-f32.gguf Older release.
localvqe-v1-1.3M-f32.gguf Original release.

The current release is v1.3. It widens the encoder/decoder (mic channels [2,112,32,104,96,152], far [2,64,32], bottleneck 256) and trains from scratch under a noise-floor-aware loss recipe. On doubletalk it filters noise better than v1.2 (deg MOS +0.25 on the stratified dev sample, with stronger ERLE). On far-end-only echo it cancels harder but the residual rates rougher in AECMOS — some users will prefer v1.2's gentler trade-off on FE-ST scenes. v1.2 stays on the repo as the small/fast option (~1/4 the per-hop cost). Both reuse v1.2's 1024 ms echo-search window.

Streaming latency

Per-hop, 16 kHz / 256-sample hop → 16 ms budget. Each hop is a full ggml_backend_graph_compute. Run any of these locally with the bench-run cmake target — see Benchmark below. 30 iters × 625 hops/iter = 18 750 hops per row.

v1.3 (current — 4.8 M, wider encoder/decoder, bn 256)

Hardware Backend Threads Hop p50 Hop p99 RT factor
Ryzen 9 7900 (Zen4 desktop) CPU 1 9.73 ms 14.48 ms 1.58×
Ryzen 9 7900 (Zen4 desktop) CPU 2 5.41 ms 5.62 ms 2.95×
Ryzen 9 7900 (Zen4 desktop) CPU 4 3.21 ms 3.42 ms 4.97×
Ryzen 9 7900 (Zen4 desktop) CPU 8 3.47 ms 3.80 ms 4.59×
Ryzen 9 7900 (Zen4 desktop) CPU 16 3.79 ms 4.06 ms 4.19×
Ryzen 9 7900 + RADV iGPU (Raphael) Vulkan 8.71 ms 9.15 ms 1.83×
Ryzen 9 7900 + RTX 5070 Ti (dGPU) Vulkan 2.57 ms 4.21 ms 6.07×

The wider model is ~2× the per-hop cost of v1.2 in matching configurations — the dGPU (RTX 5070 Ti) ends up the fastest option for v1.3 by ~1.25× vs 4-thread CPU. The 1-thread case is the worst, still real-time (RT 1.58×) but with little margin; running v1.3 on a low-core / power-constrained device should use v1.2 instead. Re-runs on other CPUs (Apple M4, Alder Lake, mobile Zen3+) will be published as we collect them — until then the v1.2 sweep below is representative shape-wise and expects roughly the same ~2× multiplier.

v1.2 (compact alternative — 1.3 M, 1024 ms echo-search window)

Hardware Backend Threads Hop p50 Hop p99 Hop max RT factor
Ryzen 9 7900 (Zen4 desktop) CPU 1 4.28 ms 4.85 ms 6.23 ms 3.72×
Ryzen 9 7900 (Zen4 desktop) CPU 2 2.59 ms 3.80 ms 3.81 ms 6.09×
Ryzen 9 7900 (Zen4 desktop) CPU 4 1.65 ms 2.91 ms 4.57 ms 8.90×
Ryzen 9 7900 (Zen4 desktop) CPU 8 1.93 ms 2.41 ms 6.91 ms ‡ 8.22×
Ryzen 9 7900 (Zen4 desktop) CPU 16 2.09 ms 2.22 ms 6.43 ms ‡ 7.69×
Ryzen 9 7900 + RADV iGPU (Raphael) Vulkan 6.10 ms 6.53 ms 6.24 ms 2.61×
Ryzen 9 7900 + RTX 5070 Ti (dGPU) Vulkan 1.96 ms 3.64 ms 5.42 ms 7.85×
Ryzen 7 6800U (Zen3+ laptop) CPU 1 4.69 ms 6.08 ms 19.31 ms ‡ 3.37×
Ryzen 7 6800U (Zen3+ laptop) CPU 4 2.11 ms 2.77 ms 4.90 ms 7.44×
Ryzen 7 6800U (Zen3+ laptop) CPU 8 1.94 ms 2.60 ms 5.52 ms 7.94×
Ryzen 7 6800U + RADV iGPU (Rembrandt) Vulkan 9.84 ms 14.75 ms 20.87 ms ‡ 1.53×

The wider echo-search window v1.2 introduced (1024 ms vs v1.1's 512 ms) costs ~20–25 % per-hop on CPU vs v1.1.

v1.1 (previous — 512 ms echo-search window)

Hardware Backend Threads Hop p50 Hop p99 Hop max RT factor
Ryzen 9 7900 (Zen4 desktop) CPU 1 3.40 ms 3.57 ms 5.06 ms 4.7×
Ryzen 9 7900 (Zen4 desktop) CPU 2 2.07 ms 2.25 ms 3.65 ms 7.7×
Ryzen 9 7900 (Zen4 desktop) CPU 4 1.32 ms 1.57 ms 6.91 ms ‡ 12.0×
Ryzen 9 7900 + RADV iGPU (Raphael) Vulkan 4.43 ms 4.62 ms 5.07 ms 3.60×
Ryzen 9 7900 + RTX 5070 Ti (dGPU) Vulkan 1.79 ms 3.41 ms 4.14 ms 8.63×
Apple M4 (4P + 6E, macOS 25.3) CPU 1 2.98 ms 3.16 ms 19.11 ms ‡ 5.4×
Apple M4 (4P + 6E, macOS 25.3) CPU 2 1.82 ms 1.93 ms 3.17 ms 8.8×
Apple M4 (4P + 6E, macOS 25.3) CPU 4 1.11 ms 1.81 ms 10.41 ms ‡ 14.4×
Core i5-14500 (Alder Lake-S) CPU 1 3.25 ms 3.53 ms 6.73 ms 4.93×
Core i5-14500 (Alder Lake-S) CPU 2 2.55 ms 2.81 ms 5.20 ms 6.23×
Core i5-14500 (Alder Lake-S) CPU 3 2.26 ms 3.09 ms 3.85 ms 7.06×
Core i5-14500 (Alder Lake-S) CPU 4 2.02 ms 2.89 ms 3.59 ms 7.79×
Core i5-14500 + Arc A770 (dGPU) Vulkan 10.90 ms 12.00 ms 13.38 ms 1.48×
Core i5-14500 + UHD 770 (iGPU) Vulkan 9.02 ms 11.77 ms 17.93 ms 1.74×

Adding cores hits diminishing returns quickly: even the wider v1.3 graph is small enough that thread-launch and synchronisation overhead start to dominate beyond ≈4 threads on these CPUs. The Zen4 sweeps show it plainly on both versions — the 1→4 thread step gives a 2.59× speedup on v1.2 and a 3.03× speedup on v1.3, but 4→8 is a regression on both and 8→16 worse still. The 6800U mobile Zen3+ on v1.2 agrees: 1→4 is a 2.21× speedup, 4→8 only buys another 7%. The library's default thread count is min(4, sched_getaffinity) — auto-capped at 4 with respect for taskset, cgroup, and VM CPU limits, so over-subscription doesn't happen on resource-constrained hosts. Pass a non-zero value to localvqe_options_set_threads to override.

‡ Outliers are single hops early in the first iteration (cold caches); p99 is representative of steady-state.

Vulkan p50/p95/p99 are typically tight, but worst-case single-hop latency on a shared desktop is sensitive to external GPU clients (display compositor, browser). On a dedicated embedded device with no compositor contending for the queue, expect the quieter end of the range.

The bench binary prints the top-10 slowest hops with (iteration, hop-in-iteration) coordinates so you can check whether outliers cluster at post-localvqe_reset() boundaries (cold path) or scatter through the stream (external contention). In practice we see the latter.

Memory footprint (CPU)

bench reports process RSS from /proc/self/status alongside the internal allocator accounting from --profile. The numbers are essentially thread-count-invariant — both 1 and 16 threads land on the same peak within a few hundred KiB — so one row per model suffices.

Model Post-load delta ¹ Peak RSS (VmHWM) ² Internal total resident ³
v1.3 (4.8 M) +24.4 MiB 34.1 MiB 23.0 MiB
v1.2 (1.3 M) +10.0 MiB 19.6 MiB 8.7 MiB

¹ RSS added by localvqe_new_with_options + CPU backend init, on top of the ~7 MiB binary/libs baseline measured by bench itself. This is the portable "working set the model brings" number; the absolute peak will depend on your host process baseline.

² VmHWM after warmup + sustained streaming on a Zen4 desktop (Ryzen 9 7900). v1.3 is ~1.75× v1.2 in RSS terms despite carrying ~3.7× more parameters — activation, history-scratch, and per-frame history buffers don't scale with channel width the way the weight buffer does.

³ Backend-internal accounting from bench --profile: the sum of the weights buffer, activation buffer (gallocr), and one-shot history scratch. Excludes the double-buffered history-tensor swap pages (already counted in the activation buffer for the read side).

For GPU backends (Vulkan), RSS understates real usage — VRAM isn't visible in /proc/self/status. Use bench --profile on the GPU build to read the same weights/activation/scratch breakdown from the backend-internal allocator.

Validation Results

Full 800-clip eval on the ICASSP 2022 AEC Challenge blind test set (real recordings, not synthetic mixes):

v1.3 (current, 4.8 M):

Scenario n AECMOS echo ↑ AECMOS deg ↑ blind ERLE ↑ DNSMOS OVRL ↑
doubletalk 115 4.73 2.62 8.5 dB 2.89
doubletalk-with-movement 185 4.67 2.43 8.3 dB 2.85
farend-singletalk 107 3.69 4.83 50.9 dB 1.94
farend-singletalk-with-movement 193 3.88 4.98 49.9 dB 1.96
nearend-singletalk 200 5.00 4.18 2.4 dB 3.17

v1.2 (compact alternative, 1.3 M):

Scenario n AECMOS echo ↑ AECMOS deg ↑ blind ERLE ↑ DNSMOS OVRL ↑
doubletalk 115 4.72 2.37 8.4 dB 2.83
doubletalk-with-movement 185 4.65 2.30 8.1 dB 2.79
farend-singletalk 107 3.78 4.91 45.7 dB 1.80
farend-singletalk-with-movement 193 4.12 4.96 40.6 dB 1.75
nearend-singletalk 200 5.00 4.16 2.1 dB 3.17

v1.3 vs v1.2 deltas (same 800-clip set, same eval pipeline):

  • Doubletalk deg MOS +0.25, dt-with-movement deg MOS +0.13 — the wider model + noise-floor-aware loss recipe noticeably reduces perceived speech degradation when both talkers are active. The primary v1.3 release goal.
  • FE-ST-with-movement ERLE +9.3 dB, FE-ST ERLE +5.2 dB — v1.3 cancels far-end echo substantially harder. AECMOS echo MOS drops −0.24 / −0.09 at the same time: the residual after cancellation rates rougher on AECMOS's perceptual scale even though there's numerically less of it. Some users will prefer v1.2's gentler trade-off on far-end-only scenes.
  • Nearend-singletalk identical within noise (deg +0.02, OVRL +0.00) — wider capacity doesn't help (or hurt) when there's nothing to cancel.
  • DNSMOS OVRL is up 0.04–0.21 across all scenarios — the wider model produces consistently cleaner-rated output by DNS metrics.

For the original v1.2-vs-v1.1 release notes (the previous headline: echo MOS +0.80 / +0.72 on FE-ST and FE-ST-with-movement, near-end deg MOS +0.11), see the v1.2 git tag.

  • AECMOS (Purin et al., ICASSP 2022) is Microsoft's non-intrusive AEC quality predictor. "Echo" rates how well echo was removed; "degradation" rates how clean the resulting speech is. 1–5 MOS scale, higher is better.
  • Blind ERLE is 10·log10(E[mic²] / E[enh²]). Only meaningful on far-end single-talk where the input is echo-only; on scenes with active near-end speech it understates echo removal because both numerator and denominator are dominated by speech.

PyTorch checkpoint integrity (SHA256):

22d3e2f33bb8b25ec1c6a928cfb741bb631d45bae2b3759684818b101c95878e  localvqe-v1.3-4.8M.pt
ff6885e7c8d7d29a8ce963303dcd668ae0f2a7bdafae28631292fe6f06f7cd77  localvqe-v1.2-1.3M.pt

Repository Layout

ggml/        C++ streaming inference (GGML graph, CLI, C API, tests)
pytorch/     PyTorch reference implementation (model definition only)
obs-plugin/  OBS Studio audio filter wrapping liblocalvqe.so
CITATION.cff
LICENSE
flake.nix

Building the C++ Inference Engine

Requires CMake ≥ 3.20 and a C++17 compiler. A Nix flake is provided:

git clone --recursive https://github.com/localai-org/LocalVQE.git
cd LocalVQE

# With Nix:
nix develop
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release
cmake --build ggml/build -j$(nproc)

# Without Nix — install cmake, gcc/clang, pkg-config, libsndfile, then:
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release
cmake --build ggml/build -j$(nproc)

Binaries land in ggml/build/bin/. The CPU build produces multiple libggml-cpu-*.so variants (SSE4.2 / AVX2 / AVX-512) selected at runtime. Keep the binaries and .so files together.

Vulkan backend (embedded / integrated-GPU targets)

Add -DLOCALVQE_VULKAN=ON to the configure step. This composes with the CPU build — an additional libggml-vulkan.so is produced in ggml/build/bin/ and the runtime loader picks it up when a Vulkan ICD is present, otherwise it falls back to the CPU variants.

cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release -DLOCALVQE_VULKAN=ON
cmake --build ggml/build -j$(nproc)

The Nix flake's dev shell already includes vulkan-loader, vulkan-headers, and shaderc. Without Nix, install the equivalents from your distro (Debian: libvulkan-dev vulkan-headers glslc/shaderc).

Running Inference

CLI

./ggml/build/bin/localvqe localvqe-v1.2-1.3M-f32.gguf \
    --in-wav mic.wav ref.wav \
    --out-wav enhanced.wav

Expects 16 kHz mono PCM for both mic and far-end reference.

Benchmark

The bench-run cmake target is the turnkey path: it builds bench, downloads the released F32 model and a doubletalk mic/ref WAV pair from HuggingFace into ggml/build/bench_assets/, and runs the benchmark.

# Configure once (Vulkan optional but recommended for GPU runs)
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release -DLOCALVQE_VULKAN=ON

# Discover backends + device indices
cmake --build ggml/build --target bench-list-devices

# Run on the default backend (CPU device 0, 10 iterations)
cmake --build ggml/build --target bench-run

To pick a specific backend or device, set the cache variables at configure time and rebuild the target:

# Vulkan device 0 (e.g. dGPU) with 30 iterations
cmake -S ggml -B ggml/build -DBENCH_BACKEND=Vulkan -DBENCH_DEVICE=0 -DBENCH_ITERS=30
cmake --build ggml/build --target bench-run

# Vulkan device 1 (e.g. iGPU)
cmake -S ggml -B ggml/build -DBENCH_DEVICE=1
cmake --build ggml/build --target bench-run

Sweeping every backend/device on the box is just a shell loop over the indices bench-list-devices printed:

for dev in 0 1; do
    cmake -S ggml -B ggml/build -DBENCH_BACKEND=Vulkan -DBENCH_DEVICE=$dev
    cmake --build ggml/build --target bench-run
done

Or invoke the binary directly against your own WAV pair:

./ggml/build/bin/bench localvqe-v1.2-1.3M-f32.gguf \
    --backend Vulkan --device 0 \
    --in-wav mic.wav ref.wav --iters 10 --profile

Shared Library (C API)

cmake -S ggml -B ggml/build -DLOCALVQE_BUILD_SHARED=ON
cmake --build ggml/build -j$(nproc)

Produces liblocalvqe.so with the API in ggml/localvqe_api.h. See ggml/example_purego_test.go for a Go / purego integration.

Regression test

ggml/tests/test_regression.cpp is an end-to-end check: it runs localvqe_process_f32 on a fixed seeded input through each published .gguf and compares against a committed reference output, mirroring the PyTorch suite under pytorch/tests/. Build, fetch both released GGUFs from HuggingFace, and run via CTest:

cmake --build ggml/build --target test_regression regression-assets
ctest --test-dir ggml/build --output-on-failure

regression-assets reuses the same SHA256-verified download path as bench-assets. Missing GGUFs make the corresponding test entry SKIP rather than fail, so CI without network access still runs cleanly.

To refresh a reference output after an intentional graph change:

python ggml/tests/regenerate_fixtures.py \
    --gguf ggml/build/bench_assets/localvqe-v1.2-1.3M-f32.gguf

Quantizing (experimental)

Calibrated Q4_K / Q8_0 weights are not yet published. The quantize tool in the C++ build can produce GGUF variants from the F32 reference for experimentation:

./ggml/build/bin/quantize localvqe-v1.2-1.3M-f32.gguf localvqe-v1.2-1.3M-q8.gguf Q8_0

Expect end-to-end quality loss until proper per-tensor selection and calibration have been worked through.

OBS Studio Plugin

obs-plugin/ wraps liblocalvqe.so as an OBS Studio audio source filter. Once installed it appears as "LocalVQE (AEC + Noise + Dereverb)" in any audio source's filter list. The bundled v1.3 GGUF is preselected on first use, so noise suppression and dereverberation work out of the box; AEC additionally requires picking a reference source — typically "Desktop Audio" — so the model knows what's playing through the speakers.

The flake provides a dedicated dev shell with libobs alongside the parent build deps:

nix develop .#obs-plugin

# Parent library (shared); the plugin links against it.
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release -DLOCALVQE_BUILD_SHARED=ON
cmake --build ggml/build -j$(nproc)

# Stage the bundled GGUF so the plugin's default-model resolver finds it.
cmake --build ggml/build --target regression-assets
cp ggml/build/bench_assets/localvqe-v1.3-4.8M-f32.gguf obs-plugin/data/

# Plugin
cmake -S obs-plugin -B obs-plugin/build -DCMAKE_BUILD_TYPE=Release
cmake --build obs-plugin/build -j$(nproc)
cmake --install obs-plugin/build

The install copies the plugin .so along with liblocalvqe.so and every libggml-cpu-*.so variant into ~/.config/obs-studio/plugins/obs-localvqe/, so the tree is self-contained — no LD_LIBRARY_PATH, no system-wide install of LocalVQE required. Pass -DOBS_PLUGIN_DESTINATION=/usr/lib/x86_64-linux-gnu/obs-plugins/obs-localvqe to the plugin's configure step for a system-wide install instead.

Restart OBS, right-click any audio source → Filters → Add → LocalVQE.

Property Default Notes
Model (.gguf) bundled Auto-resolved to data/localvqe-v1.3-4.8M-f32.gguf if staged; otherwise browse to a path.
Inference threads 4 Sweet spot on Zen4 (see the benchmark table). Changing this rebuilds the model ctx.
Residual noise gate off Mutes hops below an RMS threshold; cleans up quiet model residual during silence.
Gate threshold (dBFS) -45 Only used when the gate is on. -45 mutes the typical -60 dBFS residual but preserves speech.
Reference source (none) For AEC: pick the OBS source feeding your speakers (usually "Desktop Audio"). Off → NS + dereverb only.

Without a reference, the AEC head sees silence and contributes nothing — the filter still runs noise suppression and dereverberation on the mic alone. With a reference, the plugin time-aligns it to the mic queue via OBS timestamps; the model's AlignBlock then absorbs the remaining speaker→mic acoustic delay (up to ~1 s on v1.2 and v1.3).

Tested on Linux. macOS uses the same POSIX dladdr path and is expected to work unchanged. The Windows path is implemented (via GetModuleHandleEx + GetModuleFileName in ensure_backends_loaded) but is currently unverified — please open an issue if you hit a problem there.

PyTorch Reference

pytorch/ contains the model definition used to train and export the weights. It's provided for verification, ablation, and downstream research — not for end-user inference, which should go through the GGML build.

cd pytorch
pip install -r requirements.txt
python -c "
import yaml, torch
from localvqe.model import LocalVQE
cfg = yaml.safe_load(open('configs/default.yaml'))
model = LocalVQE(**cfg['model'], n_freqs=cfg['audio']['n_freqs'])
print(sum(p.numel() for p in model.parameters()))
"

Citing LocalVQE

If you use LocalVQE in academic work, please cite the repository via the CITATION.cff file at the root — GitHub renders a "Cite this repository" button that produces APA and BibTeX entries automatically.

For a DOI, we recommend citing a specific release via Zenodo, which mints a DOI per GitHub release. Please also cite the upstream DeepVQE paper:

@inproceedings{indenbom2023deepvqe,
  title     = {DeepVQE: Real Time Deep Voice Quality Enhancement for Joint
               Acoustic Echo Cancellation, Noise Suppression and Dereverberation},
  author    = {Indenbom, Evgenii and Beltr{\'a}n, Nicolae-C{\u{a}}t{\u{a}}lin
               and Chernov, Mykola and Aichner, Robert},
  booktitle = {Interspeech},
  year      = {2023},
  doi       = {10.21437/Interspeech.2023-2176}
}

Dataset Attribution

Published weights are trained on data from the ICASSP 2023 Deep Noise Suppression Challenge (Microsoft, CC BY 4.0) and fine-tuned on the ICASSP 2022/2023 Acoustic Echo Cancellation Challenge.

Safety Note

Training data was filtered by DNSMOS perceived-quality scores, which can misclassify distressed speech (screaming, crying) as noise. LocalVQE may attenuate or distort such signals and must not be relied upon for emergency call or safety-critical applications.

License

Apache License 2.0 — see LICENSE.

About

Lean neural real-time acoustic echo cancellation with soft delay estimation - GGML and PyTorch inference

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors