Skip to content

Commit b06f578

Browse files
committed
docs: add Voice Activity Detector guide
1 parent 4545194 commit b06f578

File tree

3 files changed

+112
-2
lines changed

3 files changed

+112
-2
lines changed

docs/_sidebar.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@
2626
- Monitoring and Debugging
2727
- [RTC Stats](guide/rtc_stats.md)
2828
- [Logging](guide/logging.md)
29+
- Utilities
30+
- [Voice Activity Detector](guide/voice_activity_detector.md)
2931

3032
- [**Build Notes**](build.md)
3133
- [**Changelog**](changelog.md)

docs/guide/overview.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,18 @@ This section provides detailed guides for various features of the webrtc-java li
2626

2727
- [Data Channels](guide/data_channels.md) - Sending and receiving arbitrary data between peers
2828

29+
## Networking and ICE
30+
31+
- [Port Allocator Config](guide/port_allocator_config.md) - Restrict ICE port ranges and control candidate gathering behavior
32+
2933
## Monitoring and Debugging
3034

3135
- [RTC Stats](guide/rtc_stats.md) - Monitoring connection quality and performance
3236
- [Logging](guide/logging.md) - Configuring and using the logging system
3337

34-
## Networking and ICE
38+
## Utilities
3539

36-
- [Port Allocator Config](guide/port_allocator_config.md) - Restrict ICE port ranges and control candidate gathering behavior
40+
- [Voice Activity Detector](guide/voice_activity_detector.md) - Detect speech activity in PCM audio streams
3741

3842
## Additional Resources
3943

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Voice Activity Detector
2+
3+
The Voice Activity Detector (VAD) helps you determine when speech is present in an audio stream. It analyzes short chunks of PCM audio and returns the probability that the chunk contains voice.
4+
5+
This can be used to:
6+
- Drive push-to-talk or auto-mute logic
7+
- Skip encoding/sending silence to save bandwidth
8+
- Trigger UI indicators when the user is speaking
9+
10+
API: `dev.onvoid.webrtc.media.audio.VoiceActivityDetector`
11+
12+
## Overview
13+
14+
`VoiceActivityDetector` exposes a minimal API:
15+
- `process(byte[] audio, int samplesPerChannel, int sampleRate)`: Analyze one audio frame.
16+
- `getLastVoiceProbability()`: Retrieve the probability (0.0..1.0) that the last processed frame contained voice.
17+
- `dispose()`: Release native resources. Always call this when done.
18+
19+
Internally, VAD uses a native implementation optimized for real-time analysis. The class itself does not perform resampling or channel mixing, so provide audio matching the given `sampleRate` and expected format.
20+
21+
## Audio format expectations
22+
23+
- PCM signed 16-bit little-endian (typical Java byte[] from microphone capture via this library)
24+
- Mono is recommended. If you have stereo, downmix to mono before calling `process` or pass samples-per-channel accordingly
25+
- Frame size: commonly 10 ms per call (e.g., 160 samples at 16 kHz for 10 ms)
26+
- Supported sample rates: 8 kHz, 16 kHz, 32 kHz, 48 kHz (use one of these for best results)
27+
28+
## Basic usage
29+
30+
```java
31+
import dev.onvoid.webrtc.media.audio.VoiceActivityDetector;
32+
33+
// Create the detector
34+
VoiceActivityDetector vad = new VoiceActivityDetector();
35+
36+
try {
37+
// Example parameters
38+
int sampleRate = 16000; // 16 kHz
39+
int frameMs = 10; // 10 ms frames
40+
int samplesPerChannel = sampleRate * frameMs / 1000; // 160 samples
41+
42+
// audioFrame must contain 16-bit PCM data for one frame (mono)
43+
byte[] audioFrame = new byte[samplesPerChannel * 2]; // 2 bytes per sample
44+
45+
// Fill audioFrame from your audio source here
46+
// ...
47+
48+
// Analyze frame
49+
vad.process(audioFrame, samplesPerChannel, sampleRate);
50+
51+
// Query probability of voice in the last frame
52+
float prob = vad.getLastVoiceProbability(); // 0.0 .. 1.0
53+
54+
boolean isSpeaking = prob >= 0.5f; // choose a threshold that works for your app
55+
56+
}
57+
finally {
58+
// Always release resources
59+
vad.dispose();
60+
}
61+
```
62+
63+
## Continuous processing loop
64+
65+
```java
66+
VoiceActivityDetector vad = new VoiceActivityDetector();
67+
68+
try {
69+
int sampleRate = 16000;
70+
int frameMs = 10;
71+
int samplesPerChannel = sampleRate * frameMs / 1000; // 160 samples
72+
byte[] audioFrame = new byte[samplesPerChannel * 2];
73+
74+
while (running) {
75+
// Read PCM frame from your capture pipeline into audioFrame
76+
// ...
77+
78+
vad.process(audioFrame, samplesPerChannel, sampleRate);
79+
float prob = vad.getLastVoiceProbability();
80+
81+
if (prob > 0.8f) {
82+
// High confidence of speech
83+
// e.g., enable VU meter, unmute, or mark active speaker
84+
}
85+
else {
86+
// Likely silence or noise
87+
}
88+
}
89+
}
90+
finally {
91+
vad.dispose();
92+
}
93+
```
94+
95+
## Tips and best practices
96+
97+
- Threshold selection: Start with 0.5–0.8 and tune for your environment.
98+
- Frame size consistency: Use a consistent frame duration and sample rate.
99+
- Resource management: VAD holds native resources; ensure `dispose()` is called.
100+
- Preprocessing: Consider using `AudioProcessing` (noise suppression, gain control) before VAD for improved robustness in noisy environments. See the Audio Processing guide.
101+
102+
## Related guides
103+
104+
- [Audio Processing](guide/audio_processing.md)

0 commit comments

Comments
 (0)