Skip to content

❓ Questions / Help / Support #310

@bil21071

Description

@bil21071

Summary

I’m seeing very high false positives when deploying custom wake-word models
(“hilfe” and “adele”) trained with openWakeWord’s 2nd training approach.
Recall is good, but FP rate is high for on-device use like it is >=1 per hour .


Environment

  • Python: 3.10+
  • openWakeWord: 0.6.0
  • Hardware:
    • Training: VM
    • Inference: Samsung Galaxy Watch

Training Details

  • Training approach: 2nd approach from the official training notebook
    (training_models.ipynb)
  • Model architecture:
    • Model layer size: 256
  • Input shapes:
    • (6, 96)
    • (3, 96)
  • Key hyperparameters:
    • recall_weight = 0.2

Dataset

Positive samples (original + augmented)

  • “hilfe”: ~88,000 samples
  • “adele”: ~83,417 samples

Negative samples (~7000 hours total)

  • Room impulse responses (RIRs)
  • Mozilla Common Voice
  • TuDa German speech
  • Birds / environmental sounds
  • Radio, info content (German & English)

Observed Behavior

  • Recall: High (≈0.85 and ≈0.77)
  • False positives: High
    • Frequent triggers on unrelated speech and background audio
    • Example: random German radio or conversational speech activates “hilfe” / “adele”
    • Custom inference pipeline

Expected Behavior

  • FP rate similar to pre-trained openWakeWord models
    (e.g. < 1 false trigger per hour on negative-only audio)
  • Maintain good recall on positives

Question / Help Requested

What is the recommended way to reduce false positives without killing recall?

Any guidance would be greatly appreciated — this is currently blocking on-device deployment.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions