Skip to content

TFLite export (openWakeWord-compatible) + ONNX quantization fix#82

Merged
pham-tuan-binh merged 2 commits into
mainfrom
feat/tflite-export
Jun 8, 2026
Merged

TFLite export (openWakeWord-compatible) + ONNX quantization fix#82
pham-tuan-binh merged 2 commits into
mainfrom
feat/tflite-export

Conversation

@pham-tuan-binh

Copy link
Copy Markdown
Collaborator

Summary

Adds an tflite export format alongside ONNX, producing artifacts that openWakeWord can load directly. Also fixes a pre-existing crash in quantize_onnx discovered while testing.

run_export stays the single entry point and gains a format argument; ONNX is always produced (it is the TFLite conversion source), so eval and the full run pipeline are unaffected.

What's included

Format-aware export

  • config: new ExportFormat enum + output_format field (defaults to onnx)
  • cli: --format/-f flag on export (precedence: flag > config)
  • run_export(config, quantize=False, format=None) dispatches onnx/tflite
  • rename export_classifierexport_onnx for parity with export_tflite

TFLite via export/tflite.py (onnx2tf: ONNX → TF SavedModel → TFLite), pinned to the openWakeWord runtime contract:

  • static (1, 16, 96) input (openWakeWord never calls resize_tensor_input; keep_shape_absolutely_input_names stops onnx2tf transposing it to (1, 96, 16))
  • (1, 1) output
  • builtin TFLite ops only (LiteRT interpreter has no Flex delegate)

Head support (verified): dnn converts bit-exact vs ONNX/PyTorch. conv_attention and rnn cannot convert to builtin-op TFLite (attention emits an unsupported constant; LSTM needs the Flex delegate) — they now raise NotImplementedError before any work, with an actionable message. The conversion itself is also wrapped to surface a clear RuntimeError instead of an onnx2tf stack trace.

quantize_onnx fix (pre-existing bug, all heads)
The torch dynamo ONNX exporter emits value_info for weight initializers. ORT's dynamic quantizer rewrites GemmMatMul by transposing those weights in place but leaves the value_info stale, so its strict shape-inference pass failed with ShapeInferenceError: Inferred shape ... differ in dimension 0: (1536) vs (32). Fix: drop initializer value_info (redundant — shapes infer from the tensors) before quantizing.

Packaging / docs / tests

  • optional tflite extra (onnx2tf, tensorflow, tf-keras, onnx-graphsurgeon, sng4onnx, psutil, ai-edge-litert — onnx2tf under-declares several of these)
  • docs/export-and-inference.md + README updated
  • tests/test_export.py: ONNX IO contract, quantize regression, fast guard for unsupported heads, and a real TFLite parity test (auto-skips without the extra)

Usage

```bash
pip install livekit-wakeword[tflite]
livekit-wakeword export configs/prod.yaml --format tflite
```

Testing

```
69 passed, 1 skipped
```
The real ONNX→TFLite conversion + parity test passes locally with the tflite extra installed; it skips in CI (which runs --extra train --extra export only).

Notes

  • Only the dnn head currently produces openWakeWord-compatible TFLite. conv_attention (the default/flagship head) does not convert via onnx2tf yet — a follow-up could try onnx2tf param-replacement for the attention op or ai-edge-torch (direct PyTorch→LiteRT).

…zation

Add a format-aware export path while keeping run_export as the entry point.

- config: new ExportFormat enum + output_format field (defaults to onnx)
- cli: --format/-f flag on `export`; precedence is flag > config
- run_export(config, quantize, format): dispatches onnx/tflite; ONNX is always
  produced (it is the TFLite conversion source) so `eval` keeps working
- export/tflite.py: ONNX -> TFLite via onnx2tf, pinned to the openWakeWord
  contract (static (1,16,96) input via keep_shape_absolutely_input_names,
  (1,1) output, builtin TFLite ops only). dnn verified bit-exact vs ONNX;
  conv_attention/rnn fail fast with NotImplementedError before any work
- rename export_classifier -> export_onnx for parity with export_tflite
- fix quantize_onnx: strip stale initializer value_info emitted by the torch
  dynamo exporter, which broke ORT's Gemm->MatMul shape-inference pass
- add optional `tflite` extra, tests (test_export.py), and docs

Tests: 69 passed, 1 skipped (real TFLite conversion skips without the extra).
- run: validate tflite + head compatibility before step 1 so an unsupported
  head (e.g. the default conv_attention) fails fast instead of after training
- quantize_onnx: strip stale initializer value_info on a temp copy so the
  source ONNX deliverable is no longer mutated as a side effect
- tests: assert the source ONNX survives quantization, and cover
  --quantize --format tflite
@pham-tuan-binh pham-tuan-binh merged commit 431c7e4 into main Jun 8, 2026
7 checks passed
@pham-tuan-binh pham-tuan-binh deleted the feat/tflite-export branch June 8, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant