feat: voice mode for hands-free interaction

## Problem

Mitzo is mobile-first but interaction is still text-only. When walking, driving, or doing anything with your hands occupied, you can't use it. The whole point of a mobile command center is that it's accessible anywhere — forcing keyboard input defeats that.

## Proposal

Add a voice mode that enables hands-free interaction:

### Input (speech-to-text)
- Tap-to-talk button or push-to-talk gesture in the chat input area
- Browser Web Speech API (`SpeechRecognition`) for on-device STT — no server roundtrip for transcription
- Interim results shown as the user speaks, final transcript sent as the prompt
- Fallback: if Web Speech API unavailable (some browsers), show a clear "not supported" message

### Output (text-to-speech)
- Auto-read agent responses aloud when voice mode is active
- Browser `SpeechSynthesis` API for on-device TTS
- Stop/skip button to interrupt readback
- Only read text blocks — skip tool calls, thinking blocks, and code blocks (or summarize them: "Running 3 tools...")

### UX considerations
- Voice mode toggle in the chat header (next to model/mode selectors)
- Visual indicator when listening (pulsing microphone icon)
- Works alongside text input — not a replacement, an addition
- Permission prompt for microphone access on first use
- Should work in iOS Safari PWA mode (the primary deployment target)

### Non-goals (for now)
- Real-time streaming STT (interim results from Web Speech API are sufficient)
- Custom wake word / always-listening
- Server-side STT/TTS (keep it client-side to avoid latency and API costs)
- Voice-to-voice (Anthropic real-time audio API) — revisit when available

## Technical notes
- Web Speech API is available in Safari (iOS 14.5+), Chrome, Edge. Not Firefox.
- `SpeechRecognition` needs HTTPS or localhost — Mitzo runs HTTP over Tailscale, which may require testing. iOS Safari PWA may have different permissions behavior.
- No new dependencies needed — both APIs are browser-native.

## Priority

Medium-high. This is a UX multiplier for the mobile-first use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: voice mode for hands-free interaction #60

Problem

Proposal

Input (speech-to-text)

Output (text-to-speech)

UX considerations

Non-goals (for now)

Technical notes

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: voice mode for hands-free interaction #60

Description

Problem

Proposal

Input (speech-to-text)

Output (text-to-speech)

UX considerations

Non-goals (for now)

Technical notes

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions