Skip to content

feat: two-way voice — TTS replies for voice messages #59

@krazyuniks

Description

@krazyuniks

Summary

When a user sends a Telegram voice message, CCBot transcribes it via Whisper and forwards text to Claude Code. The response should come back as BOTH a text message AND a voice message (OGG/Opus via Telegram sendVoice), enabling hands-free two-way voice conversations.

Current State

The TTS module and bot integration are implemented and passing all tests (243/243). The remaining step is configuring the OpenAI API key on hetzner1 and restarting the service.

Architecture

Phone (voice) → Telegram → Whisper STT → text → Claude Code
Claude Code → text → Telegram (text message)
                  → OpenAI TTS → OGG/Opus → Telegram (voice message)

src/ccbot/tts.py

Mirrors transcribe.py pattern: lazy singleton httpx client, shared config.openai_api_key and config.openai_base_url.

  • Model: tts-1
  • Voice: nova
  • Format: opus (Telegram native OGG/Opus)
  • Timeout: 60s
  • synthesize_speech(text) -> bytes — returns raw OGG/Opus audio
  • should_skip_tts(text) -> bool — skips when text exceeds TTS_MAX_CHARS (4096) or contains a fenced code block with 4+ lines
  • close_client() — shutdown cleanup

src/ccbot/bot.py

  • Voice input tracking: _last_input_voice set of (user_id, thread_id) — set in voice_handler, cleared in text_handler and forward_command_handler
  • Per-user toggle: _voice_enabled dict (default: True, in-memory only, resets on restart)
  • /voice command: /voice on, /voice off, or bare /voice to show status. Registered as bot command in menu.
  • TTS delivery: In handle_new_message, after enqueueing a complete assistant text message, synthesises audio and sends via bot.send_voice(). Errors are logged but never block text delivery.

Skip conditions (voice reply not sent)

  • Last input was not a voice message
  • User has toggled /voice off
  • No OPENAI_API_KEY configured
  • Response exceeds TTS_MAX_CHARS (4096 chars)
  • Response contains a fenced code block with 4+ lines

Tests

tests/ccbot/test_tts.py — 14 tests covering synthesis, error handling, URL handling, payload validation, skip logic, and client lifecycle.

Remaining

  • Add OPENAI_API_KEY to ~/.ccbot/.env on hetzner1
  • systemctl --user restart ccbot
  • Test: send a voice message in Telegram, verify text + voice reply
  • Test: /voice off disables voice replies, /voice on re-enables
  • Test: long or code-heavy responses are text-only

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions