Summary
When a user sends a Telegram voice message, CCBot transcribes it via Whisper and forwards text to Claude Code. The response should come back as BOTH a text message AND a voice message (OGG/Opus via Telegram sendVoice), enabling hands-free two-way voice conversations.
Current State
The TTS module and bot integration are implemented and passing all tests (243/243). The remaining step is configuring the OpenAI API key on hetzner1 and restarting the service.
Architecture
Phone (voice) → Telegram → Whisper STT → text → Claude Code
Claude Code → text → Telegram (text message)
→ OpenAI TTS → OGG/Opus → Telegram (voice message)
src/ccbot/tts.py
Mirrors transcribe.py pattern: lazy singleton httpx client, shared config.openai_api_key and config.openai_base_url.
- Model:
tts-1
- Voice:
nova
- Format:
opus (Telegram native OGG/Opus)
- Timeout: 60s
synthesize_speech(text) -> bytes — returns raw OGG/Opus audio
should_skip_tts(text) -> bool — skips when text exceeds TTS_MAX_CHARS (4096) or contains a fenced code block with 4+ lines
close_client() — shutdown cleanup
src/ccbot/bot.py
- Voice input tracking:
_last_input_voice set of (user_id, thread_id) — set in voice_handler, cleared in text_handler and forward_command_handler
- Per-user toggle:
_voice_enabled dict (default: True, in-memory only, resets on restart)
/voice command: /voice on, /voice off, or bare /voice to show status. Registered as bot command in menu.
- TTS delivery: In
handle_new_message, after enqueueing a complete assistant text message, synthesises audio and sends via bot.send_voice(). Errors are logged but never block text delivery.
Skip conditions (voice reply not sent)
- Last input was not a voice message
- User has toggled
/voice off
- No
OPENAI_API_KEY configured
- Response exceeds
TTS_MAX_CHARS (4096 chars)
- Response contains a fenced code block with 4+ lines
Tests
tests/ccbot/test_tts.py — 14 tests covering synthesis, error handling, URL handling, payload validation, skip logic, and client lifecycle.
Remaining
Summary
When a user sends a Telegram voice message, CCBot transcribes it via Whisper and forwards text to Claude Code. The response should come back as BOTH a text message AND a voice message (OGG/Opus via Telegram sendVoice), enabling hands-free two-way voice conversations.
Current State
The TTS module and bot integration are implemented and passing all tests (243/243). The remaining step is configuring the OpenAI API key on hetzner1 and restarting the service.
Architecture
src/ccbot/tts.pyMirrors
transcribe.pypattern: lazy singleton httpx client, sharedconfig.openai_api_keyandconfig.openai_base_url.tts-1novaopus(Telegram native OGG/Opus)synthesize_speech(text) -> bytes— returns raw OGG/Opus audioshould_skip_tts(text) -> bool— skips when text exceedsTTS_MAX_CHARS(4096) or contains a fenced code block with 4+ linesclose_client()— shutdown cleanupsrc/ccbot/bot.py_last_input_voiceset of(user_id, thread_id)— set invoice_handler, cleared intext_handlerandforward_command_handler_voice_enableddict (default:True, in-memory only, resets on restart)/voicecommand:/voice on,/voice off, or bare/voiceto show status. Registered as bot command in menu.handle_new_message, after enqueueing a complete assistant text message, synthesises audio and sends viabot.send_voice(). Errors are logged but never block text delivery.Skip conditions (voice reply not sent)
/voice offOPENAI_API_KEYconfiguredTTS_MAX_CHARS(4096 chars)Tests
tests/ccbot/test_tts.py— 14 tests covering synthesis, error handling, URL handling, payload validation, skip logic, and client lifecycle.Remaining
OPENAI_API_KEYto~/.ccbot/.envon hetzner1systemctl --user restart ccbot/voice offdisables voice replies,/voice onre-enables