Voice-to-Code

Control AI Agent CLI with your voice using speech recognition.

Overview

Speak commands naturally and watch them appear in AI Agent CLI in real-time. The tool uses whisper-mic for local, free speech-to-text transcription with support for accented English.

Features:

✅ Free, unlimited usage (runs locally)
✅ Auto-pause detection (2s silence triggers transcription)
✅ Context maintained across commands (persistent AI Agent session)
✅ Visual feedback (watch text appear in tmux window)
✅ Notifications and TTS announcements (mac: OS notification and say / linux: notify-send and espeak-ng)
✅ Live status bar (LISTENING/SENDING/READY)
✅ Detailed logging for debugging

Prerequisites

macOS and Homebrew, or Linux
Python 3.x
AI Agent with CLI installed and configured

The rest of this document assumes Debian for Linux instructions.

Installation

1. Install system dependencies

brew install tmux portaudio ffmpeg

or

sudo apt-get install tmux libportaudio2 portaudio19-dev ffmpeg

2. Install Python dependencies

cd voice-to-code
python3 -m venv venv
./venv/bin/pip install -r requirements.txt

3. Configure microphone gain

System Settings → Sound → Input:

Select your microphone
Adjust Input volume to 60-70%
Test: speak normally, meter should reach 60-70%

Higher mic gain improves transcription accuracy and prevents cutoffs.

Usage

1. Start AI Agent in tmux session

Open terminal window and create a tmux session:

tmux new-session -s ai-voice-input "amp"

To resume existing Amp thread:

tmux new-session -s ai-voice-input "amp threads continue T-your-thread-id"

The default session name is ai-voice-input, but you can use any name and select it from the dropdown in the GUI.

2. Launch GUI

source venv/bin/activate
python main.py

If you see the following Tk error:

import _tkinter # If this fails your Python may not be configured for Tk
ModuleNotFoundError: No module named '_tkinter'

You will need to rebuild python with Tk configuration:

# mac
brew install python-tk
# linux
sudo apt-get install python3-tk

If using pyenv, you may need to reinstall Python:

pyenv install <your-python-version>

3. Use voice input

Click Start → Initializes transcriber (5-10s, GUI freezes during model load)
Speak your command → Auto-transcribes after 2s pause
View in log window → See transcribed text
AI Agent responds in tmux window
Continue conversation → Speak, pause, repeat
Click Stop → Stops listening (exits within 2s)

Configuration

Edit via Settings → Preferences in GUI, or directly in config.py:

CONFIG = {
    'transcriber_type': 'whisper_mic',  # Speech-to-text implementation
    'processor_type': 'tmux',           # Where to send transcribed text
    'model': 'large',                   # Whisper model (tiny/base/small/medium/large)
    'pause_threshold': 2.0,             # Seconds of silence before ending phrase
    'listen_timeout': 2.0,              # Max seconds to wait for speech to start
    'energy_threshold': 100,            # Minimum audio energy to detect speech
    'dynamic_energy': True,             # Auto-adjust for ambient noise
    'vocalize_response': False,         # Whether AI Agent should say a summary of the response out loud (macOS: say / Linux: espeak-ng)
    'log_handler_type': 'ui',           # Log output: 'ui' or 'file'
    'debug': False,                     # Verbose logging + capture WhisperMic logs
}

Debug Mode:

False - Logs only: start/stop, transcribed text, errors
True - Logs all operational details + WhisperMic internal logs

Whisper models (trade-off speed vs accuracy):

tiny - Fastest, poor accuracy
base - Fast, decent accuracy (~1-2s transcription)
small - Better accuracy (~3-4s transcription)
medium - High accuracy (~5-7s transcription)
large - Best for accented English (~7-10s transcription) - Recommended

Note: Transcription speed depends on your CPU. For Mac, Apple Silicon (M1/M2/M3) is much faster.

Troubleshooting

Stop button doesn't respond immediately

WhisperMic retries when detecting "too quiet" background noise
Quick fix: Mute your microphone, then click Stop
Permanent fix: Increase energy_threshold via Settings (200-400) to ignore background noise

Voice cuts off mid-sentence

Increase pause_threshold to 2.5 or 3.0 seconds via Settings

Not detecting speech

Lower energy_threshold (try 200 or 100) via Settings
Enable dynamic_energy for automatic adjustment

Poor transcription accuracy

Change model to large via Settings
Speak slower and more deliberately

No audio captured

macOS: Check mic permissions: System Settings → Privacy & Security → Microphone → Python (enable)
Linux: Ensure microphone on ALSA/PulseAudio is configured properly
- For PulseAudio, pavucontrol → Input Devices tab → Click checkmark ("set as fallback") to set a microphone as the default

Session not found error

Start tmux session first: tmux new-session -s ai-voice-input "amp"
Select the correct session from the dropdown in the GUI
Use the + button to add custom session names if needed
Use the - button to remove a custom session name when done

View logs

Logs and configuration stored in ~/.voice-to-code/

tail -f ~/.voice-to-code/voice_input.log

Enable debug logging

Edit config.py:

'debug': True,

Logs will include detailed operation info

How It Works

Click Start in GUI
Initializes WhisperMic model (5-10s, GUI freezes)
WhisperMic listens for speech with timeout-based polling
Detects pause (2s silence) → auto-transcribes
Text sent to tmux session via TmuxProcessor
Loop continues until Stop clicked
Threading.Event signals stop → exits within 2s

Technical Details:

GUI: Tkinter with modular views/models architecture
Transcription: Fully local via whisper-mic, zero cost
Threading: Init on main thread, streaming in background
Factories: Pluggable transcribers, processors, log handlers
Logging: File or UI output with stdlib bridge in debug mode

License

MIT License

https://opensource.org/licenses/MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE.txt		LICENSE.txt
README.md		README.md
Voice to Code.spec		Voice to Code.spec
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-to-Code

Overview

Prerequisites

Installation

1. Install system dependencies

2. Install Python dependencies

3. Configure microphone gain

Usage

1. Start AI Agent in tmux session

2. Launch GUI

3. Use voice input

Configuration

Troubleshooting

Stop button doesn't respond immediately

Voice cuts off mid-sentence

Not detecting speech

Poor transcription accuracy

No audio captured

Session not found error

View logs

Enable debug logging

How It Works

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice-to-Code

Overview

Prerequisites

Installation

1. Install system dependencies

2. Install Python dependencies

3. Configure microphone gain

Usage

1. Start AI Agent in tmux session

2. Launch GUI

3. Use voice input

Configuration

Troubleshooting

Stop button doesn't respond immediately

Voice cuts off mid-sentence

Not detecting speech

Poor transcription accuracy

No audio captured

Session not found error

View logs

Enable debug logging

How It Works

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages