Chatterbox TTS - M1 MacBook Air Deployment

A complete, production-ready deployment of Resemble AI's Chatterbox text-to-speech system, optimized for Apple Silicon M1 MacBook Air.

Features

State-of-the-art TTS: Outperforms ElevenLabs in blind evaluations
Voice Cloning: Clone any voice with 10-30 seconds of audio
Multilingual: Supports 20+ languages
Apple Silicon Optimized: Uses MPS acceleration for fast inference
Web Interface: Beautiful, responsive UI for easy use
REST API: Easy integration with other applications
Voice Library System: Manage multiple synthetic voices
Memory Efficient: Optimized for MacBook Air constraints
Built-in Watermarking: All audio includes imperceptible watermarks

Prerequisites

Hardware: Apple Silicon M1, M2, or M3 Mac
OS: macOS 11.0 or later
Memory: 8GB RAM minimum (16GB recommended)
Storage: ~5GB for models and dependencies
Python: 3.11 (will be installed via conda)

Quick Start

# Clone this repository
git clone https://github.com/mkoker/chatterbox-m1-project.git
cd chatterbox-m1-project

# Run the automated setup
chmod +x setup_m1.sh
./setup_m1.sh

# Start the server
export PYTORCH_ENABLE_MPS_FALLBACK=1
source $HOME/miniforge3/etc/profile.d/conda.sh
conda activate chatterbox
python server.py

Open http://localhost:8000 in your browser.

Installation

Automated Installation

The easiest way to get started:

./setup_m1.sh

This installs:

Miniforge (conda for ARM64)
PyTorch with MPS support
Chatterbox TTS and all dependencies
FastAPI server components

Manual Installation

If you prefer manual control:

# Install Miniforge
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh"
bash Miniforge3-MacOSX-arm64.sh

# Create environment
conda create -n chatterbox python=3.11 -y
conda activate chatterbox

# Install PyTorch
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cpu

# Install Chatterbox and dependencies
pip install chatterbox-tts transformers==4.46.3
pip install fastapi uvicorn python-multipart psutil

Troubleshooting Installation

If you encounter issues, see PYTORCH_FIX.md for common solutions.

Usage

Web Interface

Start the server and access the web UI:

./start_server_with_fallback.sh

Then open http://localhost:8000

Python API

from chatterbox.tts import ChatterboxTTS
import torchaudio as ta

# Initialize model
model = ChatterboxTTS.from_pretrained(device="mps")

# Basic text-to-speech
wav = model.generate("Hello from Chatterbox!")
ta.save("output.wav", wav, model.sr)

# Voice cloning
wav = model.generate(
    "This is a cloned voice.",
    audio_prompt_path="reference_audio/my_voice.wav",
    exaggeration=0.6,
    cfg_weight=0.4
)
ta.save("cloned_output.wav", wav, model.sr)

REST API

# Generate speech
curl -X POST "http://localhost:8000/synthesize" \
     -F "text=Hello world" \
     -F "exaggeration=0.6" \
     --output speech.wav

# With voice cloning
curl -X POST "http://localhost:8000/synthesize" \
     -F "text=Hello in cloned voice" \
     -F "reference_audio=@reference_audio/voice.wav" \
     --output cloned_speech.wav

Command Line

# Using the voice library
python scripts/voice_library.py

# Direct voice cloning
python scripts/direct_voice_cloning.py

# API client example
python scripts/use_cloned_voice_api.py

API Documentation

Endpoints

Web Interface

GET / - Web UI

API

POST /synthesize - Generate speech
GET /health - Health check
POST /clear_cache - Clear model cache

Interactive Docs

Visit http://localhost:8000/docs when server is running

Parameters

Parameter	Type	Range	Description
`text`	string	-	Text to synthesize (max 1000 chars)
`language`	string	-	Language code (en, es, fr, etc.)
`exaggeration`	float	0.0-1.0	Emotion/expression intensity
`cfg_weight`	float	0.0-1.0	Speech control (lower = faster)
`temperature`	float	0.1-1.0	Creativity/variation
`reference_audio`	file	-	Reference audio for voice cloning

Example Response

{
  "status": "success",
  "audio": "binary_wav_data",
  "duration": 3.5,
  "sample_rate": 24000
}

Integration Examples

E-Learning Platform

from chatterbox.tts import ChatterboxTTS

class CourseNarrator:
    def __init__(self):
        self.model = ChatterboxTTS.from_pretrained(device="mps")
    
    def narrate_lesson(self, text, output_path):
        wav = self.model.generate(text, exaggeration=0.4, cfg_weight=0.6)
        ta.save(output_path, wav, self.model.sr)

Voice Assistant

import requests

def speak(text):
    response = requests.post(
        "http://localhost:8000/synthesize",
        data={"text": text}
    )
    # Play audio
    with open("response.wav", "wb") as f:
        f.write(response.content)

See INTEGRATION_GUIDE.md for more examples.

Troubleshooting

Common Issues

Issue: operator torchvision::nms does not exist
Fix: Run ./definitive_fix.sh

Issue: rope_scaling configuration error
Fix: Run pip install transformers==4.46.3 --force-reinstall

Issue: MPS operation not supported
Fix: Enable fallback with export PYTORCH_ENABLE_MPS_FALLBACK=1

Issue: Server won't start
Fix: Initialize conda: source $HOME/miniforge3/etc/profile.d/conda.sh

See PYTORCH_FIX.md for detailed troubleshooting.

Project Structure

chatterbox-m1-project/
├── README.md                       # This file
├── setup_m1.sh                     # Automated setup script
├── server.py                       # Main web server
├── start_server_with_fallback.sh  # Server launcher with MPS fallback
├── requirements.txt                # Python dependencies
│
├── examples/                       # Usage examples
│   ├── basic_example.py
│   └── voice_cloning_example.py
│
├── scripts/                        # Integration scripts
│   ├── voice_library.py           # Voice management system
│   ├── api_service.py             # Standalone API service
│   ├── direct_voice_cloning.py    # Direct Python integration
│   └── use_cloned_voice_api.py    # API client example
│
├── static/                         # Web interface
│   └── index.html
│
├── reference_audio/                # Voice samples for cloning
│   └── README.md
│
├── outputs/                        # Generated audio files
│
└── docs/                          # Additional documentation
    ├── GETTING_STARTED.md
    ├── INTEGRATION_GUIDE.md
    ├── SYNTHETIC_VOICES_GUIDE.md
    └── PYTORCH_FIX.md

Performance

Expected Performance on M1 MacBook Air

Model Loading: 30-60 seconds (first time only)
Generation Speed: 2-5 seconds for short sentences
Memory Usage: ~3-4GB
Voice Cloning: Works with 10-30 second reference clips

Optimization Tips

Keep MacBook plugged in for sustained performance
Close memory-intensive applications
Use shorter text chunks for faster generation
Models cache in memory after first load
Enable MPS fallback for unsupported operations

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

The underlying Chatterbox TTS model is also MIT licensed by Resemble AI.

Acknowledgments

Resemble AI for creating Chatterbox TTS
Chatterbox Model: https://github.com/resemble-ai/chatterbox
The open-source community for PyTorch, Transformers, and FastAPI

Citation

If you use this project in your research or application, please cite:

@misc{chatterboxtts2025,
  author = {{Resemble AI}},
  title = {{Chatterbox-TTS}},
  year = {2025},
  howpublished = {\url{https://github.com/resemble-ai/chatterbox}},
  note = {GitHub repository}
}

Support

Issues: Open an issue on GitHub
Chatterbox Documentation: https://github.com/resemble-ai/chatterbox
Discord: https://discord.gg/rJq9cRJBJ6

Legal and Ethical Use

Only clone voices with explicit permission
Disclose when audio is AI-generated
All generated audio includes watermarks (PerTh)
Do not use for impersonation or fraud
Follow ethical AI guidelines

Built with ❤️ for Apple Silicon

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
reference_audio		reference_audio
scripts		scripts
server		server
static		static
.gitignore		.gitignore
=1.1.0		=1.1.0
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
GETTING_STARTED.md		GETTING_STARTED.md
GITHUB_GUIDE.md		GITHUB_GUIDE.md
INTEGRATION_GUIDE.md		INTEGRATION_GUIDE.md
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
PYTORCH_FIX.md		PYTORCH_FIX.md
README.md		README.md
SYNTHETIC_VOICES_GUIDE.md		SYNTHETIC_VOICES_GUIDE.md
complete_fix.sh		complete_fix.sh
debug_environment.sh		debug_environment.sh
definitive_fix.sh		definitive_fix.sh
final_compatibility_fix.sh		final_compatibility_fix.sh
fix_pytorch.sh		fix_pytorch.sh
fix_pytorch_alternative.sh		fix_pytorch_alternative.sh
fix_rope_scaling.sh		fix_rope_scaling.sh
force_env_fix.sh		force_env_fix.sh
install_sympy.sh		install_sympy.sh
launch.sh		launch.sh
manual_pytorch_fix.sh		manual_pytorch_fix.sh
quick_server_fix.sh		quick_server_fix.sh
requirements.txt		requirements.txt
run_example.sh		run_example.sh
run_examples.sh		run_examples.sh
run_voice_cloning.sh		run_voice_cloning.sh
server.py		server.py
setup.sh		setup.sh
setup_m1.sh		setup_m1.sh
simple_test.sh		simple_test.sh
start_server.sh		start_server.sh
start_server_with_fallback.sh		start_server_with_fallback.sh

Folders and files

Latest commit

History

Repository files navigation

Chatterbox TTS - M1 MacBook Air Deployment

Features

Table of Contents

Prerequisites

Quick Start

Installation

Automated Installation

Manual Installation

Troubleshooting Installation

Usage

Web Interface

Python API

REST API

Command Line

API Documentation

Endpoints

Parameters

Example Response

Integration Examples

E-Learning Platform

Voice Assistant

Troubleshooting

Common Issues

Project Structure

Performance

Expected Performance on M1 MacBook Air

Optimization Tips

Contributing

License

Acknowledgments

Citation

Support

Legal and Ethical Use

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages