Audio Transcription, Slide Integration & LaTeX Notes Generation
Transform your audio recordings into structured, professional LaTeX notes instantly.
AudioTTo is a powerful Python application designed to streamline the process of creating study notes. It takes audio recordings (lectures, meetings, etc.) and optionally PDF slides, then uses advanced AI to generate comprehensive LaTeX documents.
- ποΈ Local Transcription: Uses
Faster-Whisperfor fast, accurate, and private audio transcription. - βοΈ Efficient Processing: Automatically chunks audio for parallel processing, maximizing CPU usage.
- π§ AI-Powered Notes: Leverages Google Gemini AI to synthesize transcripts into structured LaTeX notes.
- πΌοΈ Visual Integration: Extracts images from PDF slides and embeds them directly into the notes where relevant.
- π Modern UI: Includes a user-friendly web interface for easy drag-and-drop operation.
Tutorial_Audiotto.mp4
Before you begin, ensure you have the following:
- Python 3.12 (recommended) or higher (only if running from source).
- A LaTeX Distribution installed and added to your PATH (required for PDF compilation). You can download it manually or use the included helper scripts:
- Windows: MiKTeX (Recommended) or TeX Live
- Alternative: Run
Install_MiKTeX.batincluded in the folder.
- Alternative: Run
- macOS: MacTeX
- Alternative: Run
install_deps_mac.shincluded in the folder (requires Homebrew).
- Alternative: Run
- Linux: Tex Live
- Alternative: Run
install_deps_linux.shincluded in the folder or runsudo apt install texlive.
- Alternative: Run
- Windows: MiKTeX (Recommended) or TeX Live
- A Google Gemini API Key. You can get one from Google AI Studio.
If you downloaded the standalone executable:
- Download the latest version from the Releases page.
- Prerequisites: You still need a working LaTeX distribution installed (see Prerequisites above).
- Run:
- Windows: Double-click
AudioTTo.exe. - macOS: Double-click
AudioTTo.app. Note: If you see a security warning, go to System Settings > Privacy & Security and allow the app. - Linux: Open a terminal in the folder and run
./AudioTTo
- Windows: Double-click
On Linux and MacOS ensure it has execution permissions:
chmod +x AudioTTo.
-
Clone the repository (or download usage files):
git clone https://github.com/Manumarzo/AudioTTo.git cd AudioTTo -
Install dependencies:
pip install -r requirements.txt
-
Setup FFmpeg (Required for Local Run):
- Create a folder named
binin the root directory. - Download FFmpeg and FFprobe executables (static builds) for your OS:
- Place
ffmpeg(orffmpeg.exe) andffprobe(orffprobe.exe) inside thebinfolder. - Note: This is required as AudioTTo expects these binaries in the local
binfolder.
- Create a folder named
AudioTTo provides both a modern Web GUI and a classic CLI.
The easiest way to use AudioTTo.
- Launch the application:
python gui_app.py
- Interact: A window will open automatically (or go to
http://localhost:8000). - Configure: Click the Settings (βοΈ) button to enter your Gemini API Key.
- Process:
- Drag & drop your Audio file.
- (Optional) Drag & drop your Slides (PDF).
- Click Start Processing.
For automation or headless environments.
Set your API Key first:
Create a file named .env in the root directory of the project. Open it with a text editor and add your API Key:
GEMINI_API_KEY = your_actual_api_key_hereRun the script:
# Basic transcription
python AudioTTo.py lecture.wav
# With specific threads
python AudioTTo.py lecture.wav --threads 4
# With slides
python AudioTTo.py lecture.wav --slides slides.pdf
# With specific slide pages
python AudioTTo.py lecture.wav --slides slides.pdf --pages 1-15
# With slides and specific threads
python AudioTTo.py lecture.wav --slides slides.pdf --pages 1-15 --threads 4All generated files are organized in the output/ directory:
output/
βββ [Audio_Filename]/
βββ [Audio_Filename]_trascrizione.txt # Raw text transcript
βββ [Audio_Filename]_appunti.tex # Generated LaTeX source
βββ [Audio_Filename]_appunti.pdf # Final compiled PDF
π§Ή Intermediate files (chunks, noisy audio, logs) are automatically cleaned up.
Contributions are welcome! Feel free to open issues or submit pull requests to improve AudioTTo.
If you find AudioTTo useful and want to support its development, consider making a small donation! Your support helps keep the project alive and improving.
This project is licensed under the MIT License.
Developed with β€οΈ by Manumarzo


