Skip to content

stephenhsklarew/Zoom2GoogleTranscript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Zoom2GoogleTranscript

Automatically transcribe Zoom mp4 videos to Google Docs using local Whisper AI with intelligent calendar integration and speaker identification. Zero ongoing costs - all processing happens on your machine.

✨ Key Features

  • πŸŽ₯ Batch Processing - Transcribe multiple videos automatically
  • πŸ“… Calendar Integration - Automatically fetches meeting titles and attendees from Google Calendar
  • πŸ‘₯ Speaker Identification - Uses calendar attendees for accurate speaker names
  • πŸ’° Zero Cost - Uses local Whisper AI (no API charges)
  • πŸ“ Google Meet Format - Creates properly formatted transcripts matching Google Meet style
  • πŸ”„ Progress Tracking - Rich terminal UI with progress bars
  • 🎯 Model Selection - Choose speed vs accuracy trade-off
  • πŸ€– AI Speaker Diarization - Optional advanced speaker detection with pyannote.audio

πŸš€ Quick Start

Prerequisites

  1. Python 3.9+

  2. ffmpeg - Required for audio processing

    # macOS
    brew install ffmpeg
    
    # Ubuntu/Debian
    sudo apt install ffmpeg
  3. Google Cloud Project - For API access

Installation

# Clone the repository
git clone https://github.com/stephenhsklarew/Zoom2GoogleTranscript.git
cd Zoom2GoogleTranscript

# Install dependencies
pip install -r requirements.txt

# Authenticate with Google
python authenticate.py

Basic Usage

# Transcribe all videos in a folder
python video_transcriber.py /path/to/zoom/recordings

# Use a specific model
python video_transcriber.py /path/to/zoom/recordings --model medium

# Specify credentials
python video_transcriber.py /path/to/zoom/recordings --credentials token_video.pickle

πŸ“‹ Output Format

The tool creates Google Docs transcripts in Google Meet format:

Dec 9, 2024
Steve/Karan/Stephen - Transcript
Attendees: karan.apatel, stephen.sklarew, steve.burden
00:00:00

karan.apatel: Hey everyone, thanks for joining...
stephen.sklarew: Great to be here. Let's discuss...
steve.burden: I'll start with the quarterly results...

How It Works

  1. Extracts date/time from Zoom folder names (format: YYYY-MM-DD HH.MM.SS Meeting Name)
  2. Queries Google Calendar for matching events (Β±30 minute window)
  3. Extracts meeting details - title and attendee list
  4. Transcribes audio using Whisper AI
  5. Maps speakers to calendar attendees
  6. Creates formatted Google Doc with proper attribution

🎀 Speaker Identification

Method 1: Calendar-Based (Default)

Uses pause detection (>2 seconds) combined with calendar attendee names:

  • βœ… Zero setup required
  • βœ… Works immediately with calendar integration
  • βœ… Good for 2-3 person conversations
  • ⚠️ Less accurate for complex multi-speaker scenarios

Method 2: AI-Powered Diarization (Optional)

For advanced speaker detection with pyannote.audio:

  1. Get Hugging Face Token:

  2. Use with token:

    # Via environment variable (recommended)
    export HF_TOKEN=hf_your_token_here
    python video_transcriber.py /path/to/videos
    
    # Or via command line
    python video_transcriber.py /path/to/videos --hf-token hf_your_token_here

Benefits of AI Diarization:

  • 🎯 More accurate speaker detection
  • πŸ‘₯ Better for 3+ person meetings
  • πŸ”Š Analyzes voice characteristics, not just pauses
  • βœ… Still free (runs locally)

πŸŽ›οΈ Command Line Options

python video_transcriber.py <video_folder> [OPTIONS]

Required:
  video_folder              Path to folder containing Zoom recordings

Optional:
  --model MODEL            Whisper model: tiny, base, small, medium, large
                           (default: base)

  --no-recursive           Don't search subdirectories

  --folder-id ID           Google Drive folder ID to save documents

  --credentials PATH       Path to Google credentials file
                           (default: token_video.pickle)

  --hf-token TOKEN         Hugging Face token for speaker diarization
                           (can also use HF_TOKEN environment variable)

  --since DATE             Only process videos modified after this date
                           Format: YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
                           Examples: 2024-12-01 or "2024-12-01 14:30:00"

πŸ“Š Model Comparison

Model Speed Accuracy RAM Download Size
tiny ~32x realtime Lowest 1GB ~75MB
base ~16x realtime Good βœ… 1GB ~140MB
small ~6x realtime Better 2GB ~460MB
medium ~2x realtime High 5GB ~1.5GB
large ~1x realtime Best 10GB ~3GB

Recommendation: Start with base model for speed/quality balance.

Real-World Performance

MacBook Pro M1 (CPU only):

  • 30 min video with base model: ~2 minutes
  • 30 min video with medium model: ~4 minutes

πŸ”§ Setup Details

1. Google Cloud Setup

  1. Go to https://console.cloud.google.com
  2. Create a new project
  3. Enable these APIs:
    • Google Docs API
    • Google Drive API
    • Google Calendar API (v3)
  4. Create OAuth 2.0 credentials:
    • Application type: "Desktop app"
    • Download as credentials.json
    • Place in project directory

2. Authentication

Run the authentication script once:

python authenticate.py

This will:

  • Open your browser for Google OAuth
  • Request permissions for Docs, Drive, and Calendar
  • Save credentials to token_video.pickle

The token is reused for all future transcriptions.

3. Zoom Recording Structure

For calendar integration to work, organize recordings in Zoom's default format:

Zoom/
β”œβ”€β”€ 2024-12-09 10.31.25 Steve_Karan_Stephen/
β”‚   └── video1487928882.mp4
β”œβ”€β”€ 2024-12-02 15.00.18 Diane_Stephen Weekly 1_1/
β”‚   └── video1683623283.mp4
└── ...

The folder name format YYYY-MM-DD HH.MM.SS Meeting Name is used to match calendar events.

πŸ’‘ Usage Examples

Example 1: Weekly Meeting Transcripts

# Transcribe all recordings from last week
python video_transcriber.py ~/Documents/Zoom --model base

# Review transcripts in Google Docs
# Speaker names automatically pulled from calendar

Example 2: Client Call Archive

# Process all client recordings with better accuracy
python video_transcriber.py ~/Videos/ClientCalls \
  --model medium \
  --folder-id abc123xyz

# All transcripts organized in specific Drive folder

Example 3: Conference Recording

# Use AI speaker diarization for multi-speaker panel
export HF_TOKEN=hf_your_token
python video_transcriber.py ~/Conferences/2024 \
  --model medium \
  --recursive

Example 4: Incremental Processing

# Only process videos from this week
python video_transcriber.py ~/Documents/Zoom --since 2024-12-01

# Process videos from a specific date and time
python video_transcriber.py ~/Documents/Zoom --since "2024-12-01 14:30:00"

# Useful for daily/weekly automation - only transcribe new recordings
python video_transcriber.py ~/Documents/Zoom --since $(date -v-7d +%Y-%m-%d)

πŸ”’ Security & Privacy

  • βœ… All AI processing is local - Videos never sent to external servers
  • βœ… No OpenAI API calls - Zero data sent to cloud
  • βœ… Google OAuth - Secure authentication flow
  • βœ… Minimal permissions - Only Docs/Drive/Calendar access
  • βœ… Token stored locally - credentials.json and token.pickle stay on your machine

πŸ› Troubleshooting

"ffmpeg not found"

brew install ffmpeg  # macOS
sudo apt install ffmpeg  # Ubuntu/Debian

"Calendar API has not been used"

Enable Calendar API in Google Cloud Console: https://console.cloud.google.com/apis/library/calendar-json.googleapis.com

"No calendar event found"

Check that:

  • Video folder follows Zoom naming: YYYY-MM-DD HH.MM.SS Meeting Name
  • Calendar event exists within Β±30 minutes of recording time
  • Calendar API is enabled and authenticated

Slow processing

  • Use smaller model (--model base or --model tiny)
  • Enable GPU if available (automatic)
  • Process overnight for large batches

Incorrect speaker names

  • Verify calendar event has attendees listed
  • Try AI diarization with --hf-token for better accuracy
  • Check that Zoom folder timestamp matches meeting time

πŸ“ Cost Comparison

Solution Cost Processing Speed Accuracy
This Tool (Zoom2GoogleTranscript) $0 Local (2-10x realtime) High
Whisper API $0.006/min Very Fast High
Google Speech-to-Text $0.016/min Very Fast Medium
Rev.ai $1.50/min Fast Very High

100 hours of video:

  • Zoom2GoogleTranscript: $0
  • Whisper API: $36
  • Google Speech-to-Text: $96
  • Rev.ai: $9,000

🀝 Contributing

Contributions welcome! Areas for improvement:

  • Support for additional video formats (mov, avi, webm)
  • Parallel processing for faster batch jobs
  • Custom speaker name mapping
  • Integration with other calendar systems
  • Improved speaker diarization algorithms

πŸ“„ License

MIT License - Free for personal and commercial use.

πŸ‘€ Author

Stephen Sklarew (@stephenhsklarew)

πŸ™ Acknowledgments

πŸ“ž Support

For issues or questions:

About

Automatically transcribe Zoom recordings to Google Docs with calendar integration and speaker identification. Zero API costs using local Whisper AI.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages