You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This algorithm has been created using Claude Opus 4.5.
Executive Summary
This document outlines the algorithm and architecture for the subsync CLI application's core feature: generating Netflix-compliant subtitles from YouTube videos. The system extracts audio from YouTube videos, transcribes the speech using AI, processes the transcription to meet professional subtitle standards, and outputs YouTube-compatible subtitle files.
INPUT: youtube_url, language_code, [output_path], [quality]
PROCESS:
1. Parse URL to extract video ID
- Supported formats:
• https://www.youtube.com/watch?v=VIDEO_ID
• https://youtu.be/VIDEO_ID
• https://www.youtube.com/embed/VIDEO_ID
• https://www.youtube.com/v/VIDEO_ID
2. Validate language code against Whisper supported languages
- List of 99+ supported languages
- Auto-detection available if not specified
3. Validate output path (create if needed)
OUTPUT: Validated parameters object
Step 2: Video Metadata Extraction
INPUT: video_id
PROCESS (using yt-dlp):
1. Extract video information without downloading
- Title (for output filename)
- Duration (for progress estimation)
- Availability status
- Video ID confirmation
2. Check for restrictions:
- Age restriction
- Region locking
- Private/unlisted status
- Live stream (not supported)
OUTPUT: VideoMetadata {
id: str,
title: str,
duration: float,
uploader: str,
upload_date: str
}
ERRORS:
- VideoUnavailable: Video is private or deleted
- AgeRestricted: Video requires age verification
- LiveStream: Live streams not supported
Step 3: Audio Extraction
INPUT: VideoMetadata, temp_directory
PROCESS (using yt-dlp):
1. Configure extraction options:
- format: "bestaudio/best"
- extract_audio: True
- audio_format: "wav" # Best for Whisper
- audio_quality: 0 # Highest quality
- postprocessors: FFmpegExtractAudio
2. Download and extract audio
- Show progress callback
- Handle interruptions gracefully
3. Verify audio file integrity
- Check file exists
- Validate duration matches metadata
OUTPUT: audio_file_path (WAV format, 16kHz mono optimal)
CLEANUP: Register for deletion on completion/error
Step 4: Speech Transcription
INPUT: audio_file_path, language_code
PROCESS (using OpenAI Whisper):
1. Select model based on quality setting:
- "turbo": Fast, good accuracy (recommended)
- "large-v3": Best accuracy, slower
- "medium": Balanced
- "small"/"base"/"tiny": Faster, less accurate
2. Load model (cache for reuse)
3. Transcribe with options:
- language: specified or auto-detect
- task: "transcribe" (not translate)
- word_timestamps: True (granular timing)
- verbose: False
4. Extract segments with timing:
- Each segment has start, end, text
- Word-level timestamps for precise alignment
OUTPUT: TranscriptionResult {
language: str,
segments: [
{
id: int,
start: float, # seconds
end: float, # seconds
text: str,
words: [{word, start, end}, ...]
},
...
]
}
Step 5: Netflix Compliance Processing
INPUT: TranscriptionResult
PROCESS:
FOR each segment in transcription:
1. TIMING VALIDATION:
┌─────────────────────────────────────────────┐
│ Check duration = end - start │
│ │
│ IF duration < 833ms (5/6 second): │
│ - Extend end time if no conflict │
│ - Minimum: 833ms │
│ │
│ IF duration > 7000ms (7 seconds): │
│ - Split into multiple segments │
│ - Find natural break points │
│ │
│ ENSURE gap from previous >= 83ms (2 frames) │
└─────────────────────────────────────────────┘
2. TEXT SEGMENTATION:
┌─────────────────────────────────────────────┐
│ Count characters in text │
│ │
│ IF chars <= 42: │
│ - Single line, no split needed │
│ │
│ IF 42 < chars <= 84: │
│ - Split into 2 lines │
│ - Apply line break rules (see below) │
│ │
│ IF chars > 84: │
│ - Split into multiple subtitle events │
│ - Use word timestamps for timing │
└─────────────────────────────────────────────┘
3. LINE BREAK RULES (in priority order):
┌─────────────────────────────────────────────┐
│ PREFER breaking: │
│ ✓ After punctuation marks (. , ! ? :) │
│ ✓ Before conjunctions (and, but, or) │
│ ✓ Before prepositions (in, on, at, to) │
│ │
│ AVOID breaking: │
│ ✗ Between article and noun │
│ ✗ Between adjective and noun │
│ ✗ Between first and last name │
│ ✗ Between verb and subject pronoun │
│ ✗ Between verb and auxiliary │
└─────────────────────────────────────────────┘
4. READING SPEED VALIDATION:
┌─────────────────────────────────────────────┐
│ Calculate CPS = total_chars / duration_secs │
│ │
│ IF CPS > 20 (adult content): │
│ Option A: Extend duration (if space) │
│ Option B: Flag for manual review │
│ │
│ IF CPS > 17 (children's content): │
│ Apply stricter limits │
└─────────────────────────────────────────────┘
5. CREATE SUBTITLE EVENT:
- Assign sequential index (1, 2, 3...)
- Format times as HH:MM:SS,mmm
- Store processed text with line breaks
OUTPUT: List[Subtitle] ready for output
Step 6: Output Generation
INPUT: List[Subtitle], output_path, format
PROCESS:
FOR SRT FORMAT:
┌────────────────────────────────────────┐
│ 1 │
│ 00:00:00,000 --> 00:00:02,500 │
│ Hello everyone and welcome │
│ to today's video. │
│ │
│ 2 │
│ 00:00:02,600 --> 00:00:05,000 │
│ Today we're going to discuss │
│ something very important. │
└────────────────────────────────────────┘
- Index number
- Timestamp line: START --> END
- Text (1-2 lines)
- Blank line separator
FOR VTT FORMAT (optional):
┌────────────────────────────────────────┐
│ WEBVTT │
│ │
│ 00:00:00.000 --> 00:00:02.500 │
│ Hello everyone and welcome │
│ to today's video. │
│ │
│ 00:00:02.600 --> 00:00:05.000 │
│ Today we're going to discuss │
│ something very important. │
└────────────────────────────────────────┘
- Header: "WEBVTT"
- Timestamp uses . instead of ,
- No index numbers (optional)
OUTPUT: Subtitle file(s) written to disk
This document serves as the architectural blueprint for the subsync subtitle generation feature. Implementation should follow these specifications to ensure Netflix compliance and YouTube compatibility.