Transform YouTube tutorials into AI agent skills that actually work.
YouTube contains millions of programming tutorials, but this knowledge is:
- Inaccessible to AI assistants — AI agents cannot watch videos
- Time-consuming to extract — A 10-minute video takes 10 minutes to watch
- Often outdated — APIs change, libraries deprecate, best practices evolve
- Not actionable — Narrative format ("so what we're gonna do...") ≠ executable instructions
AI agent skills extend an agent's capabilities. However, creating effective skills requires understanding:
- How agents discover skills (via description matching)
- The difference between narrative and imperative content
- Current state of mentioned technologies
Without this knowledge, generated skills are often:
- Never activated (poor descriptions)
- Unusable (incomplete instructions)
- Harmful (outdated information)
This skill automates the transformation pipeline:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ EXTRACT │────▶│ IDENTIFY │────▶│ VERIFY │────▶│ TRANSFORM │────▶│ GENERATE │
│ transcript │ │technologies │ │ up-to-date │ │ content │ │ skill │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
Pulls transcripts from YouTube using captions API (with Whisper fallback):
python scripts/extract_youtube.py "https://youtube.com/watch?v=..."Output: JSON with metadata (title, channel, date) and full transcript.
Scans transcript for technologies:
- Languages and versions
- Frameworks and libraries
- APIs and services
- CLI tools
Videos get outdated. Skills must not.
Uses available tools to verify current state:
| Source | Tool | Use Case |
|---|---|---|
| Library docs | Context7 MCP1 | React, Node.js, Python packages |
| General search | WebSearch | Tools, services, APIs |
| Specific docs | WebFetch | Official documentation URLs |
When outdated content is detected, the user is asked how to proceed:
- Update to current version
- Keep original with warnings
- Skip the section
Converts narrative video content to actionable skill instructions:
| Video Content | Skill Content |
|---|---|
| "So basically what we're doing..." | (removed) |
| "You want to run this command..." | bash\ncommand\n |
| "The important thing here is..." | Prerequisites section |
| "If you get this error..." | Troubleshooting section |
Produces a SKILL.md following the standard skill format:
---
name: lowercase-with-hyphens
description: Action verb + what + when to use (max 1024 chars)
---youtube-to-skill/ # github.com/yfe404/youtube-to-skill
├── SKILL.md # Main skill (174 lines)
│ # - Workflow overview
│ # - Tool integration (Context7, WebSearch)
│ # - Quality checklist
│
├── reference.md # Deep knowledge (289 lines)
│ # - How agents discover skills
│ # - Transformation patterns
│ # - Common mistakes
│
├── templates/
│ └── skill_template.md # Output template (94 lines)
│
├── scripts/
│ ├── extract_youtube.py # Transcript extractor (337 lines)
│ └── requirements.txt # Dependencies
│
└── examples/
└── good_vs_bad.md # Contrasting examples (354 lines)
Total: 1,250 lines of skill content
# Install the skill
npx skills add yfe404/youtube-to-skill
# Install dependencies
pip install youtube-transcript-api yt-dlpRestart your AI agent to load.
Start a new session and request:
Create a skill from this YouTube video: https://youtube.com/watch?v=...
The agent will:
- Extract the transcript
- Identify technologies
- Verify current state (ask about updates)
- Ask about enrichment from official docs
- Generate the skill
- Ask where to save it
| Package | Purpose | Install |
|---|---|---|
| youtube-transcript-api | Caption extraction | pip install youtube-transcript-api |
| yt-dlp | Metadata + fallback | pip install yt-dlp |
| openai-whisper | Audio transcription (optional) | pip install openai-whisper |
- Requires captions: Videos without captions need Whisper (slower, requires GPU for speed)
- English-focused: Translation available but may reduce accuracy
- Single video or playlist: Does not process channel-wide content
- Verification depends on tools: Context7 coverage varies by library
License: MIT
Footnotes
-
Context7 MCP — Library documentation retrieval tool providing up-to-date API references and code examples. ↩