skills/transcribing-youtube/SKILL.md
Downloads YouTube videos, transcribes audio via OpenAI Whisper, and produces summaries stored locally. Covers yt-dlp download, audio extraction, transcription, caching, and summarization. Use when a YouTube link is shared and the user wants a transcript or summary
npx skillsauth add riccardogrin/skills transcribing-youtubeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Download audio from YouTube videos using yt-dlp, transcribe with OpenAI Whisper API, and summarize the content.
Transcriptions and summaries are cached in a transcriptions/ folder to avoid redundant API calls.
| File | Read When |
|------|-----------|
| references/installation.md | Phase 0 reports a missing dependency (yt-dlp, ffmpeg, or API key) |
| references/troubleshooting.md | Download fails, transcription errors, codec issues, or rate limiting |
OPENAI_API_KEY, yt-dlp, ffmpeg. Phase 0 checks all three and references/installation.md covers install steps if anything is missing.
All scripts/ paths are relative to the skill directory — resolve to absolute paths before running.
- [ ] Phase 0: Verify prerequisites
- [ ] Phase 1: Check cache for existing transcription
- [ ] Phase 2: Download audio
- [ ] Phase 3: Transcribe audio
- [ ] Phase 4: Summarize transcription
- [ ] Phase 5: Save and report
Run these checks. If anything is MISSING, read references/installation.md and install it before proceeding.
python -c "
import os, shutil
from pathlib import Path
key = os.environ.get('OPENAI_API_KEY', '')
if not key:
env = Path('.env')
if env.exists():
for line in env.read_text().splitlines():
if line.strip().startswith('OPENAI_API_KEY'):
key = line.partition('=')[2].strip().strip('\"').strip(\"'\")
print('OPENAI_API_KEY:', 'OK' if key else 'MISSING')
print('yt-dlp:', 'OK' if shutil.which('yt-dlp') else 'MISSING')
print('ffmpeg:', 'OK' if shutil.which('ffmpeg') else 'MISSING')
"
Then ensure dependencies and output directory are ready:
pip install -r <skill-dir>/scripts/requirements.txt
mkdir -p transcriptions
Before downloading anything, check if this video has already been transcribed.
The download script extracts the video ID from the URL. Check for existing files:
ls transcriptions/<VIDEO_ID>_transcript.txt 2>/dev/null && echo "CACHED" || echo "NEW"
Extract the video ID from common URL formats:
https://www.youtube.com/watch?v=VIDEO_ID — the v parameterhttps://youtu.be/VIDEO_ID — the path segmenthttps://www.youtube.com/shorts/VIDEO_ID — the path after shorts/If cached:
transcriptions/<VIDEO_ID>_transcript.txttranscriptions/<VIDEO_ID>_summary.txt (if it exists)Download audio only (no video) to minimize bandwidth and storage.
python scripts/download_audio.py --url "YOUTUBE_URL" --output-dir transcriptions
The script:
<VIDEO_ID>.mp3 (or <VIDEO_ID>_chunk_001.mp3 etc. if split)Output format:
VIDEO_ID: dQw4w9WgXcQ
TITLE: Rick Astley - Never Gonna Give You Up
DURATION: 213
FILES: transcriptions/dQw4w9WgXcQ.mp3
STATUS: OK
If the video is longer than 3 hours, warn the user — transcription will be expensive and slow. Ask for confirmation before proceeding.
Transcribe the downloaded audio using OpenAI's Whisper API.
python scripts/transcribe_audio.py --input transcriptions/<VIDEO_ID>.mp3 --output transcriptions/<VIDEO_ID>_transcript.txt
The script:
whisper-1 model)--timestamps flag is passedFor chunked audio (files that were split in Phase 2):
python scripts/transcribe_audio.py --input-dir transcriptions --video-id <VIDEO_ID> --output transcriptions/<VIDEO_ID>_transcript.txt
This mode auto-discovers all <VIDEO_ID>_chunk_*.mp3 files and transcribes them in order.
Output:
WORDS: 3847
DURATION_SECONDS: 213
OUTPUT: transcriptions/dQw4w9WgXcQ_transcript.txt
STATUS: OK
After transcription completes, clean up audio files to save disk space:
rm transcriptions/<VIDEO_ID>.mp3 transcriptions/<VIDEO_ID>_chunk_*.mp3 2>/dev/null
Read the transcript file and produce a summary. This is done by you (the agent), not a script.
Audience: Write for a smart, knowledgeable reader. Don't pad with filler or restate the obvious. Focus on genuinely valuable insights, novel arguments, and actionable information. Omit trivial details, pleasantries, sponsor reads, advertisements, and anything a thoughtful person could infer from context. If the video references specific tools, websites, repos, or resources that viewers would find useful, mention them. For tutorials and how-to content, preserve specific values, thresholds, and step sequences — these are the primary value.
transcriptions/<VIDEO_ID>_transcript.txttranscriptions/<VIDEO_ID>_summary.txtSummary format:
# Video Summary: <TITLE>
**Source:** <YOUTUBE_URL>
**Duration:** <DURATION>
**Transcribed:** <DATE>
**Words:** <WORD_COUNT>
## Key Points
- Point 1
- Point 2
- ...
## Summary
<paragraphs scaled to content length>
## Notable Quotes
> "Quote 1"
> "Quote 2"
transcriptions/<VIDEO_ID>_transcript.txt — full transcripttranscriptions/<VIDEO_ID>_summary.txt — summary| Avoid | Do Instead |
|-------|------------|
| Downloading full video | Download audio only (-x flag) — much smaller |
| Sending files over 25MB to Whisper | Split into chunks before transcribing |
| Re-downloading already transcribed videos | Check cache by video ID first |
| Keeping audio files after transcription | Delete WAV files to save disk space |
| Transcribing 3+ hour videos without asking | Warn user about cost and get confirmation |
| Using youtube-dl (unmaintained) | Use yt-dlp (actively maintained fork) |
| Hardcoding API keys in scripts | Load from env var or .env file |
| Summarizing without reading the full transcript | Always read the complete transcript before summarizing |
| Generating summary in a script | Use the agent (Claude) for summarization — better quality |
This skill requires yt-dlp, openai, and ffmpeg. Phase 0 checks availability; references/installation.md handles the rest. The only manual step for the user is providing OPENAI_API_KEY.
development
Runs an adversarial code review loop that spawns independent reviewer and fixer subagents, iterating until only nitpicks remain. Scores findings by confidence, fixes real issues, and re-reviews with fresh eyes — all internal, no GitHub comments. Use when asked to review code, self-review, adversarial review, or polish code before pushing
development
Creates implementation-ready plans through discovery interviews, external research, and codebase analysis. Covers requirements, competitor research, architecture decisions, and change sequencing. Use when planning features, roadmaps, specs, or any work that needs discovery before coding
development
Generates an autonomous game design loop that iteratively expands a game concept into a comprehensive vision and implementation plan across multiple sessions. Covers mechanic exploration, system design, competitor research, and plan generation. Use when developing a game idea from seed concept to full implementation plan
testing
Generates an autonomous implementation loop that executes tasks from a plan across Claude sessions, with periodic audit passes that inject follow-up tasks. Covers loop script, prompt design, and audit cadence. Use when setting up autonomous task execution or Ralph-style iterative workflows