skills/research/video-transcript/SKILL.md
Extract video transcripts: yt-dlp subtitles to clean paragraphs.
npx skillsauth add notque/claude-code-toolkit video-transcriptInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Pull a video's transcript as readable paragraphs. Two paths, in order:
Path 1 — uploader subtitles (accurate, prefer when present):
yt-dlp --skip-download --write-subs --sub-langs en --sub-format vtt \
-o '<work-dir>/%(id)s' '<URL>'
Path 2 — auto-generated captions (fallback when path 1 writes no file):
yt-dlp --skip-download --write-auto-subs --sub-langs en --sub-format vtt \
-o '<work-dir>/%(id)s' '<URL>'
Then clean the VTT into paragraphs:
python3 skills/research/video-transcript/scripts/vtt_to_paragraph.py <work-dir>/<id>.en.vtt
Default output is plain paragraph text with [Music]-style cues stripped and the rolling duplicates of auto-captions deduplicated. Use --timestamps for [mm:ss] markers, --keep-brackets to keep cue tags, -o FILE to write to a file.
For other languages, change --sub-langs (e.g. de, en.*). List what a video offers with yt-dlp --list-subs '<URL>'.
Cause: video has no subtitles or captions in the requested language.
Solution: run yt-dlp --list-subs '<URL>' and pick an available language; if none exist, report that and offer audio transcription via the markdown-converter skill on a downloaded audio file.
Cause: platform rate-limiting the host.
Solution: wait and retry with --sleep-requests 2; keep request volume low.
Cause: VTT came from a third path (e.g. translated captions) with cue formats the dedupe misses.
Solution: rerun the cleaner; if repeats remain, file the sample VTT alongside a fix to vtt_to_paragraph.py.
tools
Collect, filter, and freshness-qualify news items.
development
Convert PDF, Office, HTML, data, media, ZIP to Markdown.
testing
Verify factual claims against sources before publish.
data-ai
Package session state for the next agent, or rehydrate it at start.