skills/baoyu-youtube-transcript/SKILL.md
Downloads YouTube video transcripts/subtitles and cover images by URL or video ID. Supports multiple languages, translation, chapters, and speaker identification. Caches raw data for fast re-formatting. Use when user asks to "get YouTube transcript", "download subtitles", "get captions", "YouTube字幕", "YouTube封面", "视频封面", "video thumbnail", "video cover image", or provides a YouTube URL and wants the transcript/subtitle text or cover image extracted.
npx skillsauth add open-fox/agents baoyu-youtube-transcriptInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Downloads transcripts (subtitles/captions) from YouTube videos. Works with both manually created and auto-generated transcripts. No API key or browser required — uses YouTube's InnerTube API directly and automatically falls back to yt-dlp when YouTube blocks the direct API path.
Fetches video metadata and cover image on first run, caches raw data for fast re-formatting.
Scripts in scripts/ subdirectory. {baseDir} = this SKILL.md's directory path. Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun. Replace {baseDir} and ${BUN_X} with actual values.
| Script | Purpose |
|--------|---------|
| scripts/main.ts | Transcript download CLI |
# Default: markdown with timestamps (English)
${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>
# Specify languages (priority order)
${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja
# Without timestamps
${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps
# With chapter segmentation
${BUN_X} {baseDir}/scripts/main.ts <url> --chapters
# With speaker identification (requires AI post-processing)
${BUN_X} {baseDir}/scripts/main.ts <url> --speakers
# SRT subtitle file
${BUN_X} {baseDir}/scripts/main.ts <url> --format srt
# Translate transcript
${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans
# List available transcripts
${BUN_X} {baseDir}/scripts/main.ts <url> --list
# Force re-fetch (ignore cache)
${BUN_X} {baseDir}/scripts/main.ts <url> --refresh
| Option | Description | Default |
|--------|-------------|---------|
| <url-or-id> | YouTube URL or video ID (multiple allowed) | Required |
| --languages <codes> | Language codes, comma-separated, in priority order | en |
| --format <fmt> | Output format: text, srt | text |
| --translate <code> | Translate to specified language code | |
| --list | List available transcripts instead of fetching | |
| --timestamps | Include [HH:MM:SS → HH:MM:SS] timestamps per paragraph | on |
| --no-timestamps | Disable timestamps | |
| --chapters | Chapter segmentation from video description | |
| --speakers | Raw transcript with metadata for speaker identification | |
| --exclude-generated | Skip auto-generated transcripts | |
| --exclude-manually-created | Skip manually created transcripts | |
| --refresh | Force re-fetch, ignore cached data | |
| -o, --output <path> | Save to specific file path | auto-generated |
| --output-dir <dir> | Base output directory | youtube-transcript |
| Variable | Description |
|----------|-------------|
| YOUTUBE_TRANSCRIPT_COOKIES_FROM_BROWSER | Passed to yt-dlp --cookies-from-browser during fallback, e.g. chrome, safari, firefox, or chrome:Profile 1 |
Accepts any of these as video input:
https://www.youtube.com/watch?v=dQw4w9WgXcQhttps://youtu.be/dQw4w9WgXcQhttps://www.youtube.com/embed/dQw4w9WgXcQhttps://www.youtube.com/shorts/dQw4w9WgXcQdQw4w9WgXcQ| Format | Extension | Description |
|--------|-----------|-------------|
| text | .md | Markdown with frontmatter (incl. description), title heading, summary, optional TOC/cover/timestamps/chapters/speakers |
| srt | .srt | SubRip subtitle format for video players |
youtube-transcript/
├── .index.json # Video ID → directory path mapping (for cache lookup)
└── {channel-slug}/{title-full-slug}/
├── meta.json # Video metadata (title, channel, description, duration, chapters, etc.)
├── transcript-raw.json # Raw transcript snippets from YouTube API (cached)
├── transcript-sentences.json # Sentence-segmented transcript (split by punctuation, merged across snippets)
├── imgs/
│ └── cover.jpg # Video thumbnail
├── transcript.md # Markdown transcript (generated from sentences)
└── transcript.srt # SRT subtitle (generated from raw snippets, if --format srt)
{channel-slug}: Channel name in kebab-case{title-full-slug}: Full video title in kebab-caseThe --list mode outputs to stdout only (no file saved).
On first fetch, the script saves:
meta.json — video metadata, chapters, cover image path, language infotranscript-raw.json — raw transcript snippets from YouTube API ({ text, start, duration }[])transcript-sentences.json — sentence-segmented transcript ({ text, start: "HH:mm:ss", end: "HH:mm:ss" }[]), split by sentence-ending punctuation (.?!…。?! etc.), timestamps proportionally allocated by character length, CJK-aware text mergingimgs/cover.jpg — video thumbnailSubsequent runs for the same video use cached data (no network calls). Use --refresh to force re-fetch. If a different language is requested, the cache is automatically refreshed.
When YouTube returns anti-bot / blocked responses on the direct InnerTube path, the script retries with alternate client identities and then falls back to yt-dlp if available. If fallback is needed but yt-dlp is unavailable, the agent should decide how to make yt-dlp available and continue rather than pushing the installation decision to the user.
SRT output (--format srt) is generated from transcript-raw.json. Text/markdown output uses transcript-sentences.json for natural sentence boundaries.
When user provides a YouTube URL and wants the transcript:
--list first if the user hasn't specified a language, to show available options? as a glob wildcard, so an unquoted YouTube URL causes "no matches found": use 'https://www.youtube.com/watch?v=ID'--chapters --speakers for the richest output (chapters + speaker identification)--speakers mode: after the script saves the raw file, follow the speaker identification workflow below to post-process with speaker labelsWhen user only wants a cover image or metadata, running the script with any option will also cache meta.json and imgs/cover.jpg.
When re-formatting the same video (e.g., first text then SRT), the cached data is reused — no re-fetch needed.
--chapters)The script parses chapter timestamps from the video description (e.g., 0:00 Introduction), segments the transcript by chapter boundaries, groups snippets into readable paragraphs, and saves as .md with a Table of Contents. No further processing needed.
If no chapter timestamps exist in the description, the transcript is output as grouped paragraphs without chapter headings.
--speakers)Speaker identification requires AI processing. The script outputs a raw .md file containing:
After the script saves the raw file, spawn a sub-agent (use a cheaper model like Sonnet for cost efficiency) to process speaker identification:
.md file{baseDir}/prompts/speaker-transcript.md**Speaker Name:** labels, paragraph grouping (2-4 sentences), and [HH:MM:SS → HH:MM:SS] timestamps.md file with the processed transcript (keep the YAML frontmatter)When --speakers is used, --chapters is implied — the processed output always includes chapter segmentation.
| Error | Meaning |
|-------|---------|
| Transcripts disabled | Video has no captions at all |
| No transcript found | Requested language not available |
| Video unavailable | Video deleted, private, or region-locked |
| IP blocked | Too many requests, try again later |
| Age restricted | Video requires login for age verification |
| bot detected | The script retries alternate clients and then yt-dlp; if fallback tooling is missing, the agent should resolve that itself, otherwise if it still fails try YOUTUBE_TRANSCRIPT_COOKIES_FROM_BROWSER=safari (or your browser) |
development
Comprehensive Cloudflare platform skill covering Workers, Pages, storage (KV, D1, R2), AI (Workers AI, Vectorize, Agents SDK), feature flags (Flagship), networking (Tunnel, Spectrum), security (WAF, DDoS), and infrastructure-as-code (Terraform, Pulumi). Use for any Cloudflare development task. Biases towards retrieval from Cloudflare docs over pre-trained knowledge.
tools
Send and receive transactional emails with Cloudflare Email Service (Email Sending + Email Routing). Use when building email sending (Workers binding or REST API), email routing, Agents SDK email handling, or integrating email into any app — Workers, Node.js, Python, Go, etc. Also use for email deliverability, SPF/DKIM/DMARC, wrangler email setup, MCP email tools, or when a coding agent needs to send emails. Even for simple requests like "add email to my Worker" — this skill has critical config details.
tools
Build AI agents on Cloudflare Workers using the Agents SDK. Load when creating stateful agents, durable workflows, real-time WebSocket apps, scheduled tasks, MCP servers, chat applications, voice agents, or browser automation. Covers Agent class, state management, callable RPC, Workflows, durable execution, queues, retries, observability, and React hooks. Biases towards retrieval from Cloudflare docs over pre-trained knowledge.
development
Tell the agent to zoom out and give broader context or a higher-level perspective. Use when you're unfamiliar with a section of code or need to understand how it fits into the bigger picture.