skills/video/SKILL.md
# Video & Podcast Digest Skill > Send a video/podcast link → get full transcript + structured summary ## Supported Platforms | Platform | Type | Subtitles | Whisper Transcription | |----------|------|-----------|----------------------| | YouTube | Video | ✅ | ✅ | | Bilibili | Video | ✅ | ✅ | | X/Twitter | Video | ❌ | ✅ | | Xiaoyuzhou (小宇宙) | Podcast | ❌ | ✅ | | Apple Podcasts | Podcast | ❌ | ✅ | | Direct links (mp3/mp4/m3u8) | Any | ❌ | ✅ | ## Trigger Auto-triggered when a media URL is dete
npx skillsauth add runesleo/x-reader skills/videoInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Send a video/podcast link → get full transcript + structured summary
| Platform | Type | Subtitles | Whisper Transcription | |----------|------|-----------|----------------------| | YouTube | Video | ✅ | ✅ | | Bilibili | Video | ✅ | ✅ | | X/Twitter | Video | ❌ | ✅ | | Xiaoyuzhou (小宇宙) | Podcast | ❌ | ✅ | | Apple Podcasts | Podcast | ❌ | ✅ | | Direct links (mp3/mp4/m3u8) | Any | ❌ | ✅ |
Auto-triggered when a media URL is detected:
youtube.com, youtu.bebilibili.com, b23.tvx.com, twitter.com (tweets with video)xiaoyuzhoufm.compodcasts.apple.com.mp3, .mp4, .m3u8, .m4a, .webm| URL Pattern | Type | Pipeline |
|-------------|------|----------|
| xiaoyuzhoufm.com/episode/ | Podcast | → Step 1b (Xiaoyuzhou) |
| podcasts.apple.com | Podcast | → Step 1c (Apple) |
| bilibili.com, b23.tv | Video | → Step 1d (Bilibili API) |
| .mp3, .m4a direct link | Audio | → Step 2b (direct download) |
| Other | Video | → Step 1a (subtitle extraction) |
# Clean up temp files
rm -f /tmp/media_sub*.vtt /tmp/media_audio.mp3 /tmp/media_transcript*.json /tmp/media_segment_*.mp3 2>/dev/null || true
# YouTube (prefer English, fallback Chinese)
yt-dlp --skip-download --write-auto-sub --sub-lang "en,zh-Hans" -o "/tmp/media_sub" "VIDEO_URL"
# Bilibili
yt-dlp --skip-download --write-auto-sub --sub-lang "zh-Hans,zh" -o "/tmp/media_sub" "VIDEO_URL"
Check for subtitles:
ls /tmp/media_sub*.vtt 2>/dev/null
# Extract CDN direct link from __NEXT_DATA__
# Xiaoyuzhou is a Next.js SPA, but initial HTML contains audio URL in __NEXT_DATA__
AUDIO_URL=$(curl -sL -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" \
"EPISODE_URL" \
| grep -oE 'https://media\.xyzcdn\.net/[^"]+\.(m4a|mp3)' \
| head -1)
echo "Audio URL: $AUDIO_URL"
# Download audio
curl -L -o /tmp/media_audio.mp3 "$AUDIO_URL"
If curl extraction is empty (rare), fallback: use Puppeteer/browser to get rendered page and extract.
→ Step 2b (check size & transcribe)
yt-dlp -f "ba[ext=m4a]/ba/b" --extract-audio --audio-format mp3 --audio-quality 5 \
-o "/tmp/media_audio.%(ext)s" "APPLE_PODCAST_URL"
→ Step 2b (check size & transcribe)
yt-dlp returns 412 for Bilibili even with cookies. Use Bilibili's API instead:
# 1. Extract BV number from URL
BV="BV1xxxxx" # Replace with actual BV number
# 2. Get video info (title, duration, CID)
curl -s "https://api.bilibili.com/x/web-interface/view?bvid=$BV" \
-H "User-Agent: Mozilla/5.0" -H "Referer: https://www.bilibili.com/" \
| python3 -c "import json,sys; d=json.load(sys.stdin)['data']; print(f\"Title: {d['title']}\nDuration: {d['duration']}s\nCID: {d['cid']}\")"
# 3. Get audio stream URL
CID=<CID from previous step>
AUDIO_URL=$(curl -s "https://api.bilibili.com/x/player/playurl?bvid=$BV&cid=$CID&fnval=16&qn=64" \
-H "User-Agent: Mozilla/5.0" -H "Referer: https://www.bilibili.com/" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['data']['dash']['audio'][0]['baseUrl'])")
# 4. Download audio (Referer header required, otherwise 403)
curl -L -o /tmp/media_audio.m4s \
-H "User-Agent: Mozilla/5.0" -H "Referer: https://www.bilibili.com/" "$AUDIO_URL"
# 5. Convert to mp3
ffmpeg -y -i /tmp/media_audio.m4s -acodec libmp3lame -q:a 5 /tmp/media_audio.mp3
→ Step 2b (check size & transcribe)
# YouTube may need --cookies-from-browser chrome to bypass bot detection
yt-dlp --cookies-from-browser chrome -f "ba[ext=m4a]/ba/b" --extract-audio --audio-format mp3 --audio-quality 5 \
-o "/tmp/media_audio.%(ext)s" "VIDEO_URL"
FILE_SIZE=$(stat -f%z /tmp/media_audio.* 2>/dev/null || stat -c%s /tmp/media_audio.* 2>/dev/null)
echo "File size: $FILE_SIZE bytes"
Splitting large audio (>25MB):
# Get total duration
DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 /tmp/media_audio.* | head -1)
# Split into 10-minute segments (keeps each under 25MB)
SEGMENT_SEC=600
SEGMENTS=$(python3 -c "import math; print(math.ceil(float('$DURATION')/$SEGMENT_SEC))")
# Cut segments
for i in $(seq 0 $((SEGMENTS-1))); do
START=$((i * SEGMENT_SEC))
ffmpeg -y -i /tmp/media_audio.* -ss $START -t $SEGMENT_SEC -acodec libmp3lame -q:a 5 \
"/tmp/media_segment_${i}.mp3" 2>/dev/null
done
→ Call Step 2c for each segment sequentially (parallel triggers Groq 524 timeout), concatenate results
Prerequisite: GROQ_API_KEY environment variable
# Check API key
if [ -z "$GROQ_API_KEY" ]; then
echo "❌ GROQ_API_KEY not set. Get one at: https://console.groq.com/keys"
exit 1
fi
# Transcribe single file (replace AUDIO_FILE with actual path)
curl -s -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "file=@AUDIO_FILE" \
-F "model=whisper-large-v3-turbo" \
-F "response_format=verbose_json" \
-F "language=zh" \
> /tmp/media_transcript.json
# Extract plain text
python3 -c "import json; print(json.load(open('/tmp/media_transcript.json'))['text'])"
Whisper model options:
| Model | Speed | Accuracy | Use Case |
|-------|-------|----------|----------|
| whisper-large-v3-turbo | 10x realtime | High | Default choice |
| whisper-large-v3 | 5x realtime | Highest | Professional/noisy content |
Language parameter:
language=zhlanguage=enChoose output format based on media type:
Video (≤20 min):
Podcast (>20 min):
## 📺 Video Digest
**Title**: [Video Title]
**Duration**: [x minutes]
**Language**: [Chinese/English]
### Overview
[1-2 sentence summary]
### Key Points
1. [Point 1]
2. [Point 2]
...
### Notable Quotes
> "xxx" — [timestamp]
### Action Items
- [if applicable]
## 🎙️ Podcast Digest
**Show**: [Podcast Name]
**Episode**: [Episode Title]
**Duration**: [x minutes]
**Guests**: [if any]
### Overview
[2-3 sentences: who discussed what, core conclusions]
### Chapter Summary
#### 1. [Topic] (~xx:xx-xx:xx)
[2-3 sentences of core content]
#### 2. [Topic] (~xx:xx-xx:xx)
[2-3 sentences of core content]
...
### Key Points
1. [Point 1]
2. [Point 2]
...
### Notable Quotes
> "xxx"
### Action Items
- [if applicable]
| Situation | Action |
|-----------|--------|
| No subtitles + no GROQ_API_KEY | Prompt user to set API key |
| No subtitles + has API key | Auto Whisper transcription |
| Xiaoyuzhou curl extraction empty | Use Puppeteer/browser to get rendered HTML |
| Audio >25MB | ffmpeg segment (10min/segment), transcribe sequentially |
| Podcast >2 hours | Warn user about duration, confirm before proceeding |
| Groq 524 timeout | Do NOT parallelize — transcribe sequentially, sleep 5-8s between segments |
| Groq 429 rate limit | 7200s/hour limit, wait for retry-after header, then retry |
| yt-dlp Bilibili 412 | Use Bilibili API instead (Step 1d) |
| yt-dlp YouTube bot detection | Add --cookies-from-browser chrome |
| Network timeout | Retry once |
| Spotify links | Inform user: not supported (DRM protected) |
yt-dlp: video download + subtitle extractionffmpeg: audio conversion + segmentationcurl: Xiaoyuzhou audio download, Bilibili APIGROQ_API_KEY: Whisper transcription API (free at https://console.groq.com/keys)tools
# Content Analyzer Skill > Any content → structured analysis report with actionable insights ## Trigger When user sends content (URL, text, or transcript) with analysis intent: - `/analyze [URL]` - "Analyze this article" - "What are the key takeaways?" - Auto-triggered after video/podcast transcription (from video skill) ## Pipeline ### Step 1: Get Content Choose tool based on input type: | Input | Tool | |-------|------| | Tweet URL | `fetch_tweet` or Jina Reader | | Web URL | `WebFetch`
content-media
Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).
content-media
QQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
content-media
Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).