skills/whisper-transcribe/SKILL.md
Transcribe audio/video to accurate subtitles using Whisper AI, with optional translation and delivery. Supports YouTube URLs and local audio/video files. Use when: (1) a YouTube video has no subtitles, (2) auto-generated captions are inaccurate, (3) the user wants high-quality transcription, (4) the user needs translated subtitles, (5) the user wants transcripts sent to email or cloud storage. Triggers: "轉錄", "語音轉文字", "Whisper", "沒有字幕", "字幕不準", "transcribe", "speech to text", "no subtitles", "bad captions", "翻譯字幕", "translate subtitles", "寄到信箱", "上傳到雲端". Make sure to use this skill whenever the user needs transcription beyond what YouTube auto-captions provide, or when yt-search reports no subtitles available.
npx skillsauth add azuma520/youtube-to-notebooklm whisper-transcribeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
用 Whisper AI 將音頻/影片轉錄為高準確度字幕(SRT),可選翻譯和投遞。
yt-dlp + ffmpeg(下載音頻)pip install faster-whisper(需 GPU 或可用 CPU)pip install groq + GROQ_API_KEY詳細安裝見 references/setup.md。
python <skill-path>/scripts/transcribe.py "https://youtube.com/watch?v=xxx" \
-o "$TEMP/transcripts" \
--model large-v3 \
--device cuda
python <skill-path>/scripts/transcribe.py ./recording.mp3 \
-o "$TEMP/transcripts" \
--model large-v3
python <skill-path>/scripts/transcribe.py "URL_or_FILE" \
-o "$TEMP/transcripts" \
--backend groq
輸出:{video_id}.srt(帶時間碼)+ {video_id}.txt(純文字)。
| 情境 | 建議參數 |
|------|---------|
| 一般英文影片 | --model large-v3(預設) |
| 中文/日文內容 | --model large-v3 --language zh 或 ja |
| 快速預覽 | --model small --device cpu |
| 無 GPU | --backend groq |
| 長影片(> 2 小時) | 本地模式,避免 API 超時 |
不指定 --language 時會自動偵測。但已知語言時指定會提高準確度。
轉錄完成後,用戶想翻譯:
python <skill-path>/scripts/translate_srt.py "$TEMP/transcripts/VIDEO_ID.srt" \
--target zh-tw \
--engine deepl \
-o "$TEMP/transcripts/VIDEO_ID_zh.srt"
翻譯引擎選擇:
| 引擎 | 品質 | 速度 | 成本 | 設定 |
|------|------|------|------|------|
| deepl | 最佳 | 快 | 免費 50 萬字/月 | DEEPL_API_KEY |
| openai | 很好(上下文感知) | 中 | 按量計費 | OPENAI_API_KEY |
也可以讓 Agent 直接讀取 TXT 檔後在對話中翻譯,不需要額外 API — 適合短內容或用戶想邊看邊討論。
Agent 使用可用的 email 工具(Claude Code 有 Gmail MCP):
[Transcript] {影片標題}rclone copy "$TEMP/transcripts/VIDEO_ID.srt" gdrive:/Transcripts/
rclone copy "$TEMP/transcripts/VIDEO_ID.txt" gdrive:/Transcripts/
需要先設定 rclone(見 references/setup.md)。
如果用戶沒有 rclone,也可以用 Google Drive API 或手動告知檔案路徑讓用戶自己上傳。
yt-search 用 --list-subs 發現沒有字幕,或用戶表示自動字幕不準時:
轉錄完成後,用戶想推進 NotebookLM:
notebooklm source add "$TEMP/transcripts/VIDEO_ID.txt" --waitTXT 檔的優勢:經過 Whisper 轉錄,品質遠高於 NotebookLM 自己抓的 YouTube 自動字幕。
用戶:「幫我找 AI agent 的教學影片」
→ yt-search 搜尋,列出 20 部
用戶:「第 3 部看起來不錯,幫我抓字幕」
→ yt-search 回報:這部影片沒有字幕
用戶:「那用 Whisper 轉錄」
→ whisper-transcribe 下載音頻 + 轉錄
→ 產出 SRT + TXT
用戶:「翻譯成繁中,然後寄到我信箱」
→ translate_srt.py 翻譯
→ Gmail 寄出
用戶:「也幫我推到 NotebookLM 生成播客」
→ anything-to-notebooklm source add TXT
→ generate audio
rm -rf "$TEMP/transcripts/"
large-v3,不行換 medium 或 smallPYTHONUTF8=1 前綴testing
Search YouTube videos, get video metadata, and download subtitles/transcripts using yt-dlp. Use when the user wants to find videos, search YouTube, look up a YouTube channel, get video info, download captions, grab subtitles, or extract a transcript. Triggers include "找影片", "搜影片", "YouTube 搜尋", "抓字幕", "逐字稿", "search videos", "find videos on YouTube", "get subtitles", "video transcript", "youtube tutorial", "watch video about", "影片推薦", "有沒有影片", "教學影片". Make sure to use this skill whenever the user mentions YouTube, video search, subtitles, or transcripts — even if they don't explicitly say "yt-search".
development
Upload content to Google NotebookLM and generate podcasts, slides, mind maps, quizzes, reports, videos, and more. Also browse, search, and query existing notebooks. Supports URLs (web, YouTube), local files (PDF, DOCX, PPTX, XLSX, EPUB, Markdown, images, audio, CSV, JSON, ZIP), AI research, and Google Drive. Use when the user says: "上傳到 NotebookLM", "生成播客", "做成 PPT", "畫思維導圖", "generate podcast", "make slides", "create mind map", "upload to NotebookLM", "做成報告", "出題", "生成 Quiz", "turn this into a podcast", "summarize this as slides", "把這個做成播客", "幫我整理成報告", "ask NotebookLM", "問筆記本", "找之前的筆記本", "NotebookLM 上的", "check my notebooks", "download the podcast", "what notebooks do I have", "搜尋 notebook", "之前的 notebook", "下載播客", "瀏覽筆記本". Activate this skill whenever the user wants to push content to NotebookLM, generate outputs, browse or query existing notebooks, or download previously generated artifacts.
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).