skills/audio/SKILL.md
Generate audio narration of blog posts using Google Gemini TTS. Supports summary narration, full article read-aloud, and two-speaker podcast/dialogue mode with 30 voice options. Outputs MP3 with HTML5 audio embed code. Works standalone via /smart-blog-skills:audio or internally from write. Falls back gracefully when API key is not configured. Use when user says "blog audio", "narrate blog", "audio version", "text to speech", "tts", "podcast mode", "read aloud", "audio narration", "voice", "narration", "generate audio".
npx skillsauth add rainday/smart-blog-skills audioInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.
| Command | What it does |
|---------|-------------|
| /smart-blog-skills:audio generate <file> | Generate audio narration of a blog post |
| /smart-blog-skills:audio voices | Show available voices with characteristics |
| /smart-blog-skills:audio setup | Check/configure API key for Gemini TTS |
run.py)GOOGLE_AI_API_KEY environment variable (same key used by image)# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json
# WRONG:
python3 scripts/generate_audio.py --text "..." # Fails without venv
Before generating audio, check for the API key:
echo $GOOGLE_AI_API_KEY
export GOOGLE_AI_API_KEY=your-key
This is the same key used by /smart-blog-skills:image -- if image generation works, audio works too."For /smart-blog-skills:audio setup:
GOOGLE_AI_API_KEY is set in environmentpython3 scripts/run.py generate_audio.py --text "Test" --dry-run --jsonFor /smart-blog-skills:audio voices:
Ask the user which voice they prefer, or recommend based on content type:
For /smart-blog-skills:audio generate <file>:
Read the file and extract:
Ask the user (or auto-select if they specified --mode):
| Mode | When to use | Output | |------|-------------|--------| | Summary | Quick audio overview (1-2 min) | 200-300 word spoken summary | | Full | Complete read-aloud (5-15 min) | Full article as natural speech | | Dialogue | Podcast-style (3-8 min) | Two-person conversation about the article |
CRITICAL: Claude prepares the text. The script does TTS only.
Summary mode: Write a 200-300 word spoken summary of the article. Rules:
Full mode: Strip the markdown content to clean spoken text:
Dialogue mode: Write a 2-person conversation script about the article:
[Speaker1] What's the key takeaway here?If the user chose a voice, use it. Otherwise, recommend based on mode:
Write the prepared text to a temp file, then call:
# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_prepared.txt \
--voice Charon \
--model flash \
--output /path/to/audio/post-slug.mp3 \
--json
# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_dialogue.txt \
--voice Puck \
--voice2 Kore \
--model pro \
--output /path/to/audio/post-slug-dialogue.mp3 \
--json
Model selection:
flash (default): Fast, cheap. Good for summaries and standard narration.pro: Higher quality. Use for dialogue mode or premium content.Present the result to the user:
<audio controls preload="metadata">
<source src="audio/post-slug.mp3" type="audio/mpeg">
Your browser does not support the audio element.
</audio>
<audio controls preload="metadata">
<source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>
[audio src="audio/post-slug.mp3"]
Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".
| Error | Resolution |
|-------|-----------|
| GOOGLE_AI_API_KEY not set | Get key at https://aistudio.google.com/apikey |
| FFmpeg not found | Install: sudo apt install ffmpeg. Falls back to WAV output. |
| Rate limited | Wait and retry. |
| Text too long (>32k tokens) | Split into sections, generate separately |
| Unknown voice name | Run /smart-blog-skills:audio voices to see valid options |
| API key missing (internal call) | Return silently -- writing workflow continues |
documentation
Smart Blog 寫文章。從零寫一篇新的部落格文章,包含模板選擇、研究、 YouTube 影片嵌入、Humanizer 反 AI 審稿、品質檢查。 內建反幻覺驗證,繁體中文優先。 Use when user says "write blog", "寫文章", "寫部落格", "new blog post", "smart-blog write", "blog write".
testing
Translate existing blog posts into one or more target languages with SEO-optimized localization. Produces native-quality translations that preserve markdown structure, frontmatter, schema JSON-LD, image and chart embeds, and citation capsules. Localizes keywords, meta tags, numbers, dates, currencies, and quote styles per locale. Flags machine-translation artifacts for review. Run BEFORE localize: this handles language conversion; localize handles cultural adaptation after translation completes. Use when user says "translate blog", "blog translate", "uebersetzen", "traduire", "traducir", "translate post", "blog auf Deutsch", "blog en espanol".
development
Extract, suggest, and sync tags and categories for blog posts across all major CMS platforms. Supports WordPress REST API, Shopify GraphQL, Ghost Content API, Strapi REST/GraphQL, and Sanity GROQ. Generates tag suggestions from content analysis (keyword frequency, heading extraction, semantic grouping), enforces minimum post-count thresholds to prevent thin tag archives, and syncs taxonomy via authenticated API calls. Use when user says "tags", "categories", "taxonomy", "tag suggestions", "sync tags", "WordPress tags", "Shopify tags".
development
Blog strategy development including topic cluster architecture with hub-and-spoke design, audience mapping, competitive landscape analysis, AI citation surface strategy across ChatGPT/Perplexity/AI Overviews, distribution channel planning (YouTube, Reddit, review platforms for GEO), content scoring targets, measurement framework, and content differentiation through original research and first-hand experience. Use when user says "blog strategy", "content strategy", "blog positioning", "what should I blog about", "blog topics", "content pillars", "blog ideation".