skills/blog-audio/SKILL.md
Generate audio narration of blog posts using Google Gemini TTS. Supports summary narration, full article read-aloud, and two-speaker podcast/dialogue mode with 30 voice options. Outputs MP3 with HTML5 audio embed code. Works standalone via /blog audio or internally from blog-write. Falls back gracefully when API key is not configured. Use when user says "blog audio", "narrate blog", "audio version", "text to speech", "tts", "podcast mode", "read aloud", "audio narration", "voice", "narration", "generate audio".
npx skillsauth add agricidaniel/claude-blog blog-audioInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.
| Command | What it does |
|---------|-------------|
| /blog audio generate <file> | Generate audio narration of a blog post |
| /blog audio voices | Show available voices with characteristics |
| /blog audio setup | Check/configure API key for Gemini TTS |
run.py)GOOGLE_AI_API_KEY environment variable (same key used by blog-image)# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json
# WRONG:
python3 scripts/generate_audio.py --text "..." # Fails without venv
Before generating audio, check for the API key:
echo $GOOGLE_AI_API_KEY
export GOOGLE_AI_API_KEY=your-key
This is the same key used by /blog image: if image generation works, audio works too."For /blog audio setup:
GOOGLE_AI_API_KEY is set in environment.mcp.json), the key is already availablepython3 scripts/run.py generate_audio.py --text "Test" --dry-run --jsonFor /blog audio voices:
Load references/voices.md and present the voice catalog to the user.
Ask the user which voice they prefer, or recommend based on content type:
For /blog audio generate <file>:
Read the file and extract:
Ask the user (or auto-select if they specified --mode):
| Mode | When to use | Output | |------|-------------|--------| | Summary | Quick audio overview (1-2 min) | 200-300 word spoken summary | | Full | Complete read-aloud (5-15 min) | Full article as natural speech | | Dialogue | Podcast-style (3-8 min) | Two-person conversation about the article |
CRITICAL: Claude prepares the text. The script does TTS only.
Summary mode: Write a 200-300 word spoken summary of the article. Rules:
Full mode: Strip the markdown content to clean spoken text:
Dialogue mode: Write a 2-person conversation script about the article:
[Speaker1] What's the key takeaway here?If the user chose a voice, use it. Otherwise, recommend based on mode:
Write the prepared text to a temp file, then call:
# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_prepared.txt \
--voice Charon \
--model flash \
--output /path/to/audio/post-slug.mp3 \
--json
# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
--text-file /tmp/blog_audio_dialogue.txt \
--voice Puck \
--voice2 Kore \
--model pro \
--output /path/to/audio/post-slug-dialogue.mp3 \
--json
Model selection:
flash (default): Fast, cheap. Good for summaries and standard narration.pro: Higher quality. Use for dialogue mode or premium content.Present the result to the user:
<audio controls preload="metadata">
<source src="audio/post-slug.mp3" type="audio/mpeg">
Your browser does not support the audio element.
</audio>
<audio controls preload="metadata">
<source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>
[audio src="audio/post-slug.mp3"]
Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".
When invoked internally from blog-write:
Input:
text: Prepared text (already cleaned by Claude)voice: Voice name (default: Charon)voice2: Second voice for dialogue (optional)model: flash or prooutput_path: Where to save the fileOutput:
### Audio Narration
- **Path:** /path/to/audio/post-slug.mp3
- **Duration:** 3:42
- **Voice:** Charon
- **Embed:** `<audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>`
Graceful fallback: If GOOGLE_AI_API_KEY is not set, return immediately
with no error. The writing workflow continues without audio. Never block
blog-write because audio generation is unavailable.
| Error | Resolution |
|-------|-----------|
| GOOGLE_AI_API_KEY not set | Get key at https://aistudio.google.com/apikey |
| FFmpeg not found | Install: sudo apt install ffmpeg. Falls back to WAV output. |
| Rate limited | Wait and retry. Check limits at https://aistudio.google.com/rate-limit |
| Text too long (>32k tokens) | Split into sections, generate separately |
| Unknown voice name | Run /blog audio voices to see valid options |
| API error | Check key validity, model availability (preview models) |
| API key missing (internal call) | Return silently: writing workflow continues |
Load on-demand: do NOT load all at startup:
references/voices.md: Full 30-voice catalog, recommendations by content type, dialogue pairingsdevelopment
Research what people are actually saying about a topic in the last 30 days across Reddit, X / Twitter, YouTube, Hacker News, dev.to, Medium, and other public discourse platforms. API-free; uses WebSearch with platform-targeted site operators plus recency filters. Produces DISCOURSE.md (a structured brief) and JSON output the writer can consume. Complements blog-researcher (which focuses on authority sources) with a recency-and-engagement lens. Use when user says "blog discourse", "discourse research", "what are people saying about", "research what people are saying", "voice of customer", "social listening", "30-day research", "trend research", "what's the discussion on", "real-time research", "practitioner discourse", "/blog discourse".
documentation
Establish durable brand and voice context for cross-skill consumption. Generates BRAND.md (audience, positioning, do/don't editorial rules, taboo phrases, competitor differentiation) and VOICE.md (existing persona JSON re-expressed as readable prose), both written to the project root. When present, all blog sub-skills auto-load these files before writing or reviewing. Pairs with blog-persona, which manages the structured persona JSON. Use when user says "blog brand", "create brand context", "brand voice doc", "BRAND.md", "VOICE.md", "establish editorial brand", "brand guidelines for blog".
testing
Translate existing blog posts into one or more target languages with SEO-optimized localization. Produces native-quality translations that preserve markdown structure, frontmatter, schema JSON-LD, image and chart embeds, and citation capsules. Localizes keywords, meta tags, numbers, dates, currencies, and quote styles per locale. Flags machine-translation artifacts for review. Run BEFORE blog-localize: this handles language conversion; localize handles cultural adaptation after translation completes. Use when user says "translate blog", "blog translate", "uebersetzen", "traduire", "traducir", "translate post", "blog auf Deutsch", "blog en espanol".
testing
One-command multilingual blog creation. Writes a blog post, translates it into user-specified languages, applies cultural adaptation, and emits hreflang tags, sitemap entries, and a CMS-ready language map. The complete write-to-publish pipeline for international content. Orchestrates blog-write, blog-translate, blog-localize, and (optionally) seo-hreflang. Use when user says "multilingual blog", "blog multilingual", "write in multiple languages", "international blog", "mehrsprachiger Blog", "blog multilingue", "blog multilingue", "create blog in German and French".