Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

rainday/audio

Name: audio
Author: rainday

skills/audio/SKILL.md

npx skillsauth add rainday/smart-blog-skills audio

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Blog Audio -- Gemini TTS Narration for Blog Posts

Generate professional audio narration of blog content using Google's Gemini TTS. Three modes: summary (200-300 word spoken overview), full article read-aloud, or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.

Quick Reference

| Command | What it does | |---------|-------------| | /smart-blog-skills:audio generate <file> | Generate audio narration of a blog post | | /smart-blog-skills:audio voices | Show available voices with characteristics | | /smart-blog-skills:audio setup | Check/configure API key for Gemini TTS |

Prerequisites

Python 3.11+ (venv managed automatically by run.py)
GOOGLE_AI_API_KEY environment variable (same key used by image)
FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)

Always Use run.py Wrapper

# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json

# WRONG:
python3 scripts/generate_audio.py --text "..."  # Fails without venv

API Key Check (Gate Pattern)

Before generating audio, check for the API key:

echo $GOOGLE_AI_API_KEY

If set: proceed with generation
If not set: guide the user: "Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey Then set it: export GOOGLE_AI_API_KEY=your-key This is the same key used by /smart-blog-skills:image -- if image generation works, audio works too."
When called internally (from write): return silently if key is missing. Never block the writing workflow.

Setup

For /smart-blog-skills:audio setup:

Check if GOOGLE_AI_API_KEY is set in environment
If image is configured, the key is already available
If not, guide user to https://aistudio.google.com/apikey
Verify with a dry run: python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json

Voice Selection

For /smart-blog-skills:audio voices:

Ask the user which voice they prefer, or recommend based on content type:

Article narration: Charon (Informative) or Sadaltager (Knowledgeable)
Tutorial/how-to: Achird (Friendly) or Sulafat (Warm)
News/analysis: Rasalgethi (Informative) or Schedar (Even)
Lifestyle/wellness: Aoede (Breezy) or Vindemiatrix (Gentle)
Dialogue host: Puck (Upbeat) or Laomedeia (Upbeat)
Dialogue expert: Kore (Firm) or Charon (Informative)

Generation Workflow

For /smart-blog-skills:audio generate <file>:

Step 1: Read the Blog Post

Read the file and extract:

Title (from H1 or frontmatter)
Full content (markdown body)
Approximate word count

Step 2: Choose Mode

Ask the user (or auto-select if they specified --mode):

| Mode | When to use | Output | |------|-------------|--------| | Summary | Quick audio overview (1-2 min) | 200-300 word spoken summary | | Full | Complete read-aloud (5-15 min) | Full article as natural speech | | Dialogue | Podcast-style (3-8 min) | Two-person conversation about the article |

Step 3: Prepare Text

CRITICAL: Claude prepares the text. The script does TTS only.

Summary mode: Write a 200-300 word spoken summary of the article. Rules:

Write as natural speech, not written text
Open with the article's key finding or answer
Cover 3-5 main takeaways
Close with actionable advice
No markdown, no "In this article...", no meta-commentary
Use conversational transitions ("Here's what matters...", "The key finding is...")

Full mode: Strip the markdown content to clean spoken text:

Headings become natural transitions ("Next, let's look at...")
Links become plain text (remove URLs, keep anchor text)
Images and charts: omit or briefly describe ("As the data shows...")
Code blocks: describe verbally ("The code uses a for-loop to...")
Lists: convert to natural sentences
Remove frontmatter, schema markup, HTML tags
Add brief intro: "This is [title], published on [date]."

Dialogue mode: Write a 2-person conversation script about the article:

Speaker1 = Host (curious, asks good questions)
Speaker2 = Expert (knowledgeable, gives clear answers)
Format each line as: [Speaker1] What's the key takeaway here?
Cover the article's main points conversationally
15-25 exchanges (produces ~3-8 minutes)
Natural, not stilted

Step 4: Select Voice

If the user chose a voice, use it. Otherwise, recommend based on mode:

Summary/Full: default to Charon (Informative)
Dialogue: default to Puck (Host) + Kore (Expert)

Step 5: Generate Audio

Write the prepared text to a temp file, then call:

# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_prepared.txt \
  --voice Charon \
  --model flash \
  --output /path/to/audio/post-slug.mp3 \
  --json

# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_dialogue.txt \
  --voice Puck \
  --voice2 Kore \
  --model pro \
  --output /path/to/audio/post-slug-dialogue.mp3 \
  --json

Model selection:

flash (default): Fast, cheap. Good for summaries and standard narration.
pro: Higher quality. Use for dialogue mode or premium content.

Step 6: Deliver

Present the result to the user:

File path -- where the audio was saved
Duration -- human-readable (e.g., "3:42")
Embed code -- ready-to-paste HTML5 audio tag
Cost -- estimated API cost
Placement suggestion -- where to insert the embed in the blog post

Embedding Guide

Standard HTML (Hugo, Jekyll, static sites)

<audio controls preload="metadata">
  <source src="audio/post-slug.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

MDX (Next.js, Gatsby)

<audio controls preload="metadata">
  <source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>

WordPress

[audio src="audio/post-slug.mp3"]

Placement

Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".

Error Handling

| Error | Resolution | |-------|-----------| | GOOGLE_AI_API_KEY not set | Get key at https://aistudio.google.com/apikey | | FFmpeg not found | Install: sudo apt install ffmpeg. Falls back to WAV output. | | Rate limited | Wait and retry. | | Text too long (>32k tokens) | Split into sections, generate separately | | Unknown voice name | Run /smart-blog-skills:audio voices to see valid options | | API key missing (internal call) | Return silently -- writing workflow continues |

rainday/audio

skills/audio/SKILL.md

Generate audio narration of blog posts using Google Gemini TTS. Supports summary narration, full article read-aloud, and two-speaker podcast/dialogue mode with 30 voice options. Outputs MP3 with HTML5 audio embed code. Works standalone via /smart-blog-skills:audio or internally from write. Falls back gracefully when API key is not configured. Use when user says "blog audio", "narrate blog", "audio version", "text to speech", "tts", "podcast mode", "read aloud", "audio narration", "voice", "narration", "generate audio".

35 stars

development

Updated May 30, 2026

$ install --global

skillsauth

npx skillsauth add rainday/smart-blog-skills audio

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 30, 2026, 5:46 AM14.1s1 file scanned

SKILL.md

name:: audio
description:: >
audio embed code. Works standalone via /smart-blog-skills:: audio or internally from
user-invokable:: true
argument-hint:: [generate|voices|setup] [file-or-text] [--mode summary|full|dialogue] [--voice name]
license:: MIT

Blog Audio -- Gemini TTS Narration for Blog Posts

Quick Reference

Prerequisites

Python 3.11+ (venv managed automatically by run.py)
GOOGLE_AI_API_KEY environment variable (same key used by image)
FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)

Always Use run.py Wrapper

# CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json

# WRONG:
python3 scripts/generate_audio.py --text "..."  # Fails without venv

API Key Check (Gate Pattern)

Before generating audio, check for the API key:

echo $GOOGLE_AI_API_KEY

If set: proceed with generation
If not set: guide the user: "Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey Then set it: export GOOGLE_AI_API_KEY=your-key This is the same key used by /smart-blog-skills:image -- if image generation works, audio works too."
When called internally (from write): return silently if key is missing. Never block the writing workflow.

Setup

For /smart-blog-skills:audio setup:

Check if GOOGLE_AI_API_KEY is set in environment
If image is configured, the key is already available
If not, guide user to https://aistudio.google.com/apikey
Verify with a dry run: python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json

Voice Selection

For /smart-blog-skills:audio voices:

Ask the user which voice they prefer, or recommend based on content type:

Article narration: Charon (Informative) or Sadaltager (Knowledgeable)
Tutorial/how-to: Achird (Friendly) or Sulafat (Warm)
News/analysis: Rasalgethi (Informative) or Schedar (Even)
Lifestyle/wellness: Aoede (Breezy) or Vindemiatrix (Gentle)
Dialogue host: Puck (Upbeat) or Laomedeia (Upbeat)
Dialogue expert: Kore (Firm) or Charon (Informative)

Generation Workflow

For /smart-blog-skills:audio generate <file>:

Step 1: Read the Blog Post

Read the file and extract:

Title (from H1 or frontmatter)
Full content (markdown body)
Approximate word count

Step 2: Choose Mode

Ask the user (or auto-select if they specified --mode):

Step 3: Prepare Text

CRITICAL: Claude prepares the text. The script does TTS only.

Summary mode: Write a 200-300 word spoken summary of the article. Rules:

Write as natural speech, not written text
Open with the article's key finding or answer
Cover 3-5 main takeaways
Close with actionable advice
No markdown, no "In this article...", no meta-commentary
Use conversational transitions ("Here's what matters...", "The key finding is...")

Full mode: Strip the markdown content to clean spoken text:

Headings become natural transitions ("Next, let's look at...")
Links become plain text (remove URLs, keep anchor text)
Images and charts: omit or briefly describe ("As the data shows...")
Code blocks: describe verbally ("The code uses a for-loop to...")
Lists: convert to natural sentences
Remove frontmatter, schema markup, HTML tags
Add brief intro: "This is [title], published on [date]."

Dialogue mode: Write a 2-person conversation script about the article:

Speaker1 = Host (curious, asks good questions)
Speaker2 = Expert (knowledgeable, gives clear answers)
Format each line as: [Speaker1] What's the key takeaway here?
Cover the article's main points conversationally
15-25 exchanges (produces ~3-8 minutes)
Natural, not stilted

Step 4: Select Voice

If the user chose a voice, use it. Otherwise, recommend based on mode:

Summary/Full: default to Charon (Informative)
Dialogue: default to Puck (Host) + Kore (Expert)

Step 5: Generate Audio

Write the prepared text to a temp file, then call:

# Single voice (summary or full mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_prepared.txt \
  --voice Charon \
  --model flash \
  --output /path/to/audio/post-slug.mp3 \
  --json

# Two voices (dialogue mode)
python3 scripts/run.py generate_audio.py \
  --text-file /tmp/blog_audio_dialogue.txt \
  --voice Puck \
  --voice2 Kore \
  --model pro \
  --output /path/to/audio/post-slug-dialogue.mp3 \
  --json

Model selection:

flash (default): Fast, cheap. Good for summaries and standard narration.
pro: Higher quality. Use for dialogue mode or premium content.

Step 6: Deliver

Present the result to the user:

File path -- where the audio was saved
Duration -- human-readable (e.g., "3:42")
Embed code -- ready-to-paste HTML5 audio tag
Cost -- estimated API cost
Placement suggestion -- where to insert the embed in the blog post

Embedding Guide

Standard HTML (Hugo, Jekyll, static sites)

<audio controls preload="metadata">
  <source src="audio/post-slug.mp3" type="audio/mpeg">
  Your browser does not support the audio element.
</audio>

MDX (Next.js, Gatsby)

<audio controls preload="metadata">
  <source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>

WordPress

[audio src="audio/post-slug.mp3"]

Placement

Insert the audio player after the introduction (below the first H2) or at the very top of the article with a label: "Listen to this article" or "Audio version".

Error Handling

Related Skills

rainday/write

documentation

VerifiedTrustedCommunity

Smart Blog 寫文章。從零寫一篇新的部落格文章，包含模板選擇、研究、 YouTube 影片嵌入、Humanizer 反 AI 審稿、品質檢查。內建反幻覺驗證，繁體中文優先。 Use when user says "write blog", "寫文章", "寫部落格", "new blog post", "smart-blog write", "blog write".

35SKILL.mdUpdated May 30, 2026

rainday/translate

testing

VerifiedTrustedCommunity

Translate existing blog posts into one or more target languages with SEO-optimized localization. Produces native-quality translations that preserve markdown structure, frontmatter, schema JSON-LD, image and chart embeds, and citation capsules. Localizes keywords, meta tags, numbers, dates, currencies, and quote styles per locale. Flags machine-translation artifacts for review. Run BEFORE localize: this handles language conversion; localize handles cultural adaptation after translation completes. Use when user says "translate blog", "blog translate", "uebersetzen", "traduire", "traducir", "translate post", "blog auf Deutsch", "blog en espanol".

35SKILL.mdUpdated May 30, 2026

rainday/taxonomy

development

VerifiedTrustedCommunity

Extract, suggest, and sync tags and categories for blog posts across all major CMS platforms. Supports WordPress REST API, Shopify GraphQL, Ghost Content API, Strapi REST/GraphQL, and Sanity GROQ. Generates tag suggestions from content analysis (keyword frequency, heading extraction, semantic grouping), enforces minimum post-count thresholds to prevent thin tag archives, and syncs taxonomy via authenticated API calls. Use when user says "tags", "categories", "taxonomy", "tag suggestions", "sync tags", "WordPress tags", "Shopify tags".

35SKILL.mdUpdated May 30, 2026

rainday/strategy

development

VerifiedTrustedCommunity

Blog strategy development including topic cluster architecture with hub-and-spoke design, audience mapping, competitive landscape analysis, AI citation surface strategy across ChatGPT/Perplexity/AI Overviews, distribution channel planning (YouTube, Reddit, review platforms for GEO), content scoring targets, measurement framework, and content differentiation through original research and first-hand experience. Use when user says "blog strategy", "content strategy", "blog positioning", "what should I blog about", "blog topics", "content pillars", "blog ideation".

35SKILL.mdUpdated May 30, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/rainday/smart-blog-skills.git

# Copy into Claude Code skills folder (global)
cp -r smart-blog-skills/skills/audio ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

rainday/smart-blog-skills

35 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT