skills/gemini-tts/SKILL.md
Generate speech from text using Google Gemini TTS models via scripts/. Use for text-to-speech, audio generation, voice synthesis, multi-speaker conversations, and creating audio content. Supports multiple voices and streaming. Triggers on "text to speech", "TTS", "generate audio", "voice synthesis", "speak this text".
npx skillsauth add akrindev/google-studio-skills gemini-ttsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate natural-sounding speech from text using Gemini's TTS models through executable scripts with support for multiple voices and multi-speaker conversations.
Use this skill when you need to:
Purpose: Convert text to speech using Gemini TTS models
When to use:
Key parameters:
| Parameter | Description | Example |
|-----------|-------------|---------|
| text | Text to convert (required) | "Hello, world!" |
| --voice, -v | Voice name | Kore |
| --output, -o | Base name for output file | welcome |
| --output-dir | Output directory for audio | audio/ |
| --no-timestamp | Disable auto timestamp | Flag |
| --model, -m | TTS model | gemini-2.5-flash-preview-tts |
| --stream, -s | Enable streaming | Flag |
| --speakers | Multi-speaker mapping | "Joe:Kore,Jane:Puck" |
Output: WAV audio file path
node scripts/tts.js "Hello, world! Have a wonderful day."
Kore (default, clear and professional)audio/tts_output_YYYYMMDD_HHMMSS.wav (auto timestamp)node scripts/tts.js "Welcome to our podcast about technology trends" --voice Puck --output welcome
audio/welcome_YYYYMMDD_HHMMSS.wavnode scripts/tts.js "TTS the following conversation:
Joe: How's it going today?
Jane: Not too bad, how about you?
Joe: I'm working on a new project.
Jane: Sounds exciting, tell me more!" --speakers "Joe:Kore,Jane:Puck" --output conversation
audio/conversation_YYYYMMDD_HHMMSS.wavnode scripts/tts.js "This is a very long text that would benefit from streaming..." --stream --output long-form
audio/long-form_YYYYMMDD_HHMMSS.wavnode scripts/tts.js "Welcome to our quarterly earnings presentation. Today we'll discuss our growth metrics and future plans." --voice Charon --output voiceover
Charon (deep, authoritative)node scripts/tts.js "Save to specific folder." --output-dir ./my-projects/podcasts/ --output episode1
./my-projects/podcasts/episode1_YYYYMMDD_HHMMSS.wav# 1. Generate script (gemini-text skill)
node skills/gemini-text/scripts/generate.js "Write a 2-minute podcast intro about sustainable energy"
# 2. Generate audio (this skill)
node scripts/tts.js "[Paste generated script]" --voice Fenrir --output podcast-intro
# 3. Use in video or podcast
node scripts/tts.js "Welcome to our accessible website. This audio describes our main navigation options." --voice Aoede --output accessibility
Aoede (melodic, pleasant)node scripts/tts.js "Chapter 1: Introduction to Quantum Computing. Let's explore the fundamental principles..." --voice Zephyr --output chapter1
Zephyr (light, airy)node scripts/tts.js "Fixed filename." --output my-audio --no-timestamp
audio/my-audio.wav (no timestamp)| Model | Quality | Speed | Best For |
|-------|---------|-------|----------|
| gemini-2.5-flash-preview-tts | Good | Fast | General use, high volume |
| gemini-2.5-pro-preview-tts | Higher | Slower | Premium content, voiceovers |
| Voice | Characteristics | Best For | |-------|----------------|----------| | Kore | Clear, professional | Announcements, general purpose (default) | | Puck | Friendly, conversational | Casual content, interviews | | Charon | Deep, authoritative | Corporate, serious content | | Fenrir | Warm, expressive | Storytelling, narratives | | Aoede | Melodic, pleasant | Educational, accessibility | | Zephyr | Light, airy | Gentle content, tutorials | | Sulafat | Neutral, balanced | Documentaries, factual content |
| Specification | Value | |--------------|-------| | Format | WAV (PCM) | | Sample rate | 24000 Hz | | Channels | 1 (mono) | | Bit depth | 16-bit |
| Limit | Type | Description | |-------|------|-------------| | 8,192 | Input | Maximum input text tokens | | 16,384 | Output | Maximum output audio tokens |
--speakers parameter to map speakers to voicesnpm install @google/genai@latest dotenv@latest
SpeakerName:VoiceName,Speaker2:Voice2"Joe:Kore,Jane:Puck,Host:Charon"--output filename to avoid conflicts| Voice | Ideal Use Cases | |-------|-----------------| | Kore | Announcements, navigation, general info | | Puck | Podcasts, interviews, casual content | | Charon | Corporate, news, formal presentations | | Fenrir | Audiobooks, stories, emotional content | | Aoede | Accessibility, educational, gentle content | | Zephyr | Tutorials, explanations, guides | | Sulafat | Documentaries, factual presentations |
# Basic
node scripts/tts.js "Your text here"
# Custom voice
node scripts/tts.js "Your text" --voice Puck --output audio.wav
# Multi-speaker
node scripts/tts.js "Joe: Hi. Jane: Hello!" --speakers "Joe:Kore,Jane:Puck"
# Streaming
node scripts/tts.js "Long text..." --stream --output long.wav
# Professional
node scripts/tts.js "Corporate announcement" --voice Charon
references/voices.md for complete voice documentationdevelopment
Generate text content using Google Gemini models via scripts/. Use for text generation, multimodal prompts with images, thinking mode for complex reasoning, JSON-formatted outputs, and Google Search grounding for real-time information. Triggers on "generate with gemini", "use gemini for text", "AI text generation", "multimodal prompt", "gemini thinking mode", "grounded response".
development
Generate images using Google Gemini and Imagen models via scripts/. Use for AI image generation, text-to-image, creating visuals from prompts, generating multiple images, custom aspect ratios, and high-resolution output up to 4K. Triggers on "generate image", "create image", "imagen", "text to image", "AI art", "nano banana".
development
Upload and manage files using Google Gemini File API via scripts/. Use for uploading images, audio, video, PDFs, and other files for use with Gemini models. Supports file upload, status checking, and file management. Triggers on "upload file", "file API", "upload image", "upload PDF", "upload video", "file management".
development
Generate text embeddings using Gemini Embedding API via scripts/. Use for creating vector representations of text, semantic search, similarity matching, clustering, and RAG applications. Triggers on "embeddings", "semantic search", "vector search", "text similarity", "RAG", "retrieval".