skills/gemini-tts/SKILL.md
Generate read-aloud audio (text-to-speech) using Google Gemini TTS (gemini-3.1-flash-tts-preview). Automatically detects the mode from the input: single-speaker narration for plain text, and multi-speaker dialogue when the input has two "Name:" speaker labels. Supports 30 prebuilt voices, natural-language style control and audio tags, text or file input, and WAV output. Works with both the Gemini Developer API and Vertex AI.
npx skillsauth add danishi/claude-code-config gemini-ttsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use the Python script in scripts/ to turn text into natural read-aloud audio
via Google Gemini TTS. Model: gemini-3.1-flash-tts-preview (single TTS
model — there is no Pro variant).
The mode is automatically detected from the input:
| Mode | When used | Voices |
|---|---|---|
| Single-speaker | Plain text / narration | One voice (--voice) |
| Multi-speaker | A 2-person dialogue: lines like Name: ... with exactly two distinct speakers | Two voices (--speaker) |
--single / --multipip install google-genai
Set the GEMINI_API_KEY environment variable.
Get a key at https://aistudio.google.com/apikey
export GEMINI_API_KEY="your-api-key"
Set GOOGLE_CLOUD_PROJECT and optionally GOOGLE_CLOUD_LOCATION.
Requires a GCP project with the Vertex AI API enabled and
Application Default Credentials configured (gcloud auth application-default login).
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1" # optional, defaults to us-central1
Priority: If both
GOOGLE_CLOUD_PROJECTandGEMINI_API_KEYare set, Vertex AI is used.
| Variable | Default | Description |
|---|---|---|
| TTS_MODEL | gemini-3.1-flash-tts-preview | Force a specific model |
| AUDIO_OUTPUT_DIR | ./gemini-tts | Default output directory |
| GEMINI_TTS_NO_SSL_VERIFY | (unset) | Set to 1 / true / yes to disable SSL certificate verification |
scripts/generate.py - Text-to-speech generationpython scripts/generate.py "Have a wonderful day!" -o hello.wav
python scripts/generate.py "Welcome aboard!" --voice Puck --style cheerfully -o welcome.wav
You can also steer delivery inline with audio tags, e.g. [whispers], [shouting],
[excitedly], or a natural-language prefix like Say in a calm voice:.
python scripts/generate.py -f article.txt -o article.wav
Given dialogue.txt:
Taro: How's it going today, Hanako?
Hanako: Not too bad, how about you?
python scripts/generate.py -f dialogue.txt -o conversation.wav
python scripts/generate.py -f dialogue.txt \
--speaker "Taro:Kore" --speaker "Hanako:Puck" -o conversation.wav
python scripts/generate.py --list-voices
python scripts/generate.py "hello" --no-ssl-verify -o hello.wav
python scripts/generate.py "hello" --json -o hello.wav
usage: generate.py [-h] [-f FILE] [-o OUTPUT] [--voice VOICE]
[--speaker "Name:Voice"] [--style STYLE]
[--temperature T] [--single] [--multi] [--list-voices]
[-v] [--json] [--no-ssl-verify] [text]
Arguments:
text Text to read aloud (or use -f)
Options:
-f, --file PATH Read input text from a file
-o, --output PATH Output .wav file path (auto-generated if omitted)
--voice VOICE Voice for single-speaker mode (default: Zephyr)
--speaker "N:V" Multi-speaker voice mapping "Name:Voice" (repeatable)
--style STYLE Style prefix for single speaker (e.g. "cheerfully")
--temperature T Sampling temperature (default: 1.0)
--single Force single-speaker mode
--multi Force multi-speaker mode (requires 2 "Name:" speakers)
--list-voices List the available prebuilt voices and exit
-v, --verbose Show detailed output
--json Output result as JSON
--no-ssl-verify Disable SSL certificate verification
--singleand--multiare mutually exclusive.
Gemini TTS returns raw PCM (audio/L16;rate=24000, mono). The script wraps it
in a WAV header and saves a playable .wav file (16-bit, 24 kHz, mono).
30 prebuilt voices are available (e.g. Zephyr bright, Puck upbeat, Charon informative, Kore firm, Sulafat warm, Leda youthful, Enceladus breathy, Achernar soft). 70+ languages are supported — the output language follows the input text's language.
See references/voices.md for the full voice list with characteristics and a
style / audio-tag guide.
Say cheerfully: ..., Read this slowly and calmly: ...
(or use --style cheerfully, which prepends Say cheerfully:).[whispers], [shouting], [laughs], [excitedly],
[sarcastically], etc. — 200+ tags steer vocal style, pace, and delivery.Name: text; the speaker names must match
the --speaker "Name:Voice" mappings exactly.gemini-3.1-flash-tts-preview (no Pro variant).| Error | Solution |
|---|---|
| google-genai package not installed | Run pip install google-genai |
| No API credentials found | Set GEMINI_API_KEY or GOOGLE_CLOUD_PROJECT |
| Input text is empty | Provide non-empty text via argument or -f |
| Multi-speaker mode requires 2 ... speakers | Use Name: labels for exactly 2 speakers, or drop --multi |
| Content blocked by safety filters | Rephrase the input (avoid impersonating real people) |
| API rate limit reached | Wait and retry |
| SSL: CERTIFICATE_VERIFY_FAILED | Use --no-ssl-verify or set GEMINI_TTS_NO_SSL_VERIFY=1 |
tools
Produce rich, finished video content with React Remotion by orchestrating the repository's media-generation skills (nanobanana for images, veo for video clips, lyria for BGM, gemini-tts for narration) and composing them on a data-driven Remotion timeline. Follows an approval-gated workflow: first return a video composition plan for the user to approve, then generate assets, compose, run a multimodal self-review loop, and deliver only when the result meets the quality bar. Use when the user wants to "create a video", "make a promo / explainer / social clip", or combine images, video, music, and voiceover into one polished video.
tools
Generate videos using Google Gemini Veo 3.1. Defaults to the cost-effective Veo 3.1 Lite model; the premium (Veo 3.1) and Fast models are used only when explicitly requested via --pro / --fast. Supports text-to-video and image-to-video (first frame + optional last frame), 16:9 / 9:16, 720p / 1080p (4k on Pro), 4-8s clips, and 1-4 videos per request. Works with both the Gemini Developer API and Vertex AI.
tools
Package a skill directory into a distributable `.skill` archive placed on the Desktop. Use when the user asks to "package", "bundle", "zip up", "export", "distribute", or "ship" a skill, or mentions creating a `.skill` file from `~/.claude/skills/<skill-name>/`.
tools
Salesforce CLIを使ってSalesforceのデータ操作・管理を行うスキル。 取引先・商談・プロジェクト・外注管理のCRUD操作、SOQLクエリ、パイプライン分析、レポート生成を実行する。 ユーザーがSalesforceのデータを照会・更新・分析したいとき、商談のステージを確認・変更したいとき、 プロジェクトや外注の状況を確認したいとき、売上・粗利・パイプラインのレポートが必要なとき、 取引先や案件の情報を調べたいとき、SOQLクエリを実行したいときに使用する。 「Salesforce」「SF」「商談」「取引先」「パイプライン」「案件」「プロジェクト」「外注」「粗利」 「売上」「受注」「失注」「ステージ」「SOQL」などのキーワードが含まれる場合はこのスキルを使う。 Salesforceに関する質問や操作依頼であれば、明示的にスキル名を言及していなくても積極的にこのスキルを使用すること。