Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

danishi/gemini-tts

Name: gemini-tts
Author: danishi

skills/gemini-tts/SKILL.md

npx skillsauth add danishi/claude-code-config gemini-tts

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Gemini TTS - Read-Aloud Speech Skill

Use the Python script in scripts/ to turn text into natural read-aloud audio via Google Gemini TTS. Model: gemini-3.1-flash-tts-preview (single TTS model — there is no Pro variant).

The mode is automatically detected from the input:

| Mode | When used | Voices | |---|---|---| | Single-speaker | Plain text / narration | One voice (--voice) | | Multi-speaker | A 2-person dialogue: lines like Name: ... with exactly two distinct speakers | Two voices (--speaker) |

Detected 2 speaker labels → multi-speaker (2 is a hard limit)
Detected 1 or 0 → single-speaker
Detected 3+ → warns and falls back to single-speaker narration
Override with --single / --multi

Prerequisites

1. Install dependencies

pip install google-genai

2. Configure API credentials (one of the following)

Option A: Gemini Developer API (recommended for personal use)

Set the GEMINI_API_KEY environment variable. Get a key at https://aistudio.google.com/apikey

export GEMINI_API_KEY="your-api-key"

Option B: Vertex AI API (for Google Cloud users)

Set GOOGLE_CLOUD_PROJECT and optionally GOOGLE_CLOUD_LOCATION. Requires a GCP project with the Vertex AI API enabled and Application Default Credentials configured (gcloud auth application-default login).

export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1"   # optional, defaults to us-central1

Priority: If both GOOGLE_CLOUD_PROJECT and GEMINI_API_KEY are set, Vertex AI is used.

3. Optional environment variables

| Variable | Default | Description | |---|---|---| | TTS_MODEL | gemini-3.1-flash-tts-preview | Force a specific model | | AUDIO_OUTPUT_DIR | ./gemini-tts | Default output directory | | GEMINI_TTS_NO_SSL_VERIFY | (unset) | Set to 1 / true / yes to disable SSL certificate verification |

Script

`scripts/generate.py` - Text-to-speech generation

Basic narration (single voice)

python scripts/generate.py "Have a wonderful day!" -o hello.wav

Choose a voice and style

python scripts/generate.py "Welcome aboard!" --voice Puck --style cheerfully -o welcome.wav

You can also steer delivery inline with audio tags, e.g. [whispers], [shouting], [excitedly], or a natural-language prefix like Say in a calm voice:.

Read text from a file (good for long input)

python scripts/generate.py -f article.txt -o article.wav

Multi-speaker dialogue (auto-detected)

Given dialogue.txt:

Taro: How's it going today, Hanako?
Hanako: Not too bad, how about you?

python scripts/generate.py -f dialogue.txt -o conversation.wav

Assign voices to speakers explicitly

python scripts/generate.py -f dialogue.txt \
  --speaker "Taro:Kore" --speaker "Hanako:Puck" -o conversation.wav

List available voices

python scripts/generate.py --list-voices

Disable SSL verification (for corporate proxies or self-signed certs)

python scripts/generate.py "hello" --no-ssl-verify -o hello.wav

JSON output (for programmatic use)

python scripts/generate.py "hello" --json -o hello.wav

Full options

usage: generate.py [-h] [-f FILE] [-o OUTPUT] [--voice VOICE]
                   [--speaker "Name:Voice"] [--style STYLE]
                   [--temperature T] [--single] [--multi] [--list-voices]
                   [-v] [--json] [--no-ssl-verify] [text]

Arguments:
  text                Text to read aloud (or use -f)

Options:
  -f, --file PATH     Read input text from a file
  -o, --output PATH   Output .wav file path (auto-generated if omitted)
  --voice VOICE       Voice for single-speaker mode (default: Zephyr)
  --speaker "N:V"     Multi-speaker voice mapping "Name:Voice" (repeatable)
  --style STYLE       Style prefix for single speaker (e.g. "cheerfully")
  --temperature T     Sampling temperature (default: 1.0)
  --single            Force single-speaker mode
  --multi             Force multi-speaker mode (requires 2 "Name:" speakers)
  --list-voices       List the available prebuilt voices and exit
  -v, --verbose       Show detailed output
  --json              Output result as JSON
  --no-ssl-verify     Disable SSL certificate verification

--single and --multi are mutually exclusive.

Output Format

Gemini TTS returns raw PCM (audio/L16;rate=24000, mono). The script wraps it in a WAV header and saves a playable .wav file (16-bit, 24 kHz, mono).

Voices

30 prebuilt voices are available (e.g. Zephyr bright, Puck upbeat, Charon informative, Kore firm, Sulafat warm, Leda youthful, Enceladus breathy, Achernar soft). 70+ languages are supported — the output language follows the input text's language.

See references/voices.md for the full voice list with characteristics and a style / audio-tag guide.

Style & Delivery Control

Natural-language prefix: Say cheerfully: ..., Read this slowly and calmly: ... (or use --style cheerfully, which prepends Say cheerfully:).
Audio tags (inline): [whispers], [shouting], [laughs], [excitedly], [sarcastically], etc. — 200+ tags steer vocal style, pace, and delivery.
Multi-speaker: label each line Name: text; the speaker names must match the --speaker "Name:Voice" mappings exactly.

Limitations

Single TTS model: gemini-3.1-flash-tts-preview (no Pro variant).
Multi-speaker cap: at most 2 speakers per request.
No voice cloning: requests to imitate a specific real person's voice are blocked by safety filters.

Error Handling

| Error | Solution | |---|---| | google-genai package not installed | Run pip install google-genai | | No API credentials found | Set GEMINI_API_KEY or GOOGLE_CLOUD_PROJECT | | Input text is empty | Provide non-empty text via argument or -f | | Multi-speaker mode requires 2 ... speakers | Use Name: labels for exactly 2 speakers, or drop --multi | | Content blocked by safety filters | Rephrase the input (avoid impersonating real people) | | API rate limit reached | Wait and retry | | SSL: CERTIFICATE_VERIFY_FAILED | Use --no-ssl-verify or set GEMINI_TTS_NO_SSL_VERIFY=1 |

danishi/gemini-tts

skills/gemini-tts/SKILL.md

Generate read-aloud audio (text-to-speech) using Google Gemini TTS (gemini-3.1-flash-tts-preview). Automatically detects the mode from the input: single-speaker narration for plain text, and multi-speaker dialogue when the input has two "Name:" speaker labels. Supports 30 prebuilt voices, natural-language style control and audio tags, text or file input, and WAV output. Works with both the Gemini Developer API and Vertex AI.

development

Updated Jun 10, 2026

$ install --global

skillsauth

npx skillsauth add danishi/claude-code-config gemini-tts

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 10, 2026, 2:44 AM174.9s3 files scanned

SKILL.md

name:: gemini-tts
description:: >
input:: single-speaker narration for plain text, and multi-speaker dialogue
when the input has two "Name:: speaker labels. Supports 30 prebuilt voices,

Gemini TTS - Read-Aloud Speech Skill

Use the Python script in scripts/ to turn text into natural read-aloud audio via Google Gemini TTS. Model: gemini-3.1-flash-tts-preview (single TTS model — there is no Pro variant).

The mode is automatically detected from the input:

Detected 2 speaker labels → multi-speaker (2 is a hard limit)
Detected 1 or 0 → single-speaker
Detected 3+ → warns and falls back to single-speaker narration
Override with --single / --multi

Prerequisites

1. Install dependencies

pip install google-genai

2. Configure API credentials (one of the following)

Option A: Gemini Developer API (recommended for personal use)

Set the GEMINI_API_KEY environment variable. Get a key at https://aistudio.google.com/apikey

export GEMINI_API_KEY="your-api-key"

Option B: Vertex AI API (for Google Cloud users)

export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1"   # optional, defaults to us-central1

Priority: If both GOOGLE_CLOUD_PROJECT and GEMINI_API_KEY are set, Vertex AI is used.

3. Optional environment variables

Script

`scripts/generate.py` - Text-to-speech generation

Basic narration (single voice)

python scripts/generate.py "Have a wonderful day!" -o hello.wav

Choose a voice and style

python scripts/generate.py "Welcome aboard!" --voice Puck --style cheerfully -o welcome.wav

You can also steer delivery inline with audio tags, e.g. [whispers], [shouting], [excitedly], or a natural-language prefix like Say in a calm voice:.

Read text from a file (good for long input)

python scripts/generate.py -f article.txt -o article.wav

Multi-speaker dialogue (auto-detected)

Given dialogue.txt:

Taro: How's it going today, Hanako?
Hanako: Not too bad, how about you?

python scripts/generate.py -f dialogue.txt -o conversation.wav

Assign voices to speakers explicitly

python scripts/generate.py -f dialogue.txt \
  --speaker "Taro:Kore" --speaker "Hanako:Puck" -o conversation.wav

List available voices

python scripts/generate.py --list-voices

Disable SSL verification (for corporate proxies or self-signed certs)

python scripts/generate.py "hello" --no-ssl-verify -o hello.wav

JSON output (for programmatic use)

python scripts/generate.py "hello" --json -o hello.wav

Full options

usage: generate.py [-h] [-f FILE] [-o OUTPUT] [--voice VOICE]
                   [--speaker "Name:Voice"] [--style STYLE]
                   [--temperature T] [--single] [--multi] [--list-voices]
                   [-v] [--json] [--no-ssl-verify] [text]

Arguments:
  text                Text to read aloud (or use -f)

Options:
  -f, --file PATH     Read input text from a file
  -o, --output PATH   Output .wav file path (auto-generated if omitted)
  --voice VOICE       Voice for single-speaker mode (default: Zephyr)
  --speaker "N:V"     Multi-speaker voice mapping "Name:Voice" (repeatable)
  --style STYLE       Style prefix for single speaker (e.g. "cheerfully")
  --temperature T     Sampling temperature (default: 1.0)
  --single            Force single-speaker mode
  --multi             Force multi-speaker mode (requires 2 "Name:" speakers)
  --list-voices       List the available prebuilt voices and exit
  -v, --verbose       Show detailed output
  --json              Output result as JSON
  --no-ssl-verify     Disable SSL certificate verification

--single and --multi are mutually exclusive.

Output Format

Gemini TTS returns raw PCM (audio/L16;rate=24000, mono). The script wraps it in a WAV header and saves a playable .wav file (16-bit, 24 kHz, mono).

Voices

See references/voices.md for the full voice list with characteristics and a style / audio-tag guide.

Style & Delivery Control

Natural-language prefix: Say cheerfully: ..., Read this slowly and calmly: ... (or use --style cheerfully, which prepends Say cheerfully:).
Audio tags (inline): [whispers], [shouting], [laughs], [excitedly], [sarcastically], etc. — 200+ tags steer vocal style, pace, and delivery.
Multi-speaker: label each line Name: text; the speaker names must match the --speaker "Name:Voice" mappings exactly.

Limitations

Single TTS model: gemini-3.1-flash-tts-preview (no Pro variant).
Multi-speaker cap: at most 2 speakers per request.
No voice cloning: requests to imitate a specific real person's voice are blocked by safety filters.

Error Handling

Related Skills

danishi/video-composer

tools

VerifiedTrustedCommunity

Produce rich, finished video content with React Remotion by orchestrating the repository's media-generation skills (nanobanana for images, veo for video clips, lyria for BGM, gemini-tts for narration) and composing them on a data-driven Remotion timeline. Follows an approval-gated workflow: first return a video composition plan for the user to approve, then generate assets, compose, run a multimodal self-review loop, and deliver only when the result meets the quality bar. Use when the user wants to "create a video", "make a promo / explainer / social clip", or combine images, video, music, and voiceover into one polished video.

SKILL.mdUpdated Jun 10, 2026

danishi/video-composer

danishi/veo

tools

VerifiedTrustedCommunity

Generate videos using Google Gemini Veo 3.1. Defaults to the cost-effective Veo 3.1 Lite model; the premium (Veo 3.1) and Fast models are used only when explicitly requested via --pro / --fast. Supports text-to-video and image-to-video (first frame + optional last frame), 16:9 / 9:16, 720p / 1080p (4k on Pro), 4-8s clips, and 1-4 videos per request. Works with both the Gemini Developer API and Vertex AI.

SKILL.mdUpdated Jun 10, 2026

danishi/skill-packager

tools

VerifiedTrustedCommunity

Package a skill directory into a distributable `.skill` archive placed on the Desktop. Use when the user asks to "package", "bundle", "zip up", "export", "distribute", or "ship" a skill, or mentions creating a `.skill` file from `~/.claude/skills/<skill-name>/`.

SKILL.mdUpdated Jun 10, 2026

danishi/skill-packager

danishi/salesforce

tools

VerifiedTrustedCommunity

Salesforce CLIを使ってSalesforceのデータ操作・管理を行うスキル。取引先・商談・プロジェクト・外注管理のCRUD操作、SOQLクエリ、パイプライン分析、レポート生成を実行する。ユーザーがSalesforceのデータを照会・更新・分析したいとき、商談のステージを確認・変更したいとき、プロジェクトや外注の状況を確認したいとき、売上・粗利・パイプラインのレポートが必要なとき、取引先や案件の情報を調べたいとき、SOQLクエリを実行したいときに使用する。「Salesforce」「SF」「商談」「取引先」「パイプライン」「案件」「プロジェクト」「外注」「粗利」「売上」「受注」「失注」「ステージ」「SOQL」などのキーワードが含まれる場合はこのスキルを使う。 Salesforceに関する質問や操作依頼であれば、明示的にスキル名を言及していなくても積極的にこのスキルを使用すること。

SKILL.mdUpdated Jun 10, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/danishi/claude-code-config.git

# Copy into Claude Code skills folder (global)
cp -r claude-code-config/skills/gemini-tts ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

danishi/claude-code-config

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

danishi/gemini-tts

$ install --global

Security Scan Results

SKILL.md

Gemini TTS - Read-Aloud Speech Skill

Prerequisites

1. Install dependencies

2. Configure API credentials (one of the following)

Option A: Gemini Developer API (recommended for personal use)

Option B: Vertex AI API (for Google Cloud users)

3. Optional environment variables

Script

scripts/generate.py - Text-to-speech generation

Basic narration (single voice)

Choose a voice and style

Read text from a file (good for long input)

Multi-speaker dialogue (auto-detected)

Assign voices to speakers explicitly

List available voices

Disable SSL verification (for corporate proxies or self-signed certs)

JSON output (for programmatic use)

Full options

Output Format

Voices

Style & Delivery Control

Limitations

Error Handling

Related Skills

danishi/video-composer

danishi/veo

danishi/skill-packager

danishi/salesforce

danishi/gemini-tts

$ install --global

Security Scan Results

SKILL.md

Gemini TTS - Read-Aloud Speech Skill

Prerequisites

1. Install dependencies

2. Configure API credentials (one of the following)

Option A: Gemini Developer API (recommended for personal use)

Option B: Vertex AI API (for Google Cloud users)

3. Optional environment variables

Script

scripts/generate.py - Text-to-speech generation

Basic narration (single voice)

Choose a voice and style

Read text from a file (good for long input)

Multi-speaker dialogue (auto-detected)

Assign voices to speakers explicitly

List available voices

Disable SSL verification (for corporate proxies or self-signed certs)

JSON output (for programmatic use)

Full options

Output Format

Voices

Style & Delivery Control

Limitations

Error Handling

Related Skills

danishi/video-composer

danishi/veo

danishi/skill-packager

danishi/salesforce

`scripts/generate.py` - Text-to-speech generation

`scripts/generate.py` - Text-to-speech generation