Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

kenneropia/text-to-voice

Name: text-to-voice
Author: kenneropia

/SKILL.md

npx skillsauth add kenneropia/text-to-voice text-to-voice

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Text-to-Voice with Kyutai Pocket TTS

Convert text to natural speech using Kyutai's Pocket TTS - a lightweight 100M parameter model that runs efficiently on CPU.

Installation

pip install pocket-tts
# or use uvx to run without installing:
uvx pocket-tts generate

Requires Python 3.10+ and PyTorch 2.5+. GPU not required.

CLI Usage

Basic Generation

# Generate with defaults (saves to ./tts_output.wav)
uvx pocket-tts generate

# Specify text
pocket-tts generate --text "Hello, this is my message."

# Specify output file location
pocket-tts generate --text "Hello" --output-path ./audio/greeting.wav

# Full example with all common options
pocket-tts generate \
  --text "Welcome to the demo." \
  --voice alba \
  --output-path ./output/welcome.wav

CLI Options

| Option | Default | Description | |--------|---------|-------------| | --text | "Hello world..." | Text to convert to speech | | --voice | alba | Voice name, local file path, or HuggingFace URL | | --output-path | ./tts_output.wav | Where to save the generated audio file | | --temperature | 0.7 | Generation temperature (higher = more expressive) | | --lsd-decode-steps | 1 | Quality steps (higher = better quality, slower) | | --eos-threshold | -4.0 | End detection threshold (lower = finish earlier) | | --frames-after-eos | auto | Extra frames after end (each frame = 80ms) | | --device | cpu | Device to use (cpu/cuda) | | -q, --quiet | false | Disable logging output |

Voice Selection (CLI)

# Use a pre-made voice by name
pocket-tts generate --voice alba --text "Hello"
pocket-tts generate --voice javert --text "Hello"

# Use a local audio file for voice cloning
pocket-tts generate --voice ./my_voice.wav --text "Hello"

# Use a voice from HuggingFace
pocket-tts generate --voice "hf://kyutai/tts-voices/alba-mackenna/merchant.wav" --text "Hello"

Quality Tuning (CLI)

# Higher quality (more generation steps)
pocket-tts generate --lsd-decode-steps 5 --temperature 0.5 --output-path high_quality.wav

# More expressive/varied output
pocket-tts generate --temperature 1.0 --output-path expressive.wav

# Shorter output (finishes speaking earlier)
pocket-tts generate --eos-threshold -3.0 --output-path shorter.wav

Local Web Server

For quick iteration with multiple voices/texts:

uvx pocket-tts serve
# Open http://localhost:8000

Available Voices

Pre-made voices (use name directly with --voice):

| Voice | Gender | License | Description | |-------|--------|---------|-------------| | alba | Female | CC BY 4.0 | Casual voice | | marius | Male | CC0 | Voice donation | | javert | Male | CC0 | Voice donation | | jean | Male | CC-NC | EARS dataset | | fantine | Female | CC BY 4.0 | VCTK dataset | | cosette | Female | CC-NC | Expresso dataset | | eponine | Female | CC BY 4.0 | VCTK dataset | | azelma | Female | CC BY 4.0 | VCTK dataset |

Full voice catalog: https://huggingface.co/kyutai/tts-voices

For detailed voice information, see references/voices.md.

Voice Cloning

Clone any voice from an audio sample. For best results:

Use clean audio (minimal background noise)
10+ seconds recommended
Consider Adobe Podcast Enhance to clean samples

pocket-tts generate --voice ./my_recording.wav --text "Hello" --output-path cloned.wav

Output Format

Sample Rate: 24kHz
Channels: Mono
Format: 16-bit PCM WAV
Default location: ./tts_output.wav

Python API

For programmatic use:

from pocket_tts import TTSModel
import scipy.io.wavfile

tts_model = TTSModel.load_model()
voice_state = tts_model.get_state_for_audio_prompt("alba")
audio = tts_model.generate_audio(voice_state, "Hello world!")

# Save to specific location
scipy.io.wavfile.write("./audio/output.wav", tts_model.sample_rate, audio.numpy())

TTSModel.load_model()

model = TTSModel.load_model(
    variant="b6369a24",      # Model variant
    temp=0.7,                # Temperature (0.0-1.0)
    lsd_decode_steps=1,      # Generation steps
    noise_clamp=None,        # Max noise value
    eos_threshold=-4.0       # End-of-sequence threshold
)

Voice State

# Pre-made voice
voice_state = model.get_state_for_audio_prompt("alba")

# Local file
voice_state = model.get_state_for_audio_prompt("./my_voice.wav")

# HuggingFace
voice_state = model.get_state_for_audio_prompt("hf://kyutai/tts-voices/alba-mackenna/casual.wav")

Generate Audio

audio = model.generate_audio(voice_state, "Text to speak")
# Returns: torch.Tensor (1D)

Streaming

for chunk in model.generate_audio_stream(voice_state, "Long text..."):
    # Process each chunk as it's generated
    pass

Properties

model.sample_rate - 24000 Hz
model.device - "cpu" or "cuda"

Performance

~200ms latency to first audio chunk
~6x real-time on MacBook Air M4 CPU
Uses only 2 CPU cores

Limitations

English only
No built-in pause/silence control

kenneropia/text-to-voice

/SKILL.md

Convert text to speech using Kyutai's Pocket TTS. Use when the user asks to "generate speech", "text to speech", "TTS", "convert text to audio", "voice synthesis", "generate voice", "read aloud", or "create audio from text". Supports voice cloning from audio samples and multiple pre-made voices (alba, marius, javert, jean, fantine, cosette, eponine, azelma).

data-ai

Updated May 18, 2026

$ install --global

skillsauth

npx skillsauth add kenneropia/text-to-voice text-to-voice

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 18, 2026, 6:47 AM129.2s4 files scanned

SKILL.md

name:: text-to-voice
description:: Convert text to speech using Kyutai's Pocket TTS. Use when the user asks to "generate speech", "text to speech", "TTS", "convert text to audio", "voice synthesis", "generate voice", "read aloud", or "create audio from text". Supports voice cloning from audio samples and multiple pre-made voices (alba, marius, javert, jean, fantine, cosette, eponine, azelma).
license:: MIT
contributor:: Aaron Adetunmbi
thanks:: kyutai-labs

Text-to-Voice with Kyutai Pocket TTS

Convert text to natural speech using Kyutai's Pocket TTS - a lightweight 100M parameter model that runs efficiently on CPU.

Installation

pip install pocket-tts
# or use uvx to run without installing:
uvx pocket-tts generate

Requires Python 3.10+ and PyTorch 2.5+. GPU not required.

CLI Usage

Basic Generation

# Generate with defaults (saves to ./tts_output.wav)
uvx pocket-tts generate

# Specify text
pocket-tts generate --text "Hello, this is my message."

# Specify output file location
pocket-tts generate --text "Hello" --output-path ./audio/greeting.wav

# Full example with all common options
pocket-tts generate \
  --text "Welcome to the demo." \
  --voice alba \
  --output-path ./output/welcome.wav

CLI Options

Voice Selection (CLI)

# Use a pre-made voice by name
pocket-tts generate --voice alba --text "Hello"
pocket-tts generate --voice javert --text "Hello"

# Use a local audio file for voice cloning
pocket-tts generate --voice ./my_voice.wav --text "Hello"

# Use a voice from HuggingFace
pocket-tts generate --voice "hf://kyutai/tts-voices/alba-mackenna/merchant.wav" --text "Hello"

Quality Tuning (CLI)

# Higher quality (more generation steps)
pocket-tts generate --lsd-decode-steps 5 --temperature 0.5 --output-path high_quality.wav

# More expressive/varied output
pocket-tts generate --temperature 1.0 --output-path expressive.wav

# Shorter output (finishes speaking earlier)
pocket-tts generate --eos-threshold -3.0 --output-path shorter.wav

Local Web Server

For quick iteration with multiple voices/texts:

uvx pocket-tts serve
# Open http://localhost:8000

Available Voices

Pre-made voices (use name directly with --voice):

Full voice catalog: https://huggingface.co/kyutai/tts-voices

For detailed voice information, see references/voices.md.

Voice Cloning

Clone any voice from an audio sample. For best results:

Use clean audio (minimal background noise)
10+ seconds recommended
Consider Adobe Podcast Enhance to clean samples

pocket-tts generate --voice ./my_recording.wav --text "Hello" --output-path cloned.wav

Output Format

Sample Rate: 24kHz
Channels: Mono
Format: 16-bit PCM WAV
Default location: ./tts_output.wav

Python API

For programmatic use:

from pocket_tts import TTSModel
import scipy.io.wavfile

tts_model = TTSModel.load_model()
voice_state = tts_model.get_state_for_audio_prompt("alba")
audio = tts_model.generate_audio(voice_state, "Hello world!")

# Save to specific location
scipy.io.wavfile.write("./audio/output.wav", tts_model.sample_rate, audio.numpy())

TTSModel.load_model()

model = TTSModel.load_model(
    variant="b6369a24",      # Model variant
    temp=0.7,                # Temperature (0.0-1.0)
    lsd_decode_steps=1,      # Generation steps
    noise_clamp=None,        # Max noise value
    eos_threshold=-4.0       # End-of-sequence threshold
)

Voice State

# Pre-made voice
voice_state = model.get_state_for_audio_prompt("alba")

# Local file
voice_state = model.get_state_for_audio_prompt("./my_voice.wav")

# HuggingFace
voice_state = model.get_state_for_audio_prompt("hf://kyutai/tts-voices/alba-mackenna/casual.wav")

Generate Audio

audio = model.generate_audio(voice_state, "Text to speak")
# Returns: torch.Tensor (1D)

Streaming

for chunk in model.generate_audio_stream(voice_state, "Long text..."):
    # Process each chunk as it's generated
    pass

Properties

model.sample_rate - 24000 Hz
model.device - "cpu" or "cuda"

Performance

~200ms latency to first audio chunk
~6x real-time on MacBook Air M4 CPU
Uses only 2 CPU cores

Limitations

English only
No built-in pause/silence control

Related Skills

openclaw/taskflow-inbox-triage

data-ai

VerifiedTrustedCommunity

Example TaskFlow authoring pattern for inbox triage. Use when messages need different treatment based on intent, with some routes notifying immediately, some waiting on outside answers, and others rolling into a later summary.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/taskflow-inbox-triage

steipete/taskflow-inbox-triage

data-ai

VerifiedTrustedCommunity

356,423SKILL.mdUpdated Apr 13, 2026

steipete/taskflow-inbox-triage

steipete/prose

data-ai

VerifiedTrustedCommunity

OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.

356,423SKILL.mdUpdated Apr 13, 2026

openclaw/prose

data-ai

VerifiedTrustedCommunity

OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.

353,662SKILL.mdUpdated Apr 10, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/kenneropia/text-to-voice.git

# Copy into Claude Code skills folder (global)
cp -r text-to-voice/ ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

kenneropia/text-to-voice

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT