.claude/skills/ts-elevenlabs/SKILL.md
Generate realistic speech with the ElevenLabs API. Use when a user asks to convert text to speech, clone voices, build voice-enabled apps, stream audio, or integrate ElevenLabs voice synthesis into applications.
npx skillsauth add eliferjunior/Claude elevenlabsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert in ElevenLabs, the AI voice platform for high-quality text-to-speech, voice cloning, and conversational AI. You help developers build voice-enabled applications with natural-sounding speech, custom voice creation, multilingual support, and real-time streaming TTS for voice agents, audiobooks, podcasts, and accessibility features.
# Basic TTS — generate audio from text
from elevenlabs import ElevenLabs
client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
# Generate and save audio
audio = client.text_to_speech.convert(
voice_id="pNInz6obpgDQGcFmaJgB", # "Rachel" — warm, professional
text="Welcome to Bright Smile Dental. How can I help you today?",
model_id="eleven_turbo_v2_5", # Optimized for low latency (~200ms)
voice_settings={
"stability": 0.6, # Lower = more expressive, higher = more consistent
"similarity_boost": 0.8, # How closely to match the original voice
"style": 0.3, # Style exaggeration (0-1)
"use_speaker_boost": True, # Enhance clarity
},
)
# Save to file
with open("greeting.mp3", "wb") as f:
for chunk in audio:
f.write(chunk)
# Streaming TTS — for real-time applications
audio_stream = client.text_to_speech.convert_as_stream(
voice_id="pNInz6obpgDQGcFmaJgB",
text="Let me check our available appointments for next Tuesday.",
model_id="eleven_turbo_v2_5",
output_format="pcm_24000", # Raw PCM for WebRTC/LiveKit
)
for chunk in audio_stream:
send_to_audio_output(chunk) # Stream directly to speaker
# Instant voice clone — from a single audio sample
voice = client.voices.add(
name="Dr. Smith",
files=[open("dr_smith_sample.mp3", "rb")],
description="Calm, authoritative male voice for medical context",
labels={"use_case": "voice_agent", "language": "en"},
)
print(f"Cloned voice ID: {voice.voice_id}")
# Professional voice clone (higher quality, requires consent)
# Needs 30+ minutes of clean audio for best results
# ElevenLabs Conversational AI — fully managed voice agent
from elevenlabs import ConversationalAI
agent = ConversationalAI(
api_key=os.environ["ELEVENLABS_API_KEY"],
agent_id="your-agent-id", # Created in ElevenLabs dashboard
)
# WebSocket connection for real-time conversation
async def handle_call(websocket):
async for audio_chunk in websocket:
# Send caller audio to ElevenLabs
response = await agent.process_audio(audio_chunk)
# Send AI response audio back to caller
await websocket.send(response.audio)
// Browser-based TTS
import { ElevenLabsClient } from "elevenlabs";
const client = new ElevenLabsClient({ apiKey: process.env.ELEVENLABS_KEY });
// Stream audio in browser
const response = await client.textToSpeech.convertAsStream(voiceId, {
text: "Hello! How can I assist you?",
model_id: "eleven_turbo_v2_5",
output_format: "mp3_44100_128",
});
// Play audio using Web Audio API
const audioContext = new AudioContext();
const reader = response.getReader();
// ... decode and play chunks
| Model | Latency | Quality | Best For |
|-------|---------|---------|----------|
| eleven_turbo_v2_5 | ~200ms | High | Voice agents, real-time apps |
| eleven_multilingual_v2 | ~400ms | Highest | Multilingual, audiobooks |
| eleven_english_v1 | ~300ms | Good | English-only, cost-sensitive |
pip install elevenlabs # Python
npm install elevenlabs # Node.js
Example 1: User asks to set up elevenlabs
User: "Help me set up elevenlabs for my project"
The agent should:
Example 2: User asks to build a feature with elevenlabs
User: "Create a dashboard using elevenlabs"
The agent should:
eleven_turbo_v2_5 for real-time conversations; 200ms latency feels instantconvert_as_stream instead of convert for voice agents; first audio chunk arrives in ~200mspcm_24000 or pcm_16000 output format when feeding into WebRTC/LiveKit; no decoding overhead<break time="0.5s"/> in text for natural pauses; helps with phone menu optionsdevelopment
Expert guidance for Fireworks AI, the platform for running open-source LLMs (Llama, Mixtral, Qwen, etc.) with enterprise-grade speed and reliability. Helps developers integrate Fireworks' inference API, fine-tune models, and deploy custom model endpoints with function calling and structured output support.
development
Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.
tools
Expert guidance for Firebase, Google's platform for building and scaling web and mobile applications. Helps developers set up authentication, Firestore/Realtime Database, Cloud Functions, hosting, storage, and analytics using Firebase's SDK and CLI.
development
When the user needs to build file upload functionality for a web application. Use when the user mentions "file upload," "image upload," "upload endpoint," "multipart upload," "presigned URL," "S3 upload," "file validation," "upload to cloud storage," or "accept user files." Handles upload endpoints, file validation (type, size, magic bytes), cloud storage integration, and upload status tracking. For image/video processing after upload, see media-transcoder.