src/skills/ai-provider-elevenlabs/SKILL.md
ElevenLabs voice AI SDK patterns for TypeScript/Node.js -- text-to-speech, streaming, voice cloning, speech-to-speech, pronunciation control, and conversational AI
npx skillsauth add agents-inc/skills ai-provider-elevenlabsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick Guide: Use the official
@elevenlabs/elevenlabs-jspackage to interact with the ElevenLabs API. Useclient.textToSpeech.convert()for full audio generation orclient.textToSpeech.stream()for low-latency streaming. Voice settings (stability,similarityBoost,style) control output character. Useeleven_v3for best quality,eleven_flash_v2_5for lowest latency, oreleven_multilingual_v2for stable long-form content. The SDK returnsReadableStream<Uint8Array>-- pipe to files or HTTP responses. Use@elevenlabs/clientfor real-time conversational AI agents.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use @elevenlabs/elevenlabs-js for server-side TTS, voice management, and speech-to-speech -- use @elevenlabs/client only for conversational AI agents)
(You MUST never hardcode API keys -- always use environment variables via process.env.ELEVENLABS_API_KEY which the SDK reads automatically)
(You MUST consume the ReadableStream<Uint8Array> returned by convert() and stream() -- unconsumed streams leak resources)
(You MUST choose the correct model for your use case -- eleven_v3 for quality, eleven_flash_v2_5 for speed, eleven_multilingual_v2 for long-form stability)
(You MUST pass voiceId as the first positional argument to all textToSpeech methods -- it is NOT inside the options object)
</critical_requirements>
Auto-detection: ElevenLabs, elevenlabs, ElevenLabsClient, textToSpeech.convert, textToSpeech.stream, eleven_multilingual_v2, eleven_flash_v2_5, eleven_v3, speechToSpeech, voices.search, voice cloning, ELEVENLABS_API_KEY, @elevenlabs/elevenlabs-js, @elevenlabs/client, text-to-speech, TTS, voice synthesis
When to use:
Key patterns covered:
convert, stream, timestamps)stability, similarityBoost, style, speed)voices.search, voices.get)voices.ivc.create)@elevenlabs/client)When NOT to use:
The ElevenLabs SDK provides direct access to the most advanced voice AI API available. It wraps the ElevenLabs REST API with full TypeScript types, streaming support, and automatic retries.
Core principles:
ReadableStream<Uint8Array>. You pipe them to files, HTTP responses, or audio players. The SDK never buffers entire audio files in memory.stability, similarityBoost, style, and speed shape every generation. Learn these four knobs well.eleven_v3 for best quality, eleven_flash_v2_5 for sub-75ms latency, eleven_multilingual_v2 for stable long-form.@elevenlabs/elevenlabs-js for server-side TTS/voice management, @elevenlabs/client for browser-side conversational AI agents.Initialize the ElevenLabs client. It auto-reads ELEVENLABS_API_KEY from the environment.
// lib/elevenlabs.ts -- basic setup
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
const client = new ElevenLabsClient();
export { client };
// lib/elevenlabs.ts -- production configuration
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
const TIMEOUT_SECONDS = 60;
const MAX_RETRIES = 3;
const client = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
timeoutInSeconds: TIMEOUT_SECONDS,
maxRetries: MAX_RETRIES,
});
export { client };
Why good: Minimal setup, env var auto-detected, named constants for production settings
// BAD: Hardcoded API key
const client = new ElevenLabsClient({
apiKey: "sk-1234567890abcdef",
});
Why bad: Hardcoded API key is a security breach risk, will leak in version control
See: examples/core.md for per-request overrides, error handling
Generate complete audio from text. Returns ReadableStream<Uint8Array>.
import { createWriteStream } from "node:fs";
import { Readable } from "node:stream";
const VOICE_ID = "JBFqnCBsd6RMkjVDRZzb"; // George
const audio = await client.textToSpeech.convert(VOICE_ID, {
text: "Welcome to the application.",
modelId: "eleven_multilingual_v2",
outputFormat: "mp3_44100_128",
});
// Pipe to file
const readable = Readable.fromWeb(audio);
const fileStream = createWriteStream("output.mp3");
readable.pipe(fileStream);
Why good: voiceId as first arg (required), model and format explicit, stream piped to file without buffering
// BAD: voiceId inside options object
const audio = await client.textToSpeech.convert({
voiceId: VOICE_ID, // WRONG: voiceId is a positional argument
text: "Hello",
});
Why bad: voiceId is the first positional argument, not an options field -- this will throw a type error
See: examples/core.md for timestamps, HTTP response piping
Stream audio for real-time playback with lower latency than convert().
const VOICE_ID = "JBFqnCBsd6RMkjVDRZzb";
const LATENCY_OPTIMIZATION = 2;
const audioStream = await client.textToSpeech.stream(VOICE_ID, {
text: "This streams with lower latency for real-time playback.",
modelId: "eleven_flash_v2_5",
optimizeStreamingLatency: LATENCY_OPTIMIZATION,
outputFormat: "mp3_44100_128",
});
// Consume the stream
for await (const chunk of audioStream) {
process.stdout.write(chunk); // Or pipe to audio player / HTTP response
}
Why good: Uses stream() for lower latency, eleven_flash_v2_5 for speed, optimizeStreamingLatency reduces first-byte time
// BAD: Stream created but never consumed
const audioStream = await client.textToSpeech.stream(VOICE_ID, {
text: "This audio is lost",
modelId: "eleven_flash_v2_5",
});
// Stream never consumed -- resources leaked
Why bad: Unconsumed streams leak resources and the audio data is silently lost
See: examples/core.md for streaming to HTTP responses
Control voice characteristics with voiceSettings.
const VOICE_ID = "JBFqnCBsd6RMkjVDRZzb";
const audio = await client.textToSpeech.convert(VOICE_ID, {
text: "Emotional and expressive delivery.",
modelId: "eleven_v3",
voiceSettings: {
stability: 0.3, // Lower = more expressive/variable
similarityBoost: 0.8, // Higher = closer to original voice
style: 0.5, // Higher = more style exaggeration
useSpeakerBoost: true, // Enhanced speaker similarity (adds latency)
speed: 1.0, // 0.7-1.3 range typical
},
});
Why good: All settings explicit with clear purpose, stability lowered for expressive content
// BAD: Using extreme values without understanding
const audio = await client.textToSpeech.convert(VOICE_ID, {
text: "Extreme settings cause artifacts.",
modelId: "eleven_v3",
voiceSettings: {
stability: 0.0, // Too unstable -- garbled output
similarityBoost: 1.0, // Combined with low stability = artifacts
style: 1.0, // Maximum exaggeration -- unnatural
},
});
Why bad: Extreme values produce artifacts; stability: 0.0 with high similarityBoost is unstable. Start with defaults and adjust incrementally.
See: reference.md for voice settings ranges and recommended starting values
Find and select voices from the ElevenLabs voice library.
// Search all available voices
const { voices } = await client.voices.search();
for (const voice of voices) {
console.log(`${voice.name} (${voice.voiceId}) - ${voice.category}`);
}
// Get a specific voice by ID
const VOICE_ID = "JBFqnCBsd6RMkjVDRZzb";
const voice = await client.voices.get(VOICE_ID);
console.log(voice.name, voice.settings);
Why good: Uses voices.search() to discover available voices, voices.get() for details
See: examples/voices.md for filtering, voice cloning, speech-to-speech
Create an instant voice clone from audio samples.
import { createReadStream } from "node:fs";
const voice = await client.voices.ivc.create({
name: "My Custom Voice",
files: [createReadStream("sample1.mp3"), createReadStream("sample2.mp3")],
removeBackgroundNoise: true,
});
console.log(`Created voice: ${voice.voiceId}`);
// Use the cloned voice for TTS
const audio = await client.textToSpeech.convert(voice.voiceId, {
text: "Speaking in the cloned voice.",
modelId: "eleven_multilingual_v2",
});
Why good: removeBackgroundNoise improves quality, multiple samples improve accuracy, immediately usable
See: examples/voices.md for professional voice cloning, sample validation
Convert speech from one voice to another while preserving emotion and cadence.
import { createReadStream } from "node:fs";
const TARGET_VOICE_ID = "JBFqnCBsd6RMkjVDRZzb";
const convertedAudio = await client.speechToSpeech.convert(TARGET_VOICE_ID, {
audio: createReadStream("input-speech.mp3"),
modelId: "eleven_multilingual_sts_v2",
voiceSettings: {
stability: 0.5,
similarityBoost: 0.75,
},
});
Why good: Uses STS-specific model, preserves source emotion, voice settings control output fidelity
See: examples/voices.md for streaming STS, English-only model
Catch SDK errors and handle specific failure modes.
import {
ElevenLabsError,
ElevenLabsTimeoutError,
} from "@elevenlabs/elevenlabs-js";
const VOICE_ID = "JBFqnCBsd6RMkjVDRZzb";
try {
const audio = await client.textToSpeech.convert(VOICE_ID, {
text: "Hello, world.",
modelId: "eleven_multilingual_v2",
});
} catch (error) {
if (error instanceof ElevenLabsTimeoutError) {
console.error("Request timed out -- increase timeoutInSeconds or retry");
} else if (error instanceof ElevenLabsError) {
console.error(`ElevenLabs API error: ${error.message}`);
console.error(`Status: ${error.statusCode}`);
console.error(`Body: ${JSON.stringify(error.body)}`);
} else {
throw error; // Re-throw non-ElevenLabs errors
}
}
Why good: Catches specific error types, logs status code and body for debugging, re-throws unknown errors
See: examples/core.md for stream error handling, retry patterns
</patterns>Best quality + expressiveness -> eleven_v3 (70+ languages)
Long-form stability -> eleven_multilingual_v2 (29 languages, 10K char limit)
Lowest latency (<75ms) -> eleven_flash_v2_5 (32 languages, 40K char limit)
English-only low latency -> eleven_flash_v2 (English only, 30K char limit)
Voice design from text prompt -> eleven_ttv_v3 (70+ languages)
stream() instead of convert() for user-facing audio -- playback starts before generation completesoptimizeStreamingLatency (0-4) on stream() calls -- higher values reduce latency but may affect text normalizationeleven_flash_v2_5 for real-time applications -- sub-75ms latency at 50% lower costprevious_request_ids for multi-part generation -- maintains voice consistency across segmentsoutputFormat: "pcm_16000" for server-side processing pipelines -- lower bandwidth than MP3<decision_framework>
What is your priority?
+-- Best quality / expressiveness -> eleven_v3
+-- Lowest latency (<75ms) -> eleven_flash_v2_5
+-- Long-form stability (audiobooks) -> eleven_multilingual_v2
+-- English-only speed -> eleven_flash_v2
+-- Voice design from text description -> eleven_ttv_v3
+-- Speech-to-speech conversion -> eleven_multilingual_sts_v2 (or eleven_english_sts_v2)
Is the audio user-facing with real-time playback?
+-- YES -> Use stream() for progressive playback
| +-- Need timestamps? -> streamWithTimestamps()
+-- NO -> Use convert() for complete audio
+-- Need timestamps? -> convertWithTimestamps()
+-- Saving to file? -> convert() and pipe to WriteStream
What are you building?
+-- Server-side TTS, voice management, STS -> @elevenlabs/elevenlabs-js
+-- Browser conversational AI agent -> @elevenlabs/client
+-- React conversational AI agent -> @elevenlabs/react
+-- WebSocket text input streaming -> @elevenlabs/elevenlabs-js (or raw WebSocket)
What is the audio destination?
+-- Web browser playback -> mp3_44100_128 (universal compatibility)
+-- Low-bandwidth streaming -> opus_48000_64 (smaller files)
+-- Audio processing pipeline -> pcm_16000 or pcm_44100 (raw audio)
+-- Telephony / IVR -> ulaw_8000 or alaw_8000 (legacy codecs)
+-- High-quality archival -> wav_44100 or mp3_44100_192
</decision_framework>
<red_flags>
High Priority Issues:
process.env.ELEVENLABS_API_KEY (security breach risk)convert() or stream() (resources leaked, audio lost)voiceId inside the options object instead of as the first positional argument (type error)eleven_turbo_v2_5 instead of eleven_flash_v2_5 (migrate to Flash models)eleven_monolingual_v1 or eleven_multilingual_v1 (use v2+ models)Medium Priority Issues:
timeoutInSeconds for production (default is 240 seconds -- may be too long or too short)stability: 0.0 or extreme voice settings without testing (produces artifacts)optimizeStreamingLatency when streaming to users (adds unnecessary latency)outputFormat and relying on the default when a specific format is neededCommon Mistakes:
@elevenlabs/elevenlabs-js (server-side TTS SDK) with @elevenlabs/client (conversational AI agents SDK) -- they serve different purposestextToSpeech.convert() for real-time playback instead of textToSpeech.stream() -- convert waits for full generationprevious_request_ids for multi-part audio -- causes voice inconsistency between segmentseleven_v3 when latency matters -- it has higher latency than Flash modelsGotchas & Edge Cases:
maxRetries: 0 if you handle retries yourself.convert() and stream() both return ReadableStream<Uint8Array> but stream() starts sending data before generation completes (lower time-to-first-byte).convertWithTimestamps() returns { audioBase64, alignment } NOT a stream -- the entire audio is base64-encoded.streamWithTimestamps() returns an SSE Stream<ChunkWithTimestamps> -- each chunk has audio data AND character timing.play() helper function from the SDK requires MPV and FFmpeg installed locally -- not suitable for production servers.chunk_length_schedule defaults to [120, 160, 250, 290] characters -- audio generation starts after the first threshold.enable_ssml_parsing must be set as a query parameter on the WebSocket connection, not in the text message.speed voice setting accepts values roughly in the 0.7-1.3 range for natural-sounding output.</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST use @elevenlabs/elevenlabs-js for server-side TTS, voice management, and speech-to-speech -- use @elevenlabs/client only for conversational AI agents)
(You MUST never hardcode API keys -- always use environment variables via process.env.ELEVENLABS_API_KEY which the SDK reads automatically)
(You MUST consume the ReadableStream<Uint8Array> returned by convert() and stream() -- unconsumed streams leak resources)
(You MUST choose the correct model for your use case -- eleven_v3 for quality, eleven_flash_v2_5 for speed, eleven_multilingual_v2 for long-form stability)
(You MUST pass voiceId as the first positional argument to all textToSpeech methods -- it is NOT inside the options object)
Failure to follow these rules will produce broken, insecure, or degraded voice AI integrations.
</critical_reminders>
development
Material Design component library for Vue 3
development
VitePress 1.x — Vue-powered static site generator for documentation sites, built on Vite
tools
Docusaurus 3.x documentation framework — site configuration, docs/blog plugins, sidebars, versioning, MDX, swizzling, and deployment
development
TanStack Form patterns - useForm, form.Field, validators, arrays, linked fields, createFormHook, type safety