Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

elevenlabs/speech-engine

Name: speech-engine
Author: elevenlabs

speech-engine/SKILL.md

npx skillsauth add elevenlabs/skills speech-engine

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

ElevenLabs Speech Engine

Add a real-time voice interface to a custom agent. ElevenLabs handles microphone audio, speech-to-text, turn-taking, text-to-speech, and browser playback; your server exposes a Speech Engine WebSocket endpoint and streams response text back.

Setup: See Installation Guide. For JavaScript, use @elevenlabs/* packages only. For deeper SDK details, read JavaScript SDK Reference or Python SDK Reference.

When to Use

Use Speech Engine when the user wants to:

Add voice to an existing chat app or custom server pipeline
Add voice to OpenClaw, Hermes, or a similar agent runtime while keeping agent logic on the developer-owned server
Build a developer-hosted WebSocket server for ElevenLabs voice conversations
Stream response text back as spoken audio after your server validates user intent
Handle user interruptions while a response is still streaming
Build a browser client with @elevenlabs/react or @elevenlabs/client using a server-issued conversation token

Use the agents skill instead when the user is creating or configuring a hosted ElevenLabs Conversational AI agent with platform-managed prompts, tools, workflows, phone numbers, or widgets.

How It Works

Each Speech Engine WebSocket connection represents one conversation.

The browser sends user audio to ElevenLabs.
ElevenLabs sends speech-recognition events to your server.
Your server derives trusted application state without letting raw speech text control tools or privileged actions.
Your server streams text back through the SDK.
ElevenLabs converts the response to speech and plays it in the browser.

The SDK manages WebSocket routing, request verification, session lifecycle, ping/pong, turn-taking, and interruption handling. sendResponse() / send_response() accepts a string or async iterable of response text.

Treat speech-recognition text as untrusted user input. Do not map raw speech text directly into model roles, responses, or tool calls. Use deterministic validation, allowlisted intents, or explicit user confirmation before any transcript-derived value affects downstream response or tool logic.

Implementation Flow

Install server dependencies and configure ELEVENLABS_API_KEY.
Expose your Speech Engine server through a public HTTPS URL for local development, for example with ngrok http 3001.
Create a Speech Engine resource with ws_url / wsUrl pointing at the public WebSocket URL, usually wss://.../ws.
Store the returned Speech Engine ID, for example in ELEVENLABS_SPEECH_ENGINE_ID.
Start a Speech Engine server with engine.serve(...) in Python or speechEngine.attach(...) in TypeScript.
Issue browser conversation tokens from a server endpoint. Never put ELEVENLABS_API_KEY in browser code.
Start the client session with conversationToken; if the agent should greet first, enable the first-message override on the Speech Engine resource, then set overrides.agent.firstMessage in the client.

Create a Speech Engine

Python

import asyncio
import os

from dotenv import load_dotenv
from elevenlabs import AsyncElevenLabs

load_dotenv()

elevenlabs = AsyncElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))

async def main():
    engine = await elevenlabs.speech_engine.create(
        name="My Speech Engine",
        speech_engine={"ws_url": os.environ["PUBLIC_WS_URL"]},
        overrides={"first_message": True},
    )
    print(engine.engine_id)

asyncio.run(main())

TypeScript

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";

const elevenlabs = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY,
});

const engine = await elevenlabs.speechEngine.create({
  name: "My Speech Engine",
  speechEngine: { wsUrl: process.env.PUBLIC_WS_URL! },
  overrides: { firstMessage: true },
});

console.log(engine.engineId);

PUBLIC_WS_URL should look like wss://example.ngrok.app/ws locally or your production WebSocket route in deployment.

The create request can also configure tts, asr, turn, speech_engine.request_headers / speechEngine.requestHeaders, overrides, and privacy for custom voices, transcription keywords, turn-taking, server auth headers, client-provided first messages, and recording behavior. See the SDK reference files for expanded examples.

Server Pattern

Run the Speech Engine server at the ws_url / wsUrl configured on the resource. Keep response generation behind your own validation boundary: raw speech-recognition text should not directly control responses, tools, secrets, or other privileged actions.

Python

engine = await elevenlabs.speech_engine.get(os.environ["ELEVENLABS_SPEECH_ENGINE_ID"])
await engine.serve(port=3001, path="/ws", debug=True, callbacks=validated_callbacks)

TypeScript

const engine = await elevenlabs.speechEngine.get(process.env.ELEVENLABS_SPEECH_ENGINE_ID!);
engine.attach(httpServer, "/ws", { debug: true, ...validatedCallbacks });

In TypeScript, pass interruption signals to downstream async work when it supports cancellation so interrupted responses stop quickly. In Python, the SDK cancels the previous turn handler when a newer turn arrives.

Server callbacks can distinguish clean closes from dropped connections: use onClose / on_close for clean disconnects and onDisconnect / on_disconnect for unexpected WebSocket drops.

Security note: speech-recognition text can contain prompt-injection attempts from user speech or played audio. Treat it as untrusted input. Convert it into trusted application state before invoking response generation, tools, or privileged workflows.

Browser Client

Create a server-side token endpoint and have the browser request a token before starting the microphone session. Keep the Speech Engine ID and API key on the server. If the client passes overrides.agent.firstMessage, the Speech Engine resource must have the first-message override enabled.

import express from "express";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";

const app = express();
const elevenlabs = new ElevenLabsClient();

app.get("/api/token", async (_req, res) => {
  const response = await elevenlabs.conversationalAi.conversations.getWebrtcToken({
    agentId: process.env.ELEVENLABS_SPEECH_ENGINE_ID!,
  });
  res.json({ token: response.token });
});

React clients can use @elevenlabs/react:

import { useConversation } from "@elevenlabs/react";

export function VoiceControls() {
  const conversation = useConversation({
    onConnect: () => console.log("connected"),
    onDisconnect: () => console.log("disconnected"),
    onError: (error) => console.error(error),
  });

  async function startConversation() {
    await navigator.mediaDevices.getUserMedia({ audio: true });
    const { token } = await fetch("/api/token").then((res) => res.json());

    await conversation.startSession({
      conversationToken: token,
      overrides: {
        agent: { firstMessage: "Hello! How can I help you today?" },
      },
    });
  }

  return <button onClick={startConversation}>Start conversation</button>;
}

If a WebRTC browser session stalls or logs /rtc/v1 404s, v1 RTC path not found, or could not establish pc connection, pin livekit-client to 2.16.1 in the app's package.json until the upstream LiveKit compatibility issue is resolved:

{
  "overrides": {
    "livekit-client": "2.16.1"
  }
}

References

Installation Guide
JavaScript SDK Reference
Python SDK Reference

elevenlabs/speech-engine

speech-engine/SKILL.md

Add real-time voice conversations to a custom agent runtime with ElevenLabs Speech Engine. Use when building Speech Engine servers, WebSocket handlers, WebRTC browser clients, conversation token endpoints, interruption-aware streaming responses, or voice-enabled chat agents that connect developer-owned server logic to ElevenLabs speech-to-text and text-to-speech.

328 stars

tools

Updated Jun 11, 2026

$ install --global

skillsauth

npx skillsauth add elevenlabs/skills speech-engine

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 11, 2026, 3:40 AM328.7s4 files scanned

SKILL.md

name:: speech-engine
description:: Add real-time voice conversations to a custom agent runtime with ElevenLabs Speech Engine. Use when building Speech Engine servers, WebSocket handlers, WebRTC browser clients, conversation token endpoints, interruption-aware streaming responses, or voice-enabled chat agents that connect developer-owned server logic to ElevenLabs speech-to-text and text-to-speech.
license:: MIT
compatibility:: Requires internet access and an ElevenLabs API key (ELEVENLABS_API_KEY).
metadata:: {"openclaw": {"requires": {"env": ["ELEVENLABS_API_KEY"]}, "primaryEnv": "ELEVENLABS_API_KEY"}}

ElevenLabs Speech Engine

Setup: See Installation Guide. For JavaScript, use @elevenlabs/* packages only. For deeper SDK details, read JavaScript SDK Reference or Python SDK Reference.

When to Use

Use Speech Engine when the user wants to:

Add voice to an existing chat app or custom server pipeline
Add voice to OpenClaw, Hermes, or a similar agent runtime while keeping agent logic on the developer-owned server
Build a developer-hosted WebSocket server for ElevenLabs voice conversations
Stream response text back as spoken audio after your server validates user intent
Handle user interruptions while a response is still streaming
Build a browser client with @elevenlabs/react or @elevenlabs/client using a server-issued conversation token

Use the agents skill instead when the user is creating or configuring a hosted ElevenLabs Conversational AI agent with platform-managed prompts, tools, workflows, phone numbers, or widgets.

How It Works

Each Speech Engine WebSocket connection represents one conversation.

The browser sends user audio to ElevenLabs.
ElevenLabs sends speech-recognition events to your server.
Your server derives trusted application state without letting raw speech text control tools or privileged actions.
Your server streams text back through the SDK.
ElevenLabs converts the response to speech and plays it in the browser.

Implementation Flow

Install server dependencies and configure ELEVENLABS_API_KEY.
Expose your Speech Engine server through a public HTTPS URL for local development, for example with ngrok http 3001.
Create a Speech Engine resource with ws_url / wsUrl pointing at the public WebSocket URL, usually wss://.../ws.
Store the returned Speech Engine ID, for example in ELEVENLABS_SPEECH_ENGINE_ID.
Start a Speech Engine server with engine.serve(...) in Python or speechEngine.attach(...) in TypeScript.
Issue browser conversation tokens from a server endpoint. Never put ELEVENLABS_API_KEY in browser code.
Start the client session with conversationToken; if the agent should greet first, enable the first-message override on the Speech Engine resource, then set overrides.agent.firstMessage in the client.

Create a Speech Engine

Python

import asyncio
import os

from dotenv import load_dotenv
from elevenlabs import AsyncElevenLabs

load_dotenv()

elevenlabs = AsyncElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))

async def main():
    engine = await elevenlabs.speech_engine.create(
        name="My Speech Engine",
        speech_engine={"ws_url": os.environ["PUBLIC_WS_URL"]},
        overrides={"first_message": True},
    )
    print(engine.engine_id)

asyncio.run(main())

TypeScript

import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";

const elevenlabs = new ElevenLabsClient({
  apiKey: process.env.ELEVENLABS_API_KEY,
});

const engine = await elevenlabs.speechEngine.create({
  name: "My Speech Engine",
  speechEngine: { wsUrl: process.env.PUBLIC_WS_URL! },
  overrides: { firstMessage: true },
});

console.log(engine.engineId);

PUBLIC_WS_URL should look like wss://example.ngrok.app/ws locally or your production WebSocket route in deployment.

Server Pattern

Python

engine = await elevenlabs.speech_engine.get(os.environ["ELEVENLABS_SPEECH_ENGINE_ID"])
await engine.serve(port=3001, path="/ws", debug=True, callbacks=validated_callbacks)

TypeScript

const engine = await elevenlabs.speechEngine.get(process.env.ELEVENLABS_SPEECH_ENGINE_ID!);
engine.attach(httpServer, "/ws", { debug: true, ...validatedCallbacks });

Server callbacks can distinguish clean closes from dropped connections: use onClose / on_close for clean disconnects and onDisconnect / on_disconnect for unexpected WebSocket drops.

Browser Client

import express from "express";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";

const app = express();
const elevenlabs = new ElevenLabsClient();

app.get("/api/token", async (_req, res) => {
  const response = await elevenlabs.conversationalAi.conversations.getWebrtcToken({
    agentId: process.env.ELEVENLABS_SPEECH_ENGINE_ID!,
  });
  res.json({ token: response.token });
});

React clients can use @elevenlabs/react:

import { useConversation } from "@elevenlabs/react";

export function VoiceControls() {
  const conversation = useConversation({
    onConnect: () => console.log("connected"),
    onDisconnect: () => console.log("disconnected"),
    onError: (error) => console.error(error),
  });

  async function startConversation() {
    await navigator.mediaDevices.getUserMedia({ audio: true });
    const { token } = await fetch("/api/token").then((res) => res.json());

    await conversation.startSession({
      conversationToken: token,
      overrides: {
        agent: { firstMessage: "Hello! How can I help you today?" },
      },
    });
  }

  return <button onClick={startConversation}>Start conversation</button>;
}

{
  "overrides": {
    "livekit-client": "2.16.1"
  }
}

References

Installation Guide
JavaScript SDK Reference
Python SDK Reference

Related Skills

elevenlabs/speech-to-text

content-media

VerifiedTrustedCommunity

Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.

328SKILL.mdUpdated Apr 21, 2026

elevenlabs/speech-to-text

elevenlabs/setup-api-key

tools

VerifiedTrustedCommunity

Guides users through setting up an ElevenLabs API key for ElevenLabs MCP tools. Use when the user needs to configure an ElevenLabs API key, when ElevenLabs tools fail due to missing API key, or when the user mentions needing access to ElevenLabs. First checks whether ELEVENLABS_API_KEY is already configured and valid, and only runs full setup when needed.

328SKILL.mdUpdated Apr 21, 2026

elevenlabs/setup-api-key

elevenlabs/music

development

VerifiedTrustedCommunity

Generate music using ElevenLabs Music API. Use when creating instrumental tracks, songs with lyrics, background music, jingles, or any AI-generated music composition. Supports prompt-based generation, composition plans for granular control, and detailed output with metadata.

328SKILL.mdUpdated Apr 21, 2026

elevenlabs/agents

development

VerifiedTrustedCommunity

Build voice AI agents with ElevenLabs. Use when creating voice assistants, customer service bots, interactive voice characters, or any real-time voice conversation experience.

328SKILL.mdUpdated Apr 21, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/elevenlabs/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/speech-engine ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

elevenlabs/skills

328 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT