voice-isolator/SKILL.md
Remove background noise and isolate vocals/speech from audio using ElevenLabs Voice Isolator (audio isolation) API. Use when cleaning up noisy recordings, removing music or background ambience from dialogue, isolating speech from field recordings, preparing audio for transcription, extracting vocals, or any "denoise / clean up / isolate voice" task.
npx skillsauth add elevenlabs/skills voice-isolatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Removes background noise from audio and isolates vocals/speech — useful for cleaning up noisy recordings, prepping audio for transcription, or pulling dialogue out of a mixed track.
Setup: See Installation Guide. For JavaScript, use
@elevenlabs/*packages only.
from elevenlabs import ElevenLabs
client = ElevenLabs()
with open("noisy.mp3", "rb") as audio_file:
audio_stream = client.audio_isolation.convert(audio=audio_file)
with open("clean.mp3", "wb") as f:
for chunk in audio_stream:
f.write(chunk)
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createReadStream, createWriteStream } from "fs";
const client = new ElevenLabsClient();
const audioStream = await client.audioIsolation.convert({
audio: createReadStream("noisy.mp3"),
});
audioStream.pipe(createWriteStream("clean.mp3"));
curl -X POST "https://api.elevenlabs.io/v1/audio-isolation" \
-H "xi-api-key: $ELEVENLABS_API_KEY" \
-F "[email protected]" \
--output clean.mp3
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| audio | file (required) | — | Audio file with vocals/speech to isolate |
| file_format | string | other | other for any encoded audio, or pcm_s16le_16 for 16-bit PCM mono @ 16kHz little-endian (lower latency) |
import requests
from io import BytesIO
from elevenlabs import ElevenLabs
client = ElevenLabs()
audio_url = "https://example.com/noisy.mp3"
response = requests.get(audio_url)
audio_data = BytesIO(response.content)
audio_stream = client.audio_isolation.convert(audio=audio_data)
with open("clean.mp3", "wb") as f:
for chunk in audio_stream:
f.write(chunk)
If you already have raw 16-bit PCM mono @ 16kHz, passing file_format="pcm_s16le_16" skips decoding and reduces latency:
audio_stream = client.audio_isolation.convert(
audio=pcm_bytes,
file_format="pcm_s16le_16",
)
Any common encoded audio/video container works as input (MP3, WAV, M4A, FLAC, OGG, WebM, MP4, etc.). Response is a streamed MP3 by default.
speech_to_text.convert() for better transcription accuracy.try:
audio_stream = client.audio_isolation.convert(audio=audio_file)
except Exception as e:
print(f"Voice isolation failed: {e}")
Common errors:
file_format for the supplied audio)development
Build voice AI agents with ElevenLabs. Use when creating voice assistants, customer service bots, interactive voice characters, or any real-time voice conversation experience.
tools
Transform the voice in an audio recording into a different target voice while preserving emotion, timing, and delivery using the ElevenLabs Voice Changer (speech-to-speech) API. Use when converting one voice to another, changing the speaker/narrator of an existing recording, dubbing a voice-over in a different voice, creating character voices from a scratch performance, anonymizing a speaker, or any "voice conversion / voice transfer / speech-to-speech" task. Make sure to use this skill whenever the user mentions voice changing, voice conversion, speech-to-speech, swapping a voice in audio, re-voicing a clip, or applying a different voice to an existing recording — even if they don't explicitly say "voice changer".
content-media
Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.
development
Convert text to speech using ElevenLabs voice AI. Use when generating audio from text, creating voiceovers, building voice apps, or synthesizing speech in 70+ languages.