openclaw/elevenlabs-transcribe/SKILL.md
Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.
npx skillsauth add elevenlabs/skills elevenlabs-transcribeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Official ElevenLabs skill for speech-to-text transcription.
Convert audio to text with state-of-the-art accuracy. Supports 90+ languages, speaker diarization, and realtime streaming.
brew install ffmpeg on macOS){baseDir}/scripts/transcribe.sh <audio_file> [options]
{baseDir}/scripts/transcribe.sh --url <stream_url> [options]
{baseDir}/scripts/transcribe.sh --mic [options]
Transcribe a local audio file:
{baseDir}/scripts/transcribe.sh recording.mp3
With speaker identification:
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize
Get full JSON response with timestamps:
{baseDir}/scripts/transcribe.sh interview.wav --diarize --json
Stream from a URL (e.g., live radio, podcast):
{baseDir}/scripts/transcribe.sh --url https://npr-ice.streamguys1.com/live.mp3
Transcribe from microphone:
{baseDir}/scripts/transcribe.sh --mic
Stream a local file in realtime (useful for testing):
{baseDir}/scripts/transcribe.sh audio.mp3 --realtime
Suppress status messages on stderr:
{baseDir}/scripts/transcribe.sh --mic --quiet
| Option | Description |
|--------|-------------|
| --diarize | Identify different speakers in the audio |
| --lang CODE | ISO language hint (e.g., en, pt, es, fr) |
| --json | Output full JSON with timestamps and metadata |
| --events | Tag audio events (laughter, music, applause) |
| --realtime | Stream local file instead of batch processing |
| --partials | Show interim transcripts during realtime mode |
| -q, --quiet | Suppress status messages (recommended for agents) |
Plain text transcription:
The quick brown fox jumps over the lazy dog.
--json){
"text": "The quick brown fox jumps over the lazy dog.",
"language_code": "eng",
"language_probability": 0.98,
"words": [
{"text": "The", "start": 0.0, "end": 0.15, "type": "word", "speaker_id": "speaker_0"}
]
}
Final transcripts print as they're committed. With --partials:
[partial] The quick
[partial] The quick brown fox
The quick brown fox jumps over the lazy dog.
Audio: MP3, WAV, M4A, FLAC, OGG, WebM, AAC, AIFF, Opus Video: MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GPP
Limits: Up to 3GB file size, 10 hours duration
The script exits with non-zero status on errors:
ELEVENLABS_API_KEY environment variable| Scenario | Command |
|----------|---------|
| Transcribe a recording | ./transcribe.sh file.mp3 |
| Meeting with multiple speakers | ./transcribe.sh meeting.mp3 --diarize |
| Live radio/podcast stream | ./transcribe.sh --url <url> |
| Voice input from user | ./transcribe.sh --mic --quiet |
| Need word timestamps | ./transcribe.sh file.mp3 --json |
development
Build voice AI agents with ElevenLabs. Use when creating voice assistants, customer service bots, interactive voice characters, or any real-time voice conversation experience.
tools
Transform the voice in an audio recording into a different target voice while preserving emotion, timing, and delivery using the ElevenLabs Voice Changer (speech-to-speech) API. Use when converting one voice to another, changing the speaker/narrator of an existing recording, dubbing a voice-over in a different voice, creating character voices from a scratch performance, anonymizing a speaker, or any "voice conversion / voice transfer / speech-to-speech" task. Make sure to use this skill whenever the user mentions voice changing, voice conversion, speech-to-speech, swapping a voice in audio, re-voicing a clip, or applying a different voice to an existing recording — even if they don't explicitly say "voice changer".
content-media
Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.
development
Remove background noise and isolate vocals/speech from audio using ElevenLabs Voice Isolator (audio isolation) API. Use when cleaning up noisy recordings, removing music or background ambience from dialogue, isolating speech from field recordings, preparing audio for transcription, extracting vocals, or any "denoise / clean up / isolate voice" task.