7 skills
tools
Transcribe a video clip using Gemini to get timestamped segments for captions
testing
ASR with ~30ms timestamp precision using Qwen3-ASR + ForcedAligner
data-ai
--- id: speech name: Speech description: Voice-to-text (Whisper) and text-to-voice (11Labs). Use when transcribing audio, converting speech to text, or generating spoken audio from text. Commands: transcribe, synthesize. --- # Speech Voice-to-text via **Whisper** (OpenAI) and text-to-voice via **11Labs**. Use when the user wants to transcribe audio, convert speech to text, or generate spoken audio from text. Call **run_skill** with **skill: "speech"**. Set **command** or **arguments.action**
content-media
Summarize or extract text/transcripts from URLs, podcasts, and local files (great fallback for “transcribe this YouTube/video”).
tools
Local speech-to-text with the Whisper CLI (no API key).
tools
ElevenLabs text-to-speech with mac-style say UX.
content-media
Audio processing utilities - noise reduction, normalization, enhancement