skills/speech/SKILL.md
--- id: speech name: Speech description: Voice-to-text (Whisper) and text-to-voice (11Labs). Use when transcribing audio, converting speech to text, or generating spoken audio from text. Commands: transcribe, synthesize. --- # Speech Voice-to-text via **Whisper** (OpenAI) and text-to-voice via **11Labs**. Use when the user wants to transcribe audio, convert speech to text, or generate spoken audio from text. Call **run_skill** with **skill: "speech"**. Set **command** or **arguments.action**
npx skillsauth add bishwashere/cowcode skills/speechInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Voice-to-text via Whisper (OpenAI) and text-to-voice via 11Labs. Use when the user wants to transcribe audio, convert speech to text, or generate spoken audio from text.
Call run_skill with skill: "speech". Set command or arguments.action to the operation.
whisper-1 or gpt-4o-transcribe), arguments.language (ISO code, e.g. en).audio (required), model, languagetext (required), voiceId, outputPathtext (required) - the reply to speak; the message will be sent as voiceWhen the user sends a voice message, the bot transcribes it with Whisper and feeds the text to the LLM. The system adds a hint so you reply using reply_as_voice; your reply is then sent as a voice message. So voice always goes through the speech skill: transcribe for input, reply_as_voice for the reply.
Voice replies must be spoken-word friendly. The listener hears the reply as audio, so:
speech_transcribe
description: Voice to text. Pass path to audio file (mp3, wav, etc.).
parameters:
audio: string
model: string
language: string
speech_synthesize
description: Text to voice. Pass text and optional voiceId, outputPath.
parameters:
text: string
voiceId: string
outputPath: string
speech_reply_as_voice
description: Send the reply as a voice message. Pass the exact reply text.
parameters:
text: string
Speech uses a separate setup from the LLM cloud provider:
LLM_1_API_KEY) or enter a separate Whisper/OpenAI key (stored as SPEECH_WHISPER_API_KEY). Config: skills.speech.whisper.apiKey (env var name)..env as ELEVEN_LABS_API_KEY. Config: skills.speech.elevenLabs.apiKey (env var name).Re-run setup to add or change speech keys.
testing
Bridge conversation to dashboard Projects and Missions — list configured projects, register new ones with setup details, health-check, propose tasks, create missions after user approval, log progress, and update task status. Use when the user wants to work on, track, or manage a project.
testing
Scan linked teammates and score who best fits a user request. Returns ranked agents with relevance scores, reasoning, and a recommendation (delegate, handle-in-main, adapt, or create-new). Call when the topic does not clearly match your active skills or before deciding whether to delegate.
tools
Gmail integration. List, read, search, send, reply, archive, trash, mark-read emails. Natural language commands like "clear my inbox" or "summarize unread". Requires gog CLI authenticated with Gmail.
documentation
GitHub integration. Read repos, list/read issues and PRs, create branches, post comments, create PRs. Requires GitHub token in ~/.pasture/secrets.json or GITHUB_TOKEN env var.