/SKILL.md
# voice — TTS & STT tool for AI agents `voice` speaks text aloud using Kokoro TTS and transcribes speech using Moonshine STT on Apple Silicon. Use it to talk to your user, listen for their response, or run a full voice conversation loop. ## Quick reference ### Speak (TTS) ```bash # Speak text (backward compatible — no subcommand needed) voice Hello, I finished the task. # Explicit say subcommand with options voice say -v am_michael "Switching to a male voice." # Speak from a pipe echo "Bui
npx skillsauth add rgbkrk/voicers voicersInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
voice speaks text aloud using Kokoro TTS and transcribes speech using Moonshine STT on Apple Silicon. Use it to talk to your user, listen for their response, or run a full voice conversation loop.
# Speak text (backward compatible — no subcommand needed)
voice Hello, I finished the task.
# Explicit say subcommand with options
voice say -v am_michael "Switching to a male voice."
# Speak from a pipe
echo "Build complete." | voice say
# Read a file aloud (strip markdown first)
voice say --markdown -f README.md
# Save to WAV instead of playing
voice say -o result.wav "Here is your audio."
# Precise pronunciation via IPA phonemes
voice say --phonemes "həlˈO wˈɜɹld"
# Speak text, then immediately listen for a response
voice converse "How are you today?"
# With voice and speed options
voice converse -v am_michael -s 1.2 "What do you think about that?"
# Record from mic, transcribe on Enter/Ctrl+C
voice listen
# Continuous mode — transcribe segments as you speak, split on silence
voice listen --continuous
# Transcribe a WAV file
voice transcribe recording.wav
# Start the server (for programmatic control)
voice serve -v am_michael
# Speak
→ {"jsonrpc":"2.0","method":"speak","params":{"text":"Hello"},"id":1}
← {"jsonrpc":"2.0","result":{"duration_ms":1800,"chunks":1},"id":1}
# Listen (ding plays, records, auto-stops on silence)
→ {"jsonrpc":"2.0","method":"listen","id":2}
← {"jsonrpc":"2.0","result":{"text":"I heard you","tokens":4,"duration_ms":3200},"id":2}
# Cancel current playback or recording
→ {"jsonrpc":"2.0","method":"cancel","id":3}
# Other methods: set_voice, set_speed, list_voices, ping
voice say to read back docs, errors, or summariesvoice listen to capture a spoken response from the uservoice converse to speak then listen in one shot, or voice serve for programmatic controlvoice transcribe to convert audio files to text-q for quiet mode — suppresses phonemes and progress, only errors printvoice automatically chunks at ~510 phonemes and streams playback--sub word=replacement to fix names: voice say --sub kubectl=cube-cuddle "Restarting kubectl".voice-subs file in the project root is auto-discovered for persistent fixes/slashes/ for raw phoneme overrides: Kokoro=/kˈOkəɹO/STT_MODEL=UsefulSensors/moonshine-tiny for faster (but less accurate) transcriptionmoonshine-base (61M params, ~50× real-time on Apple Silicon)voice serve loads the TTS model at startup; STT model loads lazily on first listencancel interrupts the current speak or listen mid-operationspeak supports per-request voice and speed overrides without changing defaultslisten params are tunable: noise_multiplier, calibration_ms, silence_timeout_msid) are fire-and-forget — no response returned| Command | What it does |
|---------|-------------|
| voice <text> | Speak text (implicit say, backward compatible) |
| voice say | Speak text with full TTS options |
| voice converse | Speak text, then listen for a response |
| voice listen | Record from mic, transcribe once |
| voice listen --continuous | Record and transcribe segments continuously |
| voice transcribe <file> | Transcribe a WAV file |
| voice serve | Start JSON-RPC server on stdin/stdout |
af_heart (default), af_bella, af_sarah, af_sky, am_michael, am_adam, bf_emma
git clone https://github.com/rgbkrk/voice.git
cd voice
cargo install --path crates/voice-cli
Requires macOS with Apple Silicon, Git LFS, and Rust 1.85+. TTS model weights download on first voice say (~312MB, cached). STT model weights download on first voice listen (~246MB, cached).
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------
tools
A CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.