.agents/skills/add-voice-transcription/SKILL.md
Add voice message transcription to OmniClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.
npx skillsauth add omniaura/omniclaw add-voice-transcriptionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill adds automatic voice message transcription to OmniClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>].
Read .omniclaw/state.yaml. If voice-transcription is in applied_skills, skip to Phase 3 (Configure). The code changes are already in place.
Run the skills engine to apply this skill's code package.
If .omniclaw/ directory doesn't exist yet:
npx tsx scripts/apply-skill.ts --init
npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription
This deterministically:
src/transcription.ts (voice transcription module using OpenAI Whisper)src/channels/whatsapp.ts (isVoiceMessage check, transcribeAudioMessage call)src/channels/whatsapp.test.ts (mock + 3 test cases)openai npm dependency.env.example with OPENAI_API_KEY.omniclaw/state.yamlIf the apply reports merge conflicts, read the intent files:
modify/src/channels/whatsapp.ts.intent.md — what changed and invariants for whatsapp.tsmodify/src/channels/whatsapp.test.ts.intent.md — what changed for whatsapp.test.tsnpm test
npm run build
All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding.
If the user doesn't have an API key:
I need you to create an OpenAI API key:
- Go to https://platform.openai.com/api-keys
- Click "Create new secret key"
- Give it a name (e.g., "OmniClaw Transcription")
- Copy the key (starts with
sk-)Cost: ~$0.006 per minute of audio (~$0.003 per typical 30-second voice note)
Wait for the user to provide the key.
Add to .env:
OPENAI_API_KEY=<their-key>
Sync to container environment:
mkdir -p data/env && cp .env data/env/env
The container reads environment from data/env/env, not .env directly.
npm run build
Tell the user:
Send a voice note in any registered WhatsApp chat. The agent should receive it as
[Voice: <transcript>]and respond to its content.
tail -f logs/omniclaw.log | grep -i voice
Look for:
Transcribed voice message — successful transcription with character countOPENAI_API_KEY not set — key missing from .envOpenAI transcription failed — API error (check key validity, billing)Failed to download audio message — media download issueOPENAI_API_KEY is set in .env AND synced to data/env/envcurl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200Check logs for the specific error. Common causes:
Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.
tools
Manage stacked pull requests using Graphite CLI. Create, submit, and restack PR chains.
tools
Full GitHub operations via `gh` CLI — pull requests, issues, code review, CI/CD, search, and GraphQL API. Use for any GitHub interaction beyond basic git.
development
Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.
testing
X (Twitter) integration for OmniClaw. Post tweets, like, reply, retweet, and quote. Use for setup, testing, or troubleshooting X functionality. Triggers on "setup x", "x integration", "twitter", "post tweet", "tweet".