Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

omniaura/add-voice-transcription

Name: add-voice-transcription
Author: omniaura

.agents/skills/add-voice-transcription/SKILL.md

npx skillsauth add omniaura/omniclaw add-voice-transcription

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Add Voice Transcription

This skill adds automatic voice message transcription to OmniClaw's WhatsApp channel using OpenAI's Whisper API. When a voice note arrives, it is downloaded, transcribed, and delivered to the agent as [Voice: <transcript>].

Phase 1: Pre-flight

Check if already applied

Read .omniclaw/state.yaml. If voice-transcription is in applied_skills, skip to Phase 3 (Configure). The code changes are already in place.

Ask the user

Do they have an OpenAI API key? If yes, collect it now. If no, they'll need to create one at https://platform.openai.com/api-keys.

Phase 2: Apply Code Changes

Run the skills engine to apply this skill's code package.

Initialize skills system (if needed)

If .omniclaw/ directory doesn't exist yet:

npx tsx scripts/apply-skill.ts --init

Apply the skill

npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription

This deterministically:

Adds src/transcription.ts (voice transcription module using OpenAI Whisper)
Three-way merges voice handling into src/channels/whatsapp.ts (isVoiceMessage check, transcribeAudioMessage call)
Three-way merges transcription tests into src/channels/whatsapp.test.ts (mock + 3 test cases)
Installs the openai npm dependency
Updates .env.example with OPENAI_API_KEY
Records the application in .omniclaw/state.yaml

If the apply reports merge conflicts, read the intent files:

modify/src/channels/whatsapp.ts.intent.md — what changed and invariants for whatsapp.ts
modify/src/channels/whatsapp.test.ts.intent.md — what changed for whatsapp.test.ts

Validate code changes

npm test
npm run build

All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding.

Phase 3: Configure

Get OpenAI API key (if needed)

If the user doesn't have an API key:

I need you to create an OpenAI API key:

Go to https://platform.openai.com/api-keys

Click "Create new secret key"

Give it a name (e.g., "OmniClaw Transcription")

Copy the key (starts with sk-)

Cost: ~$0.006 per minute of audio (~$0.003 per typical 30-second voice note)

Wait for the user to provide the key.

Add to environment

Add to .env:

OPENAI_API_KEY=<their-key>

Sync to container environment:

mkdir -p data/env && cp .env data/env/env

The container reads environment from data/env/env, not .env directly.

Build and restart

npm run build

Phase 4: Verify

Test with a voice note

Tell the user:

Send a voice note in any registered WhatsApp chat. The agent should receive it as [Voice: <transcript>] and respond to its content.

Check logs if needed

tail -f logs/omniclaw.log | grep -i voice

Look for:

Transcribed voice message — successful transcription with character count
OPENAI_API_KEY not set — key missing from .env
OpenAI transcription failed — API error (check key validity, billing)
Failed to download audio message — media download issue

Troubleshooting

Voice notes show "[Voice Message - transcription unavailable]"

Check OPENAI_API_KEY is set in .env AND synced to data/env/env
Verify key works: curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200
Check OpenAI billing — Whisper requires a funded account

Voice notes show "[Voice Message - transcription failed]"

Check logs for the specific error. Common causes:

Network timeout — transient, will work on next message
Invalid API key — regenerate at https://platform.openai.com/api-keys
Rate limiting — wait and retry

Agent doesn't respond to voice notes

Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.

omniaura/add-voice-transcription

.agents/skills/add-voice-transcription/SKILL.md

Add voice message transcription to OmniClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.

11 stars

development

Updated Apr 9, 2026

$ install --global

skillsauth

npx skillsauth add omniaura/omniclaw add-voice-transcription

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 9, 2026, 2:29 AM5.5s8 files scanned

SKILL.md

name:: add-voice-transcription
description:: Add voice message transcription to OmniClaw using OpenAI's Whisper API. Automatically transcribes WhatsApp voice notes so the agent can read and respond to them.

Add Voice Transcription

Phase 1: Pre-flight

Check if already applied

Read .omniclaw/state.yaml. If voice-transcription is in applied_skills, skip to Phase 3 (Configure). The code changes are already in place.

Ask the user

Do they have an OpenAI API key? If yes, collect it now. If no, they'll need to create one at https://platform.openai.com/api-keys.

Phase 2: Apply Code Changes

Run the skills engine to apply this skill's code package.

Initialize skills system (if needed)

If .omniclaw/ directory doesn't exist yet:

npx tsx scripts/apply-skill.ts --init

Apply the skill

npx tsx scripts/apply-skill.ts .claude/skills/add-voice-transcription

This deterministically:

Adds src/transcription.ts (voice transcription module using OpenAI Whisper)
Three-way merges voice handling into src/channels/whatsapp.ts (isVoiceMessage check, transcribeAudioMessage call)
Three-way merges transcription tests into src/channels/whatsapp.test.ts (mock + 3 test cases)
Installs the openai npm dependency
Updates .env.example with OPENAI_API_KEY
Records the application in .omniclaw/state.yaml

If the apply reports merge conflicts, read the intent files:

modify/src/channels/whatsapp.ts.intent.md — what changed and invariants for whatsapp.ts
modify/src/channels/whatsapp.test.ts.intent.md — what changed for whatsapp.test.ts

Validate code changes

npm test
npm run build

All tests must pass (including the 3 new voice transcription tests) and build must be clean before proceeding.

Phase 3: Configure

Get OpenAI API key (if needed)

If the user doesn't have an API key:

I need you to create an OpenAI API key:

Go to https://platform.openai.com/api-keys

Click "Create new secret key"

Give it a name (e.g., "OmniClaw Transcription")

Copy the key (starts with sk-)

Cost: ~$0.006 per minute of audio (~$0.003 per typical 30-second voice note)

Wait for the user to provide the key.

Add to environment

Add to .env:

OPENAI_API_KEY=<their-key>

Sync to container environment:

mkdir -p data/env && cp .env data/env/env

The container reads environment from data/env/env, not .env directly.

Build and restart

npm run build

Phase 4: Verify

Test with a voice note

Tell the user:

Send a voice note in any registered WhatsApp chat. The agent should receive it as [Voice: <transcript>] and respond to its content.

Check logs if needed

tail -f logs/omniclaw.log | grep -i voice

Look for:

Transcribed voice message — successful transcription with character count
OPENAI_API_KEY not set — key missing from .env
OpenAI transcription failed — API error (check key validity, billing)
Failed to download audio message — media download issue

Troubleshooting

Voice notes show "[Voice Message - transcription unavailable]"

Check OPENAI_API_KEY is set in .env AND synced to data/env/env
Verify key works: curl -s https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY" | head -c 200
Check OpenAI billing — Whisper requires a funded account

Voice notes show "[Voice Message - transcription failed]"

Check logs for the specific error. Common causes:

Network timeout — transient, will work on next message
Invalid API key — regenerate at https://platform.openai.com/api-keys
Rate limiting — wait and retry

Agent doesn't respond to voice notes

Verify the chat is registered and the agent is running. Voice transcription only runs for registered groups.

Related Skills

omniaura/graphite

tools

VerifiedTrustedCommunity

Manage stacked pull requests using Graphite CLI. Create, submit, and restack PR chains.

11SKILL.mdUpdated Apr 9, 2026

omniaura/github

tools

VerifiedTrustedCommunity

Full GitHub operations via `gh` CLI — pull requests, issues, code review, CI/CD, search, and GraphQL API. Use for any GitHub interaction beyond basic git.

11SKILL.mdUpdated Apr 9, 2026

omniaura/agent-browser

development

VerifiedTrustedCommunity

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test web pages. Use whenever a browser would be useful, not just when the user explicitly asks.

11SKILL.mdUpdated Apr 9, 2026

omniaura/agent-browser

omniaura/x-integration

testing

VerifiedTrustedCommunity

X (Twitter) integration for OmniClaw. Post tweets, like, reply, retweet, and quote. Use for setup, testing, or troubleshooting X functionality. Triggers on "setup x", "x integration", "twitter", "post tweet", "tweet".

11SKILL.mdUpdated Apr 9, 2026

omniaura/x-integration

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/omniaura/omniclaw.git

# Copy into Claude Code skills folder (global)
cp -r omniclaw/.agents/skills/add-voice-transcription ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

omniaura/omniclaw

11 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT