Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

alphaonedev/openai-whisper

Name: openai-whisper
Author: alphaonedev

skills/community/openai-whisper/SKILL.md

npx skillsauth add alphaonedev/openclaw-graph openai-whisper

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Purpose

This skill enables local audio transcription using the OpenAI Whisper model, processing files directly on the device for privacy and speed. It's ideal for converting speech to text with features like multi-language support, word-level timestamps, and speaker diarization.

When to Use

Use this skill for tasks involving local audio files, such as transcribing interviews, podcasts, or meetings, when you need offline processing to avoid network dependencies or data privacy concerns. Apply it in workflows requiring accurate timestamps or speaker identification, like content creation or analysis.

Key Capabilities

Transcription: Converts audio to text in over 90 languages; specify language via --language flag (e.g., --language en for English).
Timestamps: Outputs word-level timings; enable with --word-level for detailed segments (e.g., each word's start/end in seconds).
Speaker Diarization: Identifies speakers in audio; requires additional setup like Pyannote library; use --diarize flag if configured.
Multi-Format Support: Handles input formats like MP3, WAV, or FLAC; outputs JSON or SRT for easy parsing.
Model Selection: Choose from models like tiny, base, small, medium, or large; larger models improve accuracy but increase compute needs (e.g., --model medium).

Usage Patterns

Always run Whisper in a Python environment with the library installed. For basic transcription, load an audio file and specify options via CLI. Use it in pipelines by piping output to other tools, like text analysis skills. For speaker diarization, ensure dependencies are installed first. Example 1: Transcribe a short audio clip for note-taking. Example 2: Process a multi-speaker recording for meeting summaries.

To accomplish transcription:

Install dependencies: Run pip install git+https://github.com/openai/whisper.git pyannote.audio in your environment.
Load and process audio: Use the Whisper CLI to handle files directly.
Handle outputs: Parse JSON results for timestamps and integrate into larger scripts.

Common Commands/API

Use the Whisper CLI for quick tasks. For API integration, call the Python library directly.

CLI Command for Basic Transcription:
whisper path/to/audio.mp3 --model base --language en --output_format json
This transcribes the file, saves output as JSON, and includes timestamps.
CLI with Timestamps and Diarization:
whisper path/to/audio.wav --model medium --word-level --diarize
Generates word-level timings and speaker labels; ensure diarization is configured via Pyannote.

Python API Snippet:

import whisper  
model = whisper.load_model("base")  
result = model.transcribe("path/to/audio.mp3", language="en")  
print(result["text"])  # Outputs transcribed text

Config Format: Whisper uses a simple JSON config for batch processing; example:
{ "model": "small", "language": "es", "task": "transcribe" }
Pass via --config config.json in CLI. No auth keys needed for local use; if extending to cloud services, use env vars like $WHISPER_API_KEY for external APIs.

To use in code: Import the library, load the model, and call transcribe() with parameters like fp for file path and task for mode (e.g., task="transcribe").

Integration Notes

Integrate Whisper into AI workflows by wrapping it in Python scripts or CLI calls. For example, chain with text processing tools: Pipe JSON output to a sentiment analysis skill. Use in Jupyter notebooks for interactive transcription. Ensure your environment has GPU support for faster processing (e.g., via CUDA). To embed in larger applications, handle file I/O explicitly: Read audio files using soundfile library, then pass to Whisper. For multi-step tasks, use subprocess to call Whisper CLI from other scripts, capturing stdout for JSON parsing.

Error Handling

Common errors include missing dependencies, invalid audio formats, or out-of-memory issues with large models. To handle:

File Not Found: Check file paths before running; use try-except in Python:

try:  
    result = model.transcribe("path/to/audio.mp3")  
except FileNotFoundError:  
    print("Audio file missing; verify path.")

Model Load Failures: Ensure sufficient RAM; fallback to smaller models if needed (e.g., switch --model large to --model base).
Diarization Errors: If Pyannote fails, verify installation with pip check pyannote.audio; log errors and retry with basic transcription.
General Pattern: Wrap CLI calls in scripts using subprocess and check return codes; for API, catch exceptions like whisper.utils.DecodeError and retry with different parameters.

Always log detailed errors (e.g., via Python's logging module) and provide user-friendly messages, such as "Error: Audio too long for selected model; try a smaller one."

Graph Relationships

Relates to: "audio-processing" cluster for upstream tasks like noise reduction.
Connected to: "text-analysis" skills for downstream processing of transcription outputs.
Links with: "openai-gpt" for enhancing transcripts with AI summaries.

alphaonedev/openai-whisper

skills/community/openai-whisper/SKILL.md

Local Whisper: audio transcription, multi-language, word timestamps, speaker diarization

2 stars

data-ai

Updated Apr 3, 2026

$ install --global

skillsauth

npx skillsauth add alphaonedev/openclaw-graph openai-whisper

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 3:21 PM1.8s1 file scanned

SKILL.md

name:: openai-whisper
cluster:: community
description:: Local Whisper: audio transcription, multi-language, word timestamps, speaker diarization
tags:: ["whisper","transcription","audio","local"]
dependencies:: []
composes:: []
similar_to:: []
called_by:: []
authorization_required:: false
scope:: general
model_hint:: claude-sonnet
embedding_hint:: whisper local transcription audio speech to text timestamps

Purpose

When to Use

Key Capabilities

Transcription: Converts audio to text in over 90 languages; specify language via --language flag (e.g., --language en for English).
Timestamps: Outputs word-level timings; enable with --word-level for detailed segments (e.g., each word's start/end in seconds).
Speaker Diarization: Identifies speakers in audio; requires additional setup like Pyannote library; use --diarize flag if configured.
Multi-Format Support: Handles input formats like MP3, WAV, or FLAC; outputs JSON or SRT for easy parsing.
Model Selection: Choose from models like tiny, base, small, medium, or large; larger models improve accuracy but increase compute needs (e.g., --model medium).

Usage Patterns

To accomplish transcription:

Install dependencies: Run pip install git+https://github.com/openai/whisper.git pyannote.audio in your environment.
Load and process audio: Use the Whisper CLI to handle files directly.
Handle outputs: Parse JSON results for timestamps and integrate into larger scripts.

Common Commands/API

Use the Whisper CLI for quick tasks. For API integration, call the Python library directly.

CLI Command for Basic Transcription:
whisper path/to/audio.mp3 --model base --language en --output_format json
This transcribes the file, saves output as JSON, and includes timestamps.
CLI with Timestamps and Diarization:
whisper path/to/audio.wav --model medium --word-level --diarize
Generates word-level timings and speaker labels; ensure diarization is configured via Pyannote.

Python API Snippet:

import whisper  
model = whisper.load_model("base")  
result = model.transcribe("path/to/audio.mp3", language="en")  
print(result["text"])  # Outputs transcribed text

Config Format: Whisper uses a simple JSON config for batch processing; example:
{ "model": "small", "language": "es", "task": "transcribe" }
Pass via --config config.json in CLI. No auth keys needed for local use; if extending to cloud services, use env vars like $WHISPER_API_KEY for external APIs.

To use in code: Import the library, load the model, and call transcribe() with parameters like fp for file path and task for mode (e.g., task="transcribe").

Integration Notes

Error Handling

Common errors include missing dependencies, invalid audio formats, or out-of-memory issues with large models. To handle:

File Not Found: Check file paths before running; use try-except in Python:

try:  
    result = model.transcribe("path/to/audio.mp3")  
except FileNotFoundError:  
    print("Audio file missing; verify path.")

Model Load Failures: Ensure sufficient RAM; fallback to smaller models if needed (e.g., switch --model large to --model base).
Diarization Errors: If Pyannote fails, verify installation with pip check pyannote.audio; log errors and retry with basic transcription.
General Pattern: Wrap CLI calls in scripts using subprocess and check return codes; for API, catch exceptions like whisper.utils.DecodeError and retry with different parameters.

Always log detailed errors (e.g., via Python's logging module) and provide user-friendly messages, such as "Error: Audio too long for selected model; try a smaller one."

Graph Relationships

Relates to: "audio-processing" cluster for upstream tasks like noise reduction.
Connected to: "text-analysis" skills for downstream processing of transcription outputs.
Links with: "openai-gpt" for enhancing transcripts with AI summaries.

Related Skills

alphaonedev/web

tools

VerifiedTrustedCommunity

Root web development: project structure, tooling selection, deployment decisions

2SKILL.mdUpdated Apr 3, 2026

alphaonedev/web-wasm

development

VerifiedTrustedCommunity

WebAssembly: Rust/Go/C to WASM, wasm-bindgen, Emscripten, WASM Component Model

2SKILL.mdUpdated Apr 3, 2026

alphaonedev/web-vue

development

VerifiedTrustedCommunity

Vue 3: Composition API script setup, Pinia, Vue Router 4, SFCs, Vite, Nuxt 3

2SKILL.mdUpdated Apr 3, 2026

alphaonedev/web-tailwind

tools

VerifiedTrustedCommunity

Tailwind CSS 4: utility classes, config, JIT, arbitrary values, darkMode, plugins, shadcn/ui

2SKILL.mdUpdated Apr 3, 2026

alphaonedev/web-tailwind

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/alphaonedev/openclaw-graph.git

# Copy into Claude Code skills folder (global)
cp -r openclaw-graph/skills/community/openai-whisper ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

alphaonedev/openclaw-graph

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT