skills/community/openai-whisper/SKILL.md
Local Whisper: audio transcription, multi-language, word timestamps, speaker diarization
npx skillsauth add alphaonedev/openclaw-graph openai-whisperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables local audio transcription using the OpenAI Whisper model, processing files directly on the device for privacy and speed. It's ideal for converting speech to text with features like multi-language support, word-level timestamps, and speaker diarization.
Use this skill for tasks involving local audio files, such as transcribing interviews, podcasts, or meetings, when you need offline processing to avoid network dependencies or data privacy concerns. Apply it in workflows requiring accurate timestamps or speaker identification, like content creation or analysis.
--language flag (e.g., --language en for English).--word-level for detailed segments (e.g., each word's start/end in seconds).--diarize flag if configured.--model medium).Always run Whisper in a Python environment with the library installed. For basic transcription, load an audio file and specify options via CLI. Use it in pipelines by piping output to other tools, like text analysis skills. For speaker diarization, ensure dependencies are installed first. Example 1: Transcribe a short audio clip for note-taking. Example 2: Process a multi-speaker recording for meeting summaries.
To accomplish transcription:
pip install git+https://github.com/openai/whisper.git pyannote.audio in your environment.Use the Whisper CLI for quick tasks. For API integration, call the Python library directly.
CLI Command for Basic Transcription:
whisper path/to/audio.mp3 --model base --language en --output_format json
This transcribes the file, saves output as JSON, and includes timestamps.
CLI with Timestamps and Diarization:
whisper path/to/audio.wav --model medium --word-level --diarize
Generates word-level timings and speaker labels; ensure diarization is configured via Pyannote.
Python API Snippet:
import whisper
model = whisper.load_model("base")
result = model.transcribe("path/to/audio.mp3", language="en")
print(result["text"]) # Outputs transcribed text
Config Format: Whisper uses a simple JSON config for batch processing; example:
{ "model": "small", "language": "es", "task": "transcribe" }
Pass via --config config.json in CLI. No auth keys needed for local use; if extending to cloud services, use env vars like $WHISPER_API_KEY for external APIs.
To use in code: Import the library, load the model, and call transcribe() with parameters like fp for file path and task for mode (e.g., task="transcribe").
Integrate Whisper into AI workflows by wrapping it in Python scripts or CLI calls. For example, chain with text processing tools: Pipe JSON output to a sentiment analysis skill. Use in Jupyter notebooks for interactive transcription. Ensure your environment has GPU support for faster processing (e.g., via CUDA). To embed in larger applications, handle file I/O explicitly: Read audio files using soundfile library, then pass to Whisper. For multi-step tasks, use subprocess to call Whisper CLI from other scripts, capturing stdout for JSON parsing.
Common errors include missing dependencies, invalid audio formats, or out-of-memory issues with large models. To handle:
try:
result = model.transcribe("path/to/audio.mp3")
except FileNotFoundError:
print("Audio file missing; verify path.")
--model large to --model base).pip check pyannote.audio; log errors and retry with basic transcription.subprocess and check return codes; for API, catch exceptions like whisper.utils.DecodeError and retry with different parameters.Always log detailed errors (e.g., via Python's logging module) and provide user-friendly messages, such as "Error: Audio too long for selected model; try a smaller one."
tools
Root web development: project structure, tooling selection, deployment decisions
development
WebAssembly: Rust/Go/C to WASM, wasm-bindgen, Emscripten, WASM Component Model
development
Vue 3: Composition API script setup, Pinia, Vue Router 4, SFCs, Vite, Nuxt 3
tools
Tailwind CSS 4: utility classes, config, JIT, arbitrary values, darkMode, plugins, shadcn/ui