plugins/video-editing/skills/transcription/SKILL.md
Provides Whisper transcription patterns — model selection, SRT/VTT/JSON, timing sync, diarization. Use when transcribing audio or video, or generating subtitles.
npx skillsauth add madappgang/magus transcriptionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
plugin: video-editing updated: 2026-01-20
Production-ready patterns for audio/video transcription using OpenAI Whisper.
Option 1: OpenAI Whisper (Python)
# macOS/Linux/Windows
pip install openai-whisper
# Verify
whisper --help
Option 2: whisper.cpp (C++ - faster)
# macOS
brew install whisper-cpp
# Linux - build from source
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make
# Windows - use pre-built binaries or build with cmake
Option 3: Insanely Fast Whisper (GPU accelerated)
pip install insanely-fast-whisper
| Model | Size | VRAM | Accuracy | Speed | Use Case | |-------|------|------|----------|-------|----------| | tiny | 39M | ~1GB | Low | Fastest | Quick previews | | base | 74M | ~1GB | Medium | Fast | Draft transcripts | | small | 244M | ~2GB | Good | Medium | General use | | medium | 769M | ~5GB | Better | Slow | Quality transcripts | | large-v3 | 1550M | ~10GB | Best | Slowest | Final production |
Recommendation: Start with small for speed/quality balance. Use large-v3 for final delivery.
# Basic transcription (auto-detect language)
whisper audio.mp3 --model small
# Specify language and output format
whisper audio.mp3 --model medium --language en --output_format srt
# Multiple output formats
whisper audio.mp3 --model small --output_format all
# With timestamps and word-level timing
whisper audio.mp3 --model small --word_timestamps True
# Download model first
./models/download-ggml-model.sh base.en
# Transcribe
./main -m models/ggml-base.en.bin -f audio.wav -osrt
# With timestamps
./main -m models/ggml-base.en.bin -f audio.wav -ocsv
1
00:00:01,000 --> 00:00:04,500
Hello and welcome to this video.
2
00:00:05,000 --> 00:00:08,200
Today we'll discuss video editing.
WEBVTT
00:00:01.000 --> 00:00:04.500
Hello and welcome to this video.
00:00:05.000 --> 00:00:08.200
Today we'll discuss video editing.
{
"text": "Hello and welcome to this video.",
"segments": [
{
"id": 0,
"start": 1.0,
"end": 4.5,
"text": " Hello and welcome to this video.",
"words": [
{"word": "Hello", "start": 1.0, "end": 1.3},
{"word": "and", "start": 1.4, "end": 1.5},
{"word": "welcome", "start": 1.6, "end": 2.0},
{"word": "to", "start": 2.1, "end": 2.2},
{"word": "this", "start": 2.3, "end": 2.5},
{"word": "video", "start": 2.6, "end": 3.0}
]
}
]
}
Before transcribing video, extract audio in optimal format:
# Extract audio as WAV (16kHz, mono - optimal for Whisper)
ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav
# Extract as high-quality WAV for archival
ffmpeg -i video.mp4 -vn -c:a pcm_s16le audio.wav
# Extract as compressed MP3 (smaller, still works)
ffmpeg -i video.mp4 -vn -c:a libmp3lame -q:a 2 audio.mp3
import json
def whisper_to_fcp_timing(whisper_json_path, fps=24):
"""Convert Whisper JSON output to FCP-compatible timing."""
with open(whisper_json_path) as f:
data = json.load(f)
segments = []
for seg in data.get("segments", []):
segments.append({
"start_time": seg["start"],
"end_time": seg["end"],
"start_frame": int(seg["start"] * fps),
"end_frame": int(seg["end"] * fps),
"text": seg["text"].strip(),
"words": seg.get("words", [])
})
return segments
# Get exact frame count and duration
ffprobe -v error -count_frames -select_streams v:0 \
-show_entries stream=nb_read_frames,duration,r_frame_rate \
-of json video.mp4
For multi-speaker content, use pyannote.audio:
pip install pyannote.audio
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/[email protected]")
diarization = pipeline("audio.wav")
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"{turn.start:.1f}s - {turn.end:.1f}s: {speaker}")
#!/bin/bash
# Transcribe all videos in directory
MODEL="small"
OUTPUT_DIR="transcripts"
mkdir -p "$OUTPUT_DIR"
for video in *.mp4 *.mov *.avi; do
[[ -f "$video" ]] || continue
base="${video%.*}"
# Extract audio
ffmpeg -i "$video" -ar 16000 -ac 1 -c:a pcm_s16le "/tmp/${base}.wav" -y
# Transcribe
whisper "/tmp/${base}.wav" --model "$MODEL" \
--output_format all \
--output_dir "$OUTPUT_DIR"
# Cleanup temp audio
rm "/tmp/${base}.wav"
echo "Transcribed: $video"
done
ffmpeg -i noisy_audio.wav -af "highpass=f=200,lowpass=f=3000,afftdn=nf=-25" clean_audio.wav
whisper audio.mp3 --language en --model medium
whisper audio.mp3 --initial_prompt "Technical discussion about video editing software."
whisper audio.mp3 --model large-v3 --device cuda
# Split audio into 10-minute chunks
# Transcribe each chunk
# Merge results with time offset adjustment
# Validate audio file before transcription
validate_audio() {
local file="$1"
if ffprobe -v error -select_streams a:0 -show_entries stream=codec_type -of csv=p=0 "$file" 2>/dev/null | grep -q "audio"; then
return 0
else
echo "Error: No audio stream found in $file"
return 1
fi
}
# Check Whisper installation
check_whisper() {
if command -v whisper &> /dev/null; then
echo "Whisper available"
return 0
else
echo "Error: Whisper not installed. Run: pip install openai-whisper"
return 1
fi
}
testing
A test skill for validation testing. Use when testing skill parsing and validation logic.
tools
--- name: bad-skill description: This skill has invalid YAML in frontmatter allowed-tools: [invalid, array, syntax prerequisites: not-an-array --- # Bad Skill This skill has malformed frontmatter that should fail parsing. The YAML has: - Unclosed array bracket - Wrong type for prerequisites (should be array, not string)
development
Sync model aliases from the curated Firebase database. Fetches default model assignments, short aliases, team compositions, and known model metadata from the claudish API. Run this to get fresh model recommendations.
tools
Release one or more Magus plugins to the distribution repos (magus, magus-alpha, magus-marketing). Handles version inference from git history, marketplace.json updates, tagging, and force-push to lean dist repos. Use whenever the user says "release kanban", "release the dev plugin", "cut a new version of gtd", "bump kanban to 1.7", or hands you a batch like "release kanban and gtd". Also use for multi-plugin releases and for checking what a release would contain before committing.