skills/transcribe/SKILL.md
Transcribe audio or video files to text via OpenAI Whisper API or local whisper-cpp. Standalone utility — also used by ingest-video for meeting recordings and screen captures.
npx skillsauth add RonanCodes/ronan-skills transcribeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Convert audio or video files to text using OpenAI Whisper (API or local). Outputs a timestamped transcript as markdown.
/transcribe <file-path> [--output <path>] [--format text|srt|vtt|json] [--language <code>]
/transcribe <file-path> --vault <name> # saves to vault's raw/ folder
--output — output path. Default: <input-stem>-transcript.md.--format — output format. Default: text. Options: text (plain), srt (subtitles), vtt (web subtitles), json (timestamps + segments).--language — ISO 639-1 code (e.g. en, nl, de). Auto-detected if omitted.--vault — save transcript to vaults/<vault>/raw/<filename>-transcript.md with proper frontmatter.--provider — force a provider: openai, local. Default: auto-detect (OpenAI if key present, else local).# ffmpeg is required to extract/convert audio
which ffmpeg >/dev/null 2>&1 || {
echo "Installing ffmpeg…"
brew install ffmpeg
}
# Determine transcription provider
if [ "$PROVIDER" = "local" ] || [ -z "$OPENAI_API_KEY" ]; then
# Try local whisper
if which whisper-cpp >/dev/null 2>&1; then
TRANSCRIBE_BACKEND="whisper-cpp"
elif which whisper >/dev/null 2>&1; then
TRANSCRIBE_BACKEND="whisper-python"
elif [ -n "$OPENAI_API_KEY" ]; then
TRANSCRIBE_BACKEND="openai"
else
echo "❌ No transcription backend available."
echo ""
echo "Options:"
echo " 1. Set OPENAI_API_KEY for cloud transcription (~\$0.006/min)"
echo " 2. Install local whisper: brew install whisper-cpp"
echo " 3. Install Python whisper: pip install openai-whisper"
exit 1
fi
else
TRANSCRIBE_BACKEND="openai"
fi
Convert video to audio, or normalize audio format for Whisper:
INPUT_FILE="$1"
AUDIO_FILE="/tmp/transcribe-audio-$$.wav"
# Get file info
DURATION=$(ffprobe -v quiet -show_entries format=duration -of csv=p=0 "$INPUT_FILE" 2>/dev/null | cut -d. -f1)
HAS_VIDEO=$(ffprobe -v quiet -select_streams v -show_entries stream=codec_type -of csv=p=0 "$INPUT_FILE" 2>/dev/null)
if [ -n "$HAS_VIDEO" ]; then
echo "Extracting audio from video…"
fi
# Convert to 16kHz mono WAV (Whisper's preferred format)
ffmpeg -i "$INPUT_FILE" -vn -acodec pcm_s16le -ar 16000 -ac 1 "$AUDIO_FILE" -y 2>/dev/null
if [ "$TRANSCRIBE_BACKEND" = "openai" ]; then
# Whisper API has a 25MB file size limit
FILE_SIZE=$(wc -c < "$AUDIO_FILE" | tr -d ' ')
if [ "$FILE_SIZE" -gt 26214400 ]; then
# Split into 10-minute chunks
CHUNK_DIR="/tmp/transcribe-chunks-$$"
mkdir -p "$CHUNK_DIR"
ffmpeg -i "$AUDIO_FILE" -f segment -segment_time 600 \
-c copy "$CHUNK_DIR/chunk_%03d.wav" 2>/dev/null
TRANSCRIPT=""
for chunk in "$CHUNK_DIR"/chunk_*.wav; do
CHUNK_RESULT=$(curl -s https://api.openai.com/v1/audio/transcriptions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file=@"$chunk" \
-F model=whisper-1 \
-F response_format="${FORMAT:-text}" \
${LANGUAGE:+-F language="$LANGUAGE"})
TRANSCRIPT="$TRANSCRIPT$CHUNK_RESULT\n\n"
done
rm -rf "$CHUNK_DIR"
else
TRANSCRIPT=$(curl -s https://api.openai.com/v1/audio/transcriptions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-F file=@"$AUDIO_FILE" \
-F model=whisper-1 \
-F response_format="${FORMAT:-text}" \
${LANGUAGE:+-F language="$LANGUAGE"})
fi
fi
if [ "$TRANSCRIBE_BACKEND" = "whisper-cpp" ]; then
MODEL_PATH="/usr/local/share/whisper-cpp/models/ggml-base.en.bin"
[ -f "$MODEL_PATH" ] || MODEL_PATH="$(brew --prefix whisper-cpp 2>/dev/null)/share/whisper-cpp/models/ggml-base.en.bin"
whisper-cpp -m "$MODEL_PATH" \
-f "$AUDIO_FILE" \
--output-format txt \
-of "/tmp/transcribe-result-$$" \
${LANGUAGE:+--language "$LANGUAGE"}
TRANSCRIPT=$(cat "/tmp/transcribe-result-$$.txt")
fi
if [ "$TRANSCRIBE_BACKEND" = "whisper-python" ]; then
TRANSCRIPT=$(whisper "$AUDIO_FILE" \
--model base.en \
--output_format txt \
--output_dir /tmp \
${LANGUAGE:+--language "$LANGUAGE"} 2>/dev/null)
[ -z "$TRANSCRIPT" ] && TRANSCRIPT=$(cat "/tmp/$(basename "$AUDIO_FILE" .wav).txt")
fi
OUTPUT="${OUTPUT_PATH:-$(dirname "$INPUT_FILE")/$(basename "$INPUT_FILE" | sed 's/\.[^.]*$//')-transcript.md}"
if [ -n "$VAULT_NAME" ]; then
VAULT_DIR="vaults/$VAULT_NAME"
SLUG=$(basename "$INPUT_FILE" | sed 's/\.[^.]*$//' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g')
OUTPUT="$VAULT_DIR/raw/${SLUG}-transcript.md"
cat > "$OUTPUT" <<EOF
---
source-url: "file://$(realpath "$INPUT_FILE")"
title: "$(basename "$INPUT_FILE")"
date-fetched: $(date +%Y-%m-%d)
source-type: $([ -n "$HAS_VIDEO" ] && echo "video" || echo "audio")
transcription-provider: $TRANSCRIBE_BACKEND
duration-seconds: ${DURATION:-unknown}
---
$TRANSCRIPT
EOF
else
echo "$TRANSCRIPT" > "$OUTPUT"
fi
✅ Transcription complete
Input: <input file> (<duration>s)
Provider: <openai|whisper-cpp|whisper-python>
Language: <detected or specified>
Words: <word count>
Output: <output path>
Cost: ~$<estimated> (OpenAI) or free (local)
Quick connectivity test to verify the API key works:
# Test OpenAI Whisper API
curl -s https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY" | python3 -c "
import json, sys
data = json.load(sys.stdin)
models = [m['id'] for m in data.get('data', []) if 'whisper' in m['id']]
print('✅ OpenAI API key valid. Whisper models:', ', '.join(models) if models else 'whisper-1 (default)')
" 2>/dev/null || echo "❌ OpenAI API key invalid or network error"
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| OPENAI_API_KEY | no* | — | OpenAI API key for Whisper cloud transcription |
*Not required if local whisper-cpp or whisper-python is installed.
A 90-minute meeting recording costs ~$0.54 via OpenAI API.
ingest-video — uses this as its transcription backend (Tier 1: local, Tier 2: OpenAI)ingest — could route audio files (.mp3, .wav, .m4a) through this skill/transcribe recording.mp4 --vault llm-wiki-simplicity-taskforce-partnership → wiki pageASSEMBLYAI_API_KEY)..claude/skills/ingest-video/SKILL.md — full video ingestion pipeline that uses this for transcription.claude/skills/generate-podcast/SKILL.md — produces audio that could be round-trip tested via transcribedevelopment
Close the loop on a Linear ticket when its work ships - move the status and post a deploy comment with the PR link, what shipped, and a try-it link, mentioning the collaborator. Used as the tail of /ro:linear-nightshift for every merged mirror, or manually after an ad-hoc build. Triggers on "linear update", "update the linear ticket", "mark NUT-x done", "tell eoin it shipped", "/ro:linear-update".
devops
Run a night-shift against a collaborator's Linear board. Pulls the team's Grilled tickets (/ro:linear-grill moves a ticket to Grilled once its questions are answered), VERIFIES the questions were actually answered (unanswered → bounce the ticket to the "Question for <name>" state), mirrors verified tickets to ephemeral GitHub issues with ready-for-agent, then runs the standard /ro:night-shift machinery on GitHub. Tail-calls /ro:linear-update for everything that merged + deployed. Triggers on "linear nightshift", "nightshift linear", "drain the linear board", "run the shift off linear", "/ro:linear-nightshift".
development
Grill a collaborator's Linear tickets and move every processed ticket to where it belongs. Resolves the board from the repo's .ro-linear.json, reads the collaborator's Backlog / Ready-for-agent issues, then per ticket either posts 3-5 decision-extracting questions (state moves to "Question for <name>") or confirms it build-ready (state moves to "Grilled", the gate /ro:linear-nightshift consumes); shipped-and-confirmed tickets close as Done. The async-collaborator counterpart of /ro:day-shift for people who never touch GitHub. Triggers on "grill linear", "grill eoin's tickets", "linear grill", "add questions to the linear tickets", "/ro:linear-grill".
development
--- name: about-page description: Add a standard About page to any web app, what it is, the tech stack, and an FAQ, wired into a footer link with a sticky footer. Built with Spartan + Tailwind (the canonical component layer) and falls back to semantic HTML so it ships reliably. Use whenever building, polishing, or shipping an app, every app should have one. Triggers on "add an about page", "about page", "footer about link", or as a standard step in app build/polish. category: frontend argument-h