skills/openrouter-stt/SKILL.md
Transcribe speech to text using OpenRouter's speech-to-text API. Use when the user asks to transcribe audio, convert speech to text, extract a transcript from a recording or meeting, caption a video's audio, or mentions STT, speech-to-text, ASR, or transcription.
npx skillsauth add openrouterteam/skills openrouter-sttInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Transcribe audio via POST /api/v1/audio/transcriptions using curl. Requires OPENROUTER_API_KEY (get one at https://openrouter.ai/keys). If unset, stop and ask.
This endpoint is not OpenAI-compatible. The body is JSON with base64 audio under input_audio: { data, format } — not multipart/form-data with a file field the way OpenAI's /v1/audio/transcriptions works. Do not point the OpenAI SDK at this endpoint; it will send the wrong shape. Use curl, fetch, or requests directly.
Both request and response are JSON. The response body carries:
text — the transcript.usage — always includes cost. Providers additionally report either seconds of audio billed or a token breakdown (total_tokens, input_tokens, output_tokens), depending on how they price the request. Don't assume both are present.Sample response (duration-priced provider, e.g. google/chirp-3):
{
"text": "I used to rule the world.",
"usage": {
"seconds": 20,
"cost": 0.005333
}
}
Sample response (token-priced provider):
{
"text": "Hello, this is a test of speech-to-text transcription.",
"usage": {
"total_tokens": 113,
"input_tokens": 83,
"output_tokens": 30,
"cost": 0.000508
}
}
#!/usr/bin/env bash
set -euo pipefail
MODEL="google/chirp-3"
FORMAT="wav" # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)
audio_b64=$(base64 < "$AUDIO" | tr -d '\n')
jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
'{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"
# --data-binary @file keeps the base64 payload off argv (avoids E2BIG / ARG_MAX).
http_code=$(curl -sS -X POST https://openrouter.ai/api/v1/audio/transcriptions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
--output "$BODY" \
-w '%{http_code}' \
--data-binary @"$PAYLOAD")
if [[ "$http_code" != "200" ]]; then
echo "STT failed (HTTP $http_code):" >&2
cat "$BODY" >&2
rm -f "$BODY" "$PAYLOAD"
exit 1
fi
jq -r '.text' "$BODY"
rm -f "$BODY" "$PAYLOAD"
Filter the models endpoint by output modality to list transcription models.
curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
| jq '.data[] | {id, name, pricing}'
Models are provider-namespaced — use the full slug (google/chirp-3, openai/whisper-1, openai/whisper-large-v3), not the short name.
| Field | Required | Notes |
| -------------------- | -------- | --------------------------------------------------------------------------------------------------------- |
| model | yes | Full model slug from /api/v1/models?output_modalities=transcription. |
| input_audio.data | yes | Base64-encoded raw audio bytes. Not a data URI — just the base64 payload, no data:audio/...;base64, prefix. |
| input_audio.format | yes | wav, mp3, flac, m4a, ogg, webm, or aac. Must match the actual bytes. Support varies by provider. |
| language | no | ISO-639-1 code (en, ja, fr). Auto-detected if omitted. |
| temperature | no | 0–1. Lower is more deterministic. |
| provider | no | Provider passthrough — see below. |
wav / flac — uncompressed or lossless. Highest quality; largest uploads.mp3 / m4a / aac — compressed. Smaller payloads, which matters because base64 inflates bytes by ~33% on top of whatever the file already weighs.webm / ogg — typical for browser recordings (MediaRecorder).The format field must match the actual container/codec of the bytes. A file saved as .wav that is actually mp3 will be rejected or mis-decoded. When in doubt, confirm with ffprobe <file>.
Provider passthrough goes under provider.options.<slug> and is only forwarded when that provider handles the request. Example — Groq's prompt for vocabulary hinting:
{
"model": "openai/whisper-large-v3",
"input_audio": { "data": "UklGRiQA...", "format": "wav" },
"provider": {
"options": {
"groq": {
"prompt": "Expected vocabulary: OpenRouter, API, transcription"
}
}
}
}
Options keyed by provider slug are forwarded only when that provider matches; other keys are ignored. Check each provider's upstream docs for available passthrough keys.
import fs from "fs";
const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");
const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "google/chirp-3",
input_audio: { data, format: "wav" },
}),
});
if (!res.ok) {
throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}
const result = await res.json();
console.log(result.text);
import base64
import os
import requests
with open("audio.wav", "rb") as f:
data = base64.b64encode(f.read()).decode("utf-8")
res = requests.post(
"https://openrouter.ai/api/v1/audio/transcriptions",
headers={
"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
"Content-Type": "application/json",
},
json={
"model": "google/chirp-3",
"input_audio": {"data": data, "format": "wav"},
},
)
if not res.ok:
raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")
print(res.json()["text"])
Garbled or empty text — the format field probably doesn't match the actual bytes, or the audio is silent/corrupted. Confirm with ffprobe audio.wav.
400 with "Invalid base64" or silent failure — data must be just base64, not a data URI (data:audio/wav;base64,...). Strip the prefix if you copied it from a browser FileReader.
400 with a ZodError — a required field is missing or the wrong type. The body looks like {"success":false,"error":{"name":"ZodError","message":"[...]"}} — the nested message JSON string names the bad path (commonly input_audio.data or input_audio.format).
413 / request too large — base64 inflates bytes by ~33%, so a large raw file becomes an even larger JSON payload. Use a smaller source file (compressed format, lower sample rate, or trimmed clip).
Model not found — use the full slug from /api/v1/models?output_modalities=transcription (google/chirp-3, not chirp-3).
development
Answer natural-language questions about a user's OpenRouter usage data — spend, request volume, model breakdown, latency, token usage, and cost optimization. Use when the user asks about their API usage, billing, costs, top models, traffic patterns, or wants to optimize their OpenRouter spend.
data-ai
Discover the OpenRouter analytics schema — available metrics, dimensions, filter operators, and granularities. Use when you need to know what analytics data is queryable, what dimensions you can break down by, or how to map a user's question to the right metric/dimension combination.
development
Construct and execute analytics queries against the OpenRouter API — full parameter reference for metrics, dimensions, filters, time ranges, ordering, and pagination. Use when building or debugging an analytics query, understanding the request/response shape, or handling query errors.
development
Retrieve detailed metadata and stored content for individual OpenRouter generations. Use when the user wants to inspect a specific request — its cost, latency, token usage, provider routing, or the actual prompt/completion text — or is debugging a failed or unexpected generation.