Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

openrouterteam/openrouter-stt

Name: openrouter-stt
Author: openrouterteam

skills/openrouter-stt/SKILL.md

npx skillsauth add openrouterteam/skills openrouter-stt

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

OpenRouter Speech-to-Text

Transcribe audio via POST /api/v1/audio/transcriptions using curl. Requires OPENROUTER_API_KEY (get one at https://openrouter.ai/keys). If unset, stop and ask.

This endpoint is not OpenAI-compatible. The body is JSON with base64 audio under input_audio: { data, format } — not multipart/form-data with a file field the way OpenAI's /v1/audio/transcriptions works. Do not point the OpenAI SDK at this endpoint; it will send the wrong shape. Use curl, fetch, or requests directly.

One call, JSON back

Both request and response are JSON. The response body carries:

text — the transcript.
usage — always includes cost. Providers additionally report either seconds of audio billed or a token breakdown (total_tokens, input_tokens, output_tokens), depending on how they price the request. Don't assume both are present.

Sample response (duration-priced provider, e.g. google/chirp-3):

{
  "text": "I used to rule the world.",
  "usage": {
    "seconds": 20,
    "cost": 0.005333
  }
}

Sample response (token-priced provider):

{
  "text": "Hello, this is a test of speech-to-text transcription.",
  "usage": {
    "total_tokens": 113,
    "input_tokens": 83,
    "output_tokens": 30,
    "cost": 0.000508
  }
}

Drop-in workflow

#!/usr/bin/env bash
set -euo pipefail

MODEL="google/chirp-3"
FORMAT="wav"                          # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)

audio_b64=$(base64 < "$AUDIO" | tr -d '\n')

jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
  '{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"

# --data-binary @file keeps the base64 payload off argv (avoids E2BIG / ARG_MAX).
http_code=$(curl -sS -X POST https://openrouter.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  --output "$BODY" \
  -w '%{http_code}' \
  --data-binary @"$PAYLOAD")

if [[ "$http_code" != "200" ]]; then
  echo "STT failed (HTTP $http_code):" >&2
  cat "$BODY" >&2
  rm -f "$BODY" "$PAYLOAD"
  exit 1
fi

jq -r '.text' "$BODY"
rm -f "$BODY" "$PAYLOAD"

Discovering STT models

Filter the models endpoint by output modality to list transcription models.

curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
  | jq '.data[] | {id, name, pricing}'

Models are provider-namespaced — use the full slug (google/chirp-3, openai/whisper-1, openai/whisper-large-v3), not the short name.

Parameters

| Field | Required | Notes | | -------------------- | -------- | --------------------------------------------------------------------------------------------------------- | | model | yes | Full model slug from /api/v1/models?output_modalities=transcription. | | input_audio.data | yes | Base64-encoded raw audio bytes. Not a data URI — just the base64 payload, no data:audio/...;base64, prefix. | | input_audio.format | yes | wav, mp3, flac, m4a, ogg, webm, or aac. Must match the actual bytes. Support varies by provider. | | language | no | ISO-639-1 code (en, ja, fr). Auto-detected if omitted. | | temperature | no | 0–1. Lower is more deterministic. | | provider | no | Provider passthrough — see below. |

Picking an audio format

wav / flac — uncompressed or lossless. Highest quality; largest uploads.
mp3 / m4a / aac — compressed. Smaller payloads, which matters because base64 inflates bytes by ~33% on top of whatever the file already weighs.
webm / ogg — typical for browser recordings (MediaRecorder).

The format field must match the actual container/codec of the bytes. A file saved as .wav that is actually mp3 will be rejected or mis-decoded. When in doubt, confirm with ffprobe <file>.

Provider-specific options

Provider passthrough goes under provider.options.<slug> and is only forwarded when that provider handles the request. Example — Groq's prompt for vocabulary hinting:

{
  "model": "openai/whisper-large-v3",
  "input_audio": { "data": "UklGRiQA...", "format": "wav" },
  "provider": {
    "options": {
      "groq": {
        "prompt": "Expected vocabulary: OpenRouter, API, transcription"
      }
    }
  }
}

Options keyed by provider slug are forwarded only when that provider matches; other keys are ignored. Check each provider's upstream docs for available passthrough keys.

TypeScript (fetch)

import fs from "fs";

const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");

const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/chirp-3",
    input_audio: { data, format: "wav" },
  }),
});

if (!res.ok) {
  throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}

const result = await res.json();
console.log(result.text);

Python (requests)

import base64
import os
import requests

with open("audio.wav", "rb") as f:
    data = base64.b64encode(f.read()).decode("utf-8")

res = requests.post(
    "https://openrouter.ai/api/v1/audio/transcriptions",
    headers={
        "Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/chirp-3",
        "input_audio": {"data": data, "format": "wav"},
    },
)

if not res.ok:
    raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")

print(res.json()["text"])

Troubleshooting

Garbled or empty text — the format field probably doesn't match the actual bytes, or the audio is silent/corrupted. Confirm with ffprobe audio.wav.

400 with "Invalid base64" or silent failure — data must be just base64, not a data URI (data:audio/wav;base64,...). Strip the prefix if you copied it from a browser FileReader.

400 with a ZodError — a required field is missing or the wrong type. The body looks like {"success":false,"error":{"name":"ZodError","message":"[...]"}} — the nested message JSON string names the bad path (commonly input_audio.data or input_audio.format).

413 / request too large — base64 inflates bytes by ~33%, so a large raw file becomes an even larger JSON payload. Use a smaller source file (compressed format, lower sample rate, or trimmed clip).

Model not found — use the full slug from /api/v1/models?output_modalities=transcription (google/chirp-3, not chirp-3).

References

STT guide
Models page — filter to transcription output

openrouterteam/openrouter-stt

skills/openrouter-stt/SKILL.md

Transcribe speech to text using OpenRouter's speech-to-text API. Use when the user asks to transcribe audio, convert speech to text, extract a transcript from a recording or meeting, caption a video's audio, or mentions STT, speech-to-text, ASR, or transcription.

129 stars

development

Updated May 8, 2026

$ install --global

skillsauth

npx skillsauth add openrouterteam/skills openrouter-stt

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 8, 2026, 5:31 AM184.4s2 files scanned

SKILL.md

name:: openrouter-stt
description:: Transcribe speech to text using OpenRouter's speech-to-text API. Use when the user asks to transcribe audio, convert speech to text, extract a transcript from a recording or meeting, caption a video's audio, or mentions STT, speech-to-text, ASR, or transcription.

OpenRouter Speech-to-Text

Transcribe audio via POST /api/v1/audio/transcriptions using curl. Requires OPENROUTER_API_KEY (get one at https://openrouter.ai/keys). If unset, stop and ask.

One call, JSON back

Both request and response are JSON. The response body carries:

text — the transcript.
usage — always includes cost. Providers additionally report either seconds of audio billed or a token breakdown (total_tokens, input_tokens, output_tokens), depending on how they price the request. Don't assume both are present.

Sample response (duration-priced provider, e.g. google/chirp-3):

{
  "text": "I used to rule the world.",
  "usage": {
    "seconds": 20,
    "cost": 0.005333
  }
}

Sample response (token-priced provider):

{
  "text": "Hello, this is a test of speech-to-text transcription.",
  "usage": {
    "total_tokens": 113,
    "input_tokens": 83,
    "output_tokens": 30,
    "cost": 0.000508
  }
}

Drop-in workflow

#!/usr/bin/env bash
set -euo pipefail

MODEL="google/chirp-3"
FORMAT="wav"                          # wav, mp3, flac, m4a, ogg, webm, aac
AUDIO="audio.wav"
BODY=$(mktemp)
PAYLOAD=$(mktemp)

audio_b64=$(base64 < "$AUDIO" | tr -d '\n')

jq -n --arg model "$MODEL" --arg data "$audio_b64" --arg fmt "$FORMAT" \
  '{model: $model, input_audio: {data: $data, format: $fmt}}' > "$PAYLOAD"

# --data-binary @file keeps the base64 payload off argv (avoids E2BIG / ARG_MAX).
http_code=$(curl -sS -X POST https://openrouter.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  --output "$BODY" \
  -w '%{http_code}' \
  --data-binary @"$PAYLOAD")

if [[ "$http_code" != "200" ]]; then
  echo "STT failed (HTTP $http_code):" >&2
  cat "$BODY" >&2
  rm -f "$BODY" "$PAYLOAD"
  exit 1
fi

jq -r '.text' "$BODY"
rm -f "$BODY" "$PAYLOAD"

Discovering STT models

Filter the models endpoint by output modality to list transcription models.

curl -sS "https://openrouter.ai/api/v1/models?output_modalities=transcription" \
  | jq '.data[] | {id, name, pricing}'

Models are provider-namespaced — use the full slug (google/chirp-3, openai/whisper-1, openai/whisper-large-v3), not the short name.

Parameters

Picking an audio format

wav / flac — uncompressed or lossless. Highest quality; largest uploads.
mp3 / m4a / aac — compressed. Smaller payloads, which matters because base64 inflates bytes by ~33% on top of whatever the file already weighs.
webm / ogg — typical for browser recordings (MediaRecorder).

The format field must match the actual container/codec of the bytes. A file saved as .wav that is actually mp3 will be rejected or mis-decoded. When in doubt, confirm with ffprobe <file>.

Provider-specific options

Provider passthrough goes under provider.options.<slug> and is only forwarded when that provider handles the request. Example — Groq's prompt for vocabulary hinting:

{
  "model": "openai/whisper-large-v3",
  "input_audio": { "data": "UklGRiQA...", "format": "wav" },
  "provider": {
    "options": {
      "groq": {
        "prompt": "Expected vocabulary: OpenRouter, API, transcription"
      }
    }
  }
}

Options keyed by provider slug are forwarded only when that provider matches; other keys are ignored. Check each provider's upstream docs for available passthrough keys.

TypeScript (fetch)

import fs from "fs";

const audio = await fs.promises.readFile("audio.wav");
const data = audio.toString("base64");

const res = await fetch("https://openrouter.ai/api/v1/audio/transcriptions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.OPENROUTER_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/chirp-3",
    input_audio: { data, format: "wav" },
  }),
});

if (!res.ok) {
  throw new Error(`STT failed (HTTP ${res.status}): ${await res.text()}`);
}

const result = await res.json();
console.log(result.text);

Python (requests)

import base64
import os
import requests

with open("audio.wav", "rb") as f:
    data = base64.b64encode(f.read()).decode("utf-8")

res = requests.post(
    "https://openrouter.ai/api/v1/audio/transcriptions",
    headers={
        "Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "google/chirp-3",
        "input_audio": {"data": data, "format": "wav"},
    },
)

if not res.ok:
    raise RuntimeError(f"STT failed (HTTP {res.status_code}): {res.text}")

print(res.json()["text"])

Troubleshooting

Garbled or empty text — the format field probably doesn't match the actual bytes, or the audio is silent/corrupted. Confirm with ffprobe audio.wav.

400 with "Invalid base64" or silent failure — data must be just base64, not a data URI (data:audio/wav;base64,...). Strip the prefix if you copied it from a browser FileReader.

Model not found — use the full slug from /api/v1/models?output_modalities=transcription (google/chirp-3, not chirp-3).

References

STT guide
Models page — filter to transcription output

Related Skills

openrouterteam/openrouter-analytics

development

VerifiedTrustedCommunity

Answer natural-language questions about a user's OpenRouter usage data — spend, request volume, model breakdown, latency, token usage, and cost optimization. Use when the user asks about their API usage, billing, costs, top models, traffic patterns, or wants to optimize their OpenRouter spend.

162SKILL.mdUpdated Jun 11, 2026

openrouterteam/openrouter-analytics

openrouterteam/openrouter-analytics-schema

data-ai

VerifiedTrustedCommunity

Discover the OpenRouter analytics schema — available metrics, dimensions, filter operators, and granularities. Use when you need to know what analytics data is queryable, what dimensions you can break down by, or how to map a user's question to the right metric/dimension combination.

162SKILL.mdUpdated Jun 11, 2026

openrouterteam/openrouter-analytics-schema

openrouterteam/openrouter-analytics-query

development

VerifiedTrustedCommunity

Construct and execute analytics queries against the OpenRouter API — full parameter reference for metrics, dimensions, filters, time ranges, ordering, and pagination. Use when building or debugging an analytics query, understanding the request/response shape, or handling query errors.

162SKILL.mdUpdated Jun 11, 2026

openrouterteam/openrouter-analytics-query

openrouterteam/openrouter-generations

development

VerifiedTrustedCommunity

Retrieve detailed metadata and stored content for individual OpenRouter generations. Use when the user wants to inspect a specific request — its cost, latency, token usage, provider routing, or the actual prompt/completion text — or is debugging a failed or unexpected generation.

160SKILL.mdUpdated Jun 11, 2026

openrouterteam/openrouter-generations

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/openrouterteam/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/skills/openrouter-stt ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

openrouterteam/skills

129 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT