Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

NoizAI/speech-to-text

Name: speech-to-text
Author: NoizAI

skills/speech-to-text/SKILL.md

npx skillsauth add NoizAI/skills speech-to-text

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

speech-to-text

Transcribe any audio file to text. Supports multilingual auto-detection, timestamps, and speaker labels.

Triggers

transcribe / transcript / transcription
speech to text / STT / audio to text
what does this audio say / convert audio
转录 / 语音转文字 / 识别音频

Quick Start

# Transcribe with auto language detection
python3 skills/speech-to-text/scripts/stt.py audio.mp3

# Specify language explicitly
python3 skills/speech-to-text/scripts/stt.py interview.wav --language en

# Save transcript to file
python3 skills/speech-to-text/scripts/stt.py podcast.m4a -o transcript.txt

# Output full JSON (with timestamps and speaker labels)
python3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json

Arguments

| Argument | Default | Description | |----------|---------|-------------| | file | required | Audio file to transcribe (mp3, wav, m4a, ogg, flac, aac, webm). Max 50 MB, max 10 min. | | --language / -l | auto-detect | BCP-47 language code (e.g. en, zh, ja). Omit to auto-detect. | | --output / -o | stdout | Path to save transcript text (or JSON if --json is set). | | --json | off | Output full JSON response with timestamps and speaker labels. | | --api-key | from env/config | Noiz API key (overrides stored key). |

Output Format

Without --json, only the transcript text is printed:

Hello, welcome to today's podcast. We have a special guest joining us...

With --json, the full structured response is printed:

{
  "language": "en",
  "transcript": "Hello, welcome to today's podcast...",
  "duration": 42.5,
  "segments": [
    {"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
    {"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
  ]
}

Supported Languages

Common codes: en (English), zh (Chinese), ja (Japanese), ko (Korean), es (Spanish), fr (French), de (German), pt (Portuguese), ru (Russian), ar (Arabic). Omit --language to auto-detect.

Configuration

# Save your API key once
python3 skills/speech-to-text/scripts/stt.py config --set-api-key YOUR_KEY

# Or set via environment variable
export NOIZ_API_KEY=YOUR_KEY

Get your API key at developers.noiz.ai.

Pricing

Billed at $0.0006 per second of audio. A 10-minute file costs ~$0.36. New accounts include 10,000 free TTS characters; STT is billed separately.

Security & data disclosure

Credential storage: API key is saved to ~/.config/noiz/api_key (permissions 0600). NOIZ_API_KEY env var is also supported.
Network calls: The audio file is uploaded to https://noiz.ai/v1/speech-to-text for transcription. No data is sent until you run the command.
File limits: Max 50 MB per file, max 10 minutes (600 seconds) of audio.

Requirements

requests package: pip install requests
Get your API key at developers.noiz.ai

NoizAI/speech-to-text

skills/speech-to-text/SKILL.md

Use this skill whenever the user wants to transcribe audio to text, convert speech to text, or get a transcript from an audio or video file. Triggers include: any mention of 'transcribe', 'transcription', 'speech to text', 'STT', 'convert audio to text', 'what does this audio say', 'get transcript', 'subtitle generation', or requests to extract spoken words from a file. Also use when the user wants speaker identification from audio, timestamps for captions, or multilingual transcription.

494 stars

content-media

Updated May 8, 2026

$ install --global

skillsauth

npx skillsauth add NoizAI/skills speech-to-text

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 8, 2026, 4:02 AM35.8s2 files scanned

SKILL.md

name:: speech-to-text
description:: Use this skill whenever the user wants to transcribe audio to text, convert speech to text, or get a transcript from an audio or video file. Triggers include: any mention of 'transcribe', 'transcription', 'speech to text', 'STT', 'convert audio to text', 'what does this audio say', 'get transcript', 'subtitle generation', or requests to extract spoken words from a file. Also use when the user wants speaker identification from audio, timestamps for captions, or multilingual transcription.
metadata:: {"openclaw": {"primaryEnv": "NOIZ_API_KEY"}}

speech-to-text

Transcribe any audio file to text. Supports multilingual auto-detection, timestamps, and speaker labels.

Triggers

transcribe / transcript / transcription
speech to text / STT / audio to text
what does this audio say / convert audio
转录 / 语音转文字 / 识别音频

Quick Start

# Transcribe with auto language detection
python3 skills/speech-to-text/scripts/stt.py audio.mp3

# Specify language explicitly
python3 skills/speech-to-text/scripts/stt.py interview.wav --language en

# Save transcript to file
python3 skills/speech-to-text/scripts/stt.py podcast.m4a -o transcript.txt

# Output full JSON (with timestamps and speaker labels)
python3 skills/speech-to-text/scripts/stt.py meeting.wav --json -o result.json

Arguments

Output Format

Without --json, only the transcript text is printed:

Hello, welcome to today's podcast. We have a special guest joining us...

With --json, the full structured response is printed:

{
  "language": "en",
  "transcript": "Hello, welcome to today's podcast...",
  "duration": 42.5,
  "segments": [
    {"text": "Hello, welcome to today's podcast.", "start": 0.0, "end": 3.2, "spk": 0},
    {"text": "We have a special guest joining us.", "start": 3.5, "end": 6.1, "spk": 0}
  ]
}

Supported Languages

Configuration

# Save your API key once
python3 skills/speech-to-text/scripts/stt.py config --set-api-key YOUR_KEY

# Or set via environment variable
export NOIZ_API_KEY=YOUR_KEY

Get your API key at developers.noiz.ai.

Pricing

Billed at $0.0006 per second of audio. A 10-minute file costs ~$0.36. New accounts include 10,000 free TTS characters; STT is billed separately.

Security & data disclosure

Credential storage: API key is saved to ~/.config/noiz/api_key (permissions 0600). NOIZ_API_KEY env var is also supported.
Network calls: The audio file is uploaded to https://noiz.ai/v1/speech-to-text for transcription. No data is sent until you run the command.
File limits: Max 50 MB per file, max 10 minutes (600 seconds) of audio.

Requirements

requests package: pip install requests
Get your API key at developers.noiz.ai

Related Skills

NoizAI/sound-fx

tools

VerifiedTrustedCommunity

Use this skill whenever the user wants to generate sound effects, ambient audio, or short audio clips from a text description. Triggers include: any mention of 'sound effect', 'sfx', 'generate sound', 'make a sound', 'audio effect', 'ambient sound', 'foley', 'sound clip', 'noise', or requests to produce a specific sound (e.g. 'make a gunshot sound', 'generate thunder', 'create the sound of rain'). Also use when the user describes an action or scenario and wants the corresponding audio (e.g. 'someone getting spanked', 'a door slamming', 'cartoon boing'). Do NOT use for speech synthesis, music generation with melody/lyrics, or voice cloning.

470SKILL.mdUpdated Apr 20, 2026

NoizAI/video-translation

testing

VerifiedTrustedCommunity

Translate and dub videos from one language to another, replacing the original audio with TTS while keeping the video intact.

371SKILL.mdUpdated Mar 20, 2026

NoizAI/video-translation

NoizAI/tts

testing

VerifiedTrustedCommunity

Use this skill whenever the user wants to convert text into speech, generate audio from text, or produce voiceovers. Triggers include: any mention of 'TTS', 'text to speech', 'speak', 'say', 'voice', 'read aloud', 'audio narration', 'voiceover', 'dubbing', or requests to turn written content into spoken audio. Also use when converting EPUB/PDF/SRT/articles to audio, cloning voices from reference audio, controlling emotion or speed in speech, aligning speech to subtitle timelines, or producing per-segment voice-mapped audio.

371SKILL.mdUpdated Mar 20, 2026

NoizAI/template-skill

data-ai

VerifiedTrustedCommunity

Reusable template for authoring new Agent Skills with clear triggers, workflow, and I/O contracts.

371SKILL.mdUpdated Mar 20, 2026

NoizAI/template-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/NoizAI/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/skills/speech-to-text ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

NoizAI/skills

494 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT