Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ace-step/acestep-lyrics-transcription

Name: acestep-lyrics-transcription
Author: ace-step

skills/acestep-lyrics-transcription/SKILL.md

npx skillsauth add ace-step/ace-step-skills acestep-lyrics-transcription

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Lyrics Transcription Skill

Transcribe audio files to timestamped lyrics (LRC/SRT/JSON) via OpenAI Whisper or ElevenLabs Scribe API.

API Key Setup Guide

Before transcribing, you MUST check whether the user's API key is configured. Run the following command to check:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --check-key

This command only reports whether the active provider's API key is set or empty — it does NOT print the actual key value. NEVER read or display the user's API key content. Do not use config --get on key fields or read config.json directly. The config --list command is safe — it automatically masks API keys as *** in output.

If the command reports the key is empty, you MUST stop and guide the user to configure it before proceeding. Do NOT attempt transcription without a valid key — it will fail.

Use AskUserQuestion to ask the user to provide their API key, with the following options and guidance:

Tell the user which provider is currently active (openai or elevenlabs) and that its API key is not configured. Explain that transcription cannot proceed without it.
Provide clear instructions on where to obtain a key:
- OpenAI: Get an API key at https://platform.openai.com/api-keys — requires an OpenAI account with billing enabled. The Whisper API costs ~$0.006/min.
- ElevenLabs: Get an API key at https://elevenlabs.io/app/settings/api-keys — requires an ElevenLabs account. Free tier includes limited credits.
Also offer the option to switch to the other provider if they already have a key for it.

Once the user provides the key, configure it using:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set <provider>.api_key <KEY>

If the user wants to switch providers, also run:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set provider <provider_name>

After configuring, re-run config --check-key to verify the key is set before proceeding.

If the API key is already configured, proceed directly to transcription without asking.

Quick Start

# 1. cd to this skill's directory
cd {project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/

# 2. Configure API key (choose one)
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
# or
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# 3. Transcribe
./scripts/acestep-lyrics-transcription.sh transcribe --audio /path/to/song.mp3 --language zh

# 4. Output saved to: {project_root}/acestep_output/<filename>.lrc

Prerequisites

curl, jq, python3 (or python)
An API key for OpenAI or ElevenLabs

Script Usage

./scripts/acestep-lyrics-transcription.sh transcribe --audio <file> [options]

Options:
  -a, --audio      Audio file path (required)
  -l, --language   Language code (zh, en, ja, etc.)
  -f, --format     Output format: lrc, srt, json (default: lrc)
  -p, --provider   API provider: openai, elevenlabs (overrides config)
  -o, --output     Output file path (default: acestep_output/<filename>.lrc)

Post-Transcription Lyrics Correction (MANDATORY)

CRITICAL: After transcription, you MUST manually correct the LRC file before using it for MV rendering. Transcription models frequently produce errors on sung lyrics:

Proper nouns: "ACE-Step" → "AC step", "Spotify" → "spot a fly"
Similar-sounding words: "arrives" → "eyes", "open source" → "open sores"
Merged/split words: "lighting up" → "lightin' nup"

Correction Workflow

Read the transcribed LRC file using the Read tool
Read the original lyrics from the ACE-Step output JSON file
Use original lyrics as a whole reference: Do NOT attempt line-by-line alignment — transcription often splits, merges, or reorders lines differently from the original. Instead, read the original lyrics in full to understand the correct wording, then scan each LRC line and fix any misrecognized words based on your knowledge of what the original lyrics say.
Fix transcription errors: Replace misrecognized words with the correct original words, keeping the timestamps intact
Write the corrected LRC back using the Write tool

What to Correct

Replace misrecognized words with their correct original versions
Keep all [MM:SS.cc] timestamps exactly as-is (timestamps from transcription are accurate)
Do NOT add structure tags like [Verse] or [Chorus] — the LRC should only have timestamped text lines

Example

Transcribed (wrong):

[00:46.96]AC step alive,
[00:50.80]one point five eyes.

Original lyrics reference:

ACE-Step alive
One point five arrives

Corrected (right):

[00:46.96]ACE-Step alive,
[00:50.80]One point five arrives.

Configuration

Config file: scripts/config.json

# Switch provider
./scripts/acestep-lyrics-transcription.sh config --set provider openai
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# Set API keys
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...

# View config
./scripts/acestep-lyrics-transcription.sh config --list

| Option | Default | Description | |--------|---------|-------------| | provider | openai | Active provider: openai or elevenlabs | | output_format | lrc | Default output: lrc, srt, or json | | openai.api_key | "" | OpenAI API key | | openai.api_url | https://api.openai.com/v1 | OpenAI API base URL | | openai.model | whisper-1 | OpenAI model (whisper-1 for word timestamps) | | elevenlabs.api_key | "" | ElevenLabs API key | | elevenlabs.api_url | https://api.elevenlabs.io/v1 | ElevenLabs API base URL | | elevenlabs.model | scribe_v2 | ElevenLabs model |

Provider Notes

| Provider | Model | Word Timestamps | Pricing | |----------|-------|-----------------|---------| | OpenAI | whisper-1 | Yes (segment + word) | $0.006/min | | ElevenLabs | scribe_v2 | Yes (word-level) | Varies by plan |

OpenAI whisper-1 is the only OpenAI model supporting word-level timestamps
ElevenLabs scribe_v2 returns word-level timestamps with type filtering
Both support multilingual transcription

Examples

# Basic transcription (uses config defaults)
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3

# Chinese song to LRC
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --language zh

# Use ElevenLabs, output SRT
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --provider elevenlabs --format srt

# Custom output path
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --output ./my_lyrics.lrc

ace-step/acestep-lyrics-transcription

skills/acestep-lyrics-transcription/SKILL.md

Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.

43 stars

development

Updated Mar 27, 2026

$ install --global

skillsauth

npx skillsauth add ace-step/ace-step-skills acestep-lyrics-transcription

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Mar 29, 2026, 11:19 PM36.8s3 files scanned

SKILL.md

name:: acestep-lyrics-transcription
description:: Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.
allowed-tools:: Read, Write, Bash

Lyrics Transcription Skill

Transcribe audio files to timestamped lyrics (LRC/SRT/JSON) via OpenAI Whisper or ElevenLabs Scribe API.

API Key Setup Guide

Before transcribing, you MUST check whether the user's API key is configured. Run the following command to check:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --check-key

If the command reports the key is empty, you MUST stop and guide the user to configure it before proceeding. Do NOT attempt transcription without a valid key — it will fail.

Use AskUserQuestion to ask the user to provide their API key, with the following options and guidance:

Tell the user which provider is currently active (openai or elevenlabs) and that its API key is not configured. Explain that transcription cannot proceed without it.
Provide clear instructions on where to obtain a key:
- OpenAI: Get an API key at https://platform.openai.com/api-keys — requires an OpenAI account with billing enabled. The Whisper API costs ~$0.006/min.
- ElevenLabs: Get an API key at https://elevenlabs.io/app/settings/api-keys — requires an ElevenLabs account. Free tier includes limited credits.
Also offer the option to switch to the other provider if they already have a key for it.

Once the user provides the key, configure it using:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set <provider>.api_key <KEY>

If the user wants to switch providers, also run:

cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set provider <provider_name>

After configuring, re-run config --check-key to verify the key is set before proceeding.

If the API key is already configured, proceed directly to transcription without asking.

Quick Start

# 1. cd to this skill's directory
cd {project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/

# 2. Configure API key (choose one)
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
# or
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# 3. Transcribe
./scripts/acestep-lyrics-transcription.sh transcribe --audio /path/to/song.mp3 --language zh

# 4. Output saved to: {project_root}/acestep_output/<filename>.lrc

Prerequisites

curl, jq, python3 (or python)
An API key for OpenAI or ElevenLabs

Script Usage

./scripts/acestep-lyrics-transcription.sh transcribe --audio <file> [options]

Options:
  -a, --audio      Audio file path (required)
  -l, --language   Language code (zh, en, ja, etc.)
  -f, --format     Output format: lrc, srt, json (default: lrc)
  -p, --provider   API provider: openai, elevenlabs (overrides config)
  -o, --output     Output file path (default: acestep_output/<filename>.lrc)

Post-Transcription Lyrics Correction (MANDATORY)

CRITICAL: After transcription, you MUST manually correct the LRC file before using it for MV rendering. Transcription models frequently produce errors on sung lyrics:

Proper nouns: "ACE-Step" → "AC step", "Spotify" → "spot a fly"
Similar-sounding words: "arrives" → "eyes", "open source" → "open sores"
Merged/split words: "lighting up" → "lightin' nup"

Correction Workflow

Read the transcribed LRC file using the Read tool
Read the original lyrics from the ACE-Step output JSON file
Use original lyrics as a whole reference: Do NOT attempt line-by-line alignment — transcription often splits, merges, or reorders lines differently from the original. Instead, read the original lyrics in full to understand the correct wording, then scan each LRC line and fix any misrecognized words based on your knowledge of what the original lyrics say.
Fix transcription errors: Replace misrecognized words with the correct original words, keeping the timestamps intact
Write the corrected LRC back using the Write tool

What to Correct

Replace misrecognized words with their correct original versions
Keep all [MM:SS.cc] timestamps exactly as-is (timestamps from transcription are accurate)
Do NOT add structure tags like [Verse] or [Chorus] — the LRC should only have timestamped text lines

Example

Transcribed (wrong):

[00:46.96]AC step alive,
[00:50.80]one point five eyes.

Original lyrics reference:

ACE-Step alive
One point five arrives

Corrected (right):

[00:46.96]ACE-Step alive,
[00:50.80]One point five arrives.

Configuration

Config file: scripts/config.json

# Switch provider
./scripts/acestep-lyrics-transcription.sh config --set provider openai
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs

# Set API keys
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...

# View config
./scripts/acestep-lyrics-transcription.sh config --list

Provider Notes

OpenAI whisper-1 is the only OpenAI model supporting word-level timestamps
ElevenLabs scribe_v2 returns word-level timestamps with type filtering
Both support multilingual transcription

Examples

# Basic transcription (uses config defaults)
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3

# Chinese song to LRC
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --language zh

# Use ElevenLabs, output SRT
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --provider elevenlabs --format srt

# Custom output path
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --output ./my_lyrics.lrc

Related Skills

ace-step/acestep

development

VerifiedTrustedCommunity

Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use this skill when users mention generating music, creating songs, music production, remix, or audio continuation.

43SKILL.mdUpdated Mar 27, 2026

ace-step/acestep-thumbnail

development

VerifiedTrustedCommunity

Generate song cover/thumbnail images using Gemini API. Creates artistic images suitable for music video backgrounds. Use when users want to generate album art, song covers, thumbnails, or background images for MVs.

43SKILL.mdUpdated Mar 27, 2026

ace-step/acestep-thumbnail

ace-step/acestep-songwriting

documentation

VerifiedTrustedCommunity

Music songwriting guide for ACE-Step. Provides professional knowledge on writing captions, lyrics, choosing BPM/key/duration, and structuring songs. Use this skill when users want to create, write, or plan a song before generating it with ACE-Step.

43SKILL.mdUpdated Mar 27, 2026

ace-step/acestep-songwriting

ace-step/acestep-simplemv

content-media

VerifiedTrustedCommunity

Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.

43SKILL.mdUpdated Mar 27, 2026

ace-step/acestep-simplemv

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ace-step/ace-step-skills.git

# Copy into Claude Code skills folder (global)
cp -r ace-step-skills/skills/acestep-lyrics-transcription ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ace-step/ace-step-skills

43 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT