skills/acestep-lyrics-transcription/SKILL.md
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.
npx skillsauth add ace-step/ace-step-skills acestep-lyrics-transcriptionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Transcribe audio files to timestamped lyrics (LRC/SRT/JSON) via OpenAI Whisper or ElevenLabs Scribe API.
Before transcribing, you MUST check whether the user's API key is configured. Run the following command to check:
cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --check-key
This command only reports whether the active provider's API key is set or empty — it does NOT print the actual key value. NEVER read or display the user's API key content. Do not use config --get on key fields or read config.json directly. The config --list command is safe — it automatically masks API keys as *** in output.
If the command reports the key is empty, you MUST stop and guide the user to configure it before proceeding. Do NOT attempt transcription without a valid key — it will fail.
Use AskUserQuestion to ask the user to provide their API key, with the following options and guidance:
cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set <provider>.api_key <KEY>
cd "{project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/" && bash ./scripts/acestep-lyrics-transcription.sh config --set provider <provider_name>
config --check-key to verify the key is set before proceeding.If the API key is already configured, proceed directly to transcription without asking.
# 1. cd to this skill's directory
cd {project_root}/{.claude or .codex}/skills/acestep-lyrics-transcription/
# 2. Configure API key (choose one)
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
# or
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs
# 3. Transcribe
./scripts/acestep-lyrics-transcription.sh transcribe --audio /path/to/song.mp3 --language zh
# 4. Output saved to: {project_root}/acestep_output/<filename>.lrc
./scripts/acestep-lyrics-transcription.sh transcribe --audio <file> [options]
Options:
-a, --audio Audio file path (required)
-l, --language Language code (zh, en, ja, etc.)
-f, --format Output format: lrc, srt, json (default: lrc)
-p, --provider API provider: openai, elevenlabs (overrides config)
-o, --output Output file path (default: acestep_output/<filename>.lrc)
CRITICAL: After transcription, you MUST manually correct the LRC file before using it for MV rendering. Transcription models frequently produce errors on sung lyrics:
[MM:SS.cc] timestamps exactly as-is (timestamps from transcription are accurate)[Verse] or [Chorus] — the LRC should only have timestamped text linesTranscribed (wrong):
[00:46.96]AC step alive,
[00:50.80]one point five eyes.
Original lyrics reference:
ACE-Step alive
One point five arrives
Corrected (right):
[00:46.96]ACE-Step alive,
[00:50.80]One point five arrives.
Config file: scripts/config.json
# Switch provider
./scripts/acestep-lyrics-transcription.sh config --set provider openai
./scripts/acestep-lyrics-transcription.sh config --set provider elevenlabs
# Set API keys
./scripts/acestep-lyrics-transcription.sh config --set openai.api_key sk-...
./scripts/acestep-lyrics-transcription.sh config --set elevenlabs.api_key ...
# View config
./scripts/acestep-lyrics-transcription.sh config --list
| Option | Default | Description |
|--------|---------|-------------|
| provider | openai | Active provider: openai or elevenlabs |
| output_format | lrc | Default output: lrc, srt, or json |
| openai.api_key | "" | OpenAI API key |
| openai.api_url | https://api.openai.com/v1 | OpenAI API base URL |
| openai.model | whisper-1 | OpenAI model (whisper-1 for word timestamps) |
| elevenlabs.api_key | "" | ElevenLabs API key |
| elevenlabs.api_url | https://api.elevenlabs.io/v1 | ElevenLabs API base URL |
| elevenlabs.model | scribe_v2 | ElevenLabs model |
| Provider | Model | Word Timestamps | Pricing | |----------|-------|-----------------|---------| | OpenAI | whisper-1 | Yes (segment + word) | $0.006/min | | ElevenLabs | scribe_v2 | Yes (word-level) | Varies by plan |
whisper-1 is the only OpenAI model supporting word-level timestampsscribe_v2 returns word-level timestamps with type filtering# Basic transcription (uses config defaults)
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3
# Chinese song to LRC
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --language zh
# Use ElevenLabs, output SRT
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --provider elevenlabs --format srt
# Custom output path
./scripts/acestep-lyrics-transcription.sh transcribe --audio song.mp3 --output ./my_lyrics.lrc
development
Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use this skill when users mention generating music, creating songs, music production, remix, or audio continuation.
development
Generate song cover/thumbnail images using Gemini API. Creates artistic images suitable for music video backgrounds. Use when users want to generate album art, song covers, thumbnails, or background images for MVs.
documentation
Music songwriting guide for ACE-Step. Provides professional knowledge on writing captions, lyrics, choosing BPM/key/duration, and structuring songs. Use this skill when users want to create, write, or plan a song before generating it with ACE-Step.
content-media
Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.