skills/acestep/SKILL.md
Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use this skill when users mention generating music, creating songs, music production, remix, or audio continuation.
npx skillsauth add ace-step/ace-step-skills acestepInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use ACE-Step V1.5 API for music generation. Always use scripts/acestep.sh script — do NOT call API endpoints directly.
# 1. cd to this skill's directory
cd {project_root}/{.claude or .codex}/skills/acestep/
# 2. Check API service health
./scripts/acestep.sh health
# 3. Generate with lyrics (recommended)
./scripts/acestep.sh generate -c "pop, female vocal, piano" -l "[Verse] Your lyrics here..." --duration 120 --language zh
# 4. Output saved to: {project_root}/acestep_output/
For user requests requiring vocals:
-c and -l parametersOnly use Simple/Random mode (-d or random) for quick inspiration or instrumental exploration.
If the user needs a simple music video, use the acestep-simplemv skill to render one with waveform visualization and synced lyrics.
MV Production Requirements: Making a simple MV requires three additional skills to be installed:
MV Background Image: When the user requests MV production, ask whether they want a background image for the video:
Use AskUserQuestion to let the user choose before proceeding with MV rendering.
Parallel Processing: Lyrics transcription and thumbnail generation are independent tasks. When the user chooses to generate a background image, run acestep-lyrics-transcription and acestep-thumbnail in parallel (e.g. via two concurrent Agent calls) to save time, then use both outputs for the final MV render.
CRITICAL - Complete Lyrics Input: When providing lyrics via the -l parameter, you MUST pass ALL lyrics content WITHOUT any omission:
Music Parameters: Use the acestep-songwriting skill for guidance on duration, BPM, key scale, and time signature.
# need to cd to this skill's directory first
cd {project_root}/{.claude or .codex}/skills/acestep/
# Caption mode - RECOMMENDED: Write lyrics first, then generate
./scripts/acestep.sh generate -c "Electronic pop, energetic synths" -l "[Verse] Your complete lyrics
[Chorus] Full chorus here..." --duration 120 --bpm 128
# Instrumental only
./scripts/acestep.sh generate "Jazz with saxophone"
# Quick exploration (Simple/Random mode)
./scripts/acestep.sh generate -d "A cheerful song about spring"
./scripts/acestep.sh random
# Cover / Repainting from source audio
./scripts/acestep.sh cover song.mp3 -c "Rock cover style" -l "[Verse] Lyrics..." --duration 120 --bpm 128
./scripts/acestep.sh generate --src-audio song.mp3 --task-type repaint -c "Pop" --repaint-start 30 --repaint-end 60
# Music attribute options
./scripts/acestep.sh generate "Rock" --duration 60 --bpm 120 --key-scale "C major" --time-sig "4/4"
./scripts/acestep.sh generate "Rock" --duration 60 --batch 2
./scripts/acestep.sh generate "EDM" --no-thinking # Faster
# Other commands
./scripts/acestep.sh status <job_id>
./scripts/acestep.sh health
./scripts/acestep.sh models
The cover command generates music based on a source audio file. The audio is base64-encoded and sent to the API.
# Cover: regenerate with new style/lyrics, preserving melody structure
./scripts/acestep.sh cover input.mp3 -c "Jazz cover" -l "[Verse] New lyrics..." --duration 120
# Repainting: modify a specific region of the audio
./scripts/acestep.sh generate --src-audio input.mp3 --task-type repaint -c "Pop ballad" --repaint-start 30 --repaint-end 90
# Cover options
# --src-audio Source audio file path
# --task-type cover (default with --src-audio), repaint, text2music
# --cover-strength 0.0-1.0 (default: 1.0, higher = closer to source)
# --repaint-start Repainting start position (seconds)
# --repaint-end Repainting end position (seconds)
# --key-scale Musical key (e.g. "E minor")
# --time-signature Time signature (e.g. "4/4")
Note: For cloud API usage, large audio files may be rejected by Cloudflare. Compress audio before uploading if needed (e.g. using ffmpeg: ffmpeg -i input.mp3 -b:a 64k -ar 24000 -ac 1 compressed.mp3).
After generation, the script automatically saves results to the acestep_output folder in the project root (same level as .claude):
project_root/
├── .claude/
│ └── skills/acestep/...
├── acestep_output/ # Output directory
│ ├── <job_id>.json # Complete task result (JSON)
│ ├── <job_id>_1.mp3 # First audio file
│ ├── <job_id>_2.mp3 # Second audio file (if batch_size > 1)
│ └── ...
└── ...
Important: When LM enhancement is enabled (use_format=true), the final synthesized content may differ from your input. Check the JSON file for actual values:
| Field | Description |
|-------|-------------|
| prompt | Actual caption used for synthesis (may be LM-enhanced) |
| lyrics | Actual lyrics used for synthesis (may be LM-enhanced) |
| metas.prompt | Original input caption |
| metas.lyrics | Original input lyrics |
| metas.bpm | BPM used |
| metas.keyscale | Key scale used |
| metas.duration | Duration in seconds |
| generation_info | Detailed timing and model info |
| seed_value | Seeds used (for reproducibility) |
| lm_model | LM model name |
| dit_model | DiT model name |
To get the actual synthesized lyrics, parse the JSON and read the top-level lyrics field, not metas.lyrics.
Important: Configuration follows this priority (high to low):
config --set command permanently modifies config.jsonscripts/config.json){
"api_url": "http://127.0.0.1:8001",
"api_key": "",
"api_mode": "completion",
"generation": {
"thinking": true,
"use_format": false,
"use_cot_caption": true,
"use_cot_language": false,
"batch_size": 1,
"audio_format": "mp3",
"vocal_language": "en"
}
}
| Option | Default | Description |
|--------|---------|-------------|
| api_url | http://127.0.0.1:8001 | API server address |
| api_key | "" | API authentication key (optional) |
| api_mode | completion | API mode: completion (OpenRouter, default) or native (polling) |
| generation.thinking | true | Enable 5Hz LM (higher quality, slower) |
| generation.audio_format | mp3 | Output format (mp3/wav/flac) |
| generation.vocal_language | en | Vocal language |
IMPORTANT: This skill requires the ACE-Step API server to be running.
The scripts/acestep.sh script requires: curl and jq.
# Check dependencies
curl --version
jq --version
If jq is not installed, the script will attempt to install it automatically. If automatic installation fails:
choco install jq or download from https://jqlang.github.io/jq/download/brew install jqsudo apt-get install jq (Debian/Ubuntu) or sudo dnf install jq (Fedora)You MUST check the API key and URL status before proceeding. Run:
cd "{project_root}/{.claude or .codex}/skills/acestep/" && bash ./scripts/acestep.sh config --check-key
cd "{project_root}/{.claude or .codex}/skills/acestep/" && bash ./scripts/acestep.sh config --get api_url
https://api.acemusic.ai) without API keyIf api_url is https://api.acemusic.ai and api_key is empty, you MUST stop and guide the user to configure their key:
AskUserQuestion to ask the user to provide their API key.cd "{project_root}/{.claude or .codex}/skills/acestep/" && bash ./scripts/acestep.sh config --set api_key <KEY>
acestep-lyrics-transcription skill."Verify the API endpoint: ./scripts/acestep.sh health and proceed with music generation.
Local services (http://127.0.0.1:*) typically don't require a key. Verify with ./scripts/acestep.sh health and proceed.
If health check fails:
Official Cloud API: ACE-Step provides an official API endpoint at https://api.acemusic.ai. To use it:
./scripts/acestep.sh config --set api_url "https://api.acemusic.ai"
./scripts/acestep.sh config --set api_key "your-key"
./scripts/acestep.sh config --set api_mode completion
API keys are currently available through acemusic.ai for free.
Local Service (Default): No configuration needed — connects to http://127.0.0.1:8001.
Custom Remote Service: Update scripts/config.json or use:
./scripts/acestep.sh config --set api_url "http://remote-server:8001"
./scripts/acestep.sh config --set api_key "your-key"
API Key Handling: When checking whether an API key is configured, use config --check-key which only reports configured or empty without printing the actual key. NEVER use config --get api_key or read config.json directly — these would expose the user's API key. The config --list command is safe — it automatically masks API keys as *** in output.
The skill supports two API modes. Switch via api_mode in scripts/config.json:
| Mode | Endpoint | Description |
|------|----------|-------------|
| completion (default) | /v1/chat/completions | OpenRouter-compatible, sync request, audio returned as base64 |
| native | /release_task + /query_result | Async polling mode, supports all parameters |
Switch mode:
./scripts/acestep.sh config --set api_mode completion
./scripts/acestep.sh config --set api_mode native
Completion mode notes:
inference_steps, infer_method, shift are not configurable (server defaults)--no-wait and status commands are not applicable in completion modemodel field — auto-detected from /v1/models if not specifiedIMPORTANT: For installation and startup, always use the acestep-docs skill to get complete and accurate guidance.
DO NOT provide simplified startup commands - each user's environment may be different. Always guide them to use acestep-docs for proper setup.
For API debugging, see API Reference.
development
Generate song cover/thumbnail images using Gemini API. Creates artistic images suitable for music video backgrounds. Use when users want to generate album art, song covers, thumbnails, or background images for MVs.
documentation
Music songwriting guide for ACE-Step. Provides professional knowledge on writing captions, lyrics, choosing BPM/key/duration, and structuring songs. Use this skill when users want to create, write, or plan a song before generating it with ACE-Step.
content-media
Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.
development
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.