skills/acestep-songwriting/SKILL.md
Music songwriting guide for ACE-Step. Provides professional knowledge on writing captions, lyrics, choosing BPM/key/duration, and structuring songs. Use this skill when users want to create, write, or plan a song before generating it with ACE-Step.
npx skillsauth add ace-step/ace-step-skills acestep-songwritingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Professional music creation knowledge for writing captions, lyrics, and choosing music parameters for ACE-Step.
After using this guide, produce two things for the acestep skill:
-c): Style/genre/instruments/emotion description-l): Complete structured lyrics with tags--duration, --bpm, --key, --time-signature, --languageCaption is the most important factor affecting generated music.
Supports multiple formats: simple style words, comma-separated tags, complex natural language descriptions.
| Dimension | Examples | |-----------|----------| | Style/Genre | pop, rock, jazz, electronic, hip-hop, R&B, folk, classical, lo-fi, synthwave | | Emotion/Atmosphere | melancholic, uplifting, energetic, dreamy, dark, nostalgic, euphoric, intimate | | Instruments | acoustic guitar, piano, synth pads, 808 drums, strings, brass, electric bass | | Timbre Texture | warm, bright, crisp, muddy, airy, punchy, lush, raw, polished | | Era Reference | 80s synth-pop, 90s grunge, 2010s EDM, vintage soul, modern trap | | Production Style | lo-fi, high-fidelity, live recording, studio-polished, bedroom pop | | Vocal Characteristics | female vocal, male vocal, breathy, powerful, falsetto, raspy, choir | | Speed/Rhythm | slow tempo, mid-tempo, fast-paced, groovy, driving, laid-back | | Structure Hints | building intro, catchy chorus, dramatic bridge, fade-out ending |
Lyrics controls how music unfolds over time. It carries:
| Category | Tag | Description |
|----------|-----|-------------|
| Basic Structure | [Intro] | Opening, establish atmosphere |
| | [Verse] / [Verse 1] | Verse, narrative progression |
| | [Pre-Chorus] | Pre-chorus, build energy |
| | [Chorus] | Chorus, emotional climax |
| | [Bridge] | Bridge, transition or elevation |
| | [Outro] | Ending, conclusion |
| Dynamic Sections | [Build] | Energy gradually rising |
| | [Drop] | Electronic music energy release |
| | [Breakdown] | Reduced instrumentation, space |
| Instrumental | [Instrumental] | Pure instrumental, no vocals |
| | [Guitar Solo] | Guitar solo |
| | [Piano Interlude] | Piano interlude |
| Special | [Fade Out] | Fade out ending |
| | [Silence] | Silence |
Use - for finer control, but keep it concise:
✅ [Chorus - anthemic]
❌ [Chorus - anthemic - stacked harmonies - high energy - powerful - epic]
Put complex style descriptions in Caption, not in tags.
Models are not good at resolving conflicts. Checklist:
| Tag | Effect |
|-----|--------|
| [raspy vocal] | Raspy, textured vocals |
| [whispered] | Whispered |
| [falsetto] | Falsetto |
| [powerful belting] | Powerful, high-pitched singing |
| [spoken word] | Rap/recitation |
| [harmonies] | Layered harmonies |
| [call and response] | Call and response |
| [ad-lib] | Improvised embellishments |
| Tag | Effect |
|-----|--------|
| [high energy] | High energy, passionate |
| [low energy] | Low energy, restrained |
| [building energy] | Increasing energy |
| [explosive] | Explosive energy |
| [melancholic] | Melancholic |
| [euphoric] | Euphoric |
| [dreamy] | Dreamy |
| [aggressive] | Aggressive |
WE ARE THE CHAMPIONS! (shouting) vs walking through the streets (normal)We rise together (together)Feeeling so aliiive (use cautiously, effects unstable)| Red Flag | Description | |----------|-------------| | Adjective stacking | "neon skies, electric hearts, endless dreams" — vague imagery filler | | Rhyme chaos | Inconsistent patterns or forced rhymes breaking meaning | | Blurred boundaries | Lyric content crosses structure tags | | No breathing room | Lines too long to sing in one breath | | Mixed metaphors | Water → fire → flying — listeners can't anchor |
Metaphor discipline: One core metaphor per song, explore its multiple aspects.
Most of the time, let LM auto-infer. Only set manually when you have clear requirements.
| Parameter | Range | Description |
|-----------|-------|-------------|
| bpm | 30–300 | Slow 60–80, mid 90–120, fast 130–180 |
| keyscale | Key | e.g. C Major, Am. Common keys (C, G, D, Am, Em) most stable |
| timesignature | Time sig | 4/4 (most common), 3/4 (waltz), 6/8 (swing) |
| vocal_language | Language | Usually auto-detected from lyrics |
| duration | Seconds | See duration calculation below |
| Scenario | Set |
|----------|-----|
| Daily generation | Let LM auto-infer |
| Clear tempo requirement | bpm |
| Specific style (waltz) | timesignature=3/4 |
| Match other material | bpm + duration |
| Specific key color | keyscale |
Rule of thumb: When in doubt, estimate longer. A song too short feels rushed.
Note: Lyrics tags (piano, powerful, whispered) are consistent with Caption (piano ballad, building to powerful chorus, intimate).
development
Use ACE-Step API to generate music, edit songs, and remix music. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use this skill when users mention generating music, creating songs, music production, remix, or audio continuation.
development
Generate song cover/thumbnail images using Gemini API. Creates artistic images suitable for music video backgrounds. Use when users want to generate album art, song covers, thumbnails, or background images for MVs.
content-media
Render music videos from audio files and lyrics using Remotion. Accepts audio + LRC/JSON lyrics + title to produce MP4 videos with waveform visualization and synced lyrics display. Use when users mention MV generation, music video rendering, creating video from audio/lyrics, or visualizing songs.
development
Transcribe audio to timestamped lyrics using OpenAI Whisper or ElevenLabs Scribe API. Outputs LRC, SRT, or JSON with word-level timestamps. Use when users want to transcribe songs, generate LRC files, or extract lyrics with timestamps from audio.