skills/minimax-music-gen/SKILL.md
Use when user wants to generate music, songs, or audio tracks. Triggers on any request involving music creation, song writing, lyrics generation, audio production, or covers. Also triggers when user provides lyrics and wants them turned into a song, or describes a mood/scene and wants background music. Supports multilingual triggers — match equivalent phrases in any language. Do NOT use for music playback of existing files, music theory questions, or music recommendation without generation.
npx skillsauth add MiniMax-AI/skills minimax-music-genInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate songs (vocal or instrumental) using the MiniMax Music API. Supports two creation modes: Basic (one-sentence-in, song-out) and Advanced Control (edit lyrics, refine prompt, plan before generating).
mmx CLI (required): Music generation uses the mmx command-line tool.
Check if installed:
command -v mmx && mmx --version || echo "mmx not found"
Install (requires Node.js):
npm install -g mmx-cli
Authenticate (first time only):
mmx auth login --api-key <your-minimax-api-key>
The API key can be obtained from MiniMax Platform.
Credentials are saved to ~/.mmx/credentials.json and persist across sessions.
Verify:
mmx quota show
Audio player (recommended): mpv, ffplay, or afplay (macOS built-in) for local
playback. mpv is preferred for its interactive controls.
This skill uses the mmx CLI for all music generation:
Music Generation: mmx music generate — model: music-2.6-free
--lyrics-optimizer to auto-generate lyrics from prompt--instrumental for instrumental tracks--lyrics for user-provided lyrics--genre, --mood, --vocals, --instruments, --bpm, --key, --tempo, --structure, --referencesCover: mmx music cover — model: music-cover-free
--audio-file <path> or --audio <url>--prompt describes the target cover styleAgent flags: Always add --quiet --non-interactive when calling mmx from agents.
Pipeline:
User description -> mmx music generate --lyrics-optimizer -> MP3User description -> mmx music generate --instrumental -> MP3Source audio + style -> mmx music cover -> MP3All generated music is saved to ~/Music/minimax-gen/. Create the directory if it doesn't
exist. Files are named with a timestamp and a short slug derived from the prompt:
YYYYMMDD_HHMMSS_<slug>.mp3
Detect the user's language from their first message and respond in that language for the entire session. This applies to all interaction text, questions, confirmations, and feedback prompts.
User-facing text localization rule:
Lyrics language rule:
Parse the user's message to determine:
If ambiguous, ask using this decision tree:
Q1: What type of music?
- Vocal (with lyrics)
- Instrumental (no vocals)
- Cover
Q2: Creation mode?
- Basic — one-line description, auto-generate
- Advanced — edit lyrics, refine prompt, plan
If the user gives a clear one-liner like "make me a sad piano piece", skip the questions — infer instrumental + basic mode and proceed.
Goal: User provides a short description, the skill auto-generates everything, then calls the API.
Expand the description into a prompt: Take the user's one-liner and expand it into a rich music prompt. Refer to the Prompt Writing Guide appendix at the end of this document for style vocabulary, genre/instrument references, and prompt structure. The API prompt should always be written in English for best generation quality, regardless of the user's language.
Follow this pattern:
A [mood] [BPM optional] [genre] song, featuring [vocal description],
about [narrative/theme], [atmosphere], [key instruments and production].
Show the user a preview before generating. Translate all labels AND the prompt description into the user's language. The English prompt is only used internally when calling the API — the user should never see it. Example template (English reference — localize everything at runtime):
About to generate:
Type: Vocal / Instrumental
Description: indie folk, melancholy, acoustic guitar, gentle female voice
Lyrics: Auto-generated (--lyrics-optimizer)
Confirm? (press enter to confirm, or tell me what to change)
Call mmx: Generate the music directly.
Goal: User has full control over every parameter before generation.
Lyrics phase:
--lyrics to mmx.--lyrics-optimizer to auto-generate.--lyrics.Prompt phase:
Advanced planning (optional, offer but don't force):
Final confirmation: Show complete parameter summary, then generate.
Generate music using the mmx CLI:
Vocal with auto-generated lyrics:
mmx music generate \
--prompt "<prompt>" \
--lyrics-optimizer \
--genre "<genre>" --mood "<mood>" --vocals "<vocal style>" \
--instruments "<instruments>" --bpm <bpm> \
--out ~/Music/minimax-gen/<filename>.mp3 \
--quiet --non-interactive
Vocal with user-provided lyrics:
mmx music generate \
--prompt "<prompt>" \
--lyrics "<lyrics with section markers>" \
--genre "<genre>" --mood "<mood>" --vocals "<vocal style>" \
--out ~/Music/minimax-gen/<filename>.mp3 \
--quiet --non-interactive
Instrumental (no vocal):
mmx music generate \
--prompt "<prompt>" \
--instrumental \
--genre "<genre>" --mood "<mood>" --instruments "<instruments>" \
--out ~/Music/minimax-gen/<filename>.mp3 \
--quiet --non-interactive
Use structured flags (--genre, --mood, --vocals, --instruments, --bpm, --key,
--tempo, --structure, --references, --avoid, --use-case) to give the API
fine-grained control instead of cramming everything into --prompt.
Display a progress indicator while waiting. Typical generation takes 30-120 seconds.
After generation, detect an available audio player and play the file.
Detect player:
command -v mpv || command -v ffplay || command -v afplay
Play based on detected player (in priority order):
| Player | Command | Controls |
|--------|---------|----------|
| mpv (preferred) | mpv --no-video ~/Music/minimax-gen/<filename>.mp3 | space = pause/resume, q = quit, left/right = seek |
| ffplay | ffplay -nodisp -autoexit ~/Music/minimax-gen/<filename>.mp3 | q = quit |
| afplay (macOS) | afplay ~/Music/minimax-gen/<filename>.mp3 | Ctrl+C = stop |
| None found | Do not attempt playback | Show file path only |
After starting playback, tell the user (localize all text):
Now playing: <filename>.mp3
Saved to: ~/Music/minimax-gen/<filename>.mp3
Do NOT show playback controls (e.g. keyboard shortcuts) — they don't work in this environment since the player runs in the background.
If no player is found (localize all text):
No audio player detected.
File saved to: ~/Music/minimax-gen/<filename>.mp3
Tip: Install mpv for the best playback experience (brew install mpv).
After playback, ask for feedback:
How was this song?
1. Love it, keep it!
2. Not quite, adjust and regenerate
3. Fine-tune lyrics/style then regenerate
4. Don't want it, start over
Based on feedback:
_v1 suffix for comparison.Generate a cover version of a song based on reference audio. Model: music-cover-free.
Reference audio requirements: mp3, wav, flac — duration 6s to 6min, max 50MB. If no lyrics are provided, the original lyrics are extracted via ASR automatically.
When the user selects Cover mode:
Cover from local file:
mmx music cover \
--prompt "<cover style description>" \
--audio-file <source.mp3> \
--out ~/Music/minimax-gen/<filename>.mp3 \
--quiet --non-interactive
Cover from URL:
mmx music cover \
--prompt "<cover style description>" \
--audio <source_url> \
--out ~/Music/minimax-gen/<filename>.mp3 \
--quiet --non-interactive
With custom lyrics (text):
mmx music cover \
--prompt "<style>" \
--audio-file <source.mp3> \
--lyrics "<custom lyrics>" \
--out ~/Music/minimax-gen/<filename>.mp3 \
--quiet --non-interactive
With custom lyrics (file):
mmx music cover \
--prompt "<style>" \
--audio-file <source.mp3> \
--lyrics-file <lyrics.txt> \
--out ~/Music/minimax-gen/<filename>.mp3 \
--quiet --non-interactive
| Flag | Description |
|------|-------------|
| --seed <number> | Random seed 0-1000000 for reproducible results |
| --channel <n> | 1 (mono) or 2 (stereo, default) |
| --format <fmt> | mp3 (default), wav, pcm |
| --sample-rate <hz> | Sample rate (default: 44100) |
| --bitrate <bps> | Bitrate (default: 256000) |
Proceed with normal playback and feedback flow (Step 4 & 5).
| Error | Action |
|-------|--------|
| mmx not found | npm install -g mmx-cli |
| mmx auth error (exit code 3) | mmx auth login |
| Quota exceeded (exit code 4) | Report quota limit, suggest waiting or upgrading |
| API timeout (exit code 5) | Retry once, then report failure |
| Content filter (exit code 10) | Adjust prompt to avoid filtered content |
| Invalid lyrics format | Auto-fix section markers, warn user |
| No audio player found | Save file and tell user the path, suggest installing mpv |
| Network error | Show error detail, suggest checking connection |
[verse], [chorus], [bridge],
[outro], [intro]. Always include them when providing --lyrics.~/Music/minimax-gen/ has more than 50 files, suggest cleanup
when starting a new session.--genre, --mood, --vocals, --instruments,
--bpm etc. over embedding everything in --prompt. This gives the API better control.See references/prompt_guide.md for the complete prompt writing guide, including genre/vocal/instrument references and BPM tables.
tools
Analyze, describe, and extract information from images using the MiniMax vision MCP tool. Use when: user shares an image file path or URL (any message containing .jpg, .jpeg, .png, .gif, .webp, .bmp, or .svg file extension) or uses any of these words/phrases near an image: "analyze", "analyse", "describe", "explain", "understand", "look at", "review", "extract text", "OCR", "what is in", "what's in", "read this image", "see this image", "tell me about", "explain this", "interpret this", in connection with an image, screenshot, diagram, chart, mockup, wireframe, or photo. Also triggers for: UI mockup review, wireframe analysis, design critique, data extraction from charts, object detection, person/animal/activity identification. Triggers: any message with an image file extension (jpg, jpeg, png, gif, webp, bmp, svg), or any request to analyze/describ/understand/review/extract text from an image, screenshot, diagram, chart, photo, mockup, or wireframe.
development
Comprehensive GLSL shader techniques for creating stunning visual effects — ray marching, SDF modeling, fluid simulation, particle systems, procedural generation, lighting, post-processing, and more.
development
React Native and Expo development guide covering components, styling, animations, navigation, state management, forms, networking, performance optimization, testing, native capabilities, and engineering (project structure, deployment, SDK upgrades, CI/CD). Use when: building React Native or Expo apps, implementing animations or native UI, managing state, fetching data, writing tests, optimizing performance, deploying to App Store/Play Store, setting up CI/CD, upgrading Expo SDK, or configuring Tailwind/NativeWind.
data-ai
Generate, edit, and read PowerPoint presentations. Create from scratch with PptxGenJS (cover, TOC, content, section divider, summary slides), edit existing PPTX via XML workflows, or extract text with markitdown. Triggers: PPT, PPTX, PowerPoint, presentation, slide, deck, slides.