skills/bgm-prompter/SKILL.md
MUST read this skill BEFORE entering generate mode for music tasks. Covers prompt crafting framework, structure syntax, and multi-clip strategy.
npx skillsauth add tusosos/manus-knowledge-base bgm-prompterInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When crafting music generation prompts, adopt the mindset of a world-class music arranger and producer. Think holistically about the entire piece — its emotional arc, sonic palette, and structural flow — before writing the prompt. Your goal is to translate the user's vision into a single, cohesive musical blueprint that the model can execute in one take whenever possible.
Each call to the music generation tool produces a single audio file with a maximum duration of ~184 seconds (approx. 3 minutes).
The decision to use single-call vs multi-clip is based SOLELY on duration:
Good prompts are descriptive and clear. The prompt must be a single, continuous text string. Construct your prompt by combining the following 9 dimensions. Be descriptive and specific, using adjectives and adverbs to paint a clear sonic picture.
electronic dance, classical, jazz, ambient, 8-bit, cinematic, lo-fifast tempo, slow ballad, 120 BPM, driving beat, syncopated rhythm, gentle waltzin D minor key, in the key of C majorenergetic, melancholy, peaceful, tensepiano, synthesizer, acoustic guitar, string orchestra, electronic drumssparse arrangement, dense layers, warm dark tones, bright crisp tonesstarts with a solo piano, then strings enter, crescendo into a powerful chorus). For more precise control, you can optionally use timestamp cues [mm:ss - mm:ss] and Intensity parameters Intensity: X/10 (Level) — see the Detailed Structure Example below.
starts with a solo piano, then strings enter at the halfway point, solo piano from 0-8s, strings enter at 8s, drums join at 16srain falling, city nightlife, underwater feel, large hall reverb, tight room reverb, wide stereo image, intimate close-mic feel, distant and far away, as if playing in the next roomhigh-quality production, clean mix, vintage recording, raw demo feel, studio dry sound, live concert hall recording, outdoor open-air feelNote on Vocals: The model supports vocal generation for songs. If the user explicitly wants background music without vocals, you MUST append Instrumental only, no vocals to the prompt.
To ensure the model accurately follows your instructions, especially regarding duration, always structure your prompt in this specific order:
Instrumental only, no vocals. Create a 60-second track at 80 BPM.The feeling is nostalgic, introspective, and atmospheric. The sound should be centered around a warm Fender Rhodes...[0:00 - 0:12] Intro... [0:48 - 1:00] Outro...When the user requests a specific structure or duration, use the Arrangement/Structure dimension to write a detailed script using timestamps and intensity markers.
Example:
Instrumental only, no vocals. Create a 60-second track at 80 BPM. The feeling is nostalgic, introspective, and atmospheric - a warm, comforting melancholy with a soft, minor-key feel. The sound should be centered around a warm, slightly overdriven Fender Rhodes and soft, ethereal synth pads. The rhythm is a minimalist, laid-back drum beat with a relaxed, human feel. Weave subtle atmospheric textures, like soft static or room tone, through the entire track for texture.
[0:00 - 0:12] Intro: Begin atmospherically with just the Fender Rhodes playing soft, hazy chords. Drench it in warm reverb and introduce a light atmospheric texture. The mood is like a memory coming into focus. Intensity: 1/10 (Very Low)
[0:12 - 0:24] Verse 1: The laid-back drum beat enters with a simple kick and snare. A soft, ethereal synth pad swells in the background. A clean, subtle sub-bass joins, adding depth. The Rhodes melody becomes slightly more defined, following a simple, melancholic progression. Intensity: 3/10 (Low)
[0:24 - 0:36] Build: The groove deepens as a gentle, syncopated hi-hat is added. A simple, memorable lead melody appears, played on a warm, rounded synth. This section should feel like the gentle peak of the track's focus, with a chord progression that builds a sense of hopeful tension. Intensity: 5/10 (Medium)
[0:36 - 0:48] Chorus: Gracefully pull back the intensity. The synth lead melody fades out, returning focus to the core Rhodes groove and the drums. This gives the track space to breathe, resolving the tension from the build. Intensity: 4/10 (Medium-Low)
[0:48 - 1:00] Outro: The drums and bass drop out completely. The track fades out leaving only the Rhodes playing spacious chords, the lingering synth pad, and the persistent atmospheric texture. Intensity: 2/10 (Very Low)
To specify elements to exclude from the music, describe what you want to discourage the model from generating directly in your main prompt using explicit negative phrasing.
negative_prompt: "drums, fast tempo""Ensure there are no drums or percussion. Avoid fast tempos." or "A drumless, percussion-free ambient track..."Categories of elements commonly excluded:
no drums, no percussion, no vocalsno complex melodies, no sudden dynamic changes, no fast runsavoid dark mood, avoid aggressive energyTo emulate specific parameter controls, use these prompt translations:
sparse arrangement, minimal layers, lots of space between notesdense, busy arrangement with many overlapping layers and fillsbright, crisp tones, emphasizing high frequencies and presenceEnsure there are no drums, no percussion, no beat, no rhythm section to the promptOnly bass and drums, rhythm section only. No melody, no chords, no harmony, no piano, no guitar, no strings, no synth pads.When the user's request exceeds the single-call limit (~184 seconds), generate multiple independent clips and concatenate them. Think like a professional arranger to make them sound like a cohesive song: Plan the entire song structure first, then write prompts for each clip.
Plan the song's progression (e.g., Intro → Verse → Chorus → Bridge → Outro). Determine which musical elements define the song's core identity (DNA) and which elements drive the narrative forward. Divide the total duration into logical chunks of up to ~180s each.
Category A: Always Lock (Identical across all clips) These elements are the song's DNA. Changing them will cause jarring transitions.
Category B: Default Lock, Intentional Vary These elements are usually locked, but can be changed if the arrangement plan specifically calls for it.
Category C: Should Vary (Different across clips) These elements drive the song's narrative.
The core principle is Precise State Alignment: use timestamp cues to ensure the musical state at the end of one clip exactly matches the musical state at the beginning of the next clip. This means matching instrumentation, energy level, and dynamic intensity.
Avoiding Instrumentation & Loudness Jumps:
intimate, gentle arrangement vs full, powerful arrangement).Prompt design rules for each clip type:
Use ffmpeg via the shell tool to apply crossfades between the clips. With precise state alignment via timestamps, shorter crossfades (0.5-1s) are usually sufficient. Calculate the crossfade duration to align with the beat grid based on the BPM (e.g., at 120 BPM, 1 beat = 0.5s, so a 2-beat crossfade = 1.0s).
tools
Download video and audio from YouTube and other platforms with yt-dlp. Use when a user asks to download YouTube videos, extract audio from videos, download playlists, get subtitles, download specific formats or qualities, batch download, archive channels, extract metadata, embed thumbnails, download from social media platforms (Twitter, Instagram, TikTok), or build media ingestion pipelines. Covers format selection, audio extraction, playlists, subtitles, metadata, and automation.
development
Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.
development
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
development
Use when you have a spec or requirements for a multi-step task, before touching code