skills/video-generator/SKILL.md
Professional AI video production workflow. Use when creating videos, short films, commercials, or any video content using AI generation tools.
npx skillsauth add tusosos/manus-knowledge-base video-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Before starting, memorize these non-negotiable rules:
[PHASE 1 STOP] MUST ask questions to gather information. DO NOT assume or guess missing details—always ask the user. Never proceed without explicit user confirmation.
[DETAILED VIDEO PROMPT] Video prompts must include detailed transition_description (2-4 sentences). One-line prompts are insufficient.
[KEYFRAME DIFFERENCE] Last keyframe must show interpolatable change from first keyframe: subject position/pose, subject state (open/close, appear/disappear), or composition change. Subtle-only changes (lighting, background) while subject stays static cause unnatural video motion.
[PHASE 4 MANDATORY] MUST generate reference images before keyframes. Never skip Phase 4.
[ASPECT RATIO] ALL keyframes must use 16:9 or 9:16, and must be upright (not rotated). Never generate 1:1 or other ratios.
[NO TTS FOR ON-SCREEN] Never use TTS for on-screen dialogue or singing. Video model generates audio with lip sync.
[NARRATION CLIP BY CLIP] Generate off-screen narration separately for each clip, not all at once.
[AUDIO MIXING] When combining audio tracks (video audio, narration, BGM), preserve ALL tracks—overlay, never replace. Narration must be clearly audible and maintain consistent volume across all clips.
| Tool | Use When |
|------|----------|
| generate_image | Create new images (with or without references) |
| generate_image_variation | Edit existing images |
| Field | Description | |-------|-------------| | Purpose | Goal and target audience | | Narrative arc | Story structure and key points | | Duration | Total length in seconds | | Aspect ratio | 16:9 or 9:16 only | | Visual style | Sub-genre aesthetic (e.g., "Makoto Shinkai anime", "Pixar 3D") | | Reference materials | Reference videos, images, brand guidelines | | Language | For dialogue and narration | | Recurring elements | Characters/objects with appearance descriptions | | Dialogue/singing needs | On-screen character audio | | Narration needs | Off-screen narrator (gender, tone, pace) |
Use these perspectives to guide your questions:
| Dimension | Expert Role | Key Questions | |-----------|-------------|---------------| | Strategy & Audience | Creative Director | Who is this for? What's the goal? What action should viewers take? | | Narrative & Structure | Screenwriter | What's the story? Key moments? Emotional arc? | | Visual Style | Director + Art Director | What look and feel? Reference videos/images? Color mood? | | Shot Execution | Cinematographer | Any specific shots in mind? Product hero shots needed? | | Sound Design | Sound Designer | Voiceover? Music mood? Dialogue? Sound effects? |
Ask questions across all dimensions. Prioritize based on user's initial description.
[MANDATORY STOP - DO NOT PROCEED WITHOUT USER CONFIRMATION] Summarize gathered information and wait for user confirmation before Phase 2.
Define these 4 dimensions (applied to primary reference images in Phase 4):
| Dimension | Example Values | |-----------|----------------| | Sub-genre | Makoto Shinkai anime, Pixar 3D, cyberpunk noir | | Rendering + Line | 2D hand-drawn with thick outlines, 3D cel-shading | | Color + Lighting | High saturation neon, soft diffused natural light | | Detail density | Minimalist, highly detailed backgrounds |
Example specification:
Sub-genre: Cyberpunk anime
Rendering + Line: 2D digital painting, thin glowing outlines
Color + Lighting: High saturation neon (pink, cyan, purple), dark backgrounds, rim lighting
Detail density: Highly detailed backgrounds, moderate character detail
For each character/object:
| Field | Description | |-------|-------------| | unique_identifier | Name for reference | | appearance | Text description for prompts | | outfit_description | Clothing/accessories (characters) | | language | Spoken/sung language (if applicable) | | mechanical_properties | Physical behavior (if applicable) |
| Scenario | BGM Source | |----------|------------| | Music video / diegetic music (visible source) | Embedded (in video prompt) | | Background mood music | Separate (Phase 5 BGM Preparation) | | No music | None |
If Separate, define: genre, instruments, tempo
| Field | Values | |-------|--------| | narrative_purpose | establish / develop / climax / resolve / transition / supplementary (product shot, detail, reaction, insert, B-roll, POV) | | pacing | slow / moderate / fast | | scene | Environment description | | content_action | Subject + action + trajectory | | transition_description | [REQUIRED] Detailed transition process. Must include: subject appearance, movement trajectory, state changes, existence statements. 2-4 sentences minimum. | | duration | 4 / 6 / 8 | | camera_movement | static / pan / tilt / dolly / zoom / crane / arc / handheld | | first_keyframe_framing | Shot size + angle + composition | | first_keyframe_visible_content | What's visible | | last_keyframe_framing | Shot size + angle + composition | | last_keyframe_visible_content | What's visible | | last_keyframe_edit_from_first | yes / no (see decision table below) | | inter_clip_boundary | continuous / scene_cut | | first_keyframe_reuse | yes / no | | last_keyframe_required | yes / no | | on_screen_dialogue | "Name: text" or "Name: [lyrics] (style)" or None | | sound_effects | Sources or None | | bgm_source | embedded / separate / none | | bgm_cue | If embedded: style, BPM, instruments. If separate: emotion, intensity | | narration_cue | Narrator text or None |
inter_clip_boundary = continuous → next clip's first_keyframe_reuse = yesfirst_keyframe_reuse = yes → previous clip must have last_keyframe_required = yesWhen planning last_keyframe_visible_content, ensure interpolatable change from first_keyframe_visible_content:
[WARNING] Avoid last keyframes with only lighting or background changes while subject remains static—this causes unnatural video motion.
| Camera Movement | First & Last Keyframe Overlap? | Set to |
|-----------------|-------------------------------|--------|
| static, small pan/tilt, zoom | Yes (same scene area) | yes |
| large pan, dolly, tracking, crane, arc | No (different area) | no |
This field directly becomes part of the video prompt. The more detailed, the better.
Must include:
Length guideline: 2-4 sentences minimum. One-line descriptions are insufficient.
| Insufficient | Sufficient | |--------------|------------| | "Open box revealing jar" | "The frosted glass jar with gold lid is inside the box from the start, hidden by the closed cream-colored lid. Elegant hands with manicured nails lift the lid upward smoothly. As the lid rises, the jar gradually comes into view - first the gold cap edge, then the full jar nestled in champagne velvet." | | "Person walks left to right" | "Woman in white dress with brown hair starts at left edge of frame, walks steadily rightward at moderate pace, maintaining upright posture, reaches right edge by end of clip." | | "Light turns on" | "Room starts in complete darkness. Light gradually increases from the ceiling fixture at center, warm yellow glow spreading outward across the wooden furniture until fully illuminated." |
| Movement | Constraint | |----------|------------| | Pan/Tilt/Zoom | Camera fixed, content within rotational/zoom range | | Dolly/Tracking/Crane | Content physically traversable within duration | | Arc | Subject centered in both keyframes, environment allows orbit | | Handheld | Similar to Dolly but allows irregularity | | Combined | Must satisfy ALL involved movement constraints |
Common Mistakes:
| Mistake | Correction | |---------|------------| | "Pan from corridor entrance to middle" | Use "dolly forward" | | First: room A, Last: room B | Split into two clips | | 6-second clip covering 100 meters | Extend duration or reduce distance |
After all clips planned, list required reference images:
| Element | Clips Using It | Required Images | |---------|----------------|-----------------| | (name) | Clip X (MS), Clip Y (CU) | Full body, Face close-up |
[WARNING] Only generate what clips actually need. Do NOT generate all angles by default.
MANDATORY. Do not skip to Phase 5.
Step 1: Primary reference (visual anchor)
generate_image (no references)Step 2: Additional angles/shots
generate_image with primary reference as reference[WARNING] Never generate additional refs without using primary ref as reference.
[CRITICAL] ALL keyframes: aspect ratio from Phase 1 (16:9 or 9:16). Never 1:1.
first_keyframe_reuse = yes → Use previous clip's last keyframe (no generation)
first_keyframe_reuse = no → Generate new keyframe
If generating first keyframe:
generate_imagelast_keyframe_required = no → Skip
last_keyframe_required = yes:
last_keyframe_edit_from_first = yes → Edit mode
last_keyframe_edit_from_first = no → Generate mode
If EDIT mode:
generate_image_variationIf GENERATE mode:
generate_imageWhen generating last keyframe, verify:
Video prompt should be detailed. Even with keyframes, video models may drift during generation.
Prompt includes:
Audio in prompt:
| Type | Include | |------|---------| | On-screen dialogue | "Name says: text" with tone, language | | On-screen singing | "Name sings: [lyrics]" with style, language | | Sound effects | Source + quality | | Embedded BGM | Style, BPM, instruments, mood |
Prompt ending by bgm_source:
Example (music video with embedded BGM):
Hatsune Miku center stage, singing in Japanese with sweet electronic voice:
"ラララ、光の中で踊り出す", energetic J-pop at 140 BPM with synthesizer,
crowd cheering, concert atmosphere
[CRITICAL] Never use TTS for on-screen dialogue/singing. Video model generates audio with lip sync.
Method: Search and download from royalty-free music libraries (e.g., Pixabay, YouTube Audio Library).
[CRITICAL] Generating music with Python or any other tools is strictly prohibited. You must only use pre-existing, royalty-free tracks.
Match the downloaded music to the style defined in Phase 2.
[WARNING] Generate clip by clip, not all at once.
| Type | Method | Output | |------|--------|--------| | On-screen dialogue/singing | Video model | Embedded | | Sound effects | Video model | Embedded | | Embedded BGM | Video model | Embedded | | Separate BGM | Search only | Separate track | | Narration | TTS (clip by clip) | Separate track |
When combining multiple audio sources:
| Track | Source | |-------|--------| | Video audio | Embedded in video clips (dialogue, sound effects, embedded BGM) | | Narration | TTS generated (off-screen narrator) | | Separate BGM | Searched from royalty-free source |
[CRITICAL] Mixing rules:
tools
Download video and audio from YouTube and other platforms with yt-dlp. Use when a user asks to download YouTube videos, extract audio from videos, download playlists, get subtitles, download specific formats or qualities, batch download, archive channels, extract metadata, embed thumbnails, download from social media platforms (Twitter, Instagram, TikTok), or build media ingestion pipelines. Covers format selection, audio extraction, playlists, subtitles, metadata, and automation.
development
Download YouTube videos with customizable quality and format options. Use this skill when the user asks to download, save, or grab YouTube videos. Supports various quality settings (best, 1080p, 720p, 480p, 360p), multiple formats (mp4, webm, mkv), and audio-only downloads as MP3.
development
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
development
Use when you have a spec or requirements for a multi-step task, before touching code