skills/cinematic-video-gen/SKILL.md
AI视频生成质量增强引擎。解决AI视频角色僵硬、表情空白、动作机械的问题。通过高级prompt工程+角色卡+运镜指令+表情库,让AI视频角色活起来。触发词:视频质量、角色僵硬、灵动、表情、动作、cinematic video、视频生成增强、MV视频、角色动画
npx skillsauth add aaaaqwq/agi-super-team cinematic-video-genInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
AI视频生成(doubao-seedance / kling / sora / veo)的通病:
| 情绪 | 英文指令 | 中文意境 | |------|---------|---------| | 忧伤 | "eyes glistening with unshed tears, lips slightly trembling, a bittersweet smile forming" | 眼含泪光,唇微颤,苦笑 | | 怅然 | "gazing into the distance with a melancholic longing, wind tousling her hair, lost in thought" | 远望含愁,风拂发丝 | | 温柔 | "a warm gentle smile spreading across her face, eyes crinkling at the corners, radiating kindness" | 温暖微笑,眼角弯弯 | | 坚定 | "jaw set with quiet determination, eyes blazing with inner strength, unwavering gaze" | 下巴微抬,目光坚定 | | 惊喜 | "eyes widening with wonder, mouth forming a soft 'oh', face illuminated with sudden joy" | 眼睛一亮,嘴微张 | | 释然 | "shoulders dropping with relief, a peaceful exhale visible, serene contentment" | 肩膀放松,释然微笑 |
使用规则:每个 clip 的 prompt 必须包含至少一个表情指令。
| 部位 | 微动作 | 大动作 | |------|--------|--------| | 手 | "fingers gracefully tracing the air" / "hands gently clasping petals" | "arms sweeping wide open" / "reaching toward the light" | | 头 | "head tilting slightly" / "chin lifting gently" | "turning to look over her shoulder" / "head thrown back in laughter" | | 身体 | "shoulders swaying subtly to the rhythm" / "body swaying gently" | "spinning slowly with dress flowing" / "leaning into the wind" | | 脚 | "shifting weight from one foot to another" | "walking gracefully through the petals" / "twirling on tiptoe" |
使用规则:每个 clip 至少 1 个微动作 + 1 个大动作(大场景时)。
| 运镜 | 角色配合 | 效果 | |------|---------|------| | slow push-in | 角色慢慢转向镜头 | 亲密感、情感拉近 | | tracking shot | 角色漫步,镜头跟随 | 自然流畅、叙事感 | | crane up | 角色抬头仰望,镜头上升 | 壮阔感、精神升华 | | orbit 360° | 角色原地轻转,镜头环绕 | 空间感、梦幻感 | | handheld close-up | 角色微表情特写 | 纪实感、情感冲击 | | pull back reveal | 角色静止,镜头拉远 | 孤独感、渺小感 |
使用规则:prompt 中必须指定运镜方式 + 角色对该运镜的自然反应。
每个 MV/视频项目需要一个 角色卡,在所有 clip prompt 中嵌入:
[CHARACTER_ANCHOR]
Name: Chloe (克洛伊)
Appearance: A young woman in her early 20s, long flowing black hair with subtle cherry blossom pink tips,
wearing an ethereal white dress with delicate floral embroidery, slender figure, porcelain skin
Signature features: Always has a small cherry blossom hairpin on the left side,
eyes are deep brown with a hint of melancholy
Build: Petite, graceful, dancer-like posture
[/CHARACTER_ANCHOR]
使用规则:
相邻 clip 之间的动作衔接:
| 衔接类型 | 指令 | |---------|------| | 延续动作 | "continuing her graceful turn from the previous moment" | | 情绪转变 | "her expression shifts from wonder to quiet contentment" | | 位置移动 | "having moved from the center to the edge of the frame" | | 时间流逝 | "a moment later, petals have settled around her feet" |
[角色锚点: Chloe, long black hair with pink tips, white floral dress, cherry blossom hairpin]
[运镜] tracking shot following [角色] as she [动作] through [场景].
[主体动作] She [大动作], her [身体部位] [微动作].
[表情] [表情指令库中的具体描述].
[场景/氛围] [场景细节], [光影], [色调].
[动态连贯] [与上一个clip的衔接].
cinematic quality, 4K, shallow depth of field, natural motion blur, film grain
Before(僵硬版):
"Slowly rising from deep darkness, a dim silhouette of a young woman emerges..."
After(灵动版):
"Chloe, a young woman with long flowing black hair tipped in soft pink, wearing an ethereal white floral dress, her cherry blossom hairpin catching the first rays of light. Slow crane-up shot: she lifts her chin slowly, eyes opening with quiet wonder as warm golden light begins to illuminate her face from below. Her fingers uncurl gracefully at her sides, shoulders relaxing as if releasing a deep breath she's been holding. Her lips part slightly, a soft expression of dawning hope crossing her features. Darkness gently dissolves around her like morning mist. Cinematic chiaroscuro, warm amber and gold palette, 4K, shallow depth of field, natural motion blur."
graceful > smooth > natural > fluidclose-up + soft lighting = 表情感生成后逐条检查:
scripts/generate_cinematic_prompt.sh输入:场景描述 + 角色卡 + 上下文 → 输出:优化后的 prompt
scripts/quality_check.py输入:生成的 mp4 → 输出:质量评分(运动量/表情变化/连贯性)
cinematic-video-gen (prompt增强 + 口型歌词)
→ qingyun-api / video_generate (实际生成)
→ video-lyrics-subtitle (字幕生成 + 烧录)
→ lyrics-video-sync (歌词卡点匹配)
→ ffmpeg-video-editor (后期合成)
→ av-sync-workflow (音视频节拍同步)
以下是一首 MV 从零到成片的完整 pipeline,所有步骤使用现有 skill 工具链。
所有素材存放在规范目录结构中:
~/clawd/projects/MediaClaw/output/<project-name>/
├── lyrics/lyrics.txt # 歌词原文
├── audio/song_final.mp3 # 最终版音频
├── script/mv_script.json # MV脚本(场景+镜头列表)
├── frames/ # 参考帧图片
│ ├── 01_first.png # 每个场景的首帧
│ ├── 01_last.png # 每个场景的尾帧
│ └── ...
└── script/chloe-card.json # 角色卡
工具: video-lyrics-subtitle
python3 ~/clawd/skills/video-lyrics-subtitle/scripts/generate_srt.py \
lyrics/lyrics.txt \
--audio audio/song_final.mp3 \
-o subtitles/lyrics.srt
输出:每句歌词的精确起止时间(SRT 格式)
⚠️ 关键步骤!确保歌词/音频/视频三者同步。 跳过此步骤将导致:音频从头播放、字幕时间轴错位、视频和音频完全不对齐。
根据 edit_plan.json 中每个 Phase 的时间范围,从完整素材中截取对应的分段:
python3 ~/clawd/projects/MediaClaw/skills/cinematic-video-gen/scripts/extract_phase.py \
--audio audio/song_final.mp3 \
--srt subtitles/lyrics.srt \
--start 208.0 \
--end 237.0 \
--phase 1 \
--outdir .
输出:
audio/phaseN_audio.m4a — 分段音频(精确对应视频时间范围)subtitles/phaseN_lyrics.srt — 分段字幕(时间轴从 00:00:00,000 开始)lyrics/phaseN_lyrics.txt — 歌词文本片段SRT 时间轴重置逻辑:
00:00:00,000 开始工具: lyrics-video-sync
将 MV 脚本中的场景/镜头与歌词时间轴对齐,确定:
输出:edit_plan.json(每clip的歌词+时间+场景)
工具: cinematic-video-gen (本 skill)
对每个 clip,自动读取:
frames/XX_first.png输出:每个 clip 的完整 prompt
工具: qingyun-api (doubao-seedance)
python3 ~/clawd/skills/qingyun-api/scripts/qingyun-video-doubao-local.py \
--prompt "<Step 3 生成的完整prompt>" \
--first-frame frames/01_first.png \
--model doubao-seedance-1-5-pro-251215 \
--output video/clips/01_scene1.mp4
关键:--first-frame 确保角色一致性(base64 本地文件)
每个 clip 约 5 秒,按场景依次生成。
工具: video-lyrics-subtitle
# SRT → ASS(KTV 高亮效果)
python3 ~/clawd/skills/video-lyrics-subtitle/scripts/generate_ass.py \
subtitles/lyrics.srt \
--style ktv \
-o subtitles/lyrics.ass
工具: ffmpeg-video-editor + video-lyrics-subtitle
🔴 必须使用 Step 1.5 提取的分段素材,不是完整音频/字幕!
- ✅ 用
audio/phaseN_audio.m4a(分段音频)- ❌ 不要用
audio/song_final.mp3(完整音频)- ✅ 用
subtitles/phaseN_lyrics.srt(分段字幕,从 00:00 开始)- ❌ 不要用
subtitles/lyrics.srt(完整字幕)
# 6a. FFmpeg xfade 拼接所有 clips + 分段音频 → 无字幕版
~/tools/ffmpeg -i clip1.mp4 -i clip2.mp4 ... \
-i audio/phase1_audio.m4a \
-filter_complex "xfade chain..." \
-map "[vout]" -map N:a \
video/phase1_raw.mp4
# 6b. 生成分段 ASS 字幕(从分段 SRT 转换)
python3 ~/clawd/skills/video-lyrics-subtitle/scripts/generate_ass.py \
subtitles/phase1_lyrics.srt \
--style ktv \
-o subtitles/phase1_lyrics.ass
# 6c. 用 skill 脚本烧录字幕 → 最终成片
bash ~/clawd/skills/video-lyrics-subtitle/scripts/burn_subtitles.sh \
video/phase1_raw.mp4 \
subtitles/phase1_lyrics.ass \
video/phase1_final.mp4
lyrics.txt + song.mp3
↓
[Step 1] generate_srt.py → lyrics.srt (歌词时间轴)
↓
[Step 1.5] extract_phase.py → phaseN_audio.m4a + phaseN_lyrics.srt + phaseN_lyrics.txt
↓
[Step 2] lyrics-video-sync → edit_plan.json (场景×歌词映射)
↓
[Step 3] cinematic-video-gen → 6维prompt (含口型歌词指令)
↓
[Step 4] qingyun-video-doubao-local.py → clips/*.mp4 (首帧参考生成)
↓
[Step 5] generate_ass.py → phaseN_lyrics.ass (从分段SRT转换)
↓
[Step 6a] FFmpeg xfade → phaseN_raw.mp4 (拼接 + 分段音频)
↓
[Step 6b] burn_subtitles.sh → phaseN_final.mp4 (烧录分段字幕)
↓
🎬 最终成片 (歌词/音频/视频三者严格同步)
每个 clip 生成后立即检查:
不达标 → 调整 prompt 重生成(同一 clip 最多重试 3 次)
1. doubao-seedance-1-5-pro-251215 ← 首选,支持首帧参考
2. kling (xingjiabiapi) ← 备选,动作生成强
3. veo3.1 (Google AI Studio) ← 备选,质量高但渠道不稳定
AI视频生成时角色唱歌但嘴巴不动,没有对上歌词,看起来像假唱。
每个 clip 的 prompt 必须包含对应时间段的歌词文本+口型描述:
[角色锚点 + 运镜 + 动作 + 表情]
She is singing the lyrics: "桜が散る季節に" (在樱花飘落的季节)
Her mouth forms each syllable naturally, lips moving gracefully with the melody.
She sings with emotion, her voice carrying the weight of the words.
[场景/氛围 + 质量后缀]
| 歌唱状态 | 英文指令 | |---------|----------| | 正常演唱 | "singing softly, lips forming each syllable with gentle precision" | | 高音爆发 | "belting out a powerful note, mouth open wide, jaw dropped slightly" | | 气声轻唱 | "singing in a breathy whisper, lips barely parting, intimate and close" | | 停顿换气 | "a brief pause, catching her breath, lips closing softly before the next phrase" | | 情感爆发 | "voice cracking with emotion, singing through tears, raw and vulnerable" | | 吟唱/哼鸣 | "humming softly with closed lips, a gentle melody vibrating through her" |
每个 clip 的时间范围必须在 lyrics.json 中找到对应歌词行,然后:
She is singing: "[原文]" ([中文翻译])She moves gracefully to the music, no singing in this momentBetween phrases, she takes a breath, a moment of quiet before the next note[Character: Chloe, long black hair with pink tips, white floral dress, cherry blossom hairpin]
[运镜] shot of [Chloe] as she [动作] through [场景].
[主体动作] She [大动作], her [身体部位] [微动作].
[表情] [表情指令].
[歌唱] She is singing: "[对应歌词原文]" ([中文翻译]).
Her lips form each syllable naturally, [口型状态描述].
She pours emotion into every word, [情感力度].
[场景/氛围] [场景细节], [光影], [色调].
[动态连贯] [与上一个clip的衔接].
cinematic quality, 4K, shallow depth of field, natural motion blur, film grain
[Character: Chloe, long black hair with pink tips, white floral dress, cherry blossom hairpin]
Slow dissolve shot: Chloe's silhouette gradually transforms into floating cherry blossom petals.
She is singing the final line: "桜が散る季節に、また逢いましょう" (在樱花飘落的季节,我们再相见)
Her lips softly form each syllable as her body dissolves, singing with bittersweet acceptance.
A gentle smile plays on her face even as she fades, eyes glistening with unshed tears.
The last words linger in the air as petals carry her voice away.
Warm golden light, soft pink and white palette, dreamy atmosphere.
Transitioning from the previous aerial dawn shot into this ethereal dissolution.
cinematic quality, 4K, shallow depth of field, natural motion blur, film grain
lyrics-video-sync (歌词时间轴)
→ cinematic-video-gen (生成带歌词的prompt)
→ qingyun-video-doubao-local.py (首帧参考 + prompt生成视频)
→ ffmpeg-video-editor (合成 + 音频混合)
当 lyrics-video-sync 输出 edit_plan.json 后,
cinematic-video-gen 自动读取每个 clip 对应的歌词行,
生成完整 prompt(含口型指令),然后送入视频生成 API。
创建日期: 2026-04-10 更新日期: 2026-04-10 — 新增维度6:口型歌词指令
development
Technology-agnostic prompt generator that creates customizable AI prompts for scanning codebases and identifying high-quality code exemplars. Supports multiple programming languages (.NET, Java, JavaScript, TypeScript, React, Angular, Python) with configurable analysis depth, categorization methods, and documentation formats to establish coding standards and maintain consistency across development teams.
tools
Expert-level browser automation, debugging, and performance analysis using Chrome DevTools MCP. Use for interacting with web pages, capturing screenshots, analyzing network traffic, and profiling performance.
data-ai
Prompt for creating detailed feature implementation plans, following Epoch monorepo structure.
tools
Interactive prompt refinement workflow: interrogates scope, deliverables, constraints; copies final markdown to clipboard; never writes code. Requires the Joyride extension.