skills/kling-3-prompting/SKILL.md
Write better prompts for Kling 3.0 AI video generation. Use when the user wants to create, write, improve, or refine prompts — text-to-video, image-to-video, keyframes, multi-shot sequences, or dialogue scenes.
npx skillsauth add aedev-tools/kling-3-prompting-skill kling-3-promptingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Kling 3.0 is a unified multimodal video model. It understands cinematic direction, not keyword lists. Write prompts like a director — describe what the audience sees, hears, and feels over time.
Core shift: Description → Direction. Think "direct a scene" not "describe an image."
When invoked, guide the user through these steps using AskUserQuestion:
digraph builder {
"1. Generation mode?" [shape=diamond];
"Text-to-Video" [shape=box];
"Image-to-Video" [shape=box];
"Multi-Shot Sequence" [shape=box];
"Keyframe Transition" [shape=box];
"2. Gather scene details" [shape=box];
"3. Assemble prompt" [shape=box];
"4. Present & refine" [shape=box];
"1. Generation mode?" -> "Text-to-Video";
"1. Generation mode?" -> "Image-to-Video";
"1. Generation mode?" -> "Multi-Shot Sequence";
"1. Generation mode?" -> "Keyframe Transition";
"Text-to-Video" -> "2. Gather scene details";
"Image-to-Video" -> "2. Gather scene details";
"Multi-Shot Sequence" -> "2. Gather scene details";
"Keyframe Transition" -> "2. Gather scene details";
"2. Gather scene details" -> "3. Assemble prompt";
"3. Assemble prompt" -> "4. Present & refine";
}
Ask the user which mode:
Ask about each element (adapt questions to mode):
| Element | Question | Why it matters | |---------|----------|----------------| | Subject | Who/what is the focus? Specific appearance details? | Anchors consistency — define distinguishing traits early | | Action | What happens? Describe the timeline (first → then → finally) | Kling 3.0 excels at sequential action over 15s arcs | | Environment | Where? Be specific (not "a street" but "narrow Tokyo alley, steam from grates") | Grounds the scene physically | | Camera | Shot type and movement? (See camera reference below) | Cinematic language produces far better results | | Lighting | What light sources? Name them specifically | "Flickering neon" beats "dramatic lighting" | | Mood/Emotion | What should the audience feel? | Drives color grade, pacing, music | | Audio | Dialogue? Ambient sound? Music? | Kling 3.0 generates native audio + lip-sync | | Duration | How long? (3-15s) | Longer = describe progression over time | | Aspect Ratio | 16:9 / 9:16 / 1:1 / 21:9? | 16:9 cinematic, 9:16 social, 21:9 ultra-wide |
Image-to-Video: Focus on how the scene evolves from the image — movement, camera motion, environmental change. The model preserves identity/layout from the source.
Keyframes: Ask for start and end frame descriptions. Frames should match in color, style, and lighting. Prompt sparingly — Kling infers motion well.
Multi-Shot: Define each shot separately with its own framing, subject, action, and duration. Label shots explicitly.
Use the Master Formula:
[Scene/Environment] + [Subject & Appearance] + [Action Timeline] + [Camera Movement] + [Audio & Atmosphere] + [Technical Specs]
Writing rules:
Present the assembled prompt. Ask if they want to:
| Movement | Effect | Example phrase | |----------|--------|---------------| | Dolly push-in | Builds intimacy/tension | "slow dolly push-in toward her face" | | Dolly zoom | Vertigo/dramatic reveal | "dolly zoom creating disorienting depth shift" | | Tracking shot | Follows subject laterally | "camera tracks alongside as she walks" | | Whip-pan | Energy/surprise | "whip-pan to reveal the door" | | Crash zoom | Shock/emphasis | "sudden crash zoom on the object" | | Rack focus | Shift attention | "rack focus from foreground hand to background figure" | | Handheld/shoulder-cam | Raw/documentary feel | "handheld shoulder-cam with subtle sway" | | Static tripod | Composed/observational | "locked-off static tripod, wide shot" | | FPV drone | High-energy immersion | "dynamic FPV drone shot chasing through corridor" | | Low-angle tracking | Heroic/imposing | "low-angle tracking shot, subject towers above" | | Truck left/right | Lateral reveal | "camera trucks right revealing the cityscape" | | Tilt up/down | Vertical reveal | "slow tilt up from boots to face" |
| Phrase | Effect | |--------|--------| | "Shot on 35mm film" | Warm grain, organic texture | | "Macro 85mm lens" | Tight detail, shallow depth of field | | "Wide-angle steadicam" | Smooth, immersive, spatial | | "Handheld camcorder" | Raw VHS energy, nostalgic | | "Anamorphic lens flare" | Cinematic horizontal streaks |
Use specific sources, not adjectives:
| Rule | Do | Don't |
|------|-----|-------|
| Name characters | [Character A: Silver-haired CEO] | [Man] says... |
| Anchor to action | Agent slams table. [Agent, angrily]: "Where is it?" | Just dialogue without visual action |
| Assign voice tone | [CEO, deep authoritative gravelly voice] | Generic "says" |
| Control timing | "Immediately," "Pause," "After a beat" | Back-to-back dialogue without transitions |
Shot 1 (0-5s): [Wide establishing shot description]
Shot 2 (5-10s): [Medium/close-up with action progression]
Shot 3 (10-15s): [Resolution/reaction with camera payoff]
Atmosphere: [Overall mood, color grade]
Audio: [Sound design, music, dialogue]
Label every shot. Assign durations. Describe framing + subject + motion per shot.
Use to prevent common AI defaults:
smiling, laughing, cartoonish, bright saturated colors, low resolution,
morphing, blurry text, disfigured hands, extra fingers, static pose,
frozen expression, stock photo aesthetic
Customize based on scene — remove items that conflict with your intent.
| Element | Weak | Strong | |---------|------|--------| | Camera | "Camera follows person" | "Handheld shoulder-cam drifts behind subject with subtle sway" | | Subject | "A woman walking" | "Woman in red dress, heels clicking wet cobblestone" | | Environment | "In a city" | "Narrow Tokyo alley, steam from grates, glowing vending machines" | | Lighting | "Dramatic lighting" | "Flickering neon casting magenta/cyan across wet pavement" | | Texture | "It looks realistic" | "Rain beading on leather jacket, condensation on glass, visible breath" | | Motion | "She walks away" | "She turns slowly, hair catches light, disappears around corner" |
| Mistake | Fix | |---------|-----| | Keyword lists instead of scene direction | Write like directing a shot: subject + action + camera + environment | | Vague motion ("moves," "goes") | Use cinematic verbs: dolly, track, whip-pan, crash zoom | | Generic lighting ("dramatic") | Name the source: neon, candle, golden hour, LED panel | | Overlong prompts | 1-3 rich sentences per shot; specificity > length | | No temporal progression | Describe beginning → middle → end of the shot | | Mismatched keyframes | Match color, lighting, and style between start/end frames | | Unattributed dialogue | Label every speaker with name, tone, and emotion | | Cramming multi-shot into one paragraph | Separate and label each shot with duration |
documentation
Fetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
documentation
Maintain the OpenClaw memory wiki vault with deterministic pages, managed blocks, and source-backed updates.
documentation
Feishu knowledge base navigation. Activate when user mentions knowledge base, wiki, or wiki links.
documentation
Feishu permission management for documents and files. Activate when user mentions sharing, permissions, collaborators.