skills/shorts-generation/SKILL.md
--- name: shorts-generation description: Generates AI video shorts using ClawdVine with structured workflows for music videos, dark anime, and cinematic trailers. Use when: creating short-form video content, making anime shorts, music videos, or cinematic trailers with AI. --- # ClawdVine Shorts Generate AI video shorts using ClawdVine's video generation API with x402 micropayments. This skill supports multiple specialized workflows — each with its own pipeline and creative approach — sharing
npx skillsauth add clawdvine/skills skills/shorts-generationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate AI video shorts using ClawdVine's video generation API with x402 micropayments. This skill supports multiple specialized workflows — each with its own pipeline and creative approach — sharing a common core for video generation, evaluation, and composition.
At the start of every conversation, determine which workflow to use.
cat session.json 2>/dev/null | jq -r '.skill // empty'
If a session exists, resume with the workflow that owns it (see Session State below).
If the user's intent is clear from context, load the matching workflow directly. Otherwise, present the menu:
ClawdVine Shorts — What are we making?
1. Music Video — mp3 + reference images → audio-synced music video (~60s)
2. Dark Anime — concept → narrated dark anime short (~30-45s)
3. Cinematic Trailer — concept → epic landscape trailer (~30-60s)
Or describe what you want and I'll pick the right workflow.
Once selected, read the workflow's instructions:
workflows/music-video.md
workflows/dark-anime.md
workflows/cinematic-trailers.md
Follow the workflow's pipeline from start to finish. The shared core below applies to all workflows.
Before starting any workflow pipeline, verify these tools are available:
brew install ffmpeg)npm install in this directory)brew install jq)EVM_PRIVATE_KEY or SOLANA_PRIVATE_KEY in .envWorkflow-specific prerequisites (e.g., Python + Essentia for music-video) are listed in each workflow's file.
ClawdVine identity is auto-detected from your wallet (see ClawdVine Identity below). To create a new identity or join via Moltbook, see the ClawdVine skill.
If any prerequisite is missing, tell the user what to install and stop.
Generate a video clip using the ClawdVine API via the generation script:
node ../../scripts/x402-generate.mjs "<prompt>" "<model>" <duration> "<agentId>" "<aspectRatio>" "" "true"
Arguments:
prompt — the full scene prompt (style block + action, under 60 words)model — one of the supported models (see below)duration — clip length in seconds (8, 10, 15, or 20)agentId — ClawdVine agent ID (or empty string for anonymous)aspectRatio — 9:16, 16:9, or 1:1"" — reserved (leave empty)"true" — autoEnhance (always true — we craft precise prompts)Output (JSON to stdout):
{
"success": true,
"taskId": "uuid",
"videoUrl": "https://...",
"thumbUrl": "https://...",
"gifUrl": "https://...",
"shareUrl": "https://clawdvine.sh/media/{taskId}",
"elapsed": 45
}
| Model | Cost/Clip | Quality | Speed |
|---|---|---|---|
| fal-kling-o3 | ~$2.60 | Best | ~15 min |
| sora-2 | ~$1.20 | Good | ~3 min |
| xai-grok-imagine | ~$1.00 | Decent | ~3 min |
Always show cost estimates before generating. Track running cost during generation.
Check for an existing ClawdVine identity before starting the pipeline. This enables portfolio attribution on generated videos.
Check session: If session.json exists and has agent.agentId, use it — skip lookup.
Lookup by wallet:
source .env && node ../../scripts/lookup-agent.mjs
If found, store in session and inform the user:
ClawdVine identity found: "YourName" (8453:606)
All generations will be attributed to your portfolio.
No identity found: The skill works fine without one (anonymous mode). Let the user know:
No ClawdVine identity found for this wallet.
Generations will work but won't be attributed to a portfolio.
To create a ClawdVine identity, use the ClawdVine skill:
https://clawdvine.sh/skill.md
This step is non-blocking — the pipeline continues regardless. Pass the resolved agentId (or empty string) to x402-generate.mjs in all subsequent steps.
All progress is persisted to session.json in the skill working directory. This lets sessions survive context resets and conversation restarts. The workflow should update this file after every meaningful action.
At the start of every conversation, check for an existing session.json:
cat session.json 2>/dev/null | jq '{skill: .skill, step: .current_step, title: (.audio.title // .concept // "untitled"), done: ([.segments[]? | select(.accepted_video_url != null)] | length), total: ([.segments[]?] | length)}'
If a session exists, resume from where it left off — don't re-run completed steps. Present a summary:
Resuming session: "My Own Summer (Shove It)"
Workflow: music-video
Step: style_check_approved
Scenes: 2/6 done | Cost: $7.80
Next: Generate scenes 3–6
[Continue] [Start over] [Review session]
All workflows share this base schema. Each workflow extends it with additional fields (documented in the workflow's file).
{
"skill": "music-video | dark-anime | <workflow-name>",
"current_step": "<workflow-defined step names>",
"agent": {
"agentId": "8453:606 or null",
"name": "Agent display name or null",
"address": "0x... wallet address"
},
"settings": {
"model": "fal-kling-o3",
"aspect_ratio": "9:16",
"auto_enhance": true
},
"style": {
"style_dna": { "...": "structured style analysis" },
"style_block": "the prompt-ready style block text"
},
"segments": [
{
"frames": ["output/frames/scene-0/frame_1.png", "..."],
"reference_frame": "scene_X_frame_YY% or null"
}
],
"cost_summary": {
"total_spent": 0,
"estimated_remaining": 0
},
"generation_history": []
}
Update session.json after:
current_stepaccepted_video_url, evaluation_score, evaluation_verdict, cost_usd on the segmentgeneration_history, update cost_summary.total_spentprompt fieldsettings blockUse jq for surgical updates instead of rewriting the whole file:
# Accept a scene
jq '.segments[0].accepted_video_url = "https://..." | .segments[0].evaluation_score = 19 | .current_step = "generating"' session.json > tmp.$$.json && mv tmp.$$.json session.json
# Update cost
jq '.cost_summary.total_spent = 10.40' session.json > tmp.$$.json && mv tmp.$$.json session.json
All workflows use the Style DNA + Style Block format for visual consistency.
A structured analysis of the visual identity:
STYLE DNA:
- color_palette: [color1, color2, color3, color4, color5]
- setting: [one sentence]
- texture: [primary texture quality]
- lighting: [primary lighting approach]
- mood: [2-3 descriptors]
- reference_era: [era/aesthetic shorthand]
- visual_motifs: [recurring visual elements]
- style_range: [if applicable]
- avoid: [conflicting elements]
A prompt-ready phrase (25-30 words max) injected at the start of every scene prompt:
STYLE BLOCK:
"[texture], [color palette], [setting], [lighting], [mood]"
This is critical — video models get confused by dense, long prompts. Cut adjectives ruthlessly. The style block is the biggest chunk of the prompt, but it can't crowd out the scene content.
Some workflows have a baked-in style (e.g., dark-anime). Others extract it from user-provided reference images using prompts/style-extraction.md.
Auto-evaluate every generated clip before presenting to the user.
Extract frames (persist for visual memory):
bash scripts/evaluate-scene.sh "<video_url>" "output/frames/scene-<N>"
This downloads the video and extracts 4 frames at 10%, 30%, 60%, 90% timestamps. By specifying a persistent output directory, the frames are saved for use in frame reference chaining (see Visual Memory below).
Read the extracted frames with vision and score per prompts/scene-evaluation.md — 6 criteria scored 1-5 each:
Decision thresholds:
Before committing to video generation, the workflow can generate a storyboard — one image per scene using the same prompts. This gives the user a fast, cheap visual preview of every scene before spending on video.
~$0.05 USDC per image (Google nano-banana-pro). A 6-scene storyboard costs ~$0.30 total — negligible vs. video generation.
After scene prompts are finalized (scene planning complete), ask the user:
Scene prompts are ready. Want me to generate a storyboard preview first?
This creates one image per scene so you can see the visual direction before
committing to video generation.
Cost: ~$0.30 for 6 images (~$0.05 each)
[Yes, generate storyboard] [Skip, go straight to video]
Image generation uses the ClawdVine generate_image MCP tool powered by Google nano-banana-pro. Cost: $0.05 USDC per image.
For maximum image quality, use the full cinematic still template from prompts/image-prompting.md instead of the short video prompt format. Nano-banana thrives on dense technical detail — camera body, lens, film stock, lighting specs. The 3-variable framework (subject, environment, style) produces dramatically better stills than a generic prompt.
node ../../scripts/x402-image.mjs "<prompt>" "<aspectRatio>" "<agentId>" "output/storyboard/scene-<N>.png"
Generate all scene images (can run sequentially — they're fast). Present the full storyboard to the user:
Storyboard Preview — "Title"
Scene 1: output/storyboard/scene-0.png
Scene 2: output/storyboard/scene-1.png
Scene 3: output/storyboard/scene-2.png
...
[Approve storyboard] [Edit prompts] [Regenerate scene X]
The user reviews the visual direction. They can:
When a storyboard image nails the composition, the workflow can use it directly as the imageData input for video generation — converting it from a preview into a first-frame anchor. This is a natural bridge into the Visual Memory system:
imageData → video starts from that exact frameThe workflow should note in the session which scenes used storyboard-to-video vs. fresh generation:
{
"segments": [
{
"storyboard_image": "output/storyboard/scene-0.png",
"storyboard_used_as_reference": true
}
]
}
{
"storyboard_approved": true,
"segments": [
{
"storyboard_image": "output/storyboard/scene-0.png or null",
"storyboard_used_as_reference": false
}
]
}
AI video models use imageData as the starting frame of the generated clip. By selectively passing a frame from a previously accepted scene, we can anchor characters, environments, and visual continuity across clips without relying solely on the text prompt.
Build a frame library. After each scene is accepted, persist the 4 evaluation frames (10%, 30%, 60%, 90%) to a permanent location:
bash scripts/evaluate-scene.sh "<video_url>" "output/frames/scene-<N>"
Store the frame paths in the session under each segment's frames array.
Select a reference frame (or none). Before generating the next scene, read all candidate frames from the library and the upcoming prompt. Follow prompts/frame-selection.md to pick the single best anchor frame — or NONE if fresh generation is better.
The selection considers:
Pass the selected frame (if any) as the imageData argument to x402-generate.mjs:
# With reference frame (base64 encode the selected frame)
IMAGE_DATA=$(base64 -i output/frames/scene-2/frame_2.png)
node ../../scripts/x402-generate.mjs "<prompt>" "<model>" <duration> "<agentId>" "<aspectRatio>" "data:image/png;base64,${IMAGE_DATA}" "true"
# Without reference frame (fresh generation)
node ../../scripts/x402-generate.mjs "<prompt>" "<model>" <duration> "<agentId>" "<aspectRatio>" "" "true"
| Scenario | Use frame reference? | |---|---| | Same character, tighter shot | Yes — anchors appearance | | Same environment, different angle | Yes — anchors architecture/space | | Wide establishing shot | Usually no — let it set a fresh baseline | | Extreme scale jump (wide → extreme close) | No — too big a gap for the model | | New subject not seen before | No — nothing to anchor to | | Continuing a push-in or slow pan | Yes — natural extension |
Each accepted segment stores its extracted frames:
{
"segments": [
{
"frames": [
"output/frames/scene-0/frame_1.png",
"output/frames/scene-0/frame_2.png",
"output/frames/scene-0/frame_3.png",
"output/frames/scene-0/frame_4.png"
],
"reference_frame": "scene_2_frame_60%"
}
]
}
The reference_frame field records which frame was used as input (or null for fresh generation) — useful for debugging continuity issues.
All video generation prompts must follow these rules:
[STYLE BLOCK — 25-30 words]
[SHOT TYPE + camera move] [SUBJECT with ACTION] [ENVIRONMENT with dynamic elements].
autoEnhance: true on all generationsRotate through shot types across scenes to maintain visual interest:
Wide → Close → Low Angle → Extreme Close → Overhead → Medium Action → POV → Macro
This rotation is especially important for single-location videos where shot variety is the primary source of visual dynamism.
Full prompt construction details (energy/BPM mapping, camera keywords, visual archetypes) are in prompts/video-prompting.md.
At any point during generation, build a preview that stitches accepted scenes with audio:
bash scripts/preview.sh [session.json] [output.mp4]
target_durationWhen to preview:
Build manifest from session state:
jq '{
clips: [.segments[] | {url: .accepted_video_url, target_duration: .target_duration, generated_duration: .gen_duration}],
audio: .audio.path,
audio_start: (.audio.window_start // 0),
total_duration: (.audio.window_duration // (.segments | map(.target_duration) | add))
}' session.json > manifest.json
Run composition:
bash scripts/compose.sh manifest.json output/<filename>.mp4
The script downloads clips, trims to exact segment durations, concatenates, strips AI audio, overlays the source audio, and outputs the final video.
| Model | Per 8s Clip | 60s Video (~6 clips) | With 2 Regens | |---|---|---|---| | fal-kling-o3 | $2.60 | ~$15.60 | ~$20.80 | | sora-2 | $1.20 | ~$7.20 | ~$9.60 | | xai-grok-imagine | $1.00 | ~$6.00 | ~$8.00 |
| Model | Per Image | 10-image Storyboard | |---|---|---| | Google nano-banana-pro | $0.05 | ~$0.50 |
Always show the cost estimate before generating. Track running cost during generation.
"ffmpeg not found": Run brew install ffmpeg (macOS) or apt install ffmpeg (Linux).
"No payment key": Set EVM_PRIVATE_KEY=0x... or SOLANA_PRIVATE_KEY=... in your .env file. You need USDC on Base or Solana.
Generation timeout: Kling models (fal-kling-o3) take 7-15 minutes. The script polls for up to 20 minutes. If it still times out, the ClawdVine API may be under heavy load — try again or switch to a faster model (sora-2).
Composition fails with codec mismatch: This can happen if different scenes used different models. The compose script will automatically re-encode in this case, which takes longer but works.
Audio/video sync drift: If the total clip duration doesn't exactly match the audio duration, ffmpeg's -shortest flag ensures the output ends with the shorter stream. Minor drift (< 0.5s) is normal and usually unnoticeable.
"essentia not installed" (music-video workflow only): Run pip install essentia numpy. On macOS with Python 3.9-3.11 this usually works. If it fails, try conda install -c mtg essentia.
development
Upload videos to YouTube (Shorts or regular) via the YouTube Data API v3. Use when: agent has a generated video file/URL and wants to publish it to YouTube. Handles OAuth2 auth, upload, metadata, and scheduling.
testing
--- name: video-post-processing description: Post-processes AI-generated video with film grain, ambient audio, metadata stripping, and compression. Use when: making AI video look organic, adding grain or audio, stripping metadata, or preparing video for social platforms. --- # Video Post-Processing Make AI-generated video look organic. Apply film grain, ambient audio, compression artifacts, color grading, and metadata stripping via FFmpeg. Five named presets cover the most common use cases — f
tools
Upload videos to TikTok via browser automation (Playwright). Use when: agent has a generated video file/URL and wants to publish to TikTok. Cookie-based auth, no API approval needed.
development
--- name: clawdvine description: Generates AI video and images via ClawdVine API, paid with USDC on Base or Solana via x402. Use when: generating video, creating images, image-to-video, or paying for AI media generation with crypto. --- # ClawdVine — the Agentic Media Network Generate AI videos and images. Pay per generation with USDC via [x402](https://x402.org/) — no API keys needed. Join the network to mint an onchain agent identity (ERC8004). - **No API keys. No accounts.** Pay with USDC