Name: skills/shorts-generation
Author: clawdvine

ClawdVine Shorts

Generate AI video shorts using ClawdVine's video generation API with x402 micropayments. This skill supports multiple specialized workflows — each with its own pipeline and creative approach — sharing a common core for video generation, evaluation, and composition.

Workflow Selection

At the start of every conversation, determine which workflow to use.

Check for existing session

cat session.json 2>/dev/null | jq -r '.skill // empty'

If a session exists, resume with the workflow that owns it (see Session State below).

New session

If the user's intent is clear from context, load the matching workflow directly. Otherwise, present the menu:

ClawdVine Shorts — What are we making?

  1. Music Video — mp3 + reference images → audio-synced music video (~60s)
  2. Dark Anime — concept → narrated dark anime short (~30-45s)
  3. Cinematic Trailer — concept → epic landscape trailer (~30-60s)

  Or describe what you want and I'll pick the right workflow.

Once selected, read the workflow's instructions:

workflows/music-video.md
workflows/dark-anime.md
workflows/cinematic-trailers.md

Follow the workflow's pipeline from start to finish. The shared core below applies to all workflows.

Prerequisites

Before starting any workflow pipeline, verify these tools are available:

ffmpeg installed and on PATH (brew install ffmpeg)
Node.js 18+ with dependencies installed (npm install in this directory)
jq for JSON parsing in shell (brew install jq)
A wallet with USDC — set EVM_PRIVATE_KEY or SOLANA_PRIVATE_KEY in .env

Workflow-specific prerequisites (e.g., Python + Essentia for music-video) are listed in each workflow's file.

ClawdVine identity is auto-detected from your wallet (see ClawdVine Identity below). To create a new identity or join via Moltbook, see the ClawdVine skill.

If any prerequisite is missing, tell the user what to install and stop.

ClawdVine API

Video Generation

Generate a video clip using the ClawdVine API via the generation script:

node ../../scripts/x402-generate.mjs "<prompt>" "<model>" <duration> "<agentId>" "<aspectRatio>" "" "true"

Arguments:

prompt — the full scene prompt (style block + action, under 60 words)
model — one of the supported models (see below)
duration — clip length in seconds (8, 10, 15, or 20)
agentId — ClawdVine agent ID (or empty string for anonymous)
aspectRatio — 9:16, 16:9, or 1:1
"" — reserved (leave empty)
"true" — autoEnhance (always true — we craft precise prompts)

Output (JSON to stdout):

{
  "success": true,
  "taskId": "uuid",
  "videoUrl": "https://...",
  "thumbUrl": "https://...",
  "gifUrl": "https://...",
  "shareUrl": "https://clawdvine.sh/media/{taskId}",
  "elapsed": 45
}

Supported Models

| Model | Cost/Clip | Quality | Speed | |---|---|---|---| | fal-kling-o3 | ~$2.60 | Best | ~15 min | | sora-2 | ~$1.20 | Good | ~3 min | | xai-grok-imagine | ~$1.00 | Decent | ~3 min |

Always show cost estimates before generating. Track running cost during generation.

ClawdVine Identity

Check for an existing ClawdVine identity before starting the pipeline. This enables portfolio attribution on generated videos.

Check session: If session.json exists and has agent.agentId, use it — skip lookup.

Lookup by wallet:

source .env && node ../../scripts/lookup-agent.mjs

If found, store in session and inform the user:

ClawdVine identity found: "YourName" (8453:606)
All generations will be attributed to your portfolio.

No identity found: The skill works fine without one (anonymous mode). Let the user know:

No ClawdVine identity found for this wallet.
Generations will work but won't be attributed to a portfolio.

To create a ClawdVine identity, use the ClawdVine skill:
https://clawdvine.sh/skill.md

This step is non-blocking — the pipeline continues regardless. Pass the resolved agentId (or empty string) to x402-generate.mjs in all subsequent steps.

Session State

All progress is persisted to session.json in the skill working directory. This lets sessions survive context resets and conversation restarts. The workflow should update this file after every meaningful action.

Loading a session

At the start of every conversation, check for an existing session.json:

cat session.json 2>/dev/null | jq '{skill: .skill, step: .current_step, title: (.audio.title // .concept // "untitled"), done: ([.segments[]? | select(.accepted_video_url != null)] | length), total: ([.segments[]?] | length)}'

If a session exists, resume from where it left off — don't re-run completed steps. Present a summary:

Resuming session: "My Own Summer (Shove It)"
  Workflow: music-video
  Step: style_check_approved
  Scenes: 2/6 done | Cost: $7.80
  Next: Generate scenes 3–6

  [Continue] [Start over] [Review session]

Base session schema

All workflows share this base schema. Each workflow extends it with additional fields (documented in the workflow's file).

{
  "skill": "music-video | dark-anime | <workflow-name>",
  "current_step": "<workflow-defined step names>",

  "agent": {
    "agentId": "8453:606 or null",
    "name": "Agent display name or null",
    "address": "0x... wallet address"
  },

  "settings": {
    "model": "fal-kling-o3",
    "aspect_ratio": "9:16",
    "auto_enhance": true
  },

  "style": {
    "style_dna": { "...": "structured style analysis" },
    "style_block": "the prompt-ready style block text"
  },

  "segments": [
    {
      "frames": ["output/frames/scene-0/frame_1.png", "..."],
      "reference_frame": "scene_X_frame_YY% or null"
    }
  ],

  "cost_summary": {
    "total_spent": 0,
    "estimated_remaining": 0
  },

  "generation_history": []
}

Updating session state

Update session.json after:

Step completion: Update current_step
Scene accepted: Set accepted_video_url, evaluation_score, evaluation_verdict, cost_usd on the segment
Scene rejected/revised: Add entry to generation_history, update cost_summary.total_spent
Prompt edited: Update the segment's prompt field
Settings changed: Update settings block

Use jq for surgical updates instead of rewriting the whole file:

# Accept a scene
jq '.segments[0].accepted_video_url = "https://..." | .segments[0].evaluation_score = 19 | .current_step = "generating"' session.json > tmp.$$.json && mv tmp.$$.json session.json

# Update cost
jq '.cost_summary.total_spent = 10.40' session.json > tmp.$$.json && mv tmp.$$.json session.json

Style System

All workflows use the Style DNA + Style Block format for visual consistency.

Style DNA

A structured analysis of the visual identity:

STYLE DNA:
- color_palette: [color1, color2, color3, color4, color5]
- setting: [one sentence]
- texture: [primary texture quality]
- lighting: [primary lighting approach]
- mood: [2-3 descriptors]
- reference_era: [era/aesthetic shorthand]
- visual_motifs: [recurring visual elements]
- style_range: [if applicable]
- avoid: [conflicting elements]

Style Block

A prompt-ready phrase (25-30 words max) injected at the start of every scene prompt:

STYLE BLOCK:
"[texture], [color palette], [setting], [lighting], [mood]"

This is critical — video models get confused by dense, long prompts. Cut adjectives ruthlessly. The style block is the biggest chunk of the prompt, but it can't crowd out the scene content.

Some workflows have a baked-in style (e.g., dark-anime). Others extract it from user-provided reference images using prompts/style-extraction.md.

Scene Evaluation

Auto-evaluate every generated clip before presenting to the user.

Extract frames (persist for visual memory):
```
bash scripts/evaluate-scene.sh "<video_url>" "output/frames/scene-<N>"
```
This downloads the video and extracts 4 frames at 10%, 30%, 60%, 90% timestamps. By specifying a persistent output directory, the frames are saved for use in frame reference chaining (see Visual Memory below).
Read the extracted frames with vision and score per prompts/scene-evaluation.md — 6 criteria scored 1-5 each:
- Prompt Adherence
- Style Match
- Readability
- Visual Dynamism
- Continuity Fit (skip for scene 1)
- Shot Variety (skip for scene 1)
Decision thresholds:
- First scene (out of 20): PASS >= 14, REVIEW 10-13, FAIL < 10
- Subsequent scenes (out of 30): PASS >= 21, REVIEW 15-20, FAIL < 15
- FAIL triggers auto-revision and regeneration before showing the user
- REVIEW notes the issues when presenting

Storyboard Preview (Optional)

Before committing to video generation, the workflow can generate a storyboard — one image per scene using the same prompts. This gives the user a fast, cheap visual preview of every scene before spending on video.

Cost

~$0.05 USDC per image (Google nano-banana-pro). A 6-scene storyboard costs ~$0.30 total — negligible vs. video generation.

When to offer

After scene prompts are finalized (scene planning complete), ask the user:

Scene prompts are ready. Want me to generate a storyboard preview first?
This creates one image per scene so you can see the visual direction before
committing to video generation.

Cost: ~$0.30 for 6 images (~$0.05 each)

[Yes, generate storyboard] [Skip, go straight to video]

How it works

Image generation uses the ClawdVine generate_image MCP tool powered by Google nano-banana-pro. Cost: $0.05 USDC per image.

For maximum image quality, use the full cinematic still template from prompts/image-prompting.md instead of the short video prompt format. Nano-banana thrives on dense technical detail — camera body, lens, film stock, lighting specs. The 3-variable framework (subject, environment, style) produces dramatically better stills than a generic prompt.

node ../../scripts/x402-image.mjs "<prompt>" "<aspectRatio>" "<agentId>" "output/storyboard/scene-<N>.png"

Generate all scene images (can run sequentially — they're fast). Present the full storyboard to the user:

Storyboard Preview — "Title"

Scene 1: output/storyboard/scene-0.png
Scene 2: output/storyboard/scene-1.png
Scene 3: output/storyboard/scene-2.png
...

[Approve storyboard] [Edit prompts] [Regenerate scene X]

The user reviews the visual direction. They can:

Approve → proceed to video generation
Edit prompts → revise scene prompts, regenerate affected storyboard images
Regenerate → re-roll a specific scene image (~$0.05)

Storyboard → Image-to-Video

When a storyboard image nails the composition, the workflow can use it directly as the imageData input for video generation — converting it from a preview into a first-frame anchor. This is a natural bridge into the Visual Memory system:

Storyboard image looks perfect → pass it as imageData → video starts from that exact frame
Storyboard image is close but not right → generate video fresh (text-to-video), rely on style block

The workflow should note in the session which scenes used storyboard-to-video vs. fresh generation:

{
  "segments": [
    {
      "storyboard_image": "output/storyboard/scene-0.png",
      "storyboard_used_as_reference": true
    }
  ]
}

Session schema

{
  "storyboard_approved": true,
  "segments": [
    {
      "storyboard_image": "output/storyboard/scene-0.png or null",
      "storyboard_used_as_reference": false
    }
  ]
}

Visual Memory — Frame Reference Chaining

AI video models use imageData as the starting frame of the generated clip. By selectively passing a frame from a previously accepted scene, we can anchor characters, environments, and visual continuity across clips without relying solely on the text prompt.

How it works

Build a frame library. After each scene is accepted, persist the 4 evaluation frames (10%, 30%, 60%, 90%) to a permanent location:
```
bash scripts/evaluate-scene.sh "<video_url>" "output/frames/scene-<N>"
```
Store the frame paths in the session under each segment's frames array.
Select a reference frame (or none). Before generating the next scene, read all candidate frames from the library and the upcoming prompt. Follow prompts/frame-selection.md to pick the single best anchor frame — or NONE if fresh generation is better.

The selection considers:
- Subject match — does a prior frame contain the character/environment the next scene needs?
- Shot scale compatibility — can the model naturally animate from this composition to the target?
- Frame quality — avoid motion blur, transitional moments, intro/outro artifacts
- Narrative continuity — does reusing this visual strengthen the story?

Pass the selected frame (if any) as the imageData argument to x402-generate.mjs:

# With reference frame (base64 encode the selected frame)
IMAGE_DATA=$(base64 -i output/frames/scene-2/frame_2.png)
node ../../scripts/x402-generate.mjs "<prompt>" "<model>" <duration> "<agentId>" "<aspectRatio>" "data:image/png;base64,${IMAGE_DATA}" "true"

# Without reference frame (fresh generation)
node ../../scripts/x402-generate.mjs "<prompt>" "<model>" <duration> "<agentId>" "<aspectRatio>" "" "true"

When to use vs. skip

| Scenario | Use frame reference? | |---|---| | Same character, tighter shot | Yes — anchors appearance | | Same environment, different angle | Yes — anchors architecture/space | | Wide establishing shot | Usually no — let it set a fresh baseline | | Extreme scale jump (wide → extreme close) | No — too big a gap for the model | | New subject not seen before | No — nothing to anchor to | | Continuing a push-in or slow pan | Yes — natural extension |

Session schema

Each accepted segment stores its extracted frames:

{
  "segments": [
    {
      "frames": [
        "output/frames/scene-0/frame_1.png",
        "output/frames/scene-0/frame_2.png",
        "output/frames/scene-0/frame_3.png",
        "output/frames/scene-0/frame_4.png"
      ],
      "reference_frame": "scene_2_frame_60%"
    }
  ]
}

The reference_frame field records which frame was used as input (or null for fresh generation) — useful for debugging continuity issues.

Prompt Constraints

All video generation prompts must follow these rules:

Total prompt: under 60 words (most critical rule)
Style block: 25-30 words (injected verbatim at the start)
Action description: ~25 words (the remaining budget)

Prompt structure:

[STYLE BLOCK — 25-30 words]
[SHOT TYPE + camera move] [SUBJECT with ACTION] [ENVIRONMENT with dynamic elements].

autoEnhance: true on all generations

Shot Variety

Rotate through shot types across scenes to maintain visual interest:

Wide → Close → Low Angle → Extreme Close → Overhead → Medium Action → POV → Macro

This rotation is especially important for single-location videos where shot variety is the primary source of visual dynamism.

Full prompt construction details (energy/BPM mapping, camera keywords, visual archetypes) are in prompts/video-prompting.md.

Composition

Preview (during generation)

At any point during generation, build a preview that stitches accepted scenes with audio:

bash scripts/preview.sh [session.json] [output.mp4]

Downloads and trims each accepted scene to its target_duration
Inserts black frames for segments not yet generated
Extracts the audio window from the source audio
Muxes audio onto video

When to preview:

After the Style Check Gate (2/N scenes)
After every 2-3 accepted scenes during generation
Before final composition

Final Composition

Build manifest from session state:

jq '{
  clips: [.segments[] | {url: .accepted_video_url, target_duration: .target_duration, generated_duration: .gen_duration}],
  audio: .audio.path,
  audio_start: (.audio.window_start // 0),
  total_duration: (.audio.window_duration // (.segments | map(.target_duration) | add))
}' session.json > manifest.json

Run composition:
```
bash scripts/compose.sh manifest.json output/<filename>.mp4
```
The script downloads clips, trims to exact segment durations, concatenates, strips AI audio, overlays the source audio, and outputs the final video.

Cost Reference

Video Generation

| Model | Per 8s Clip | 60s Video (~6 clips) | With 2 Regens | |---|---|---|---| | fal-kling-o3 | $2.60 | ~$15.60 | ~$20.80 | | sora-2 | $1.20 | ~$7.20 | ~$9.60 | | xai-grok-imagine | $1.00 | ~$6.00 | ~$8.00 |

Image Generation

| Model | Per Image | 10-image Storyboard | |---|---|---| | Google nano-banana-pro | $0.05 | ~$0.50 |

Always show the cost estimate before generating. Track running cost during generation.

Troubleshooting

"ffmpeg not found": Run brew install ffmpeg (macOS) or apt install ffmpeg (Linux).

"No payment key": Set EVM_PRIVATE_KEY=0x... or SOLANA_PRIVATE_KEY=... in your .env file. You need USDC on Base or Solana.

Generation timeout: Kling models (fal-kling-o3) take 7-15 minutes. The script polls for up to 20 minutes. If it still times out, the ClawdVine API may be under heavy load — try again or switch to a faster model (sora-2).

Composition fails with codec mismatch: This can happen if different scenes used different models. The compose script will automatically re-encode in this case, which takes longer but works.

Audio/video sync drift: If the total clip duration doesn't exactly match the audio duration, ffmpeg's -shortest flag ensures the output ends with the shorter stream. Minor drift (< 0.5s) is normal and usually unnoticeable.

"essentia not installed" (music-video workflow only): Run pip install essentia numpy. On macOS with Python 3.9-3.11 this usually works. If it fails, try conda install -c mtg essentia.

ClawdVine Shorts

Workflow Selection

At the start of every conversation, determine which workflow to use.

Check for existing session

cat session.json 2>/dev/null | jq -r '.skill // empty'

If a session exists, resume with the workflow that owns it (see Session State below).

New session

If the user's intent is clear from context, load the matching workflow directly. Otherwise, present the menu:

ClawdVine Shorts — What are we making?

  1. Music Video — mp3 + reference images → audio-synced music video (~60s)
  2. Dark Anime — concept → narrated dark anime short (~30-45s)
  3. Cinematic Trailer — concept → epic landscape trailer (~30-60s)

  Or describe what you want and I'll pick the right workflow.

Once selected, read the workflow's instructions:

workflows/music-video.md
workflows/dark-anime.md
workflows/cinematic-trailers.md

Follow the workflow's pipeline from start to finish. The shared core below applies to all workflows.

Prerequisites

Before starting any workflow pipeline, verify these tools are available:

ffmpeg installed and on PATH (brew install ffmpeg)
Node.js 18+ with dependencies installed (npm install in this directory)
jq for JSON parsing in shell (brew install jq)
A wallet with USDC — set EVM_PRIVATE_KEY or SOLANA_PRIVATE_KEY in .env

Workflow-specific prerequisites (e.g., Python + Essentia for music-video) are listed in each workflow's file.

ClawdVine identity is auto-detected from your wallet (see ClawdVine Identity below). To create a new identity or join via Moltbook, see the ClawdVine skill.

If any prerequisite is missing, tell the user what to install and stop.

ClawdVine API

Video Generation

Generate a video clip using the ClawdVine API via the generation script:

node ../../scripts/x402-generate.mjs "<prompt>" "<model>" <duration> "<agentId>" "<aspectRatio>" "" "true"

Arguments:

prompt — the full scene prompt (style block + action, under 60 words)
model — one of the supported models (see below)
duration — clip length in seconds (8, 10, 15, or 20)
agentId — ClawdVine agent ID (or empty string for anonymous)
aspectRatio — 9:16, 16:9, or 1:1
"" — reserved (leave empty)
"true" — autoEnhance (always true — we craft precise prompts)

Output (JSON to stdout):

{
  "success": true,
  "taskId": "uuid",
  "videoUrl": "https://...",
  "thumbUrl": "https://...",
  "gifUrl": "https://...",
  "shareUrl": "https://clawdvine.sh/media/{taskId}",
  "elapsed": 45
}

Supported Models

| Model | Cost/Clip | Quality | Speed | |---|---|---|---| | fal-kling-o3 | ~$2.60 | Best | ~15 min | | sora-2 | ~$1.20 | Good | ~3 min | | xai-grok-imagine | ~$1.00 | Decent | ~3 min |

Always show cost estimates before generating. Track running cost during generation.

ClawdVine Identity

Check for an existing ClawdVine identity before starting the pipeline. This enables portfolio attribution on generated videos.

Check session: If session.json exists and has agent.agentId, use it — skip lookup.

Lookup by wallet:

source .env && node ../../scripts/lookup-agent.mjs

If found, store in session and inform the user:

ClawdVine identity found: "YourName" (8453:606)
All generations will be attributed to your portfolio.

No identity found: The skill works fine without one (anonymous mode). Let the user know:

No ClawdVine identity found for this wallet.
Generations will work but won't be attributed to a portfolio.

To create a ClawdVine identity, use the ClawdVine skill:
https://clawdvine.sh/skill.md

This step is non-blocking — the pipeline continues regardless. Pass the resolved agentId (or empty string) to x402-generate.mjs in all subsequent steps.

Session State

Loading a session

At the start of every conversation, check for an existing session.json:

cat session.json 2>/dev/null | jq '{skill: .skill, step: .current_step, title: (.audio.title // .concept // "untitled"), done: ([.segments[]? | select(.accepted_video_url != null)] | length), total: ([.segments[]?] | length)}'

If a session exists, resume from where it left off — don't re-run completed steps. Present a summary:

Resuming session: "My Own Summer (Shove It)"
  Workflow: music-video
  Step: style_check_approved
  Scenes: 2/6 done | Cost: $7.80
  Next: Generate scenes 3–6

  [Continue] [Start over] [Review session]

Base session schema

All workflows share this base schema. Each workflow extends it with additional fields (documented in the workflow's file).

{
  "skill": "music-video | dark-anime | <workflow-name>",
  "current_step": "<workflow-defined step names>",

  "agent": {
    "agentId": "8453:606 or null",
    "name": "Agent display name or null",
    "address": "0x... wallet address"
  },

  "settings": {
    "model": "fal-kling-o3",
    "aspect_ratio": "9:16",
    "auto_enhance": true
  },

  "style": {
    "style_dna": { "...": "structured style analysis" },
    "style_block": "the prompt-ready style block text"
  },

  "segments": [
    {
      "frames": ["output/frames/scene-0/frame_1.png", "..."],
      "reference_frame": "scene_X_frame_YY% or null"
    }
  ],

  "cost_summary": {
    "total_spent": 0,
    "estimated_remaining": 0
  },

  "generation_history": []
}

Updating session state

Update session.json after:

Step completion: Update current_step
Scene accepted: Set accepted_video_url, evaluation_score, evaluation_verdict, cost_usd on the segment
Scene rejected/revised: Add entry to generation_history, update cost_summary.total_spent
Prompt edited: Update the segment's prompt field
Settings changed: Update settings block

Use jq for surgical updates instead of rewriting the whole file:

# Accept a scene
jq '.segments[0].accepted_video_url = "https://..." | .segments[0].evaluation_score = 19 | .current_step = "generating"' session.json > tmp.$$.json && mv tmp.$$.json session.json

# Update cost
jq '.cost_summary.total_spent = 10.40' session.json > tmp.$$.json && mv tmp.$$.json session.json

Style System

All workflows use the Style DNA + Style Block format for visual consistency.

Style DNA

A structured analysis of the visual identity:

STYLE DNA:
- color_palette: [color1, color2, color3, color4, color5]
- setting: [one sentence]
- texture: [primary texture quality]
- lighting: [primary lighting approach]
- mood: [2-3 descriptors]
- reference_era: [era/aesthetic shorthand]
- visual_motifs: [recurring visual elements]
- style_range: [if applicable]
- avoid: [conflicting elements]

Style Block

A prompt-ready phrase (25-30 words max) injected at the start of every scene prompt:

STYLE BLOCK:
"[texture], [color palette], [setting], [lighting], [mood]"

This is critical — video models get confused by dense, long prompts. Cut adjectives ruthlessly. The style block is the biggest chunk of the prompt, but it can't crowd out the scene content.

Some workflows have a baked-in style (e.g., dark-anime). Others extract it from user-provided reference images using prompts/style-extraction.md.

Scene Evaluation

Auto-evaluate every generated clip before presenting to the user.

Extract frames (persist for visual memory):
```
bash scripts/evaluate-scene.sh "<video_url>" "output/frames/scene-<N>"
```
This downloads the video and extracts 4 frames at 10%, 30%, 60%, 90% timestamps. By specifying a persistent output directory, the frames are saved for use in frame reference chaining (see Visual Memory below).
Read the extracted frames with vision and score per prompts/scene-evaluation.md — 6 criteria scored 1-5 each:
- Prompt Adherence
- Style Match
- Readability
- Visual Dynamism
- Continuity Fit (skip for scene 1)
- Shot Variety (skip for scene 1)
Decision thresholds:
- First scene (out of 20): PASS >= 14, REVIEW 10-13, FAIL < 10
- Subsequent scenes (out of 30): PASS >= 21, REVIEW 15-20, FAIL < 15
- FAIL triggers auto-revision and regeneration before showing the user
- REVIEW notes the issues when presenting

Storyboard Preview (Optional)

Cost

~$0.05 USDC per image (Google nano-banana-pro). A 6-scene storyboard costs ~$0.30 total — negligible vs. video generation.

When to offer

After scene prompts are finalized (scene planning complete), ask the user:

Scene prompts are ready. Want me to generate a storyboard preview first?
This creates one image per scene so you can see the visual direction before
committing to video generation.

Cost: ~$0.30 for 6 images (~$0.05 each)

[Yes, generate storyboard] [Skip, go straight to video]

How it works

Image generation uses the ClawdVine generate_image MCP tool powered by Google nano-banana-pro. Cost: $0.05 USDC per image.

node ../../scripts/x402-image.mjs "<prompt>" "<aspectRatio>" "<agentId>" "output/storyboard/scene-<N>.png"

Generate all scene images (can run sequentially — they're fast). Present the full storyboard to the user:

Storyboard Preview — "Title"

Scene 1: output/storyboard/scene-0.png
Scene 2: output/storyboard/scene-1.png
Scene 3: output/storyboard/scene-2.png
...

[Approve storyboard] [Edit prompts] [Regenerate scene X]

The user reviews the visual direction. They can:

Approve → proceed to video generation
Edit prompts → revise scene prompts, regenerate affected storyboard images
Regenerate → re-roll a specific scene image (~$0.05)

Storyboard → Image-to-Video

Storyboard image looks perfect → pass it as imageData → video starts from that exact frame
Storyboard image is close but not right → generate video fresh (text-to-video), rely on style block

The workflow should note in the session which scenes used storyboard-to-video vs. fresh generation:

{
  "segments": [
    {
      "storyboard_image": "output/storyboard/scene-0.png",
      "storyboard_used_as_reference": true
    }
  ]
}

Session schema

{
  "storyboard_approved": true,
  "segments": [
    {
      "storyboard_image": "output/storyboard/scene-0.png or null",
      "storyboard_used_as_reference": false
    }
  ]
}

Visual Memory — Frame Reference Chaining

How it works

Build a frame library. After each scene is accepted, persist the 4 evaluation frames (10%, 30%, 60%, 90%) to a permanent location:
```
bash scripts/evaluate-scene.sh "<video_url>" "output/frames/scene-<N>"
```
Store the frame paths in the session under each segment's frames array.
Select a reference frame (or none). Before generating the next scene, read all candidate frames from the library and the upcoming prompt. Follow prompts/frame-selection.md to pick the single best anchor frame — or NONE if fresh generation is better.

The selection considers:
- Subject match — does a prior frame contain the character/environment the next scene needs?
- Shot scale compatibility — can the model naturally animate from this composition to the target?
- Frame quality — avoid motion blur, transitional moments, intro/outro artifacts
- Narrative continuity — does reusing this visual strengthen the story?

Pass the selected frame (if any) as the imageData argument to x402-generate.mjs:

# With reference frame (base64 encode the selected frame)
IMAGE_DATA=$(base64 -i output/frames/scene-2/frame_2.png)
node ../../scripts/x402-generate.mjs "<prompt>" "<model>" <duration> "<agentId>" "<aspectRatio>" "data:image/png;base64,${IMAGE_DATA}" "true"

# Without reference frame (fresh generation)
node ../../scripts/x402-generate.mjs "<prompt>" "<model>" <duration> "<agentId>" "<aspectRatio>" "" "true"

When to use vs. skip

Session schema

Each accepted segment stores its extracted frames:

{
  "segments": [
    {
      "frames": [
        "output/frames/scene-0/frame_1.png",
        "output/frames/scene-0/frame_2.png",
        "output/frames/scene-0/frame_3.png",
        "output/frames/scene-0/frame_4.png"
      ],
      "reference_frame": "scene_2_frame_60%"
    }
  ]
}

The reference_frame field records which frame was used as input (or null for fresh generation) — useful for debugging continuity issues.

Prompt Constraints

All video generation prompts must follow these rules:

Total prompt: under 60 words (most critical rule)
Style block: 25-30 words (injected verbatim at the start)
Action description: ~25 words (the remaining budget)

Prompt structure:

[STYLE BLOCK — 25-30 words]
[SHOT TYPE + camera move] [SUBJECT with ACTION] [ENVIRONMENT with dynamic elements].

autoEnhance: true on all generations

Shot Variety

Rotate through shot types across scenes to maintain visual interest:

Wide → Close → Low Angle → Extreme Close → Overhead → Medium Action → POV → Macro

This rotation is especially important for single-location videos where shot variety is the primary source of visual dynamism.

Full prompt construction details (energy/BPM mapping, camera keywords, visual archetypes) are in prompts/video-prompting.md.

Composition

Preview (during generation)

At any point during generation, build a preview that stitches accepted scenes with audio:

bash scripts/preview.sh [session.json] [output.mp4]

Downloads and trims each accepted scene to its target_duration
Inserts black frames for segments not yet generated
Extracts the audio window from the source audio
Muxes audio onto video

When to preview:

After the Style Check Gate (2/N scenes)
After every 2-3 accepted scenes during generation
Before final composition

Final Composition

Build manifest from session state:

jq '{
  clips: [.segments[] | {url: .accepted_video_url, target_duration: .target_duration, generated_duration: .gen_duration}],
  audio: .audio.path,
  audio_start: (.audio.window_start // 0),
  total_duration: (.audio.window_duration // (.segments | map(.target_duration) | add))
}' session.json > manifest.json

Run composition:
```
bash scripts/compose.sh manifest.json output/<filename>.mp4
```
The script downloads clips, trims to exact segment durations, concatenates, strips AI audio, overlays the source audio, and outputs the final video.

Cost Reference

Video Generation

Image Generation

| Model | Per Image | 10-image Storyboard | |---|---|---| | Google nano-banana-pro | $0.05 | ~$0.50 |

Always show the cost estimate before generating. Track running cost during generation.

Troubleshooting

"ffmpeg not found": Run brew install ffmpeg (macOS) or apt install ffmpeg (Linux).

"No payment key": Set EVM_PRIVATE_KEY=0x... or SOLANA_PRIVATE_KEY=... in your .env file. You need USDC on Base or Solana.

Composition fails with codec mismatch: This can happen if different scenes used different models. The compose script will automatically re-encode in this case, which takes longer but works.

"essentia not installed" (music-video workflow only): Run pip install essentia numpy. On macOS with Python 3.9-3.11 this usually works. If it fails, try conda install -c mtg essentia.

Adoption

clawdvine/skills/shorts-generation

$ install --global

Security Scan Results

SKILL.md

ClawdVine Shorts

Workflow Selection

Check for existing session

New session

Prerequisites

ClawdVine API

Video Generation

Supported Models

ClawdVine Identity

Session State

Loading a session

Base session schema

Updating session state

Style System

Style DNA

Style Block

Scene Evaluation

Storyboard Preview (Optional)

Cost

When to offer

How it works

Storyboard → Image-to-Video

Session schema

Visual Memory — Frame Reference Chaining

How it works

When to use vs. skip

Session schema

Prompt Constraints

Shot Variety

Composition

Preview (during generation)

Final Composition

Cost Reference

Video Generation

Image Generation

Troubleshooting

Related Skills

clawdvine/youtube-upload

clawdvine/skills/video-post-processing

clawdvine/tiktok-upload

clawdvine/skills/clawdvine

clawdvine/skills/shorts-generation

$ install --global

Security Scan Results

SKILL.md

ClawdVine Shorts

Workflow Selection

Check for existing session

New session

Prerequisites

ClawdVine API

Video Generation

Supported Models

ClawdVine Identity

Session State

Loading a session

Base session schema

Updating session state

Style System

Style DNA

Style Block

Scene Evaluation

Storyboard Preview (Optional)

Cost

When to offer

How it works

Storyboard → Image-to-Video

Session schema

Visual Memory — Frame Reference Chaining

How it works

When to use vs. skip

Session schema

Prompt Constraints

Shot Variety

Composition