tools/video/seedance/SKILL.md
Generate videos with ByteDance Seedance 2.0 via inference.sh CLI. Unified model for text-to-video, image-to-video, and reference-to-video with synchronized audio, up to 1080p, 4-15s duration. Pro and Fast variants. Studio variants with private asset library for portrait consistency. Use for: social media videos, music videos, product demos, animated content, AI video with sound. Triggers: seedance, seedance 2, bytedance video, seedance t2v, seedance i2v, seedance r2v, video with audio, seedance 2.0, bytedance seedance, seedance studio
npx skillsauth add inference-sh-8/skills seedanceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Install the belt CLI skill:
npx skills add belt-sh/cli
Generate videos with synchronized audio using ByteDance's Seedance 2.0 via inference.sh CLI.
Requires inference.sh CLI (
belt). Install instructions
belt login
belt app run bytedance/seedance-2-0 --input '{
"prompt": "a jazz band performing in a dimly lit club",
"generate_audio": true
}'
| Model | App ID | Best For |
|-------|--------|----------|
| Seedance 2.0 | bytedance/seedance-2-0 | Best quality, up to 1080p |
| Seedance 2.0 Fast | bytedance/seedance-2-0-fast | Faster generation, up to 720p |
| Seedance 2.0 Studio | bytedance/seedance-2-0-studio | Quality + private asset library for portrait consistency |
| Seedance 2.0 Studio Fast | bytedance/seedance-2-0-studio-fast | Fast + private asset library for portrait consistency |
All models support text-to-video, image-to-video, multimodal reference-to-video, and synchronized audio generation. Studio variants automatically upload reference images to the BytePlus private virtual portrait library for enhanced character consistency - particularly useful for faces and branded characters.
The model determines the generation mode from your inputs. These modes are mutually exclusive - use either first-frame/last-frame OR reference inputs, not both.
| Mode | Inputs | Description |
|------|--------|-------------|
| Text-to-Video | prompt only | Generate video from text description |
| Image-to-Video | prompt + image | Animate a still image (first frame) |
| First+Last Frame | prompt + image + end_image | Control start and end frames |
| Multimodal Reference | prompt + reference_images/reference_videos/reference_audios | Guide generation with reference material |
belt app run bytedance/seedance-2-0 --input '{
"prompt": "ocean waves crashing on rocks during a storm, dramatic cinematic shot",
"generate_audio": true,
"duration": 10,
"ratio": "16:9"
}'
belt app run bytedance/seedance-2-0-fast --input '{
"prompt": "a butterfly landing on a flower in slow motion",
"generate_audio": true
}'
Animate a still image into a video:
belt app run bytedance/seedance-2-0 --input '{
"image": "https://your-image.jpg",
"prompt": "gentle camera movement, leaves rustling in the wind",
"generate_audio": true
}'
belt app run bytedance/seedance-2-0 --input '{
"image": "https://start-frame.jpg",
"end_image": "https://end-frame.jpg",
"prompt": "smooth transition between scenes",
"generate_audio": true
}'
Use multiple reference images to guide character appearance, outfits, and scene elements:
belt app run bytedance/seedance-2-0 --input '{
"prompt": "The girl from Image 1 wearing the outfit from Image 2 walks through the cafe from Image 3",
"reference_images": [
"https://character-portrait.jpg",
"https://outfit-reference.jpg",
"https://cafe-scene.jpg"
],
"generate_audio": true,
"duration": 8
}'
belt app run bytedance/seedance-2-0 --input '{
"prompt": "Replace the perfume in Video 1 with the face cream from Image 1, preserving all original motions and camera work",
"reference_images": ["https://face-cream.jpg"],
"reference_videos": ["https://original-video.mp4"],
"generate_audio": true
}'
belt app run bytedance/seedance-2-0 --input '{
"prompt": "Video 1 transitions smoothly into Video 2, then the camera enters the painting from Video 3",
"reference_videos": [
"https://clip1.mp4",
"https://clip2.mp4",
"https://clip3.mp4"
],
"generate_audio": true,
"duration": 8
}'
belt app run bytedance/seedance-2-0 --input '{
"prompt": "The musician from Image 1 performs the song from Audio 1, voice style referenced from Audio 1",
"reference_images": ["https://musician.jpg"],
"reference_audios": ["https://music.mp3"],
"generate_audio": true
}'
Studio variants upload images to BytePlus's private asset library for enhanced face/character consistency:
belt app run bytedance/seedance-2-0-studio --input '{
"prompt": "The person in Image 1 smiles at the camera, golden hour lighting, cinematic",
"reference_images": ["https://portrait.jpg"],
"safety_identifier": "user-abc123",
"generate_audio": true
}'
belt app run bytedance/seedance-2-0 --input '{
"prompt": "First-person POV product ad. Opening frame is Image 1, hand picks up the product. Camera pushes into close-up showing details. Use the camera movement style from Video 1. Background music from Audio 1.",
"reference_images": ["https://product-hero.jpg", "https://product-detail.jpg"],
"reference_videos": ["https://camera-style.mp4"],
"reference_audios": ["https://bgm.mp3"],
"generate_audio": true,
"ratio": "9:16",
"duration": 11
}'
Reference assets in your prompt using type + index: Image 1, Image 2, Video 1, Audio 1. The index is the position within that type in the arrays you provide. Do NOT use asset IDs in prompts.
Multimodal reference formula:
Video editing formula:
Video extension formula:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| prompt | string | required | Text description of the video |
| generate_audio | boolean | true | Generate synchronized audio |
| duration | integer | 5 | Duration in seconds (4-15), or -1 for auto |
| ratio | enum | adaptive | 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, or adaptive |
| resolution | enum | 720p | 480p, 720p, 1080p (Fast: 480p, 720p only) |
| seed | integer | -1 | Seed for reproducibility (-1 for random) |
| watermark | boolean | false | Add watermark to output |
| safety_identifier | string | - | Unique end-user identifier for safety policy (max 64 chars, hash of user ID recommended) |
| image | file | - | First-frame image (mutually exclusive with reference inputs) |
| end_image | file | - | Last-frame image (requires image) |
| reference_images | file[] | - | Reference images, up to 9 (mutually exclusive with image/end_image) |
| reference_videos | file[] | - | Reference videos, up to 3. Max 15s each, total max 15s. mp4/mov |
| reference_audios | file[] | - | Reference audios, up to 3. Max 15s each, total max 15s. wav/mp3. Requires at least one image or video |
| Model | Pricing | |-------|---------| | Seedance 2.0 | $4.30-$7.70/M tokens (varies by resolution and input type) | | Seedance 2.0 Fast | $3.30-$5.60/M tokens |
Token formula: (width x height x fps x duration) / 1024
belt app store search "seedance"
# Full platform skill (all 250+ apps)
npx skills add inference-sh/skills@infsh-cli
# All video generation models
npx skills add inference-sh/skills@ai-video-generation
# Google Veo
npx skills add inference-sh/skills@google-veo
# Image generation (for image-to-video)
npx skills add inference-sh/skills@ai-image-generation
# AI avatars & lipsync
npx skills add inference-sh/skills@ai-avatar-video
Browse all video apps: belt app store --category video
data-ai
Generate multi-person talking head podcast videos from scratch using AI — character creation, TTS, avatar animation, and video stitching. Use when the user wants to create a podcast, talking head video, or multi-speaker conversation video.
development
Declarative UI widgets from JSON for React/Next.js from ui.inference.sh. Render rich interactive UIs from structured agent responses. Capabilities: forms, buttons, cards, layouts, inputs, selects, checkboxes. Use for: agent-generated UIs, dynamic forms, data display, interactive cards. Triggers: widgets, declarative ui, json ui, widget renderer, agent widgets, dynamic ui, form widgets, card widgets, shadcn widgets, structured output ui
tools
Tool lifecycle UI components for React/Next.js from ui.inference.sh. Display tool calls: pending, progress, approval required, results. Capabilities: tool status, progress indicators, approval flows, results display. Use for: showing agent tool calls, human-in-the-loop approvals, tool output. Triggers: tool ui, tool calls, tool status, tool approval, tool results, agent tools, mcp tools ui, function calling ui, tool lifecycle, tool pending
development
Chat UI building blocks for React/Next.js from ui.inference.sh. Components: container, messages, input, typing indicators, avatars. Capabilities: chat interfaces, message lists, input handling, streaming. Use for: building custom chat UIs, messaging interfaces, AI assistants. Triggers: chat ui, chat component, message list, chat input, shadcn chat, react chat, chat interface, messaging ui, conversation ui, chat building blocks