tools/video/seedance/SKILL.md
Generate videos with ByteDance Seedance 2.0 via inference.sh CLI. Unified model for text-to-video, image-to-video, and reference-to-video with synchronized audio, up to 1080p, 4-15s duration. Pro and Fast variants. Studio variants with private asset library for portrait consistency. Use for: social media videos, music videos, product demos, animated content, AI video with sound. Triggers: seedance, seedance 2, bytedance video, seedance t2v, seedance i2v, seedance r2v, video with audio, seedance 2.0, bytedance seedance, seedance studio
npx skillsauth add 1nfsh-s3/skills seedanceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Install the belt CLI skill:
npx skills add belt-sh/cli
Generate videos with synchronized audio using ByteDance's Seedance 2.0 via inference.sh CLI.
Requires inference.sh CLI (
belt). Install instructions
belt login
belt app run bytedance/seedance-2-0 --input '{
"prompt": "a jazz band performing in a dimly lit club",
"generate_audio": true
}'
| Model | App ID | Best For |
|-------|--------|----------|
| Seedance 2.0 | bytedance/seedance-2-0 | Best quality, up to 1080p |
| Seedance 2.0 Fast | bytedance/seedance-2-0-fast | Faster generation, up to 720p |
| Seedance 2.0 Studio | bytedance/seedance-2-0-studio | Quality + private asset library for portrait consistency |
| Seedance 2.0 Studio Fast | bytedance/seedance-2-0-studio-fast | Fast + private asset library for portrait consistency |
All models support text-to-video, image-to-video, multimodal reference-to-video, and synchronized audio generation. Studio variants automatically upload reference images to the BytePlus private virtual portrait library for enhanced character consistency - particularly useful for faces and branded characters.
The model determines the generation mode from your inputs. These modes are mutually exclusive - use either first-frame/last-frame OR reference inputs, not both.
| Mode | Inputs | Description |
|------|--------|-------------|
| Text-to-Video | prompt only | Generate video from text description |
| Image-to-Video | prompt + image | Animate a still image (first frame) |
| First+Last Frame | prompt + image + end_image | Control start and end frames |
| Multimodal Reference | prompt + reference_images/reference_videos/reference_audios | Guide generation with reference material |
belt app run bytedance/seedance-2-0 --input '{
"prompt": "ocean waves crashing on rocks during a storm, dramatic cinematic shot",
"generate_audio": true,
"duration": 10,
"ratio": "16:9"
}'
belt app run bytedance/seedance-2-0-fast --input '{
"prompt": "a butterfly landing on a flower in slow motion",
"generate_audio": true
}'
Animate a still image into a video:
belt app run bytedance/seedance-2-0 --input '{
"image": "https://your-image.jpg",
"prompt": "gentle camera movement, leaves rustling in the wind",
"generate_audio": true
}'
belt app run bytedance/seedance-2-0 --input '{
"image": "https://start-frame.jpg",
"end_image": "https://end-frame.jpg",
"prompt": "smooth transition between scenes",
"generate_audio": true
}'
Use multiple reference images to guide character appearance, outfits, and scene elements:
belt app run bytedance/seedance-2-0 --input '{
"prompt": "The girl from Image 1 wearing the outfit from Image 2 walks through the cafe from Image 3",
"reference_images": [
"https://character-portrait.jpg",
"https://outfit-reference.jpg",
"https://cafe-scene.jpg"
],
"generate_audio": true,
"duration": 8
}'
belt app run bytedance/seedance-2-0 --input '{
"prompt": "Replace the perfume in Video 1 with the face cream from Image 1, preserving all original motions and camera work",
"reference_images": ["https://face-cream.jpg"],
"reference_videos": ["https://original-video.mp4"],
"generate_audio": true
}'
belt app run bytedance/seedance-2-0 --input '{
"prompt": "Video 1 transitions smoothly into Video 2, then the camera enters the painting from Video 3",
"reference_videos": [
"https://clip1.mp4",
"https://clip2.mp4",
"https://clip3.mp4"
],
"generate_audio": true,
"duration": 8
}'
belt app run bytedance/seedance-2-0 --input '{
"prompt": "The musician from Image 1 performs the song from Audio 1, voice style referenced from Audio 1",
"reference_images": ["https://musician.jpg"],
"reference_audios": ["https://music.mp3"],
"generate_audio": true
}'
Studio variants upload images to BytePlus's private asset library for enhanced face/character consistency:
belt app run bytedance/seedance-2-0-studio --input '{
"prompt": "The person in Image 1 smiles at the camera, golden hour lighting, cinematic",
"reference_images": ["https://portrait.jpg"],
"safety_identifier": "user-abc123",
"generate_audio": true
}'
belt app run bytedance/seedance-2-0 --input '{
"prompt": "First-person POV product ad. Opening frame is Image 1, hand picks up the product. Camera pushes into close-up showing details. Use the camera movement style from Video 1. Background music from Audio 1.",
"reference_images": ["https://product-hero.jpg", "https://product-detail.jpg"],
"reference_videos": ["https://camera-style.mp4"],
"reference_audios": ["https://bgm.mp3"],
"generate_audio": true,
"ratio": "9:16",
"duration": 11
}'
Reference assets in your prompt using type + index: Image 1, Image 2, Video 1, Audio 1. The index is the position within that type in the arrays you provide. Do NOT use asset IDs in prompts.
Multimodal reference formula:
Video editing formula:
Video extension formula:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| prompt | string | required | Text description of the video |
| generate_audio | boolean | true | Generate synchronized audio |
| duration | integer | 5 | Duration in seconds (4-15), or -1 for auto |
| ratio | enum | adaptive | 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, or adaptive |
| resolution | enum | 720p | 480p, 720p, 1080p (Fast: 480p, 720p only) |
| seed | integer | -1 | Seed for reproducibility (-1 for random) |
| watermark | boolean | false | Add watermark to output |
| safety_identifier | string | - | Unique end-user identifier for safety policy (max 64 chars, hash of user ID recommended) |
| image | file | - | First-frame image (mutually exclusive with reference inputs) |
| end_image | file | - | Last-frame image (requires image) |
| reference_images | file[] | - | Reference images, up to 9 (mutually exclusive with image/end_image) |
| reference_videos | file[] | - | Reference videos, up to 3. Max 15s each, total max 15s. mp4/mov |
| reference_audios | file[] | - | Reference audios, up to 3. Max 15s each, total max 15s. wav/mp3. Requires at least one image or video |
| Model | Pricing | |-------|---------| | Seedance 2.0 | $4.30-$7.70/M tokens (varies by resolution and input type) | | Seedance 2.0 Fast | $3.30-$5.60/M tokens |
Token formula: (width x height x fps x duration) / 1024
belt app store search "seedance"
# Full platform skill (all 250+ apps)
npx skills add inference-sh/skills@infsh-cli
# All video generation models
npx skills add inference-sh/skills@ai-video-generation
# Google Veo
npx skills add inference-sh/skills@google-veo
# Image generation (for image-to-video)
npx skills add inference-sh/skills@ai-image-generation
# AI avatars & lipsync
npx skills add inference-sh/skills@ai-avatar-video
Browse all video apps: belt app store --category video
data-ai
Generate multi-person talking head podcast videos from scratch using AI — character creation, TTS, avatar animation, and video stitching. Use when the user wants to create a podcast, talking head video, or multi-speaker conversation video.
development
Build and deploy applications on inference.sh. Use when getting started, understanding the platform, creating apps, configuring resources, or needing an overview of inference.sh app development. Supports both Python and Node.js. Triggers: inference.sh app, belt app, inf.yml, inference.py, inference.js, deploy app, app development, build app, create app, GPU app, VRAM, app resources, app secrets, app integrations, multi-function app
data-ai
Generate multi-person talking head podcast videos from scratch using AI — character creation, TTS, avatar animation, and video stitching. Use when the user wants to create a podcast, talking head video, or multi-speaker conversation video.
tools
Generate talking head avatar videos with Pruna P-Video-Avatar via inference.sh CLI. Turn a portrait image into a realistic speaking video with built-in TTS. 18x faster and 6x cheaper than competitors. Models: P-Video-Avatar, P-Image (for portrait generation). Capabilities: text-to-avatar, audio-driven avatars, 30 voices, 10 languages, 720p/1080p, built-in TTS, dynamic backgrounds, full-body control. Use for: AI presenters, product demos, explainer videos, virtual influencers, marketing, education, multilingual content, UGC, gaming avatars. Triggers: avatar video, talking head, ai avatar, p-video-avatar, pruna avatar, video avatar, ai presenter, digital human, virtual presenter, lipsync, talking avatar, ai spokesperson, heygen alternative, synthesia alternative, veed alternative, fabric alternative, omnihuman alternative