guides/video/video-ad-specs/SKILL.md
Video ad creation with exact platform-specific specs for TikTok, Instagram, YouTube, Facebook, LinkedIn. Covers dimensions, duration limits, AIDA framework, and caption requirements. Use for: video ads, social media ads, paid media creative, video marketing, ad production. Triggers: video ad, social media ad, tiktok ad, instagram ad, youtube ad, facebook ad, linkedin ad, video creative, ad specs, paid media, video marketing, ad production, reels ad, stories ad, pre roll, bumper ad
npx skillsauth add inference-sh/agent-skills video-ad-specsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Create platform-specific video ads via inference.sh CLI.
Requires inference.sh CLI (
belt). Install instructions
belt login
# Generate a vertical video ad scene
belt app run bytedance/seedance-1-5-pro --input '{
"prompt": "vertical video, person excitedly unboxing a product, clean modern room, bright natural lighting, social media ad style, authentic feeling, 9:16 format"
}'
| Spec | Value | |------|-------| | Aspect ratio | 9:16 (vertical) | | Resolution | 1080 x 1920 px | | Duration | 5-60 seconds (15-30s recommended) | | File size | Max 500 MB | | Format | MP4, MOV | | Sound | On by default (design with sound) | | Text safe zone | 150px from all edges | | Hook window | 1 second — first frame must grab attention |
| Spec | Value | |------|-------| | Aspect ratio | 9:16 (vertical) | | Resolution | 1080 x 1920 px | | Duration | Up to 90 seconds (15-30s for ads) | | Cover image | Separate upload, shows in grid | | Sound | On by default | | Caption area | Bottom 20% reserved for text overlay |
| Spec | Value | |------|-------| | Aspect ratio | 9:16 | | Resolution | 1080 x 1920 px | | Duration | Up to 15 seconds per segment | | Swipe-up/Link | Available for ads | | Top/bottom | 14% top and 20% bottom = unsafe for key content |
| Format | Aspect | Duration | Skip | |--------|--------|----------|------| | Bumper | 16:9 | 6 seconds exactly | Non-skippable | | Non-skippable | 16:9 | 15 seconds | Non-skippable | | Skippable (TrueView) | 16:9 | Any length | Skip after 5 seconds | | Shorts | 9:16 | Up to 60 seconds | N/A |
Resolution: 1920 x 1080 (16:9) or 1080 x 1920 (Shorts)
| Spec | Value | |------|-------| | Aspect ratio | 1:1 (square) or 4:5 (recommended for mobile) | | Resolution | 1080 x 1080 or 1080 x 1350 | | Duration | Up to 240 min (15-30s recommended) | | Autoplay | Silent — captions are essential | | Sound | 85% of Facebook video is watched without sound |
| Spec | Value | |------|-------| | Aspect ratio | 1:1 or 16:9 | | Resolution | 1080 x 1080 or 1920 x 1080 | | Duration | 3 seconds to 10 minutes (15-30s for ads) | | Tone | Professional | | Autoplay | Silent in feed |
| Phase | Time | Goal | Technique | |-------|------|------|-----------| | Attention | 0-3s | Stop the scroll | Pattern interrupt, bold visual, question | | Interest | 3-10s | Keep watching | State the problem, show relevance | | Desire | 10-20s | Want the solution | Show the product/outcome, social proof | | Action | Final 3-5s | Click/buy/sign up | Clear CTA, urgency, offer |
| Technique | Example | |-----------|---------| | Bold statement | "This tool replaced my entire marketing team" | | Question | "Why are you still doing this manually?" | | Surprising visual | Unexpected transformation, before/after reveal | | Pattern interrupt | Start mid-action, unusual angle, bright color | | Social proof | "2 million people switched to this" | | Pain point | "If you hate [common frustration], watch this" |
# Hook scene (0-3s)
belt app run google/veo-3-1-fast --input '{
"prompt": "vertical 9:16 video, close-up of hands struggling with tangled cables and messy desk, frustrated energy, shaky handheld camera, authentic social media style, bright lighting"
}'
# Solution reveal (3-15s)
belt app run bytedance/seedance-1-5-pro --input '{
"prompt": "vertical video, smooth product reveal, clean wireless charging station on minimalist desk, satisfying organization transformation, bright modern room, social media ad aesthetic"
}'
# Add voiceover
belt app run falai/dia-tts --input '{
"prompt": "[S1] Stop wasting time with this mess. This one product changed my entire setup. Everything charges. Everything is organized. Link in bio."
}'
# Merge video + audio
belt app run infsh/video-audio-merger --input '{
"video": "solution-reveal.mp4",
"audio": "voiceover.mp3"
}'
# Add captions (critical for silent autoplay)
belt app run infsh/caption-videos --input '{
"video": "ad-with-audio.mp4",
"caption_file": "captions.srt"
}'
belt app run google/veo-3-1-fast --input '{
"prompt": "square 1:1 video, professional person at desk discovering a new software tool, laptop screen showing clean dashboard, natural office lighting, corporate commercial style, satisfied expression"
}'
# 6-second bumper: one message, one visual, one CTA
belt app run google/veo-3-1-fast --input '{
"prompt": "6 second product ad, quick montage of a sleek app being used on phone, fast cuts, modern, energetic, brand logo reveal at end, punchy and dynamic, wide 16:9"
}'
# Keep it tight
belt app run falai/dia-tts --input '{
"prompt": "[S1] Your reports. Automated. Try DataFlow free."
}'
85% of Facebook and 40%+ of Instagram video is watched on mute.
| Rule | Reason | |------|--------| | Always add captions | Silent viewing is the default on most platforms | | Large, readable font | Small text is invisible on mobile | | High contrast | White text with dark outline/background | | Centered or bottom-third | Standard viewing position | | Max 2 lines at a time | More text = can't be read fast enough | | Key words in bold/color | Draws eye to important words |
# Generate captions from audio
# (create SRT file from your script, then burn in)
belt app run infsh/caption-videos --input '{
"video": "ad-video.mp4",
"caption_file": "ad-captions.srt"
}'
| Time | Content | |------|---------| | 0-3s | Customer states the problem they had | | 3-15s | How they discovered and tried the product | | 15-25s | The specific result they achieved | | 25-30s | Product name + CTA |
| Time | Content | |------|---------| | 0-3s | The problem (text or visual) | | 3-20s | Product demo showing the solution | | 20-25s | Key result/benefit | | 25-30s | CTA + offer |
| Time | Content | |------|---------| | 0-3s | "Before" state (messy, slow, frustrating) | | 3-5s | Transition / product introduction | | 5-12s | "After" state (clean, fast, satisfying) | | 12-15s | CTA |
| Mistake | Problem | Fix | |---------|---------|-----| | No hook in first 1-3s | Viewer scrolls past | Open with pattern interrupt | | Landscape video on TikTok/Reels | Letterboxed, looks amateur | Use 9:16 for vertical platforms | | No captions | Most viewers watch silent | Always add captions | | CTA too late | Viewers already left | Clear CTA within last 5 seconds | | Too long for platform | Forced skip or dropout | Match platform duration norms | | Same ad for all platforms | Wrong specs, wrong tone | Create platform-specific versions | | Logo in first 3s | Feels like a commercial, gets skipped | Save branding for the end | | Text in unsafe zones | Cut off by platform UI | Check safe zone per platform |
npx skills add inference-sh/skills@ai-video-generation
npx skills add inference-sh/skills@video-prompting-guide
npx skills add inference-sh/skills@text-to-speech
npx skills add inference-sh/skills@prompt-engineering
Browse all apps: belt app list
development
Render videos from React/Remotion component code via inference.sh. Pass TSX code, get MP4. Supports all Remotion APIs: useCurrentFrame, useVideoConfig, spring, interpolate, AbsoluteFill, Sequence. Configurable resolution, FPS, duration, codec. Use for: programmatic video generation, animated graphics, motion design, data-driven videos, React animations to video. Triggers: remotion, render video from code, tsx to video, react video, programmatic video, remotion render, code to video, animated video, motion graphics code, react animation video
tools
Generate videos with Pruna P-Video and WAN models via inference.sh CLI. Models: P-Video, WAN-T2V, WAN-I2V. Capabilities: text-to-video, image-to-video, audio support, 720p/1080p, fast inference. Pruna optimizes models for speed without quality loss. Triggers: pruna video, p-video, pruna ai video, fast video generation, optimized video, wan t2v, wan i2v, economic video generation, cheap video generation, pruna text to video, pruna image to video
documentation
Still-to-video conversion guide: model selection, motion prompting, and camera movement. Covers Wan 2.5 i2v, Seedance, Fabric, Grok Video with when to use each. Use for: animating images, creating video from stills, adding motion, product animations. Triggers: image to video, i2v, animate image, still to video, add motion to image, image animation, photo to video, animate still, wan i2v, image2video, bring image to life, animate photo, motion from image
tools
Generate videos with Google Veo models via inference.sh CLI. Models: Veo 3.1, Veo 3.1 Fast, Veo 3, Veo 3 Fast, Veo 2. Capabilities: text-to-video, cinematic output, high quality video generation. Triggers: veo, google veo, veo 3, veo 2, veo 3.1, vertex ai video, google video generation, google video ai, veo model, veo video