guides/design/youtube-thumbnail-design/SKILL.md
YouTube thumbnail design with specific dimensions, contrast rules, and mobile preview optimization. Covers safe zones, text placement, face expression psychology, and A/B testing. Use for: YouTube thumbnails, video cover images, click-through optimization. Triggers: youtube thumbnail, thumbnail design, video thumbnail, click through rate, ctr optimization, youtube cover, video cover image, thumbnail maker, thumbnail tips, youtube design, video preview image
npx skillsauth add inference-sh-7/skills youtube-thumbnail-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
Security scan pending...
This skill is queued for security scanning. Results will appear when the scan completes.
Install the belt CLI skill:
npx skills add belt-sh/cli
Create high-CTR YouTube thumbnails with AI image generation via inference.sh CLI.
Requires inference.sh CLI (
belt). Install instructions
belt login
# Generate a thumbnail
belt app run falai/flux-dev-lora --input '{
"prompt": "YouTube thumbnail style, close-up of a person with surprised excited expression looking at a glowing laptop screen, vibrant blue and orange color scheme, dramatic studio lighting, shallow depth of field, high contrast, cinematic",
"width": 1280,
"height": 720
}'
| Spec | Value | |------|-------| | Dimensions | 1280 x 720 px (minimum) | | Recommended | 1920 x 1080 px | | Aspect ratio | 16:9 | | Max file size | 2 MB | | Formats | JPG, GIF, PNG |
Your thumbnail appears at roughly 120px wide on mobile — that's how most viewers first see it.
At 120px, viewers must be able to identify:
Test: view your thumbnail at 120px width. If it's a muddy blur, redesign.
┌─────────────────────────────────────────────┐
│ │
│ ✅ SAFE FOR TEXT AND KEY ELEMENTS │
│ │
│ │
│ │
│ │
│ ┌───┐ │
│ │ ⏱ │ │ ← Timestamp overlay
│ ┌────────┴───┘ │ (bottom-right)
│ ┌────┐ │ DURATION │
│ │ CH │ Chapter marker └──────────────│
└───┴────┴────────────────────────────────────┘
↑ Bottom-left: chapter/progress markers
Avoid placing critical elements in:
| Combination | Mood | Best For | |-------------|------|----------| | Yellow + Black | Urgency, attention | Tech, business, lists | | Red + White | Energy, excitement | Entertainment, reactions | | Blue + Orange | Professional contrast | Education, tutorials | | Green + White | Growth, money | Finance, success stories | | Purple + Yellow | Premium, creative | Design, art, creativity | | White + Dark | Clean, minimal | Luxury, minimalist channels |
| Rule | Reason | |------|--------| | Max 6 words | Readability at thumbnail size | | Min 60pt equivalent | Must be legible at 120px width | | Bold sans-serif font | Thin fonts disappear at small sizes | | Contrast stroke/shadow | Ensures readability on any background | | No small text | If it's not readable small, cut it |
Thumbnails with faces get higher CTR than faceless thumbnails. Expression matters:
| Expression | CTR Impact | Best For | |------------|-----------|----------| | Surprise/shock | Highest | Reaction, reveal, discovery content | | Curiosity | High | Tutorial, how-to, tips | | Excitement | High | Unboxing, reviews, announcements | | Concern/worry | Medium-high | Warning, mistake, problem content | | Confidence | Medium | Expert advice, authority content | | Neutral | Lowest | Avoid unless your brand is minimalist |
# Generate a face-forward thumbnail
belt app run falai/flux-dev-lora --input '{
"prompt": "close-up portrait of a man with genuinely surprised expression, mouth slightly open, raised eyebrows, looking at camera, left side of frame, vibrant teal background, dramatic rim lighting, YouTube thumbnail style, high contrast, cinematic",
"width": 1280,
"height": 720
}'
# Generate a face-looking-at-subject thumbnail
belt app run bytedance/seedream-4-5 --input '{
"prompt": "person looking amazed at a glowing holographic chart showing upward growth, dramatic blue and green lighting, right side profile view, dark background, tech aesthetic, high energy",
"size": "2K"
}'
belt app run falai/flux-dev-lora --input '{
"prompt": "overhead flat lay of organized workspace with laptop showing code editor, colorful sticky notes, coffee cup, clean bright background, professional setup, tutorial style composition, warm lighting",
"width": 1280,
"height": 720
}'
belt app run falai/flux-dev-lora --input '{
"prompt": "split composition, left side dark and messy disorganized desk, right side bright clean organized minimalist workspace, dramatic contrast between chaos and order, clear dividing line in center, high contrast",
"width": 1280,
"height": 720
}'
belt app run falai/flux-dev-lora --input '{
"prompt": "two products facing each other with dramatic lighting and sparks between them, competition battle concept, dark background with colorful rim lighting, versus comparison style, high energy, product photography",
"width": 1280,
"height": 720
}'
belt app run falai/flux-dev-lora --input '{
"prompt": "dynamic arrangement of 7 different colorful objects floating in space against dark gradient background, each item distinct and clearly separated, energetic composition, vibrant saturated colors, studio lighting",
"width": 1280,
"height": 720
}'
Test one variable at a time:
| Variable | Test A vs B | |----------|-------------| | Face vs No face | Same composition, with/without person | | Expression | Surprise vs curiosity | | Color scheme | Warm vs cool palette | | Text vs No text | With/without text overlay | | Background | Bright vs dark | | Composition | Left-facing vs right-facing subject |
# Generate variant A
belt app run falai/flux-dev-lora --input '{
"prompt": "..., bright yellow background, ...",
"width": 1280, "height": 720
}' --no-wait
# Generate variant B (same prompt, different background)
belt app run falai/flux-dev-lora --input '{
"prompt": "..., dark navy background, ...",
"width": 1280, "height": 720
}' --no-wait
| Mistake | Problem | Fix | |---------|---------|-----| | Too much text | Unreadable at thumbnail size | Max 6 words or no text | | Low contrast | Disappears in the feed | Use complementary colors | | Cluttered composition | Eye doesn't know where to look | One focal point | | Generic stock photo feel | No personality, gets skipped | Authentic expressions, unique angles | | Tiny details | Lost at 120px | Bold, simple shapes | | Same style every video | Viewer fatigue | Vary within brand guidelines | | Misleading thumbnail | Kills trust, hurts retention | Match the actual content |
npx skills add inference-sh/skills@ai-image-generation
npx skills add inference-sh/skills@image-upscaling
npx skills add inference-sh/skills@prompt-engineering
Browse all apps: belt app store
data-ai
Generate multi-person talking head podcast videos from scratch using AI — character creation, TTS, avatar animation, and video stitching. Use when the user wants to create a podcast, talking head video, or multi-speaker conversation video.
tools
Generate videos with ByteDance Seedance 2.0 via inference.sh CLI. Unified model for text-to-video, image-to-video, and reference-to-video with synchronized audio, up to 1080p, 4-15s duration. Pro and Fast variants. Studio variants with private asset library for portrait consistency. Use for: social media videos, music videos, product demos, animated content, AI video with sound. Triggers: seedance, seedance 2, bytedance video, seedance t2v, seedance i2v, seedance r2v, video with audio, seedance 2.0, bytedance seedance, seedance studio
tools
Generate talking head avatar videos with Pruna P-Video-Avatar via inference.sh CLI. Turn a portrait image into a realistic speaking video with built-in TTS. 18x faster and 6x cheaper than competitors. Models: P-Video-Avatar, P-Image (for portrait generation). Capabilities: text-to-avatar, audio-driven avatars, 30 voices, 10 languages, 720p/1080p, built-in TTS, dynamic backgrounds, full-body control. Use for: AI presenters, product demos, explainer videos, virtual influencers, marketing, education, multilingual content, UGC, gaming avatars. Triggers: avatar video, talking head, ai avatar, p-video-avatar, pruna avatar, video avatar, ai presenter, digital human, virtual presenter, lipsync, talking avatar, ai spokesperson, heygen alternative, synthesia alternative, veed alternative, fabric alternative, omnihuman alternative
tools
Generate and edit videos with Alibaba HappyHorse 1.0 models via inference.sh CLI. Models: HappyHorse T2V, I2V, R2V, Video Edit. Capabilities: text-to-video, image-to-video, reference-to-video, video editing with natural language, character preservation, 720P/1080P, up to 15 seconds. Use for: physically realistic video, video editing, character-consistent content, product demos, social media. Triggers: happyhorse, happy horse, alibaba video, happyhorse 1.0, dashscope video, alibaba happyhorse, video editing ai, ai video editor