skills/demo-video/SKILL.md
Build a narrated demo video from a screen recording, a talk track script with act timelines, branded intro/outro slides, background music, and AI-generated voiceover via ElevenLabs. Use when user asks to create a "demo video", "product demo", "narrated screencast", "talk track video", or "voiceover video".
npx skillsauth add paolomoz/skills demo-videoInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Assembles a polished narrated demo video from a screen recording, a scripted talk track with per-act timelines, branded intro/outro slides, background music, and AI voiceover. Output is a single MP4 ready for sharing.
Screen recording + script + music + brand assets
-> Slide generation (PIL)
-> TTS voiceover per act (ElevenLabs)
-> Segment assembly per act (ffmpeg)
-> Final concatenation with re-encode (ffmpeg)
-> Output MP4
/demo-video path/to/recording.mov
/demo-video path/to/recording.mov --music path/to/music.mp3 --voice Eric
Gather from user before building:
ffprobe to get resolution, duration, FPSEach act needs:
Example:
Act 1 (intro, slide): With AEM, teams can build a complete site in minutes.
Act 2 (0:00-0:15): Here we're building the Tavex site from a detailed briefing.
Act 3 (0:16-0:45): The agent reads the briefing and gets to work...
#EB1000)The build pipeline assumes a finished script. Writing that script well is a separate discipline — generic scripts feel dubbed, synced scripts feel alive. Run this process before the build.
Use ffmpeg -vf "fps=1/2,scale=800:-1" — every 2s for transition-heavy videos, every 5s for simple ones. Read every frame visually to build a precise timeline.
Map timestamps → screen content → production notes. Save as frame-analysis.md alongside the script. This is the source of truth for act boundaries.
Extract: banned words, preferred vocabulary, sentence rhythm, anti-traits. Don't invent a tone — match it. A "cheeky" first draft often has to be completely rewritten once the brand's actual voice is in view.
Mark exact timestamps where the UI changes (panels appearing/disappearing, page changes, loading states). These become act boundaries with deliberate pauses and narration bridges.
Mention the exact product name, the exact UI element ("See the interest bars?"), the exact text on screen ("Five forty-five a.m."). This is what makes narration feel synced rather than dubbed.
When a new UI element appears, the narration should prime the viewer ("let me show you what I see") → 0.5s pause → element appears. When it disappears, the last line should land on it → 0.5s pause → resume.
Count words per act, compare against act duration. Over-length acts need trimming or freeze-frame extension.
script.mdInclude: metadata (source video, duration, character, tone), brand voice rules, and each act with timestamps + on-screen description + narration + production notes.
build-demo/
frames/ # Extracted frames for analysis (step 1)
frame-analysis.md # Timeline mapping timestamps to screen content (step 2)
script.md # Talk track script (step 8)
audio/ # TTS voiceover MP3 per act
images/ # Generated slide PNGs
segments/ # Assembled MP4 per act
concat.txt # ffmpeg concat manifest
build.py # Build script
aem-demo.mp4 # Final output
The build script skips existing artifacts. To rebuild a specific act:
audio/actN.mp3)segments/NN-actN.mp4)aem-demo.mp4)build.pyDetailed technical reference is split into supplementary files — consult as needed during the build:
ffprobe for specsmkdir -p build-demo/{audio,images,segments}
/System/Library/Fonts/Helvetica.ttc/v1/text-to-speech/{voice_id}eleven_multilingual_v2stability=0.6, similarity_boost=0.8, style=0.15Intro segment:
amix:normalize=0 (no overlap, just sequential)Video segments (per act):
Outro segment:
concat.txt with all segment paths in order-c copy)open output.mp4
Check: audio plays on both channels, no sync drift at act transitions, music doesn't overlap voice.
Common adjustments (delete affected files and re-run):
audio/actN.mp3 + segments/NN-actN.mp4audio/act*.mp3 + all segments/*.mp4images/*.png + segments/00-intro.mp4 or segments/99-outro.mp4| Tool | Purpose | Install |
|------|---------|---------|
| ffmpeg | Video/audio assembly, concat | brew install ffmpeg |
| ffprobe | Source video analysis | Included with ffmpeg |
| Pillow | Slide image generation | pip install Pillow |
| requests | ElevenLabs API calls | pip install requests |
| python-dotenv | Load API keys from .env | pip install python-dotenv |
Requires ELEVENLABS_API_KEY in .env file.
See ELEVENLABS.md for API reference and voice recommendations.
development
Generate artistic infographics from any topic. Runs the Sumi pipeline (analyze → structure → craft prompt → generate image) entirely within Claude Code. Use when "generate infographic", "create infographic", "sumi", "make an infographic about", or "visualize topic".
tools
Implement Server-Sent Events streaming from Cloudflare Workers to browser clients with reconnection, state persistence, and progress tracking. Use when building "SSE streaming", "real-time updates", "server push", or "event streaming".
development
Audit websites by cross-referencing query indexes, sitemaps, and navigation to identify content gaps, stale pages, missing metadata, and quality issues. Use when "auditing a website", "finding content gaps", "site quality audit", or "content inventory analysis".
data-ai
Track user session context across multi-turn interactions using browser sessionStorage and server-side KV caching with TTL. Use when implementing "session tracking", "conversation context", "multi-turn sessions", or "user journey tracking".