plugins/sjawhar/skills/product-demos/SKILL.md
Use when creating narrated product demo videos from terminal recordings. Triggers on: asciinema, screen recording, product video, demo video, narrated walkthrough, voiceover, TTS, cast-to-video, product announcement with video
npx skillsauth add sjawhar/dotfiles product-demosInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Produce narrated product demo videos from asciinema terminal recordings. Pipeline: .cast → MP4 → ElevenLabs voiceover → synced narrated video.
.cast files (asciinema recordings)
↓
agg → GIF → ffmpeg → MP4 clips (per section)
↓
Trim clips to interesting parts (thumbnail-guided)
↓
ElevenLabs API → per-section MP3 narration
↓
ffmpeg sync (speed-adjust video to match audio)
↓
Normalize + concatenate → final MP4
Normalize + concatenate → final MP4
# Install agg (asciinema gif generator) — MUST use --git, not crate name
cargo install --git https://github.com/asciinema/agg
# Python deps in a venv
uv venv /tmp/demo/venv
source /tmp/demo/venv/bin/activate
uv pip install elevenlabs
# Verify
which agg ffmpeg ffprobe
Gotcha: cargo install agg installs a DIFFERENT crate (a library). Must use --git.
asciinema rec /tmp/demo/recordings/section-name.cast
# Terminal size: 120x35 recommended for consistency
# Theme: set your terminal to a dark theme before recording
Key principles:
# Section: ...) typed into terminal help with trim-point discovery later# Step 1: Cast → GIF (agg compresses idle time automatically)
agg --font-size 24 --theme monokai input.cast output.gif
# Step 2: GIF → MP4 (terminal-optimized encoding)
ffmpeg -y -i output.gif \
-movflags faststart -pix_fmt yuv420p \
-vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" \
-c:v libx264 -preset slow -crf 15 -tune stillimage \
output.mp4
Critical settings:
-crf 15 (not 18 or 23) — terminal text needs near-lossless quality-tune stillimage — optimizes for low-motion content (terminal = mostly static)scale=trunc(iw/2)*2:trunc(ih/2)*2 — ensures even dimensions for h264Gotcha: agg compresses idle time, so .cast timestamps ≠ MP4 timestamps. Find trim points via thumbnails, not math.
# Generate thumbnails at intervals
for t in 0 5 10 15 20 30 40 50 60; do
ffmpeg -y -ss $t -i full.mp4 -frames:v 1 -q:v 5 thumb_${t}s.jpg 2>/dev/null
done
Then use look_at or manual inspection to identify section boundaries. Trim with:
ffmpeg -y -i full.mp4 -ss $START -to $END \
-c:v libx264 -crf 15 -tune stillimage -pix_fmt yuv420p -an \
trimmed.mp4
Write narration as a Python data structure for programmatic generation:
SECTIONS = [
{
"id": "1a_feature_intro",
"title": "Feature Name", # → title card
"narration": "Script text here. Use <break time=\"0.8s\" /> for pauses.",
"video": {
"source": "recording-full.mp4",
"trim": (start_sec, end_sec),
},
},
]
Script-to-screen audit (MANDATORY before final render): After all recordings are finalized, compare every narration line to what's actually visible on screen. Pre-written scripts WILL diverge from actual recordings. Common mismatches:
Generate comparison samples before committing to a voice:
from elevenlabs import ElevenLabs, VoiceSettings, save
SAMPLE_TEXT = "Your representative 2-3 sentence sample."
for voice_id, name in [
("CwhRBWXzGAHq8TQ4Fs17", "Roger"),
("iP95p4xoKVk53GoZ742B", "Chris"),
("cjVigY5qzO86Huf0OWal", "Eric"),
("onwK4e9ZLuTAKqWW03F9", "Daniel"),
]:
audio = client.text_to_speech.convert(
voice_id=voice_id, text=SAMPLE_TEXT,
model_id="eleven_turbo_v2_5",
output_format="mp3_44100_192",
voice_settings=VoiceSettings(
stability=0.75, similarity_boost=0.85,
style=0.0, speed=0.92, use_speaker_boost=True,
),
)
save(audio, f"sample_{name}.mp3")
Build a comparison video with labels so the user can A/B in one file:
ffmpeg -y \
-f lavfi -i "color=c=0x1a1a2e:size=1280x720:duration=${dur}:rate=24" \
-i sample.mp3 \
-filter_complex "[1:a]volume=2.0,aformat=channel_layouts=stereo[a];
[0:v]drawtext=fontfile=${FONT}:text='${NAME}':fontsize=48:
fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2,format=yuv420p[v]" \
-map "[v]" -map "[a]" \
-c:v libx264 -crf 18 -c:a aac -b:a 192k -ar 44100 -ac 2 \
-shortest labeled_sample.mp4
audio = client.text_to_speech.convert(
voice_id=VOICE_ID,
text=section_text,
model_id="eleven_turbo_v2_5", # Best for English narration
output_format="mp3_44100_192", # 192kbps — 128 sounds bad
voice_settings=VoiceSettings(
stability=0.75, # 0.6-0.8 for narration
similarity_boost=0.85,
style=0.0, # Keep at 0 — reduces artifacts
speed=0.92, # Slightly slower for clarity
use_speaker_boost=True,
),
previous_text=prev[-200:], # Cross-section continuity
next_text=nxt[:200],
)
save(audio, output_path)
Critical audio settings:
mp3_44100_192 minimum — 128kbps sounds tinny/compressedeleven_turbo_v2_5 model — more natural than multilingual_v2 for Englishpcm_44100 (lossless) requires Pro planElevenLabs handles most acronyms. For problem terms, use alias substitution in text:
"jj" → "jay-jay", "CLI" → "C L I", "OAuth" → "Oh-Auth""uv sync" → "you-vee sync", "tl run" → "T L run"If no ElevenLabs key, pip install gTTS provides free Google TTS. Lower quality but unblocks the pipeline. Strip <break> tags (unsupported) and replace with periods.
Speed-adjust video to match audio duration. Terminal recordings tolerate wide speed ranges:
video_dur = get_duration(video_path)
audio_dur = get_duration(audio_path)
pts = max(0.25, min(4.0, video_dur / audio_dur))
inv_pts = 1.0 / pts
ffmpeg ... -filter_complex
"[0:v]setpts={inv_pts}*PTS,...[v];[1:a]volume=2.0,aformat=channel_layouts=stereo[a]"
-map "[v]" -map "[a]"
-c:a aac -b:a 192k -ar 44100 -ac 2
Acceptable speed ranges:
3x: video becomes unwatchably fast — trim the narration instead
ALL clips MUST be normalized before concatenation. ffmpeg concat demuxer requires identical:
fps=10 is fine for terminal)format=yuv420p)ffmpeg -y -i clip.mp4 \
-vf "scale=${W}:${H}:force_original_aspect_ratio=decrease,
pad=${W}:${H}:(ow-iw)/2:(oh-ih)/2:color=0x1a1a2e,
fps=10,format=yuv420p" \
-c:v libx264 -crf 15 \
-c:a aac -b:a 192k -ar 44100 -ac 2 \
normalized.mp4
Gotcha: ffmpeg scale filter uses : separator, NOT x. scale=1756:1208 ✅, scale=1756x1208 ❌.
ffmpeg -y -f lavfi \
-i "color=c=0x1a1a2e:size=${W}x${H}:duration=3:rate=10" \
-f lavfi -i "anullsrc=r=44100:cl=stereo" \
-vf "drawtext=fontfile=${FONT}:text='Section Title':
fontsize=52:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2,
format=yuv420p" \
-c:v libx264 -crf 15 -c:a aac -b:a 192k -t 3 title.mp4
# Build concat list
for clip in normalized_*.mp4; do
echo "file '$clip'" >> concat.txt
done
ffmpeg -y -f concat -safe 0 -i concat.txt -c copy final.mp4
For recording + production split across agents, use file-based mailbox:
~/.agent-mail/project-name/
001-recording-requests.md # Production → Recording: what to record
002-recording-status.md # Recording → Production: what's done, issues
003-followup.md # Iterate as needed
Each message includes: date, what's done, what's needed, file locations.
New recordings go directly to the shared recordings directory. Production agent polls for new files.
| Mistake | Fix |
|---------|-----|
| cargo install agg installs wrong package | Use --git https://github.com/asciinema/agg |
| 128kbps MP3 sounds tinny | Use mp3_44100_192 (Creator+ plan) |
| Mono audio plays silent on some devices | Always output stereo (-ac 2) with volume boost (volume=2.0) |
| scale=WxH in ffmpeg | Use scale=W:H (colon, not x) |
| Narration doesn't match screen | Audit script-to-screen AFTER recordings finalize |
| Concat produces garbage | Normalize ALL clips to same resolution/fps/pix_fmt/audio first |
| Writing narration before recording | Record first, write narration to match |
| Picking voice without samples | Always generate A/B comparison video for user |
| Picking voice without samples | Always generate A/B comparison video for user |
For simple feature demo recordings attached to PRs (no narration needed):
asciinema rec /tmp/demo.cast --cols 120 --rows 35
# Demonstrate the feature, then exit
agg --font-size 24 --theme monokai /tmp/demo.cast /tmp/demo.gif
ffmpeg -y -i /tmp/demo.gif \
-movflags faststart -pix_fmt yuv420p \
-vf 'scale=trunc(iw/2)*2:trunc(ih/2)*2' \
-c:v libx264 -preset slow -crf 15 -tune stillimage \
/tmp/demo.mp4
Option 1 (preferred): Upload to asciinema.org + post as PR comment
asciinema upload /tmp/demo.cast
# Copy the URL, then post as a PR comment:
gh pr comment $PR_NUM --repo $OWNER/$REPO --body '## Demo Video
https://asciinema.org/a/XXXXX'
Option 2: Post mp4 URL as PR comment (GitHub auto-renders inline)
Post the raw .mp4 URL on its own line in a PR comment. GitHub renders it as an inline video player.
gh pr comment $PR_NUM --repo $OWNER/$REPO --body '## Demo Video
https://github.com/OWNER/REPO/releases/download/TAG/demo.mp4'
DO NOT create GitHub releases just to host demo videos. Release assets pollute the releases page and do not render inline. Prefer asciinema.org for terminal recordings.
development
Use when searching flights, hotels, or rental cars; comparing fares across flexible dates; discovering cheap destinations from a fixed origin; or hunting hidden-city ticketing deals. Trigger on multi-city itineraries, fare calendars, "where can I fly cheaply", price-sensitive trip planning, or any time the user wants a sanity-check against Google Flights pricing — Skiplagged surfaces hidden-city deals other engines deliberately hide.
development
Search the web via Ceramic Search (lexical/keyword-based). Use when looking up current events, recent news, time-sensitive facts, specific people/products/companies, technical docs, or any topic requiring fresh web results. Triggers on "search the web", "look up", "find recent", "latest news", "current", or when built-in knowledge is likely stale.
tools
Use when reading WhatsApp messages, searching conversations, sending messages, listing chats, or interacting with WhatsApp workspaces
tools
Watch CI status, fix failures, and merge when green