Product Demo Videos

Produce narrated product demo videos from asciinema terminal recordings. Pipeline: .cast → MP4 → ElevenLabs voiceover → synced narrated video.

Pipeline Overview

.cast files (asciinema recordings)
    ↓
agg → GIF → ffmpeg → MP4 clips (per section)
    ↓
Trim clips to interesting parts (thumbnail-guided)
    ↓
ElevenLabs API → per-section MP3 narration
    ↓
ffmpeg sync (speed-adjust video to match audio)
    ↓
Normalize + concatenate → final MP4
    Normalize + concatenate → final MP4

Setup

# Install agg (asciinema gif generator) — MUST use --git, not crate name
cargo install --git https://github.com/asciinema/agg

# Python deps in a venv
uv venv /tmp/demo/venv
source /tmp/demo/venv/bin/activate
uv pip install elevenlabs

# Verify
which agg ffmpeg ffprobe

Gotcha: cargo install agg installs a DIFFERENT crate (a library). Must use --git.

Recording with asciinema

asciinema rec /tmp/demo/recordings/section-name.cast
# Terminal size: 120x35 recommended for consistency
# Theme: set your terminal to a dark theme before recording

Key principles:

Record one logical section per file
Keep a script of what to type, but don't over-rehearse
Comments (# Section: ...) typed into terminal help with trim-point discovery later
If a command errors on camera, that's usually fine — re-record only if the error is misleading

Cast → MP4 Conversion

# Step 1: Cast → GIF (agg compresses idle time automatically)
agg --font-size 24 --theme monokai input.cast output.gif

# Step 2: GIF → MP4 (terminal-optimized encoding)
ffmpeg -y -i output.gif \
  -movflags faststart -pix_fmt yuv420p \
  -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" \
  -c:v libx264 -preset slow -crf 15 -tune stillimage \
  output.mp4

Critical settings:

-crf 15 (not 18 or 23) — terminal text needs near-lossless quality
-tune stillimage — optimizes for low-motion content (terminal = mostly static)
scale=trunc(iw/2)*2:trunc(ih/2)*2 — ensures even dimensions for h264

Gotcha: agg compresses idle time, so .cast timestamps ≠ MP4 timestamps. Find trim points via thumbnails, not math.

Finding Trim Points

# Generate thumbnails at intervals
for t in 0 5 10 15 20 30 40 50 60; do
  ffmpeg -y -ss $t -i full.mp4 -frames:v 1 -q:v 5 thumb_${t}s.jpg 2>/dev/null
done

Then use look_at or manual inspection to identify section boundaries. Trim with:

ffmpeg -y -i full.mp4 -ss $START -to $END \
  -c:v libx264 -crf 15 -tune stillimage -pix_fmt yuv420p -an \
  trimmed.mp4

Narration Script Structure

Write narration as a Python data structure for programmatic generation:

SECTIONS = [
    {
        "id": "1a_feature_intro",
        "title": "Feature Name",        # → title card
        "narration": "Script text here. Use <break time=\"0.8s\" /> for pauses.",
        "video": {
            "source": "recording-full.mp4",
            "trim": (start_sec, end_sec),
        },
    },
]

Script-to-screen audit (MANDATORY before final render): After all recordings are finalized, compare every narration line to what's actually visible on screen. Pre-written scripts WILL diverge from actual recordings. Common mismatches:

Command output differs from what narration describes
Specific numbers/stats don't match (e.g., "resisted" vs "ignored")
Feature names differ (e.g., "slash run-inspect" vs "running-tasks")
Described workflow doesn't match what the recording shows

ElevenLabs Voice Generation

Voice Selection (DO THIS FIRST)

Generate comparison samples before committing to a voice:

from elevenlabs import ElevenLabs, VoiceSettings, save

SAMPLE_TEXT = "Your representative 2-3 sentence sample."

for voice_id, name in [
    ("CwhRBWXzGAHq8TQ4Fs17", "Roger"),
    ("iP95p4xoKVk53GoZ742B", "Chris"),
    ("cjVigY5qzO86Huf0OWal", "Eric"),
    ("onwK4e9ZLuTAKqWW03F9", "Daniel"),
]:
    audio = client.text_to_speech.convert(
        voice_id=voice_id, text=SAMPLE_TEXT,
        model_id="eleven_turbo_v2_5",
        output_format="mp3_44100_192",
        voice_settings=VoiceSettings(
            stability=0.75, similarity_boost=0.85,
            style=0.0, speed=0.92, use_speaker_boost=True,
        ),
    )
    save(audio, f"sample_{name}.mp3")

Build a comparison video with labels so the user can A/B in one file:

ffmpeg -y \
  -f lavfi -i "color=c=0x1a1a2e:size=1280x720:duration=${dur}:rate=24" \
  -i sample.mp3 \
  -filter_complex "[1:a]volume=2.0,aformat=channel_layouts=stereo[a];
    [0:v]drawtext=fontfile=${FONT}:text='${NAME}':fontsize=48:
    fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2,format=yuv420p[v]" \
  -map "[v]" -map "[a]" \
  -c:v libx264 -crf 18 -c:a aac -b:a 192k -ar 44100 -ac 2 \
  -shortest labeled_sample.mp4

Generating Narration

audio = client.text_to_speech.convert(
    voice_id=VOICE_ID,
    text=section_text,
    model_id="eleven_turbo_v2_5",      # Best for English narration
    output_format="mp3_44100_192",      # 192kbps — 128 sounds bad
    voice_settings=VoiceSettings(
        stability=0.75,                  # 0.6-0.8 for narration
        similarity_boost=0.85,
        style=0.0,                       # Keep at 0 — reduces artifacts
        speed=0.92,                      # Slightly slower for clarity
        use_speaker_boost=True,
    ),
    previous_text=prev[-200:],           # Cross-section continuity
    next_text=nxt[:200],
)
save(audio, output_path)

Critical audio settings:

mp3_44100_192 minimum — 128kbps sounds tinny/compressed
eleven_turbo_v2_5 model — more natural than multilingual_v2 for English
pcm_44100 (lossless) requires Pro plan
Mono output from API — must convert to stereo + boost volume for video

Pronunciation

ElevenLabs handles most acronyms. For problem terms, use alias substitution in text:

"jj" → "jay-jay", "CLI" → "C L I", "OAuth" → "Oh-Auth"
"uv sync" → "you-vee sync", "tl run" → "T L run"

Fallback: gTTS

If no ElevenLabs key, pip install gTTS provides free Google TTS. Lower quality but unblocks the pipeline. Strip <break> tags (unsupported) and replace with periods.

Video Assembly

Syncing Video + Audio

Speed-adjust video to match audio duration. Terminal recordings tolerate wide speed ranges:

video_dur = get_duration(video_path)
audio_dur = get_duration(audio_path)
pts = max(0.25, min(4.0, video_dur / audio_dur))
inv_pts = 1.0 / pts

ffmpeg ... -filter_complex
  "[0:v]setpts={inv_pts}*PTS,...[v];[1:a]volume=2.0,aformat=channel_layouts=stereo[a]"
  -map "[v]" -map "[a]"
  -c:a aac -b:a 192k -ar 44100 -ac 2

Acceptable speed ranges:

0.5x–2.0x: imperceptible for terminal recordings
0.3x–0.5x: fine for "reading the screen" moments (diagnostics output)
3x: video becomes unwatchably fast — trim the narration instead

Normalization for Concat

ALL clips MUST be normalized before concatenation. ffmpeg concat demuxer requires identical:

Resolution (scale + pad to target)
FPS (fps=10 is fine for terminal)
Pixel format (format=yuv420p)
Audio: stereo, 44100Hz, AAC

ffmpeg -y -i clip.mp4 \
  -vf "scale=${W}:${H}:force_original_aspect_ratio=decrease,
       pad=${W}:${H}:(ow-iw)/2:(oh-ih)/2:color=0x1a1a2e,
       fps=10,format=yuv420p" \
  -c:v libx264 -crf 15 \
  -c:a aac -b:a 192k -ar 44100 -ac 2 \
  normalized.mp4

Gotcha: ffmpeg scale filter uses : separator, NOT x. scale=1756:1208 ✅, scale=1756x1208 ❌.

Title Cards

ffmpeg -y -f lavfi \
  -i "color=c=0x1a1a2e:size=${W}x${H}:duration=3:rate=10" \
  -f lavfi -i "anullsrc=r=44100:cl=stereo" \
  -vf "drawtext=fontfile=${FONT}:text='Section Title':
       fontsize=52:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2,
       format=yuv420p" \
  -c:v libx264 -crf 15 -c:a aac -b:a 192k -t 3 title.mp4

Concatenation

# Build concat list
for clip in normalized_*.mp4; do
  echo "file '$clip'" >> concat.txt
done

ffmpeg -y -f concat -safe 0 -i concat.txt -c copy final.mp4

Multi-Agent Coordination

For recording + production split across agents, use file-based mailbox:

~/.agent-mail/project-name/
  001-recording-requests.md   # Production → Recording: what to record
  002-recording-status.md     # Recording → Production: what's done, issues
  003-followup.md             # Iterate as needed

Each message includes: date, what's done, what's needed, file locations.

New recordings go directly to the shared recordings directory. Production agent polls for new files.

Common Mistakes

| Mistake | Fix | |---------|-----| | cargo install agg installs wrong package | Use --git https://github.com/asciinema/agg | | 128kbps MP3 sounds tinny | Use mp3_44100_192 (Creator+ plan) | | Mono audio plays silent on some devices | Always output stereo (-ac 2) with volume boost (volume=2.0) | | scale=WxH in ffmpeg | Use scale=W:H (colon, not x) | | Narration doesn't match screen | Audit script-to-screen AFTER recordings finalize | | Concat produces garbage | Normalize ALL clips to same resolution/fps/pix_fmt/audio first | | Writing narration before recording | Record first, write narration to match | | Picking voice without samples | Always generate A/B comparison video for user | | Picking voice without samples | Always generate A/B comparison video for user |

Quick PR Demo Videos (Lightweight)

For simple feature demo recordings attached to PRs (no narration needed):

Record

asciinema rec /tmp/demo.cast --cols 120 --rows 35
# Demonstrate the feature, then exit

Convert to MP4

agg --font-size 24 --theme monokai /tmp/demo.cast /tmp/demo.gif
ffmpeg -y -i /tmp/demo.gif \
  -movflags faststart -pix_fmt yuv420p \
  -vf 'scale=trunc(iw/2)*2:trunc(ih/2)*2' \
  -c:v libx264 -preset slow -crf 15 -tune stillimage \
  /tmp/demo.mp4

Upload and Attach to PR

Option 1 (preferred): Upload to asciinema.org + post as PR comment

asciinema upload /tmp/demo.cast
# Copy the URL, then post as a PR comment:
gh pr comment $PR_NUM --repo $OWNER/$REPO --body '## Demo Video

https://asciinema.org/a/XXXXX'

Option 2: Post mp4 URL as PR comment (GitHub auto-renders inline)

Post the raw .mp4 URL on its own line in a PR comment. GitHub renders it as an inline video player.

gh pr comment $PR_NUM --repo $OWNER/$REPO --body '## Demo Video

https://github.com/OWNER/REPO/releases/download/TAG/demo.mp4'

DO NOT create GitHub releases just to host demo videos. Release assets pollute the releases page and do not render inline. Prefer asciinema.org for terminal recordings.

Product Demo Videos

Produce narrated product demo videos from asciinema terminal recordings. Pipeline: .cast → MP4 → ElevenLabs voiceover → synced narrated video.

Pipeline Overview

.cast files (asciinema recordings)
    ↓
agg → GIF → ffmpeg → MP4 clips (per section)
    ↓
Trim clips to interesting parts (thumbnail-guided)
    ↓
ElevenLabs API → per-section MP3 narration
    ↓
ffmpeg sync (speed-adjust video to match audio)
    ↓
Normalize + concatenate → final MP4
    Normalize + concatenate → final MP4

Setup

# Install agg (asciinema gif generator) — MUST use --git, not crate name
cargo install --git https://github.com/asciinema/agg

# Python deps in a venv
uv venv /tmp/demo/venv
source /tmp/demo/venv/bin/activate
uv pip install elevenlabs

# Verify
which agg ffmpeg ffprobe

Gotcha: cargo install agg installs a DIFFERENT crate (a library). Must use --git.

Recording with asciinema

asciinema rec /tmp/demo/recordings/section-name.cast
# Terminal size: 120x35 recommended for consistency
# Theme: set your terminal to a dark theme before recording

Key principles:

Record one logical section per file
Keep a script of what to type, but don't over-rehearse
Comments (# Section: ...) typed into terminal help with trim-point discovery later
If a command errors on camera, that's usually fine — re-record only if the error is misleading

Cast → MP4 Conversion

# Step 1: Cast → GIF (agg compresses idle time automatically)
agg --font-size 24 --theme monokai input.cast output.gif

# Step 2: GIF → MP4 (terminal-optimized encoding)
ffmpeg -y -i output.gif \
  -movflags faststart -pix_fmt yuv420p \
  -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" \
  -c:v libx264 -preset slow -crf 15 -tune stillimage \
  output.mp4

Critical settings:

-crf 15 (not 18 or 23) — terminal text needs near-lossless quality
-tune stillimage — optimizes for low-motion content (terminal = mostly static)
scale=trunc(iw/2)*2:trunc(ih/2)*2 — ensures even dimensions for h264

Gotcha: agg compresses idle time, so .cast timestamps ≠ MP4 timestamps. Find trim points via thumbnails, not math.

Finding Trim Points

# Generate thumbnails at intervals
for t in 0 5 10 15 20 30 40 50 60; do
  ffmpeg -y -ss $t -i full.mp4 -frames:v 1 -q:v 5 thumb_${t}s.jpg 2>/dev/null
done

Then use look_at or manual inspection to identify section boundaries. Trim with:

ffmpeg -y -i full.mp4 -ss $START -to $END \
  -c:v libx264 -crf 15 -tune stillimage -pix_fmt yuv420p -an \
  trimmed.mp4

Narration Script Structure

Write narration as a Python data structure for programmatic generation:

SECTIONS = [
    {
        "id": "1a_feature_intro",
        "title": "Feature Name",        # → title card
        "narration": "Script text here. Use <break time=\"0.8s\" /> for pauses.",
        "video": {
            "source": "recording-full.mp4",
            "trim": (start_sec, end_sec),
        },
    },
]

Command output differs from what narration describes
Specific numbers/stats don't match (e.g., "resisted" vs "ignored")
Feature names differ (e.g., "slash run-inspect" vs "running-tasks")
Described workflow doesn't match what the recording shows

ElevenLabs Voice Generation

Voice Selection (DO THIS FIRST)

Generate comparison samples before committing to a voice:

from elevenlabs import ElevenLabs, VoiceSettings, save

SAMPLE_TEXT = "Your representative 2-3 sentence sample."

for voice_id, name in [
    ("CwhRBWXzGAHq8TQ4Fs17", "Roger"),
    ("iP95p4xoKVk53GoZ742B", "Chris"),
    ("cjVigY5qzO86Huf0OWal", "Eric"),
    ("onwK4e9ZLuTAKqWW03F9", "Daniel"),
]:
    audio = client.text_to_speech.convert(
        voice_id=voice_id, text=SAMPLE_TEXT,
        model_id="eleven_turbo_v2_5",
        output_format="mp3_44100_192",
        voice_settings=VoiceSettings(
            stability=0.75, similarity_boost=0.85,
            style=0.0, speed=0.92, use_speaker_boost=True,
        ),
    )
    save(audio, f"sample_{name}.mp3")

Build a comparison video with labels so the user can A/B in one file:

ffmpeg -y \
  -f lavfi -i "color=c=0x1a1a2e:size=1280x720:duration=${dur}:rate=24" \
  -i sample.mp3 \
  -filter_complex "[1:a]volume=2.0,aformat=channel_layouts=stereo[a];
    [0:v]drawtext=fontfile=${FONT}:text='${NAME}':fontsize=48:
    fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2,format=yuv420p[v]" \
  -map "[v]" -map "[a]" \
  -c:v libx264 -crf 18 -c:a aac -b:a 192k -ar 44100 -ac 2 \
  -shortest labeled_sample.mp4

Generating Narration

audio = client.text_to_speech.convert(
    voice_id=VOICE_ID,
    text=section_text,
    model_id="eleven_turbo_v2_5",      # Best for English narration
    output_format="mp3_44100_192",      # 192kbps — 128 sounds bad
    voice_settings=VoiceSettings(
        stability=0.75,                  # 0.6-0.8 for narration
        similarity_boost=0.85,
        style=0.0,                       # Keep at 0 — reduces artifacts
        speed=0.92,                      # Slightly slower for clarity
        use_speaker_boost=True,
    ),
    previous_text=prev[-200:],           # Cross-section continuity
    next_text=nxt[:200],
)
save(audio, output_path)

Critical audio settings:

mp3_44100_192 minimum — 128kbps sounds tinny/compressed
eleven_turbo_v2_5 model — more natural than multilingual_v2 for English
pcm_44100 (lossless) requires Pro plan
Mono output from API — must convert to stereo + boost volume for video

Pronunciation

ElevenLabs handles most acronyms. For problem terms, use alias substitution in text:

"jj" → "jay-jay", "CLI" → "C L I", "OAuth" → "Oh-Auth"
"uv sync" → "you-vee sync", "tl run" → "T L run"

Fallback: gTTS

If no ElevenLabs key, pip install gTTS provides free Google TTS. Lower quality but unblocks the pipeline. Strip <break> tags (unsupported) and replace with periods.

Video Assembly

Syncing Video + Audio

Speed-adjust video to match audio duration. Terminal recordings tolerate wide speed ranges:

video_dur = get_duration(video_path)
audio_dur = get_duration(audio_path)
pts = max(0.25, min(4.0, video_dur / audio_dur))
inv_pts = 1.0 / pts

ffmpeg ... -filter_complex
  "[0:v]setpts={inv_pts}*PTS,...[v];[1:a]volume=2.0,aformat=channel_layouts=stereo[a]"
  -map "[v]" -map "[a]"
  -c:a aac -b:a 192k -ar 44100 -ac 2

Acceptable speed ranges:

0.5x–2.0x: imperceptible for terminal recordings
0.3x–0.5x: fine for "reading the screen" moments (diagnostics output)
3x: video becomes unwatchably fast — trim the narration instead

Normalization for Concat

ALL clips MUST be normalized before concatenation. ffmpeg concat demuxer requires identical:

Resolution (scale + pad to target)
FPS (fps=10 is fine for terminal)
Pixel format (format=yuv420p)
Audio: stereo, 44100Hz, AAC

ffmpeg -y -i clip.mp4 \
  -vf "scale=${W}:${H}:force_original_aspect_ratio=decrease,
       pad=${W}:${H}:(ow-iw)/2:(oh-ih)/2:color=0x1a1a2e,
       fps=10,format=yuv420p" \
  -c:v libx264 -crf 15 \
  -c:a aac -b:a 192k -ar 44100 -ac 2 \
  normalized.mp4

Gotcha: ffmpeg scale filter uses : separator, NOT x. scale=1756:1208 ✅, scale=1756x1208 ❌.

Title Cards

ffmpeg -y -f lavfi \
  -i "color=c=0x1a1a2e:size=${W}x${H}:duration=3:rate=10" \
  -f lavfi -i "anullsrc=r=44100:cl=stereo" \
  -vf "drawtext=fontfile=${FONT}:text='Section Title':
       fontsize=52:fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2,
       format=yuv420p" \
  -c:v libx264 -crf 15 -c:a aac -b:a 192k -t 3 title.mp4

Concatenation

# Build concat list
for clip in normalized_*.mp4; do
  echo "file '$clip'" >> concat.txt
done

ffmpeg -y -f concat -safe 0 -i concat.txt -c copy final.mp4

Multi-Agent Coordination

For recording + production split across agents, use file-based mailbox:

~/.agent-mail/project-name/
  001-recording-requests.md   # Production → Recording: what to record
  002-recording-status.md     # Recording → Production: what's done, issues
  003-followup.md             # Iterate as needed

Each message includes: date, what's done, what's needed, file locations.

New recordings go directly to the shared recordings directory. Production agent polls for new files.

Common Mistakes

Quick PR Demo Videos (Lightweight)

For simple feature demo recordings attached to PRs (no narration needed):

Record

asciinema rec /tmp/demo.cast --cols 120 --rows 35
# Demonstrate the feature, then exit

Convert to MP4

agg --font-size 24 --theme monokai /tmp/demo.cast /tmp/demo.gif
ffmpeg -y -i /tmp/demo.gif \
  -movflags faststart -pix_fmt yuv420p \
  -vf 'scale=trunc(iw/2)*2:trunc(ih/2)*2' \
  -c:v libx264 -preset slow -crf 15 -tune stillimage \
  /tmp/demo.mp4

Upload and Attach to PR

Option 1 (preferred): Upload to asciinema.org + post as PR comment

asciinema upload /tmp/demo.cast
# Copy the URL, then post as a PR comment:
gh pr comment $PR_NUM --repo $OWNER/$REPO --body '## Demo Video

https://asciinema.org/a/XXXXX'

Option 2: Post mp4 URL as PR comment (GitHub auto-renders inline)

Post the raw .mp4 URL on its own line in a PR comment. GitHub renders it as an inline video player.

gh pr comment $PR_NUM --repo $OWNER/$REPO --body '## Demo Video

https://github.com/OWNER/REPO/releases/download/TAG/demo.mp4'

DO NOT create GitHub releases just to host demo videos. Release assets pollute the releases page and do not render inline. Prefer asciinema.org for terminal recordings.

Adoption

sjawhar/product-demos

$ install --global

Security Scan Results

SKILL.md

Product Demo Videos

Pipeline Overview

Setup

Recording with asciinema

Cast → MP4 Conversion

Finding Trim Points

Narration Script Structure

ElevenLabs Voice Generation

Voice Selection (DO THIS FIRST)

Generating Narration

Pronunciation

Fallback: gTTS

Video Assembly

Syncing Video + Audio

Normalization for Concat

Title Cards

Concatenation

Multi-Agent Coordination

Common Mistakes

Quick PR Demo Videos (Lightweight)

Record

Convert to MP4

Upload and Attach to PR

Related Skills

sjawhar/skiplagged

sjawhar/ceramic

sjawhar/whatsapp

sjawhar/plugins/sjawhar/skills/watch-ci-merge

sjawhar/product-demos

$ install --global

Security Scan Results

SKILL.md

Product Demo Videos

Pipeline Overview

Setup

Recording with asciinema

Cast → MP4 Conversion

Finding Trim Points

Narration Script Structure

ElevenLabs Voice Generation

Voice Selection (DO THIS FIRST)

Generating Narration

Pronunciation

Fallback: gTTS

Video Assembly

Syncing Video + Audio

Normalization for Concat

Title Cards

Concatenation

Multi-Agent Coordination

Common Mistakes

Quick PR Demo Videos (Lightweight)

Record

Convert to MP4

Upload and Attach to PR

Related Skills

sjawhar/skiplagged

sjawhar/ceramic

sjawhar/whatsapp

sjawhar/plugins/sjawhar/skills/watch-ci-merge