skills/infographic-video/SKILL.md
Generates a short video recap from a meeting summary. Splits content into sections, creates infographics, generates two-host dialogue audio via ElevenLabs Text-to-Dialogue API, and assembles into MP4 with background music. Use when user asks to create a "meeting video", "video recap", "video summary", or "meeting recap video".
npx skillsauth add paolomoz/skills infographic-videoInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Transforms a meeting summary (Markdown) into a short (~5 min) video with infographic visuals, two-host podcast-style dialogue audio, and background music.
Meeting Summary (.md)
→ Section Splitting (6 sections)
→ Infographic Prompt Crafting (Sumi style+layout references → Claude)
→ Infographic Generation (Gemini 3 Pro Image Preview)
→ Dialogue Scripts (two-host, fast-paced podcast tone, with bridge lines)
→ Audio Generation (ElevenLabs Text-to-Dialogue API, 2 voices, 1.0s lead-in silence, 1.2x speed)
→ Video Assembly (moviepy + background music → MP4)
Agent Execution:
SKILL_DIR = this SKILL.md file's directory${SKILL_DIR}/scripts//infographic-video path/to/meeting-summary.md
/infographic-video path/to/summary.md --sections 6 --voices Alex=ID1,Jordan=ID2
/infographic-video path/to/summary.md --output video/output/
| Option | Description | Default |
|--------|-------------|---------|
| --sections <n> | Number of content sections | 6 |
| --output <dir> | Output directory for all artifacts | meeting/ |
| --voices <a,b> | Voice mapping: Name=ID,Name=ID | Archer, Alexandra |
| --host-names <a,b> | Display names for the two hosts | Alex, Jordan |
| --aspect | Infographic aspect ratio | landscape (16:9) |
| --lang | Language for all text content | en |
| --duration | Target video duration in seconds | 300 (5 min) |
| Tool | Purpose | Install |
|------|---------|---------|
| ffmpeg | Audio probing, video encoding | brew install ffmpeg |
| moviepy | Video composition, crossfades, music mixing | pip3 install moviepy |
| requests | ElevenLabs API calls | pip3 install requests |
| python-dotenv | Load API keys from .env | pip3 install python-dotenv |
| Pillow | Title/outro card generation | pip3 install Pillow |
| google-genai | Gemini 3 Pro Image Preview for infographics | pip3 install google-genai |
| anthropic | Claude API for Sumi prompt crafting | pip3 install anthropic |
| Variable | Description |
|----------|-------------|
| ELEVENLABS_API_KEY | ElevenLabs API key (required for audio) |
| GOOGLE_API_KEY | Google AI API key (required for Gemini image generation) |
| ANTHROPIC_API_KEY | Anthropic API key (required for Sumi prompt crafting) |
| Host | Voice | Voice ID | Character |
|------|-------|----------|-----------|
| Alex | Archer | L0Dsvb3SLTyegXwtm47J | Conversational, warm male guide |
| Jordan | Alexandra | kdmDKE6EkgrWrrykO9Qt | Realistic, chatty female reactor |
Fallback voices if defaults unavailable: Roger (CwhRBWXzGAHq8TQ4Fs17), Laura (FGY2WhTYpPnrIDTdsKH5).
Run scripts/list_voices.py to discover available ElevenLabs voices.
{output-dir}/
├── sections/
│ ├── 01-{slug}/source.md # Section content
│ ├── 02-{slug}/source.md
│ └── ...
├── infographic/
│ ├── 01-{slug}.png # Generated infographics
│ └── ...
├── audio/
│ ├── 01-{slug}.mp3 # Generated dialogue audio
│ └── bgm.mp3 # Background music (optional)
├── dialogue.json # Dialogue scripts (agent-generated)
├── video-config.json # Title/outro card config
└── output/
└── meeting-recap-final.mp4 # Final video
ffmpeg -version
python3 -c "import moviepy; import PIL; import requests; import dotenv"
.env:
grep ELEVENLABS_API_KEY .env
grep GOOGLE_API_KEY .env
grep ANTHROPIC_API_KEY .env
Read the source meeting summary and split into sections.
Section Planning Guidelines:
source.md with extracted content from the summarySee references/section-planning.md for detailed guidelines.
Pick ONE style for all sections for visual consistency. Vary layouts only across sections to match content type.
Recommended layouts by content type:
| Content Type | Layout |
|--------------|--------|
| Overview / Intro | bento-grid |
| Journey / Process | winding-roadmap |
| Core Concept / Mission | hub-spoke |
| Framework / Model | linear-progression |
| Metrics / KPIs | dashboard |
| Future / Technology | circular-flow |
Good style choices (pick one and use for all 6 sections): bold-graphic, corporate-memphis, technical-schematic, storybook-watercolor.
All infographics: 16:9 landscape (1920×1080 or 2K equivalent).
For each section, generate an infographic using Sumi prompt crafting + Gemini 3 Pro:
python3 ${SKILL_DIR}/scripts/generate_image.py \
{output}/sections/{NN}-{slug}/source.md \
{output}/infographic/{NN}-{slug}.png \
--layout {layout-id} --style {style-id}
The script:
{NN}-{slug}.prompt.md for debuggingUse the same --style for all sections (vary --layout only).
Generate sequentially (one at a time) to ensure quality. Verify output exists and is 16:9 aspect ratio.
Sumi references are loaded from /Users/paolo/playground/sumi/backend/sumi/references/data/ (override with --sumi-dir).
Write casual two-host dialogue for each section and save as {output}/dialogue.json:
{
"sections": [
{
"name": "01-intro",
"dialogue": [
{"speaker": "Alex", "text": "Opening line..."},
{"speaker": "Jordan", "text": "Reaction..."},
{"speaker": "Alex", "text": "Bridge to next section..."}
]
}
]
}
Dialogue Format:
See references/dialogue-format.md for detailed guidelines, bridging examples, and voice configuration.
Create {output}/video-config.json with title/outro card configuration:
{
"title": {
"line1": "Meeting Title",
"line2": "Meeting Recap",
"line3": "Date"
},
"outro": {
"quote": "A memorable closing quote from the meeting.",
"attribution": "Speaker Name"
}
}
Then run the dialogue generation script:
python3 ${SKILL_DIR}/scripts/generate_dialogue.py {output}/dialogue.json {output}/audio
How it works:
IMPORTANT ffmpeg note: When generating silence with -f lavfi, the -t duration flag must come BEFORE -i to bound the source. Placing it after creates an unbounded silence source.
Verification:
Place a background music MP3 at {output}/audio/bgm.mp3. The assembler will automatically:
If bgm.mp3 is not present, the video assembles without background music.
Recommended source: Pixabay Music (free, no attribution required) — search for corporate/tech presentation background tracks, 1-2 minutes (will be looped automatically).
Run the video assembly script:
python3 ${SKILL_DIR}/scripts/assemble_video.py {output}/video-config.json {output}/infographic {output}/audio {output}/output
How it works:
Video structure:
[Title Card 3s] → [Section 1] → [Section 2] → ... → [Section 6] → [Outro Card 3s]
↕ 0.5s crossfade between each clip ↕
🎵 full vol 🎵 15% vol (under dialogue) 🎵 full → fade out
Export settings:
| Setting | Value | |---------|-------| | Resolution | 1920×1080 | | FPS | 24 | | Video codec | H.264 (libx264) | | Audio codec | AAC | | Video bitrate | 5000k | | Audio bitrate | 192k |
ffprobe -v quiet -show_entries format=duration \
-show_entries stream=width,height,codec_name \
-of json output/meeting-recap-final.mp4
Title card (3s):
Outro card (3s):
Cards are generated as PNGs via Pillow (no ImageMagick dependency).
references/section-planning.md — How to split content into sectionsreferences/dialogue-format.md — Dialogue writing guidelines, bridging, voice config, API detailsdevelopment
Generate artistic infographics from any topic. Runs the Sumi pipeline (analyze → structure → craft prompt → generate image) entirely within Claude Code. Use when "generate infographic", "create infographic", "sumi", "make an infographic about", or "visualize topic".
tools
Implement Server-Sent Events streaming from Cloudflare Workers to browser clients with reconnection, state persistence, and progress tracking. Use when building "SSE streaming", "real-time updates", "server push", or "event streaming".
development
Audit websites by cross-referencing query indexes, sitemaps, and navigation to identify content gaps, stale pages, missing metadata, and quality issues. Use when "auditing a website", "finding content gaps", "site quality audit", or "content inventory analysis".
data-ai
Track user session context across multi-turn interactions using browser sessionStorage and server-side KV caching with TTL. Use when implementing "session tracking", "conversation context", "multi-turn sessions", or "user journey tracking".