skills/packs/video-production/beat-sync-reel/SKILL.md
--- name: beat-sync-reel description: Generates Instagram Reels where product image cuts are synced to audio beats. Accepts audio as a local file, URL, or search query. Uses librosa for beat detection, FFmpeg Ken Burns for scene animation, and Pillow for text overlays. No AI video generation — fully free, fast, and scalable. user-invocable: true allowed-tools: Bash, Read, Write, Edit, Grep, Glob, WebSearch argument-hint: [product-url-or-image-paths] [audio-source] --- # Beat-Sync Reel Generator
npx skillsauth add athina-ai/goose-skills skills/packs/video-production/beat-sync-reelInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Takes product images and a trending audio track, detects beats, and produces an Instagram Reel where every image cut lands exactly on a beat. Fast, free (no API credits), and scalable.
librosa and Pillow packagesThe user provides:
Audio (required) — one of three formats:
/path/to/trending-audio.mp3yt-dlp -x --audio-format mp3 -o "audio.%(ext)s" "<URL>"Product images (required) — one of:
.json to the product URL and extract image URLs from the responsecurl with -H "Referer: <site-domain>" and a browser user-agent, then parse <img> tagsAudio segment (optional) — start and end timestamps in seconds to use a specific portion of the audio. Defaults to 0-15s.
Beat frequency (optional) — cut on every Nth beat. Defaults to 2 (every 2nd beat, ~1.3s per image at typical tempos). Use 1 for fast cuts, 4 for slower.
Product info (optional) — brand name, product name, price, CTA URL. Used for end card. If not provided, skip end card.
Style preset (optional) — for end card text. One of: minimal, luxury, bold, editorial, clean. Defaults to clean. See Style Presets table below for font details.
Based on input type:
Local file:
# Just verify it exists and get duration
ffprobe -v quiet -print_format json -show_format "audio.mp3"
URL (Instagram/TikTok/YouTube):
yt-dlp -x --audio-format mp3 -o "<workdir>/audio.%(ext)s" "<URL>"
Audio name (search):
"<audio name>" site:youtube.com or "<audio name>" instagram audioyt-dlp -x --audio-format mp3 -o "<workdir>/audio.%(ext)s" "<URL>"import librosa
import numpy as np
y, sr = librosa.load("audio.mp3", sr=None)
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
beat_times = [float(t) for t in beat_times]
Select cut points based on beat frequency:
# beat_freq = 2 means every 2nd beat
cut_times = [0.0] + [beat_times[i] for i in range(beat_freq - 1, len(beat_times), beat_freq)]
Trim to audio segment:
start, end = 0.0, 15.0 # or user-provided
cut_times = [t - start for t in cut_times if start <= t < end]
if cut_times[0] != 0.0:
cut_times.insert(0, 0.0)
Typical results by tempo:
| Tempo (BPM) | Beat interval | Every 2nd beat | Cuts in 15s | |-------------|--------------|----------------|-------------| | 80 | 0.75s | 1.5s | ~10 | | 100 | 0.60s | 1.2s | ~12 | | 120 | 0.50s | 1.0s | ~15 | | 140 | 0.43s | 0.86s | ~17 |
If cuts > available images, cycle through images with different Ken Burns effects.
If images were scraped from a product URL, filter out infographics and size charts:
Classification heuristic (by position on product page):
| Position | Likely Type | |----------|-------------| | Image 1 (first on page) | Hero / front-facing model | | Image 2 | Alternate angle (side/back) | | Image 3-4 | Close-up or detail | | Last image | Size guide or back view |
Model vs product-only detection: If image height > 1.5× width AND file size > 100KB → likely a model photo. Otherwise → product-only photo.
Order images for visual variety: hero → detail → alternate angle → repeat.
For each cut interval, create a Ken Burns clip from the assigned image. Alternate through these effects:
# Zoom in center
ffmpeg -y -loop 1 -i "image.jpg" \
-vf "scale=2160:3840,zoompan=z='1+0.08*in/{frames}':x='iw/2-(iw/zoom/2)':y='ih/2-(ih/zoom/2)':d={frames}:s=1080x1920:fps=25" \
-t {duration} -c:v libx264 -pix_fmt yuv420p -r 25 scene.mp4
# Zoom out center
zoompan=z='1.15-0.08*in/{frames}':x='iw/2-(iw/zoom/2)':y='ih/2-(ih/zoom/2)':d={frames}:s=1080x1920:fps=25
# Pan left to right
zoompan=z='1.08':x='(iw-iw/zoom)*in/{frames}':y='ih/2-(ih/zoom/2)':d={frames}:s=1080x1920:fps=25
# Pan right to left
zoompan=z='1.08':x='(iw-iw/zoom)*(1-in/{frames})':y='ih/2-(ih/zoom/2)':d={frames}:s=1080x1920:fps=25
# Zoom in top-center (for torso/face crops)
zoompan=z='1+0.08*in/{frames}':x='iw/2-(iw/zoom/2)':y='ih/4-(ih/zoom/4)':d={frames}:s=1080x1920:fps=25
# Pan up
zoompan=z='1.06':x='iw/2-(iw/zoom/2)':y='(ih-ih/zoom)*(1-in/{frames})':d={frames}:s=1080x1920:fps=25
Where {frames} = int(duration * 25) (25 fps).
Important: Always scale source image to at least 2160x3840 before zoompan so there's enough resolution for the zoom.
If product info is provided, create a 2-second end card using Pillow:
from PIL import Image, ImageDraw, ImageFont
card = Image.new("RGBA", (1080, 1920), (20, 20, 20, 255))
draw = ImageDraw.Draw(card)
# Brand name (centered, y=750)
# Product name (centered, y=830)
# Price (centered, y=920, accent color)
# CTA (centered, y=1020, muted)
card.save("endcard.png")
Convert to video:
ffmpeg -y -loop 1 -i endcard.png -vf "scale=1080:1920" \
-t 2 -c:v libx264 -pix_fmt yuv420p -r 25 endcard.mp4
Fonts are provided as shared files in the pack's fonts/ directory (copied into each skill on install). Fall back to system fonts if custom fonts are not found.
| Preset | Title Font | Body Font | Text Color | Treatment | |--------|-----------|-----------|------------|-----------| | minimal | Montserrat-Light.ttf | Montserrat-Light.ttf | White (255,255,255) | No background, subtle shadow | | luxury | System Didot (/System/Library/Fonts/Supplemental/Didot.ttc) | Cormorant-Regular.ttf | Cream (245,235,210) | Thin gold stroke | | bold | System Futura (/System/Library/Fonts/Supplemental/Futura.ttc) | Montserrat-Bold.ttf | White | Dark backdrop bar, uppercase | | editorial | Cormorant-Italic.ttf | Cormorant-Regular.ttf | White | Minimal, italic titles | | clean | System Helvetica (/System/Library/Fonts/Helvetica.ttc) | System Helvetica | White | Simple shadow, professional |
cat > concat.txt << EOF
file 'scene-00.mp4'
file 'scene-01.mp4'
...
file 'endcard.mp4'
EOF
ffmpeg -y -f concat -safe 0 -i concat.txt \
-c:v libx264 -pix_fmt yuv420p -r 25 reel-silent.mp4
ffmpeg -y -i reel-silent.mp4 -i audio.mp3 \
-filter_complex "[1:a]atrim={start}:{end},asetpts=PTS-STARTPTS,afade=t=in:st=0:d=0.5,afade=t=out:st={fade_start}:d=2,volume=0.8[aud]" \
-map 0:v -map "[aud]" \
-c:v copy -c:a aac -shortest output.mp4
Where {start} and {end} are the audio segment timestamps, and {fade_start} = total_duration - 2.0.
Save the final reel to a user-specified directory (or the current working directory).
Output specs:
product-reel-generator skill which supports Higgsfield/Kling/Seedance video generation APIs.Free. No API credits needed. Only uses FFmpeg, librosa, and Pillow — all local processing.
User: "Make a beat-sync reel for this product: https://www.damensch.com/products/full-sleeve-polo
Use this audio: https://www.instagram.com/reels/audio/123456789/
Cut on every 2nd beat, use the first 15 seconds"
Agent:
1. Downloads audio with yt-dlp
2. Scrapes product images from URL
3. Detects beats with librosa
4. Creates Ken Burns clips at beat intervals
5. Adds end card with product info
6. Mixes audio
7. Outputs reel
content-media
Takes an existing screen recording or demo video and adds professional zoom/pan effects synchronized to the narration. Uses transcript-driven zoom targeting and Remotion for rendering. Optionally replaces audio with a soundtrack.
tools
Repurposes long-form video (podcasts, interviews, talks) into short-form vertical clips for Instagram Reels, TikTok, and YouTube Shorts. Handles transcription, moment selection, clip extraction, speaker-tracked reframing (16:9 to 9:16), and animated captions.
development
Creates talking head videos from any source material (docs, changelogs, blog posts, notes, transcripts). Produces multi-scene videos with avatar narration over screenshots/images using HeyGen v2 API. Supports Quick Shot and Full Producer modes.
tools
Generates Instagram-ready product reels from any e-commerce product page URL. Scrapes product images, classifies by type, generates AI-animated clips via Higgsfield API, creates text overlays with style presets, and composes a 15-20 second reel with music. Supports model-based and product-only reels.