skills/packs/video-production/video-clipper/SKILL.md
Repurposes long-form video (podcasts, interviews, talks) into short-form vertical clips for Instagram Reels, TikTok, and YouTube Shorts. Handles transcription, moment selection, clip extraction, speaker-tracked reframing (16:9 to 9:16), and animated captions.
npx skillsauth add athina-ai/goose-skills video-clipperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Takes a long-form video and produces ready-to-post short-form vertical clips with speaker-tracked framing and professional animated captions. Works with podcasts, interviews, talks, and any talking-head content.
brew install ffmpeg on macOS, apt install ffmpeg on Linux)openai-whisper and requests packages (pip install openai-whisper requests). Note: openai-whisper installs PyTorch (~2GB download). This skill uses openai-whisper instead of the lighter whisper-cpp because it provides word-level timestamps needed for accurate viral moment scoring.brew install yt-dlp on macOS, pip install yt-dlp on Linux.env file (project root or any parent directory):
KLAP_API_KEY — from klap.app (reframing with speaker tracking)CAPTIONS_AI_API_KEY — from captions.ai / platform.mirage.app (animated captions)Before starting: Verify that FFmpeg, yt-dlp, and the Python packages are installed. If any are missing, instruct the user to install them before proceeding.
| Step | Cost | |---|---| | Whisper (transcription) | Free (local) | | FFmpeg (clip extraction) | Free (local) | | Klap (reframing) | ~$1.50-2.50/clip depending on plan | | Captions.ai (captions) | ~$0.15/min of output | | Total per clip | ~$2-3 |
The user provides:
Video source (required) — one of:
/path/to/podcast.mp4https://www.youtube.com/watch?v=...Moment selection mode (ask the user):
Number of clips (optional) — default 3-5. Depends on video length and content density.
Caption template (optional) — Captions.ai template ID. Default: ctpl_DxflLOnuKkb198FNdI9E (Heat). List available templates via the API if user wants to browse.
Target clip duration (optional) — default 15-60 seconds. User can specify a range.
Based on input type:
Local file:
# Verify it exists and get duration
ffprobe -v quiet -print_format json -show_format "video.mp4"
YouTube URL:
yt-dlp -f "bestvideo[height<=720]+bestaudio/best[height<=720]" --merge-output-format mp4 -o "<workdir>/source.mp4" "<URL>"
Other URL:
curl -L -o "<workdir>/source.mp4" "<URL>"
import whisper
model = whisper.load_model("base")
result = model.transcribe("source.mp4", language="en", word_timestamps=True)
Save both:
transcript.json — full result with word-level timestamps (needed for Step 3)transcript.txt — readable version with timestamps per segment (for Claude to analyze)This is the key intelligence step. Claude reads the full transcript and identifies potential clip moments.
Step 3a: Segment the transcript into candidate moments
Scan the transcript for self-contained 15-60 second windows. Look for natural start/end points (topic changes, pauses, complete thoughts).
Step 3b: Score each candidate moment on this rubric
For each candidate, score 1-10 on these five criteria:
| Criteria | What to look for | Score guide | |---|---|---| | Hook Strength | Does the first sentence grab attention? Is it a surprising claim, provocative question, or bold statement? | 10 = "wait, what?" reaction. 1 = generic setup | | Quotability | Contains a memorable one-liner that people would screenshot or share? | 10 = tweet-worthy standalone quote. 1 = no standalone phrases | | Emotional Intensity | Does the speaker show passion, humor, anger, vulnerability, or conviction? | 10 = genuine emotion. 1 = monotone/flat delivery | | Self-Containedness | Does it make complete sense without watching the rest of the video? | 10 = fully standalone. 1 = needs prior context | | Surprise/Controversy | Does it challenge conventional wisdom, reveal something unexpected, or take a hot take? | 10 = counterintuitive insight. 1 = commonly known information |
Total score = sum of all five (max 50).
Step 3c: Rank and select top N moments
Step 3d: Present to user for approval
For each selected moment, show:
Wait for user approval. User can:
Do NOT proceed to Step 4 until user approves.
For each approved moment, extract with FFmpeg:
ffmpeg -y -ss <start> -to <end> -i source.mp4 -c copy clip<N>-raw.mp4
Upload each raw clip to Klap for AI-powered speaker-tracked reframing to 9:16.
API: Klap
POST https://api.klap.app/v2/tasks/video-to-videoAuthorization: Bearer <KLAP_API_KEY>Submit each clip:
import requests
headers = {
"Authorization": f"Bearer {klap_key}",
}
# Direct file upload
with open("clip-raw.mp4", "rb") as f:
r = requests.post(
"https://api.klap.app/v2/tasks/video-to-video",
headers=headers,
files={"video": f},
data={
"language": "en",
"editing_options": '{"captions":false,"reframe":true,"emojis":false,"intro_title":false}',
"dimensions": '{"width":1080,"height":1920}'
}
)
task_id = r.json()["id"]
output_id = r.json().get("output_id")
Poll until ready:
# Poll every 30 seconds
r = requests.get(f"https://api.klap.app/v2/tasks/{task_id}", headers=headers)
status = r.json()["status"] # "processing" or "ready"
output_id = r.json()["output_id"] # project ID when ready
Export the reframed video:
# Request export
r = requests.post(
f"https://api.klap.app/v2/projects/{output_id}/exports",
headers=headers,
json={}
)
export_id = r.json()["id"]
# Poll export every 15 seconds
r = requests.get(
f"https://api.klap.app/v2/projects/{output_id}/exports/{export_id}",
headers=headers
)
# When status != "processing", download from src_url
download_url = r.json()["src_url"]
Klap handles:
Upload each reframed clip to Captions.ai for professional animated captions.
API: Captions.ai (Mirage)
POST https://api.mirage.app/v1/videos/captionsx-api-key: <CAPTIONS_AI_API_KEY>Submit each clip:
headers = {"x-api-key": captions_key}
with open("clip-reframed.mp4", "rb") as f:
r = requests.post(
"https://api.mirage.app/v1/videos/captions",
headers=headers,
files={"video": f},
data={"caption_template_id": "ctpl_DxflLOnuKkb198FNdI9E"}
)
video_id = r.json()["video_id"]
Poll until complete:
# Poll every 10 seconds
r = requests.get(f"https://api.mirage.app/v1/videos/{video_id}", headers=headers)
status = r.json()["status"] # QUEUED → PROCESSING → COMPLETE or FAILED
Download the captioned video:
r = requests.get(
f"https://api.mirage.app/v1/videos/{video_id}/content",
headers=headers,
allow_redirects=True
)
with open("clip-FINAL.mp4", "wb") as f:
f.write(r.content)
Video requirements for Captions.ai:
Available caption templates (fetch full list via GET https://api.mirage.app/v1/videos/captions/templates):
Some popular templates:
| Template | ID |
|---|---|
| Heat (default) | ctpl_DxflLOnuKkb198FNdI9E |
| Buzz | ctpl_yvE0ZnYzEj6ClCD2ee1f |
| Medusa | ctpl_yNnJyDLSH5oIouKdjQx2 |
| Drive | ctpl_wR9PXfmxW1DFxEUuATFg |
| Magazine | ctpl_vrs1M2VrxvzQWNRypRvh |
| Energy | ctpl_oofP3mxbx8CaEPNYqnKD |
| Sirius | ctpl_miZu2nLWyP7X8oEAAHcM |
| Milky Way | ctpl_jcTmJGX77Uwz2AqLOX4S |
For each final clip, Claude writes platform-specific captions:
Instagram Reel:
TikTok:
YouTube Short:
LinkedIn (if applicable):
Save everything to the output directory:
<output-dir>/
clip1-FINAL.mp4 # Ready-to-post clip
clip2-FINAL.mp4
clip3-FINAL.mp4
captions.md # All platform captions for each clip
summary.md # Overview: source video, clips made, scores, costs
Output specs:
User provides video
↓
[ASK] "Do you want me to pick the best moments, or do you have specific timestamps?"
↓
Whisper transcribes locally (free)
↓
Claude scores moments on viral rubric (hook, quotability, emotion, self-contained, surprise)
↓
[ASK] "Here are the top N moments with scores. Approve, adjust, or add your own?"
↓
FFmpeg extracts raw clips (free)
↓
Klap reframes to 9:16 with speaker tracking (~$2/clip)
↓
Captions.ai adds animated captions (~$0.15/clip)
↓
Claude writes platform-specific captions
↓
Output: final clips + captions, ready to post
brew install yt-dlp and keep updated. If download fails, user should download the video manually and provide the local file path.whisper.load_model("medium") for better accuracy at the cost of slower transcription.Add these to your .env file:
KLAP_API_KEY=kak_xxxxx
CAPTIONS_AI_API_KEY=sk-xxxxx
No other API keys or local dependencies required. Whisper model downloads automatically on first run.
content-media
Takes an existing screen recording or demo video and adds professional zoom/pan effects synchronized to the narration. Uses transcript-driven zoom targeting and Remotion for rendering. Optionally replaces audio with a soundtrack.
development
Creates talking head videos from any source material (docs, changelogs, blog posts, notes, transcripts). Produces multi-scene videos with avatar narration over screenshots/images using HeyGen v2 API. Supports Quick Shot and Full Producer modes.
tools
Generates Instagram-ready product reels from any e-commerce product page URL. Scrapes product images, classifies by type, generates AI-animated clips via Higgsfield API, creates text overlays with style presets, and composes a 15-20 second reel with music. Supports model-based and product-only reels.
tools
--- name: beat-sync-reel description: Generates Instagram Reels where product image cuts are synced to audio beats. Accepts audio as a local file, URL, or search query. Uses librosa for beat detection, FFmpeg Ken Burns for scene animation, and Pillow for text overlays. No AI video generation — fully free, fast, and scalable. user-invocable: true allowed-tools: Bash, Read, Write, Edit, Grep, Glob, WebSearch argument-hint: [product-url-or-image-paths] [audio-source] --- # Beat-Sync Reel Generator