skills/packs/video-production/video-clipper/SKILL.md
Repurposes long-form video (podcasts, interviews, talks) into short-form vertical clips for Instagram Reels, TikTok, and YouTube Shorts. Handles transcription, moment selection, clip extraction, speaker-tracked reframing (16:9 to 9:16), and animated captions.
npx skillsauth add gooseworks-ai/goose-skills video-clipperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Takes a long-form video and produces ready-to-post short-form vertical clips with speaker-tracked framing and professional animated captions. Works with podcasts, interviews, talks, and any talking-head content.
brew install ffmpeg on macOS, apt install ffmpeg on Linux)openai-whisper and requests packages (pip install openai-whisper requests). Note: openai-whisper installs PyTorch (~2GB download). This skill uses openai-whisper instead of the lighter whisper-cpp because it provides word-level timestamps needed for accurate viral moment scoring.brew install yt-dlp on macOS, pip install yt-dlp on Linux.env file (project root or any parent directory):
KLAP_API_KEY — from klap.app (reframing with speaker tracking)CAPTIONS_AI_API_KEY — from captions.ai / platform.mirage.app (animated captions)Before starting: Verify that FFmpeg, yt-dlp, and the Python packages are installed. If any are missing, instruct the user to install them before proceeding.
| Step | Cost | |---|---| | Whisper (transcription) | Free (local) | | FFmpeg (clip extraction) | Free (local) | | Klap (reframing) | ~$1.50-2.50/clip depending on plan | | Captions.ai (captions) | ~$0.15/min of output | | Total per clip | ~$2-3 |
The user provides:
Video source (required) — one of:
/path/to/podcast.mp4https://www.youtube.com/watch?v=...Moment selection mode (ask the user):
Number of clips (optional) — default 3-5. Depends on video length and content density.
Caption template (optional) — Captions.ai template ID. Default: ctpl_DxflLOnuKkb198FNdI9E (Heat). List available templates via the API if user wants to browse.
Target clip duration (optional) — default 15-60 seconds. User can specify a range.
Based on input type:
Local file:
# Verify it exists and get duration
ffprobe -v quiet -print_format json -show_format "video.mp4"
YouTube URL:
yt-dlp -f "bestvideo[height<=720]+bestaudio/best[height<=720]" --merge-output-format mp4 -o "<workdir>/source.mp4" "<URL>"
Other URL:
curl -L -o "<workdir>/source.mp4" "<URL>"
import whisper
model = whisper.load_model("base")
result = model.transcribe("source.mp4", language="en", word_timestamps=True)
Save both:
transcript.json — full result with word-level timestamps (needed for Step 3)transcript.txt — readable version with timestamps per segment (for Claude to analyze)This is the key intelligence step. Claude reads the full transcript and identifies potential clip moments.
Step 3a: Segment the transcript into candidate moments
Scan the transcript for self-contained 15-60 second windows. Look for natural start/end points (topic changes, pauses, complete thoughts).
Step 3b: Score each candidate moment on this rubric
For each candidate, score 1-10 on these five criteria:
| Criteria | What to look for | Score guide | |---|---|---| | Hook Strength | Does the first sentence grab attention? Is it a surprising claim, provocative question, or bold statement? | 10 = "wait, what?" reaction. 1 = generic setup | | Quotability | Contains a memorable one-liner that people would screenshot or share? | 10 = tweet-worthy standalone quote. 1 = no standalone phrases | | Emotional Intensity | Does the speaker show passion, humor, anger, vulnerability, or conviction? | 10 = genuine emotion. 1 = monotone/flat delivery | | Self-Containedness | Does it make complete sense without watching the rest of the video? | 10 = fully standalone. 1 = needs prior context | | Surprise/Controversy | Does it challenge conventional wisdom, reveal something unexpected, or take a hot take? | 10 = counterintuitive insight. 1 = commonly known information |
Total score = sum of all five (max 50).
Step 3c: Rank and select top N moments
Step 3d: Present to user for approval
For each selected moment, show:
Wait for user approval. User can:
Do NOT proceed to Step 4 until user approves.
For each approved moment, extract with FFmpeg:
ffmpeg -y -ss <start> -to <end> -i source.mp4 -c copy clip<N>-raw.mp4
Upload each raw clip to Klap for AI-powered speaker-tracked reframing to 9:16.
API: Klap
POST https://api.klap.app/v2/tasks/video-to-videoAuthorization: Bearer <KLAP_API_KEY>Submit each clip:
import requests
headers = {
"Authorization": f"Bearer {klap_key}",
}
# Direct file upload
with open("clip-raw.mp4", "rb") as f:
r = requests.post(
"https://api.klap.app/v2/tasks/video-to-video",
headers=headers,
files={"video": f},
data={
"language": "en",
"editing_options": '{"captions":false,"reframe":true,"emojis":false,"intro_title":false}',
"dimensions": '{"width":1080,"height":1920}'
}
)
task_id = r.json()["id"]
output_id = r.json().get("output_id")
Poll until ready:
# Poll every 30 seconds
r = requests.get(f"https://api.klap.app/v2/tasks/{task_id}", headers=headers)
status = r.json()["status"] # "processing" or "ready"
output_id = r.json()["output_id"] # project ID when ready
Export the reframed video:
# Request export
r = requests.post(
f"https://api.klap.app/v2/projects/{output_id}/exports",
headers=headers,
json={}
)
export_id = r.json()["id"]
# Poll export every 15 seconds
r = requests.get(
f"https://api.klap.app/v2/projects/{output_id}/exports/{export_id}",
headers=headers
)
# When status != "processing", download from src_url
download_url = r.json()["src_url"]
Klap handles:
Upload each reframed clip to Captions.ai for professional animated captions.
API: Captions.ai (Mirage)
POST https://api.mirage.app/v1/videos/captionsx-api-key: <CAPTIONS_AI_API_KEY>Submit each clip:
headers = {"x-api-key": captions_key}
with open("clip-reframed.mp4", "rb") as f:
r = requests.post(
"https://api.mirage.app/v1/videos/captions",
headers=headers,
files={"video": f},
data={"caption_template_id": "ctpl_DxflLOnuKkb198FNdI9E"}
)
video_id = r.json()["video_id"]
Poll until complete:
# Poll every 10 seconds
r = requests.get(f"https://api.mirage.app/v1/videos/{video_id}", headers=headers)
status = r.json()["status"] # QUEUED → PROCESSING → COMPLETE or FAILED
Download the captioned video:
r = requests.get(
f"https://api.mirage.app/v1/videos/{video_id}/content",
headers=headers,
allow_redirects=True
)
with open("clip-FINAL.mp4", "wb") as f:
f.write(r.content)
Video requirements for Captions.ai:
Available caption templates (fetch full list via GET https://api.mirage.app/v1/videos/captions/templates):
Some popular templates:
| Template | ID |
|---|---|
| Heat (default) | ctpl_DxflLOnuKkb198FNdI9E |
| Buzz | ctpl_yvE0ZnYzEj6ClCD2ee1f |
| Medusa | ctpl_yNnJyDLSH5oIouKdjQx2 |
| Drive | ctpl_wR9PXfmxW1DFxEUuATFg |
| Magazine | ctpl_vrs1M2VrxvzQWNRypRvh |
| Energy | ctpl_oofP3mxbx8CaEPNYqnKD |
| Sirius | ctpl_miZu2nLWyP7X8oEAAHcM |
| Milky Way | ctpl_jcTmJGX77Uwz2AqLOX4S |
For each final clip, Claude writes platform-specific captions:
Instagram Reel:
TikTok:
YouTube Short:
LinkedIn (if applicable):
Save everything to the output directory:
<output-dir>/
clip1-FINAL.mp4 # Ready-to-post clip
clip2-FINAL.mp4
clip3-FINAL.mp4
captions.md # All platform captions for each clip
summary.md # Overview: source video, clips made, scores, costs
Output specs:
User provides video
↓
[ASK] "Do you want me to pick the best moments, or do you have specific timestamps?"
↓
Whisper transcribes locally (free)
↓
Claude scores moments on viral rubric (hook, quotability, emotion, self-contained, surprise)
↓
[ASK] "Here are the top N moments with scores. Approve, adjust, or add your own?"
↓
FFmpeg extracts raw clips (free)
↓
Klap reframes to 9:16 with speaker tracking (~$2/clip)
↓
Captions.ai adds animated captions (~$0.15/clip)
↓
Claude writes platform-specific captions
↓
Output: final clips + captions, ready to post
brew install yt-dlp and keep updated. If download fails, user should download the video manually and provide the local file path.whisper.load_model("medium") for better accuracy at the cost of slower transcription.Add these to your .env file:
KLAP_API_KEY=kak_xxxxx
CAPTIONS_AI_API_KEY=sk-xxxxx
No other API keys or local dependencies required. Whisper model downloads automatically on first run.
development
End-to-end skill that turns a single reference image into a fully-installed, example-rendered style preset for the goose-graphics composite. Analyzes the image, writes the slim style spec, registers it in styles/index.json, generates all 7 format examples using the standard brief, renders PNGs via Playwright, and updates examples/manifest.json. Invoke with /goose-graphics-create-style.
development
Evaluate YC batch companies for investment — scrapes the YC directory, researches each company and its founders (work history, LinkedIn, website), assesses founder-company fit, and exports to Google Sheets with priority rankings. Use when asked to evaluate YC companies, research a YC batch, screen startups, or do due diligence on YC companies.
tools
Take screenshots of any website using Notte browser automation. Use when asked to screenshot, capture, or snap a webpage.
development
Search the web, platforms, and datasets. Use when asked to search, find, look up, research, or discover information from the web, YouTube, Amazon, eBay, news, academic sources, or any online platform.