skills/autocut-shorts/SKILL.md
--- name: autocut-shorts description: Main orchestration skill for automatic creation of short-form content (TikTok, YouTube Shorts, Instagram Reels) from long videos. Fully automated workflow: download video, transcribe, detect highlights (transcript + laughter + sentiment + scenes), trim segments, resize to 9:16 portrait, and add subtitles. Finds viral-worthy moments like OpusClip and Vizard.ai. allowed-tools: Bash(ffmpeg:*) Bash(yt-dlp:*) Bash(python:*) compatibility: Requires all trimer-clip
npx skillsauth add akrindev/trimer-clip skills/autocut-shortsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This is the main orchestration skill that combines all other skills to automatically create short-form content from long videos.
This skill automates the entire workflow:
scripts/autocut.pyMain autocut workflow script.
Usage:
python skills/autocut-shorts/scripts/autocut.py <video_or_url> [options]
Options:
--source: Source type (file, youtube) - auto-detected--num-clips: Number of clips to generate (default: 5)--min-duration: Minimum clip duration in seconds (default: 15)--max-duration: Maximum clip duration in seconds (default: 60)--platform: Target platform (tiktok, shorts, reels, facebook) - default: tiktok--output-dir: Output directory (default: ./shorts/)--transcription-model: Transcription model (auto, whisper, gemini, openai, google) - default: auto--whisper-model: Whisper model size (tiny, base, small, medium, large-v3) - default: large-v3--openai-model: OpenAI Whisper model (default: whisper-1)--google-model: Google Speech model (default: latest_long)--diarization-model: Speaker diarization (auto, pyannote, gemini, none) - default: auto--huggingface-token: HuggingFace token for pyannote (or use env var)--focus-speaker: Extract clips only for specific speaker (SPEAKER_00, etc.)--gemini-api-key: Gemini API key (or use env var)--skip-transcribe: Skip transcription if already have transcript--skip-diarization: Skip speaker diarization--skip-scenes: Skip scene detection--skip-laughter: Skip laughter detection--skip-sentiment: Skip sentiment analysis--transcript-path: Use existing transcript file (SRT/VTT/JSON)--word-timestamps-path: Provide word-timestamp JSON for karaoke subtitles--subtitle-mode: Subtitle mode (auto, word, segment) - default: auto--style: Subtitle style (tiktok, shorts, reels, default) - default: tiktokExamples:
Basic autocut from file:
python skills/autocut-shorts/scripts/autocut.py video.mp4
Autocut from YouTube URL:
python skills/autocut-shorts/scripts/autocut.py "https://www.youtube.com/watch?v=VIDEO_ID"
Generate 10 clips for Instagram Reels:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --num-clips 10 --platform reels --style reels
Use Gemini for transcription:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model gemini
Quick local test with Whisper tiny:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model whisper --whisper-model tiny
Use OpenAI Whisper API for word-level captions:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model openai --subtitle-mode word
Use Google Speech-to-Text for word-level captions:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model google --subtitle-mode word
Custom duration range:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --min-duration 20 --max-duration 45
Use existing transcript:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcript-path video.srt --skip-transcribe
Use word timestamps JSON directly:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --word-timestamps-path words.json --subtitle-mode word
scripts/quick_cut.pyQuick cut without full analysis (faster).
Usage:
python skills/autocut-shorts/scripts/quick_cut.py <video_path> [options]
Options:
--timestamps: JSON file with timestamps to cut--output-dir: Output directory--platform: Target platformExample:
python skills/autocut-shorts/scripts/quick_cut.py video.mp4 --timestamps cuts.json
If URL provided:
Extracts audio and transcribes:
Runs detection modules:
Combines all signals:
Virality Score =
35% Transcript (hooks, viral content) +
25% Laughter (humor) +
25% Sentiment (emotion) +
15% Scenes (visual transitions)
Ranks all segments and selects top N.
For each highlight:
Converts to 9:16:
Burns in captions:
Saves final clips:
{original}_short_{index}.mp4shorts/
<video_slug>_<YYYYMMDD-HHMMSS>/
clip_001/
master.mp4
data.json
clip_002/
master.mp4
data.json
{
"success": true,
"source": {
"type": "youtube",
"url": "https://youtube.com/watch?v=...",
"title": "Video Title",
"duration": 1200.5
},
"processing": {
"transcription_model": "gemini-flash-lite-latest",
"detection_methods": ["transcript", "laughter", "sentiment", "scenes"],
"platform": "tiktok"
},
"results": {
"total_clips": 5,
"clips": [
{
"rank": 1,
"filename": "video_short_001.mp4",
"start_time": 45.2,
"end_time": 72.5,
"duration": 27.3,
"virality_score": 0.92,
"text": "This is the key moment...",
"output_path": "shorts/video_short_001.mp4"
}
],
"total_duration": 135.5,
"avg_virality_score": 0.78
},
"performance": {
"total_time": 180.5,
"transcription_time": 45.2,
"analysis_time": 67.3,
"processing_time": 68.0
}
}
_tiktok_{index}.mp4_shorts_{index}.mp4_reels_{index}.mp4_facebook_{index}.mp4Transcript (35% weight):
Laughter (25% weight):
Sentiment (25% weight):
Scenes (15% weight):
virality_score = (
transcript_score * 0.35 +
laughter_score * 0.25 +
sentiment_score * 0.25 +
scene_score * 0.15
)
Premium Clips (0.8-1.0): Must include Excellent Clips (0.6-0.8): High priority Good Clips (0.4-0.6): Consider including
Default Behavior (--diarization-model auto): The AI agent automatically selects based on context:
# Use pyannote when:
if "podcast" in user_request or "interview" in user_request:
return "pyannote" # Multi-speaker, needs accuracy
if "accurate" in user_request or "precise" in user_request:
return "pyannote" # User explicitly wants accuracy
if "panel" in user_request or "debate" in user_request:
return "pyannote" # Complex multi-speaker scenarios
if "overlapping" in user_request or "talk over" in user_request:
return "pyannote" # Overlapping speech detection
if "privacy" in user_request or "offline" in user_request:
return "pyannote" # Local processing needed
# Use Gemini when:
if "quick" in user_request or "fast" in user_request:
return "gemini" # Speed priority
if "single speaker" in user_request or "monologue" in user_request:
return "gemini" # Simple scenario
if "no diarization" in user_request or "skip speakers" in user_request:
return "none" # User doesn't want speaker detection
# Default for ambiguous cases:
return "pyannote" if likely_multi_speaker(video) else "gemini"
Decision Matrix:
| Scenario | Recommended | Reason | |----------|-------------|--------| | Podcast with 2-3 hosts | pyannote | High accuracy for multi-speaker | | Interview (host + guest) | pyannote | Precise speaker separation | | Panel discussion | pyannote | Handles 4+ speakers well | | Single speaker vlog | gemini | Faster, good enough | | Gaming commentary | gemini | Usually 1-2 speakers | | Tutorial video | gemini | Single speaker, speed matters | | Debate/competitive | pyannote | Overlapping speech detection | | Privacy-sensitive | pyannote | Local processing |
Examples by Use Case:
# Podcast - use pyannote automatically
python skills/autocut-shorts/scripts/autocut.py podcast.mp4
# Interview - use pyannote for accuracy
python skills/autocut-shorts/scripts/autocut.py interview.mp4
# Vlog - use gemini (single speaker, faster)
python skills/autocut-shorts/scripts/autocut.py vlog.mp4
# Force pyannote explicitly
python skills/autocut-shorts/scripts/autocut.py video.mp4 --diarization-model pyannote
# Skip diarization for simple content
python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --diarization-model none
# Extract only host's segments
python skills/autocut-shorts/scripts/autocut.py podcast.mp4 --focus-speaker SPEAKER_00
The agent automatically detects:
Override any time:
Users can always override with --diarization-model flag.
This skill uses all other skills:
youtube-downloader: Download from URLvideo-transcriber: Transcribe audioscene-detector: Find visual cut pointslaughter-detector: Find funny momentssentiment-analyzer: Find emotional peakshighlight-scanner: Combine all signalsvideo-trimmer: Cut segmentsportrait-resizer: Convert to 9:16subtitle-overlay: Add captionspython skills/autocut-shorts/scripts/autocut.py podcast.mp4 --num-clips 10 --platform shorts
python skills/autocut-shorts/scripts/autocut.py vlog.mp4 --num-clips 5 --platform tiktok
python skills/autocut-shorts/scripts/autocut.py "https://youtube.com/watch?v=..." --platform tiktok
python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --min-duration 30 --max-duration 60
Processing Time (approximate):
Breakdown:
testing
Download videos from YouTube URLs. Use when user wants to download a YouTube video for processing, editing, or transcription. Supports different quality options, audio-only extraction, and playlist downloads.
tools
Trim and cut videos by timestamp with precision. Supports both stream copy (fast) and re-encoding (quality) modes. Use when you need to extract specific segments from videos, create clips from highlights, or cut unwanted portions.
development
Transcribe audio from videos using Whisper (local), OpenAI Whisper API, Google Speech-to-Text, or Gemini API (gemini-flash-lite-latest). Use when you need to convert video/audio to text for further processing, subtitle generation, or content analysis. Supports multiple languages, speaker diarization, and timestamp-accurate transcription. Gemini provides additional features like emotion detection and viral segment analysis.
tools
Add burned-in subtitles/captions to video clips. Supports SRT/VTT/ASS subtitle files, customizable styling (font, size, color, position), and platform-specific presets for TikTok, YouTube Shorts, and Instagram Reels.