
Extract a video segment using FFmpeg with precise start/end times
# Remotion Render Skill Generic video rendering with motion graphics overlays. ## Components - `VideoClip.tsx` - Main composition (video + captions + annotations) - `components/RollingCaption.tsx` - Karaoke-style word highlighting - `components/SpeakerLabel.tsx` - Positioned speaker annotations - `components/ContextBadge.tsx` - Location/event badge ## Usage Called by `make-video` skill. Not typically invoked directly. ```bash npx remotion render VideoClip output.mp4 --props=props.json ```
Transcribe a video clip using Gemini to get timestamped segments for captions
Generate voiceover scripts in Joyce's style for video clips
Analyze raw video content using Gemini to identify speakers, topics, key moments, and potential clip opportunities
Text-guided audio source separation using SAM-Audio via mlx-audio
# Talking-Head Video Processing Process direct-to-camera videos with sentence-level precision, rolling captions, and multi-format output. ## Full Pipeline Overview ``` Raw Video (47+ min) ↓ smart_chunk.py Chunks (2.5-3.5 min) ↓ mlx_transcribe.py --word-timestamps Transcript (word-level timestamps) ↓ sentence_split.py (precise cuts, no overlap) Sentence Clips (~10s each, frame-accurate) ↓ analyze_script.py (RECALL-FIRST) Topics with complete arcs (hook → elaborate → conclude)
# Validate Media Skill Pre-flight media validation and diagnostics using ffprobe. ## Purpose Check video/audio files for common issues before rendering: - Duration mismatches between video and audio tracks - Missing audio tracks - Codec compatibility - Volume levels - Potential freeze points ## Usage ```bash python skills/validate-media/validate.py <video_file> [--verbose] ``` ## Output JSON report with issues and recommendations: ```json { "file": "video.mp4", "video_duration": 35.1
Generate karaoke-style word-level timestamps by aligning script text to audio using Qwen3-ForcedAligner + jieba for Chinese word segmentation. Use when the user says 'align captions', 'karaoke timestamps', 'word timestamps', 'caption alignment', 'sync text to audio'.
Audio processing utilities - noise reduction, normalization, enhancement
# Chunk Process Skill Smart video chunking and MLX-accelerated transcription for long-form content. ## Problem Solved - Raw footage too long for single Gemini upload (~47 min = 5GB+) - Need word-level timestamps for precise cutting - Fixed-length chunks break mid-sentence ## Smart Chunking Instead of fixed 5-minute segments, `smart_chunk.py` finds natural break points: ```bash python skills/chunk-process/smart_chunk.py raw_footage.mp4 -o chunks/ ``` **How it works:** 1. Detect silence regi
# Inspect Video Skill Gemini-powered video inspection for quality analysis and debugging. ## Purpose Use Gemini 3 Pro's video understanding to: - Describe what's happening in the video - Identify freeze frames, black frames, glitches - Detect visual quality issues - Verify content matches expectations - Generate quality reports ## Usage ```bash python skills/inspect-video/inspect.py <video_file> [--prompt "custom query"] ``` ## Default Inspection Without custom prompt, runs a standard qua
# Make Video Skill Single-script video production from project config. ## Usage ```bash python skills/make-video/make_video.py video_projects/space_investing/ ``` That's it. One command, one output. ## Project Structure ``` video_projects/<name>/ ├── project.json # Config (required) ├── source_video.mp4 # Input video (required) ├── voiceover.wav # Audio (optional, uses video audio if missing) └── output/ └── final.mp4 # Generated output ``` ## project.json ```json { "
ASR with ~30ms timestamp precision using Qwen3-ASR + ForcedAligner
Clone a voice using qwen3-tts and generate speech from text
Find naturally clean, coherent video segments worth keeping (selection over repair)