17 skills
development
# Validate Media Skill Pre-flight media validation and diagnostics using ffprobe. ## Purpose Check video/audio files for common issues before rendering: - Duration mismatches between video and audio tracks - Missing audio tracks - Codec compatibility - Volume levels - Potential freeze points ## Usage ```bash python skills/validate-media/validate.py <video_file> [--verbose] ``` ## Output JSON report with issues and recommendations: ```json { "file": "video.mp4", "video_duration": 35.1
tools
Transcribe a video clip using Gemini to get timestamped segments for captions
data-ai
--- id: speech name: Speech description: Voice-to-text (Whisper) and text-to-voice (11Labs). Use when transcribing audio, converting speech to text, or generating spoken audio from text. Commands: transcribe, synthesize. --- # Speech Voice-to-text via **Whisper** (OpenAI) and text-to-voice via **11Labs**. Use when the user wants to transcribe audio, convert speech to text, or generate spoken audio from text. Call **run_skill** with **skill: "speech"**. Set **command** or **arguments.action**
content-media
Create, organize, and distribute content across Workspace.
content-media
使用 yt-dlp 下载 YouTube 视频、音频或字幕。Use when user wants to 下载视频, 下载YouTube, youtube下载, 下载油管, download youtube, download video, 下载B站, bilibili下载.
tools
ElevenLabs text-to-speech with mac-style say UX.
tools
Local speech-to-text with the Whisper CLI (no API key).
tools
ElevenLabs text-to-speech with mac-style say UX.
testing
ASR with ~30ms timestamp precision using Qwen3-ASR + ForcedAligner
documentation
Text-guided audio source separation using SAM-Audio via mlx-audio
tools
# Remotion Render Skill Generic video rendering with motion graphics overlays. ## Components - `VideoClip.tsx` - Main composition (video + captions + annotations) - `components/RollingCaption.tsx` - Karaoke-style word highlighting - `components/SpeakerLabel.tsx` - Positioned speaker annotations - `components/ContextBadge.tsx` - Location/event badge ## Usage Called by `make-video` skill. Not typically invoked directly. ```bash npx remotion render VideoClip output.mp4 --props=props.json ```
development
# Make Video Skill Single-script video production from project config. ## Usage ```bash python skills/make-video/make_video.py video_projects/space_investing/ ``` That's it. One command, one output. ## Project Structure ``` video_projects/<name>/ ├── project.json # Config (required) ├── source_video.mp4 # Input video (required) ├── voiceover.wav # Audio (optional, uses video audio if missing) └── output/ └── final.mp4 # Generated output ``` ## project.json ```json { "
development
# Inspect Video Skill Gemini-powered video inspection for quality analysis and debugging. ## Purpose Use Gemini 3 Pro's video understanding to: - Describe what's happening in the video - Identify freeze frames, black frames, glitches - Detect visual quality issues - Verify content matches expectations - Generate quality reports ## Usage ```bash python skills/inspect-video/inspect.py <video_file> [--prompt "custom query"] ``` ## Default Inspection Without custom prompt, runs a standard qua
tools
Extract a video segment using FFmpeg with precise start/end times
development
# Chunk Process Skill Smart video chunking and MLX-accelerated transcription for long-form content. ## Problem Solved - Raw footage too long for single Gemini upload (~47 min = 5GB+) - Need word-level timestamps for precise cutting - Fixed-length chunks break mid-sentence ## Smart Chunking Instead of fixed 5-minute segments, `smart_chunk.py` finds natural break points: ```bash python skills/chunk-process/smart_chunk.py raw_footage.mp4 -o chunks/ ``` **How it works:** 1. Detect silence regi
content-media
Audio processing utilities - noise reduction, normalization, enhancement
content-media
Generate karaoke-style word-level timestamps by aligning script text to audio using Qwen3-ForcedAligner + jieba for Chinese word segmentation. Use when the user says 'align captions', 'karaoke timestamps', 'word timestamps', 'caption alignment', 'sync text to audio'.