
Process large volumes of requests using Gemini Batch API via scripts/. Use for batch processing, bulk text generation, processing JSONL files, async job execution, and cost-efficient high-volume AI tasks. Triggers on "batch processing", "bulk requests", "JSONL", "async job", "batch job".
Upload and manage files using Google Gemini File API via scripts/. Use for uploading images, audio, video, PDFs, and other files for use with Gemini models. Supports file upload, status checking, and file management. Triggers on "upload file", "file API", "upload image", "upload PDF", "upload video", "file management".
Generate images using Google Gemini and Imagen models via scripts/. Use for AI image generation, text-to-image, creating visuals from prompts, generating multiple images, custom aspect ratios, and high-resolution output up to 4K. Triggers on "generate image", "create image", "imagen", "text to image", "AI art", "nano banana".
Generate speech from text using Google Gemini TTS models via scripts/. Use for text-to-speech, audio generation, voice synthesis, multi-speaker conversations, and creating audio content. Supports multiple voices and streaming. Triggers on "text to speech", "TTS", "generate audio", "voice synthesis", "speak this text".
Generate text embeddings using Gemini Embedding API via scripts/. Use for creating vector representations of text, semantic search, similarity matching, clustering, and RAG applications. Triggers on "embeddings", "semantic search", "vector search", "text similarity", "RAG", "retrieval".
Generate text content using Google Gemini models via scripts/. Use for text generation, multimodal prompts with images, thinking mode for complex reasoning, JSON-formatted outputs, and Google Search grounding for real-time information. Triggers on "generate with gemini", "use gemini for text", "AI text generation", "multimodal prompt", "gemini thinking mode", "grounded response".
Convert videos to 9:16 portrait format (1080x1920) for TikTok, YouTube Shorts, Instagram Reels, and Facebook Reels. Supports smart cropping (face-aware), center cropping, and letterboxing. Maintains aspect ratio and quality.
Detect scene changes and shot boundaries in videos. Use when you need to identify where scenes change, find natural cut points, or segment video into scenes. Supports adaptive detection for both fast cuts and gradual fades.
Analyze sentiment and emotion in audio/video content. Use when you want to identify emotional peaks, detect positive/negative sentiment, find reaction moments, or analyze the emotional journey throughout the video. Supports both transcript-based and AI-based emotion detection.
Add burned-in subtitles/captions to video clips. Supports SRT/VTT/ASS subtitle files, customizable styling (font, size, color, position), and platform-specific presets for TikTok, YouTube Shorts, and Instagram Reels.
Download videos from YouTube URLs. Use when user wants to download a YouTube video for processing, editing, or transcription. Supports different quality options, audio-only extraction, and playlist downloads.
Trim and cut videos by timestamp with precision. Supports both stream copy (fast) and re-encoding (quality) modes. Use when you need to extract specific segments from videos, create clips from highlights, or cut unwanted portions.
Process multiple videos in batch mode for efficiency. Supports batch download from YouTube URLs, batch autocut for multiple videos, and batch export to multiple platforms. Generates consolidated reports with all clips.
Detect laughter and humorous segments in audio/video. Use when you want to find funny moments, identify audience reactions, or create viral clips from humorous content. Supports both AI model detection and keyword-based detection from transcripts.
Advanced speaker diarization using pyannote-audio. Identify who speaks when, detect multiple speakers, handle overlapping speech, and create speaker-specific segments. Use when you need accurate speaker identification, multi-speaker content analysis, or speaker-specific clip extraction. More accurate than Gemini's built-in diarization for complex scenarios.
Transcribe audio from videos using Whisper (local), OpenAI Whisper API, Google Speech-to-Text, or Gemini API (gemini-flash-lite-latest). Use when you need to convert video/audio to text for further processing, subtitle generation, or content analysis. Supports multiple languages, speaker diarization, and timestamp-accurate transcription. Gemini provides additional features like emotion detection and viral segment analysis.
--- name: autocut-shorts description: Main orchestration skill for automatic creation of short-form content (TikTok, YouTube Shorts, Instagram Reels) from long videos. Fully automated workflow: download video, transcribe, detect highlights (transcript + laughter + sentiment + scenes), trim segments, resize to 9:16 portrait, and add subtitles. Finds viral-worthy moments like OpusClip and Vizard.ai. allowed-tools: Bash(ffmpeg:*) Bash(yt-dlp:*) Bash(python:*) compatibility: Requires all trimer-clip
Combined analysis skill to find viral-worthy highlights from videos. Scans transcripts, detects laughter, analyzes sentiment/emotion, and uses scene changes to identify the most engaging moments for TikTok/Shorts/Reels. Produces ranked list of highlight segments with virality scores.