.claude/skills/youtube-step-extractor/SKILL.md
Extract frames from a YouTube video and analyze them to identify a sequence of steps. Use when user provides a YouTube URL and wants to understand the process, tutorial, or workflow shown in the video by examining its visual content frame-by-frame. Triggers on "extract steps from video", "what steps does this video show", "analyze YouTube tutorial", "screenshot a video", "figure out the steps".
npx skillsauth add FacuM/yolo-agent youtube-step-extractorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Download a YouTube video, extract frames at regular intervals, and analyze them to identify a specific sequence of steps from the visual content.
Requires yt-dlp and ffmpeg:
# Ubuntu/Debian
sudo apt-get install -y ffmpeg
pip install yt-dlp
# macOS
brew install ffmpeg yt-dlp
yt-dlp -f "bestvideo[height<=1080]+bestaudio/best[height<=1080]" \
-o "/tmp/yt_video.mp4" \
--merge-output-format mp4 \
"YOUTUBE_URL"
For faster download (lower quality is fine for frame analysis):
yt-dlp -f "bestvideo[height<=720]+bestaudio/best[height<=720]" \
-o "/tmp/yt_video.mp4" \
--merge-output-format mp4 \
"YOUTUBE_URL"
Use the bundled script:
{baseDir}/scripts/extract_frames.sh /tmp/yt_video.mp4 /tmp/yt_frames 1
Arguments:
video_path (required): Path to the downloaded videooutput_dir (optional): Where to save frames. Default: ./frames_<video_name>fps (optional): Frames per second. Default: 1 (one frame per second)For longer videos, reduce fps to avoid too many frames:
# 1 frame every 2 seconds for videos > 5 min
{baseDir}/scripts/extract_frames.sh /tmp/yt_video.mp4 /tmp/yt_frames 0.5
# 1 frame every 5 seconds for videos > 15 min
{baseDir}/scripts/extract_frames.sh /tmp/yt_video.mp4 /tmp/yt_frames 0.2
Or use the all-in-one script:
{baseDir}/scripts/download_and_extract.sh "YOUTUBE_URL" /tmp/yt_frames 1
ls /tmp/yt_frames/Subtitles add context to what's visible in the frames:
yt-dlp --write-auto-sub --sub-lang en --skip-download --sub-format vtt \
-o "/tmp/yt_transcript" "YOUTUBE_URL"
Clean to plain text:
sed -e '/^$/d' -e '/^[0-9]/d' -e '/-->/d' -e 's/<[^>]*>//g' \
/tmp/yt_transcript.en.vtt | sort -u > /tmp/yt_transcript.txt
fps=1, review all framesfps=0.5, sample every 3-5 framesfps=0.2, focus on scene changesThe extracted frames are numbered sequentially: frame_001.jpg, frame_002.jpg, etc.
Each frame filename corresponds to its position in time:
fps=1: frame_001.jpg = ~1s, frame_060.jpg = ~60sfps=0.5: frame_001.jpg = ~2s, frame_030.jpg = ~60srm -rf /tmp/yt_video.mp4 /tmp/yt_frames /tmp/yt_transcript*
testing
Use when creating new skills, editing existing skills, or verifying skills work before deployment
documentation
This skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.
development
Use when you have a spec or requirements for a multi-step task, before touching code
tools
Use when working with Claude Code CLI, plugins, hooks, MCP servers, skills, configuration, or any Claude Code feature - provides comprehensive official documentation for all aspects of Claude Code