Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

etanhey/video-extract

Name: video-extract
Author: etanhey

skills/golem-powers/video-extract/SKILL.md

npx skillsauth add etanhey/golems video-extract

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

/video-extract — Video Knowledge Extraction

Two workflows, one entry point. YouTube URLs get the gems pipeline (deep extraction with frames and keyword hotspots). Screen recordings get the QA pipeline (stalker-based structured findings).

How It Works

Input Detection
  ├── YouTube URL (https://youtube.com/..., youtu.be/...)
  │     → GEMS workflow
  │       → yt-dlp (audio + metadata)
  │         → whisper-cli transcription (SRT + TXT)
  │           → LLM keyword hotspot detection
  │             → yt-dlp frame extraction at hotspot timestamps
  │               → Claude Vision reads frames + transcript context
  │                 → brain_digest (full content)
  │                   → brain_store (structured gems)
  │
  └── Local recording (.mov, .mp4, .mkv)
        → QA workflow (delegates to /qa-video)
          → ffmpeg audio extraction
            → whisper-cli transcription
              → hotspot detection + frame extraction
                → Claude Vision analysis
                  → Structured QA findings

Workflow Detection & Routing

Read the user's input and route:

| Input pattern | Route to | |---------------|----------| | YouTube URL (youtube.com, youtu.be, yt.be) | workflows/gems.md — Deep extraction with keyword hotspots + frames | | Local video path (.mov, .mp4, .mkv, .webm) | workflows/qa.md — Delegates to /qa-video stalker pipeline. BrainLayer storage still applies — verify after /qa-video returns. | | "extract gems from [video]" | Gems workflow (even for local files if user wants gems, not QA) | | "process QA recording", "QA round" | QA workflow (even for YouTube if it's a recorded QA session) | | Ambiguous | Ask: "Is this a YouTube video you want gems from, or a QA recording you want processed?" |

Override signals: If the user says "gems" or "insights" or "takeaways", use gems workflow regardless of source. If they say "QA", "bugs", "findings", use QA workflow regardless of source.

🚨 BrainLayer Protocol (MANDATORY)

Both workflows store results in BrainLayer. If BrainLayer is unavailable at ANY point:

🚨🚨🚨 BRAINLAYER UNAVAILABLE — [brain_store/brain_digest] failed.
Gems/findings NOT persisted. Raw output saved to [local path].
Fix: Check BrainLayer MCP connection. Retry: brain_store(content, tags, importance).
🚨🚨🚨

NEVER silently skip BrainLayer storage. NEVER say "I'll store it later." NEVER proceed without flagging. The whole point of this skill is durable knowledge extraction — without BrainLayer, the knowledge is lost.

Fallback: If BrainLayer is down, write the full output to docs.local/video-extract/[date]-[title].md so it can be manually digested later. But still flag loudly.

Prerequisites

| Tool | Check | Install | Used by | |------|-------|---------|---------| | yt-dlp | which yt-dlp | pip3 install yt-dlp | Gems (YouTube download) | | ffmpeg | which ffmpeg | brew install ffmpeg | Both (audio extraction, frame extraction) | | whisper-cli | which whisper-cli | brew install whisper-cpp | Both (transcription) | | whisper model | ls ~/.cache/whisper/ggml-small.bin | whisper-cli --download-model small | Both |

Optional: exa MCP (fallback transcript source for YouTube if yt-dlp fails)

Quick Reference

YouTube gems:

# The skill handles this — just give it a URL
/video-extract https://www.youtube.com/watch?v=VIDEO_ID

QA recording:

# Give it a path to your screen recording
/video-extract ~/Desktop/recording.mov

Key Design Decisions

Hybrid approach: exa scout → yt-dlp deep — Tested on real video (Chase AI GSD2, 15min). Exa-only found 5 gems in 5 seconds. Full yt-dlp→whisper→frames found 12 gems in 4 minutes (2.4x). Exa missed all hard data ($30 cost, timing comparisons), war stories, and visual evidence. Use exa first as a scout ("is this video worth deep extraction?"). Use full pipeline for high-value videos where durable knowledge matters. For batch processing (5+ videos), scout all with exa, then full-pipeline the top picks.
Keyword hotspot detection — LLM reads the transcript and identifies "gem moments" — surprising insights, strong opinions, technical revelations, actionable advice. Different from QA hotspots (which look for bugs/issues).
Frame extraction from YouTube — yt-dlp can download video, then ffmpeg extracts frames at gem timestamps. The visual context (slides, code, diagrams) makes gems 3x more useful than transcript-only.
Two workflows, shared tooling — Both use ffmpeg + whisper-cli. The difference is in the analysis: gems workflow looks for insights/knowledge, QA workflow looks for bugs/issues.
BrainLayer is the destination, not files — Files are intermediate artifacts. The durable output is in BrainLayer. Local files can be cleaned up after brain_digest succeeds.

etanhey/video-extract

skills/golem-powers/video-extract/SKILL.md

Extract structured knowledge from any video source — YouTube URLs or local screen recordings. YouTube → gems workflow (yt-dlp transcript → keyword hotspots → frame extract → brain_digest → structured gems). Screen recordings → QA workflow (reuses /qa-video stalker pipeline). Use when user shares a YouTube link wanting deep extraction with frames, shares a .mov/.mp4 for QA processing, says "extract from video", "video gems", "process this recording", or mentions gem extraction from video content.

2 stars

testing

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add etanhey/golems video-extract

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 12:16 PM7.8s1 file scanned

SKILL.md

name:: video-extract
description:: Extract structured knowledge from any video source — YouTube URLs or local screen recordings. YouTube → gems workflow (yt-dlp transcript → keyword hotspots → frame extract → brain_digest → structured gems). Screen recordings → QA workflow (reuses /qa-video stalker pipeline). Use when user shares a YouTube link wanting deep extraction with frames, shares a .mov/.mp4 for QA processing, says "extract from video", "video gems", "process this recording", or mentions gem extraction from video content.
execute:: scripts/default.sh

/video-extract — Video Knowledge Extraction

Two workflows, one entry point. YouTube URLs get the gems pipeline (deep extraction with frames and keyword hotspots). Screen recordings get the QA pipeline (stalker-based structured findings).

How It Works

Input Detection
  ├── YouTube URL (https://youtube.com/..., youtu.be/...)
  │     → GEMS workflow
  │       → yt-dlp (audio + metadata)
  │         → whisper-cli transcription (SRT + TXT)
  │           → LLM keyword hotspot detection
  │             → yt-dlp frame extraction at hotspot timestamps
  │               → Claude Vision reads frames + transcript context
  │                 → brain_digest (full content)
  │                   → brain_store (structured gems)
  │
  └── Local recording (.mov, .mp4, .mkv)
        → QA workflow (delegates to /qa-video)
          → ffmpeg audio extraction
            → whisper-cli transcription
              → hotspot detection + frame extraction
                → Claude Vision analysis
                  → Structured QA findings

Workflow Detection & Routing

Read the user's input and route:

Override signals: If the user says "gems" or "insights" or "takeaways", use gems workflow regardless of source. If they say "QA", "bugs", "findings", use QA workflow regardless of source.

🚨 BrainLayer Protocol (MANDATORY)

Both workflows store results in BrainLayer. If BrainLayer is unavailable at ANY point:

🚨🚨🚨 BRAINLAYER UNAVAILABLE — [brain_store/brain_digest] failed.
Gems/findings NOT persisted. Raw output saved to [local path].
Fix: Check BrainLayer MCP connection. Retry: brain_store(content, tags, importance).
🚨🚨🚨

Fallback: If BrainLayer is down, write the full output to docs.local/video-extract/[date]-[title].md so it can be manually digested later. But still flag loudly.

Prerequisites

Optional: exa MCP (fallback transcript source for YouTube if yt-dlp fails)

Quick Reference

YouTube gems:

# The skill handles this — just give it a URL
/video-extract https://www.youtube.com/watch?v=VIDEO_ID

QA recording:

# Give it a path to your screen recording
/video-extract ~/Desktop/recording.mov

Key Design Decisions

Hybrid approach: exa scout → yt-dlp deep — Tested on real video (Chase AI GSD2, 15min). Exa-only found 5 gems in 5 seconds. Full yt-dlp→whisper→frames found 12 gems in 4 minutes (2.4x). Exa missed all hard data ($30 cost, timing comparisons), war stories, and visual evidence. Use exa first as a scout ("is this video worth deep extraction?"). Use full pipeline for high-value videos where durable knowledge matters. For batch processing (5+ videos), scout all with exa, then full-pipeline the top picks.
Keyword hotspot detection — LLM reads the transcript and identifies "gem moments" — surprising insights, strong opinions, technical revelations, actionable advice. Different from QA hotspots (which look for bugs/issues).
Frame extraction from YouTube — yt-dlp can download video, then ffmpeg extracts frames at gem timestamps. The visual context (slides, code, diagrams) makes gems 3x more useful than transcript-only.
Two workflows, shared tooling — Both use ffmpeg + whisper-cli. The difference is in the analysis: gems workflow looks for insights/knowledge, QA workflow looks for bugs/issues.
BrainLayer is the destination, not files — Files are intermediate artifacts. The durable output is in BrainLayer. Local files can be cleaned up after brain_digest succeeds.

Related Skills

etanhey/phoenix-human-view

tools

VerifiedTrustedCommunity

The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).

3SKILL.mdUpdated Jun 7, 2026

etanhey/phoenix-human-view

etanhey/mac-systems

tools

VerifiedTrustedCommunity

macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.

3SKILL.mdUpdated Jun 7, 2026

etanhey/judge-fleet

development

VerifiedTrustedCommunity

Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.

3SKILL.mdUpdated Jun 7, 2026

etanhey/fleet-wrap

development

VerifiedTrustedCommunity

Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).

3SKILL.mdUpdated Jun 7, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/etanhey/golems.git

# Copy into Claude Code skills folder (global)
cp -r golems/skills/golem-powers/video-extract ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

etanhey/golems

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT