skills/golem-powers/qa-video/SKILL.md
Video-based QA pipeline — screen recording with narration processed into structured findings. Use when the user records a QA session (.mov), wants to process a video for bugs/UX issues, needs to generate a QA checklist before testing, wants to hand off QA findings to an implementing agent, or mentions "stalker pipeline", "video QA", "screen recording QA", "qa-record", or click capture logging. Also triggers on iterative QA rounds (record → process → fix → retest cycles).
npx skillsauth add etanhey/golems qa-videoInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Record your screen while narrating. The pipeline extracts your speech, identifies what you were looking at, and compiles structured QA findings ready to hand off to an implementing agent.
Screen Recording (.mov)
→ ffmpeg audio extraction
→ whisper-cli transcription (SRT + TXT)
→ LLM reads SRT, identifies QA-relevant segments
→ Frame extraction at hotspot timestamps + regular intervals
→ Claude Vision reads frames + correlates with transcript
→ Structured QA findings document
→ Agent handoff (Codex/Claude worker via cmux)
Tell the user BEFORE every QA recording session:
Narrate your intentions BEFORE clicking. Say "I'm about to click Spin Rare on Sarah" → click → describe what happened. This aligns speech timestamps with actions, making the pipeline 3x more accurate at identifying what you were pointing at.
The 2-5 second gap between clicking and narrating is the #1 accuracy killer. Coaching the user to narrate-first is more impactful than any technical fix.
Read the user's request and route to the right workflow:
| User says | Route to | |-----------|----------| | "let's do QA", "test this", "QA round" | workflows/record.md — Pre-QA checklist + recording setup | | "process this video", "I recorded QA", path to .mov file | workflows/process.md — Stalker pipeline processing | | "send fixes to Codex", "hand off findings" | workflows/handoff.md — Agent handoff pattern | | "next round", "retest", "QA round N" | workflows/iterate.md — Multi-round QA cycle | | "set up click capture", "qa-record" | references/click-capture.md — CGEventTap + qa-record.sh |
If ambiguous: Default to the full cycle — checklist → record → process → handoff.
LLM reads the SRT directly — no automated hotspot detection (sox/ImageMagick). Claude reading the transcript is a better hotspot detector than volume spikes or frame diffs. The automated signals (from the original Twitch stalker pipeline) are unnecessary for QA narration.
Regular interval + hotspot frames — Extract one frame every 30 seconds for full visual coverage, PLUS frames at each identified hotspot. Hotspot-only extraction misses visual bugs described after the fact.
Before/after context frames — At each hotspot, extract frames at -5s and +5s in addition to the hotspot timestamp. The user may explain an issue THEN show it, or show it THEN explain.
Whisper model: ggml-small — Fast on Apple Silicon (~14s for 7min video), accurate enough for English QA narration. The ggml-large-v3 is better but 5x slower — not worth it for QA.
Findings live in the PROJECT repo — docs/qa-session-YYYY-MM-DD/ in the project being tested, not in the orchestrator. Each round gets a suffix: qa-findings-round2.md.
BrainLayer storage is mandatory — After every video processing run, brain_store the findings summary with tags ["qa", "<project>", "round-N"].
QA is iterative — Expect 3-6 rounds per feature. The skill supports multi-round workflows with proper round numbering and delta tracking (what was fixed vs. what persists).
| Tool | Check | Install |
|------|-------|---------|
| ffmpeg | which ffmpeg | brew install ffmpeg |
| whisper-cli | which whisper-cli | brew install whisper-cpp |
| whisper model | ls ~/.cache/whisper/ggml-small.bin | whisper-cli --download-model small |
Optional (click capture — Phase 2):
| Tool | Check | Install |
|------|-------|---------|
| pyobjc | python3 -c "import Quartz" | pip3 install pyobjc-framework-Quartz pyobjc-framework-ApplicationServices pyobjc-framework-Cocoa |
| qa_click_logger.py | ls ~/Gits/orchestrator/scripts/qa/qa_click_logger.py | Already exists in orchestrator repo |
| qa-record.sh | ls ~/Gits/orchestrator/scripts/qa/qa-record.sh | Already exists in orchestrator repo |
Start a recording session:
# With click capture (recommended):
bash ~/Gits/orchestrator/scripts/qa/qa-record.sh ~/Gits/<project>/docs/
# Manual (just screen recording):
# Cmd+Shift+5 → Record Selected Portion → narrate while testing
Process a video:
VIDEO="/path/to/recording.mov"
WORKDIR="~/Gits/<project>/docs/qa-session-$(date +%Y-%m-%d)"
mkdir -p "$WORKDIR/frames"
# 1. Extract audio
ffmpeg -i "$VIDEO" -vn -acodec pcm_s16le -ar 16000 -ac 1 "$WORKDIR/audio.wav"
# 2. Transcribe
whisper-cli -m ~/.cache/whisper/ggml-small.bin -f "$WORKDIR/audio.wav" \
--output-srt --output-txt -of "$WORKDIR/transcript" -l auto
# 3. Read transcript.srt → identify hotspot timestamps
# 4. Extract frames (hotspots + every 30s)
# 5. Read frames with Claude Vision
# 6. Compile findings doc
For the full step-by-step, load workflows/process.md.
development
Create, edit, and verify golem-powers skills using the standard SKILL.md structure, workflow files, adapters, templates, and eval fixtures. Use for new skills, structural edits, workflows/adapters, and pre-deploy validation. NOT for invoking existing skills, superpowers skills, or skill-creator agent workflows.
testing
Extract structured knowledge from any video source — YouTube URLs or local screen recordings. YouTube → gems workflow (yt-dlp transcript → keyword hotspots → frame extract → brain_digest → structured gems). Screen recordings → QA workflow (reuses /qa-video stalker pipeline). Use when user shares a YouTube link wanting deep extraction with frames, shares a .mov/.mp4 for QA processing, says "extract from video", "video gems", "process this recording", or mentions gem extraction from video content.
testing
Use when running or reviewing any recurring monitor loop for merge queues, worker queues, collab tails, or agent completion. Enforces drive-to-completion ticks: every tick must query live state with `!`, classify whether real progress happened, and then dispatch, verify-and-decrement, or escalate-park. Triggers on: monitor loop, /loop, recurring tick, keep monitoring, silent autonomous, merge gate, blocked review, no-progress loop.
tools
MeHayom freelance client management — daily updates, decision tracking, time logging. Use when drafting Yuval updates, logging scope changes, tracking hours, or any MeHayom client communication. Triggers: 'draft Yuval update', 'client update', 'daily update', 'log decision', 'track time', 'mehayom'.