skills/golem-powers/plan-validate/SKILL.md
Extract and validate assumptions from multi-agent sprint plans. Generates research prompts, flags unverified claims, rewrites plan. Triggers on: 'validate plan', 'check assumptions', 'plan-validate'. NOT for: single-task plans (overkill), runtime debugging, or code review.
npx skillsauth add etanhey/golems plan-validateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Invoke BEFORE executing any multi-agent sprint. This skill saved the March 26 overnight sprint — v1→v3 killed 7 phantom assumptions that would have wasted all work.
Read the plan and extract every:
Mark each as:
VERIFIED (source: URL/file/brain_search) — confirmed trueESTIMATED (needs: research prompt) — plausible but unverifiedPHANTOM (evidence: none) — made up, likely wrongFor each ESTIMATED or PHANTOM claim, generate a research prompt:
Research: Is [claim] true?
Sources to check: [specific URLs, papers, docs]
Expected answer format: [yes/no with evidence]
Dispatch research prompts to (fallback order if tool unavailable):
brain_search — has this been answered before?exa web_search — external validationIf a research tool is unavailable, skip it and proceed with remaining tools. Mark claims as ESTIMATED (not VERIFIED) if only one source confirms.
For each claim:
Output a before/after diff showing:
PHANTOM killed: "PIER = Perceptual Information Error Rate" → actually "Point-of-Interest Error Rate" (code-switching only). Worker would have built wrong eval.
PHANTOM killed: "+15pp delta = GREEN threshold" → SkillsBench shows +4.5pp to +51.9pp range. No single threshold works. Worker would have failed all evals.
PHANTOM killed: "Meta-prompting improves code generation" → code-first-then-explain outperforms by 9.86%. Would have used wrong prompting strategy.
development
Create, edit, and verify golem-powers skills using the standard SKILL.md structure, workflow files, adapters, templates, and eval fixtures. Use for new skills, structural edits, workflows/adapters, and pre-deploy validation. NOT for invoking existing skills, superpowers skills, or skill-creator agent workflows.
testing
Extract structured knowledge from any video source — YouTube URLs or local screen recordings. YouTube → gems workflow (yt-dlp transcript → keyword hotspots → frame extract → brain_digest → structured gems). Screen recordings → QA workflow (reuses /qa-video stalker pipeline). Use when user shares a YouTube link wanting deep extraction with frames, shares a .mov/.mp4 for QA processing, says "extract from video", "video gems", "process this recording", or mentions gem extraction from video content.
testing
Use when running or reviewing any recurring monitor loop for merge queues, worker queues, collab tails, or agent completion. Enforces drive-to-completion ticks: every tick must query live state with `!`, classify whether real progress happened, and then dispatch, verify-and-decrement, or escalate-park. Triggers on: monitor loop, /loop, recurring tick, keep monitoring, silent autonomous, merge gate, blocked review, no-progress loop.
tools
MeHayom freelance client management — daily updates, decision tracking, time logging. Use when drafting Yuval updates, logging scope changes, tracking hours, or any MeHayom client communication. Triggers: 'draft Yuval update', 'client update', 'daily update', 'log decision', 'track time', 'mehayom'.