Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

etanhey/plan-validate

Name: plan-validate
Author: etanhey

skills/golem-powers/plan-validate/SKILL.md

npx skillsauth add etanhey/golems plan-validate

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Plan Validation

Invoke BEFORE executing any multi-agent sprint. This skill saved the March 26 overnight sprint — v1→v3 killed 7 phantom assumptions that would have wasted all work.

Process

Phase 1: Extract Claims

Read the plan and extract every:

Quantitative claim — numbers, thresholds, percentages, durations
Tool/library assumption — "we'll use X" (does X exist? does it work how we think?)
Cross-dependency assumption — "Track A output feeds Track B" (is that interface defined?)
Metric definition — "measure X" (is X a real metric? can we actually compute it?)

Mark each as:

VERIFIED (source: URL/file/brain_search) — confirmed true
ESTIMATED (needs: research prompt) — plausible but unverified
PHANTOM (evidence: none) — made up, likely wrong

Phase 2: Generate Research Prompts

For each ESTIMATED or PHANTOM claim, generate a research prompt:

Research: Is [claim] true?
Sources to check: [specific URLs, papers, docs]
Expected answer format: [yes/no with evidence]

Phase 3: Execute Research (parallel)

Dispatch research prompts to (fallback order if tool unavailable):

brain_search — has this been answered before?
exa web_search — external validation
Claude Web / Gemini — for academic claims

If a research tool is unavailable, skip it and proceed with remaining tools. Mark claims as ESTIMATED (not VERIFIED) if only one source confirms.

Phase 4: Rewrite Plan

For each claim:

VERIFIED → keep, add source citation
ESTIMATED → keep with caveat, downgrade acceptance criteria
PHANTOM → remove or replace with verified alternative

Phase 5: Diff Report

Output a before/after diff showing:

Claims removed (phantoms killed)
Claims downgraded (estimated → caveated)
New claims added (from research findings)

Example (from March 26 overnight)

PHANTOM killed: "PIER = Perceptual Information Error Rate" → actually "Point-of-Interest Error Rate" (code-switching only). Worker would have built wrong eval.

PHANTOM killed: "+15pp delta = GREEN threshold" → SkillsBench shows +4.5pp to +51.9pp range. No single threshold works. Worker would have failed all evals.

PHANTOM killed: "Meta-prompting improves code generation" → code-first-then-explain outperforms by 9.86%. Would have used wrong prompting strategy.

NEVER

Skip this for overnight/multi-agent sprints
Trust quantitative claims without source citations
Proceed with PHANTOM claims — kill them or verify them

etanhey/plan-validate

skills/golem-powers/plan-validate/SKILL.md

Extract and validate assumptions from multi-agent sprint plans. Generates research prompts, flags unverified claims, rewrites plan. Triggers on: 'validate plan', 'check assumptions', 'plan-validate'. NOT for: single-task plans (overkill), runtime debugging, or code review.

2 stars

development

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add etanhey/golems plan-validate

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 12:15 PM14.2s2 files scanned

SKILL.md

name:: plan-validate
description:: Extract and validate assumptions from multi-agent sprint plans. Generates research prompts, flags unverified claims, rewrites plan. Triggers on: 'validate plan', 'check assumptions', 'plan-validate'. NOT for: single-task plans (overkill), runtime debugging, or code review.

Plan Validation

Invoke BEFORE executing any multi-agent sprint. This skill saved the March 26 overnight sprint — v1→v3 killed 7 phantom assumptions that would have wasted all work.

Process

Phase 1: Extract Claims

Read the plan and extract every:

Quantitative claim — numbers, thresholds, percentages, durations
Tool/library assumption — "we'll use X" (does X exist? does it work how we think?)
Cross-dependency assumption — "Track A output feeds Track B" (is that interface defined?)
Metric definition — "measure X" (is X a real metric? can we actually compute it?)

Mark each as:

VERIFIED (source: URL/file/brain_search) — confirmed true
ESTIMATED (needs: research prompt) — plausible but unverified
PHANTOM (evidence: none) — made up, likely wrong

Phase 2: Generate Research Prompts

For each ESTIMATED or PHANTOM claim, generate a research prompt:

Research: Is [claim] true?
Sources to check: [specific URLs, papers, docs]
Expected answer format: [yes/no with evidence]

Phase 3: Execute Research (parallel)

Dispatch research prompts to (fallback order if tool unavailable):

brain_search — has this been answered before?
exa web_search — external validation
Claude Web / Gemini — for academic claims

If a research tool is unavailable, skip it and proceed with remaining tools. Mark claims as ESTIMATED (not VERIFIED) if only one source confirms.

Phase 4: Rewrite Plan

For each claim:

VERIFIED → keep, add source citation
ESTIMATED → keep with caveat, downgrade acceptance criteria
PHANTOM → remove or replace with verified alternative

Phase 5: Diff Report

Output a before/after diff showing:

Claims removed (phantoms killed)
Claims downgraded (estimated → caveated)
New claims added (from research findings)

Example (from March 26 overnight)

PHANTOM killed: "PIER = Perceptual Information Error Rate" → actually "Point-of-Interest Error Rate" (code-switching only). Worker would have built wrong eval.

PHANTOM killed: "+15pp delta = GREEN threshold" → SkillsBench shows +4.5pp to +51.9pp range. No single threshold works. Worker would have failed all evals.

PHANTOM killed: "Meta-prompting improves code generation" → code-first-then-explain outperforms by 9.86%. Would have used wrong prompting strategy.

NEVER

Skip this for overnight/multi-agent sprints
Trust quantitative claims without source citations
Proceed with PHANTOM claims — kill them or verify them

Related Skills

etanhey/phoenix-human-view

tools

VerifiedTrustedCommunity

The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).

3SKILL.mdUpdated Jun 7, 2026

etanhey/phoenix-human-view

etanhey/mac-systems

tools

VerifiedTrustedCommunity

macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.

3SKILL.mdUpdated Jun 7, 2026

etanhey/judge-fleet

development

VerifiedTrustedCommunity

Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.

3SKILL.mdUpdated Jun 7, 2026

etanhey/fleet-wrap

development

VerifiedTrustedCommunity

Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).

3SKILL.mdUpdated Jun 7, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/etanhey/golems.git

# Copy into Claude Code skills folder (global)
cp -r golems/skills/golem-powers/plan-validate ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

etanhey/golems

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT