skills/golem-powers/_archive/research-prompt-quality/SKILL.md
Mandatory pre-flight gate before any deep-research prompt ships. Three gates: CHECK-FIRST (non-redundancy), GROUND (Drive refs + current-usage examples + prior-research stance), emit-only-if-pass. Use when writing deep research prompts, Claude Desktop research prompts, deciding should we research, or proposing research. Triggers: 'deep research', 'research prompt', 'should we research', 'propose research'. NOT for executing research — use /research, /claude-desktop-research, or /gemini-research.
npx skillsauth add etanhey/golems research-prompt-qualityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Stop flat, redundant deep-research prompts before they ship. Research execution skills run after this gate passes.
Run before writing or pasting any deep-research prompt (Claude Desktop, Gemini Deep Research, or /research --deep). If the gate fails, do not emit a research prompt — output the stop reason and route to engineering or plan work instead.
Search existing work before proposing new research.
skills/golem-powers/research-prompt-quality/scripts/check-first.sh "<topic keywords>"
Sources scanned:
Brain Drive/03_RESEARCH/ (when mounted)~/Gits/*/docs.local/research/~/Gits/*/docs.local/plans/~/Gits/*/docs.local/decisions/On exit 1: print ALREADY RESEARCHED → <paths> and STOP. This is engineering / plan execution, not research.
Canonical failure: proposing "RRF ranking" deep-research when ≥6 prior artifacts already exist (see evals/fixtures/neg-2-redundant-rrf.md).
Every research prompt MUST include:
Brain Drive/03_RESEARCH/... paths (or documented folder IDs from boot grounding docs).Use references/ground-template.md as the required structure. Fold grounding bundles (e.g. MCL-RESEARCH-GROUNDING.md) into the prompt; do not treat the bundle as the prompt itself.
If CHECK-FIRST passes and GROUND is satisfied, emit the self-contained research prompt (reuse /claude-desktop-research format). Otherwise output only the stop message or grounding gap list.
1. CHECK-FIRST → scripts/check-first.sh "<keywords>"
2. If hits → STOP ("ALREADY DONE → paths → engineering")
3. Gather ground → read repo paths, Drive folders, prior research files
4. Draft → references/ground-template.md
5. Score → scripts/score-research-prompt.py <draft.md> (target ≥8/10)
6. Ship prompt → hand to /claude-desktop-research or /research
Run scripts/score-research-prompt.py on the draft. RED gate for eval fixtures: negative prompts ≤4/10, grounded prompts ≥8/10. See EVAL.md for rubric and fixture scores.
/research — run this gate before --deep or external research dispatch./claude-desktop-research — run this gate before writing the paste-ready prompt file.| Bad | Good |
|-----|------|
| Flat MCL prompt with no Drive refs or repo examples | Grounded prompt + MCL-RESEARCH-GROUNDING.md folded in |
| "Deep-research RRF fusion" when 6+ prior docs exist | ALREADY RESEARCHED → <paths> → engineering |
| Generic "multi-agent ecosystem" without send_input / cmux paths | Cite cmux/SKILL.md, orc/workflows/fact-propagation.md, etc. |
| Restart prior art from scratch | BUILD-ON / VALIDATE / REFUTE per prior file |
See EVAL.md — four real fixtures, literal scorer stdout, with-skill vs without-skill delta target ≥+30%.
tools
The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).
tools
macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.
development
Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.
development
Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).