skills/golem-powers/_archive/test-plan/SKILL.md
Generate structured manual testing checklists from git diffs for QA review before merging PRs. Analyzes changed files and produces step-by-step testing instructions covering happy paths, edge cases, and regression checks. Output is a markdown checklist suitable for QA handoff or self-review. Use when preparing a PR for manual QA, creating a testing checklist for a feature branch, or documenting what needs manual verification before merge. Triggers on 'test plan', 'QA checklist', 'testing checklist', 'manual testing', 'QA review prep'. NOT for: writing automated tests (write those in code), AI code reviews (use coderabbit), or CI pipeline configuration.
npx skillsauth add etanhey/golems test-planInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze changes in the current Git branch and generate a manual testing checklist organized by page/feature.
The skill auto-runs on load. Override the base branch:
./scripts/generate.sh --base main
./scripts/generate.sh --base dev
./scripts/generate.sh --base origin/staging
## Test Plan
### [Feature/Component Name]
- [ ] Test: Description of what to verify
- [ ] Test: Another thing to check
### API Changes
- [ ] Test: Verify endpoint returns expected shape
- [ ] Test: Error responses have correct status codes
### Database/Schema
- [ ] Test: Verify migrations run cleanly
- [ ] Test: Data integrity after changes
### Configuration
- [ ] Test: Verify env vars are documented
- [ ] Test: Config changes don't break existing deploys
### General
- [ ] No console errors during testing
- [ ] No TypeScript/build errors
- [ ] Mobile responsive (if UI changes)
Run this skill before creating a PR to generate the test plan section for your PR description.
tools
The human-eval UX contract for Phoenix views: turn-by-turn scrollable replay (not a scorecard), hide-but-copyable IDs, collapsed thinking, identity chips, tool filters, tiny frozen starter datasets, mark-wrong-in-thread, mobile-first. Use when: building or reviewing ANY Phoenix/eval view, annotation UI, session replay, or human-grading surface. Triggers: phoenix view, eval UI, annotation view, session replay, human eval UX, grading interface. NOT for: Phoenix data pipelines/ingest (capture scripts have their own specs).
tools
macOS systems specialist — AppKit NSPanel architecture, launchd services, socket activation, MCP bridge resilience, syspolicyd, and high-frequency SwiftUI dashboards. Use when building menu-bar apps, LaunchAgents, debugging syspolicyd/Gatekeeper/TCC, resilient UDS/MCP bridges, or SwiftUI dashboards at 10Hz+.
development
Bulk LLM-judging protocol for fleet-dispatched verdict runs (KG cluster, eval harness). Use when: dispatching or running judge workers (J1/J2/RT), planning bulk-apply from verdict JSONL, or triaging evidence_degraded outputs. Triggers: judge fleet, bulk judge, R3 verdicts, kg-judge, RT gate, evidence_degraded. NOT for: single-item code review, Phoenix view UX (use phoenix-human-view), or non-judge eval pipelines.
development
Quiet-down protocol for sprint close: when the fleet wraps, delete ALL polling crons and monitors, send ONE final dashboard + ONE message, then go SILENT. Use when: fleet wraps, all workers done, overnight queue exhausted, sprint close, Etan asleep/away with nothing approved left. Triggers: fleet wrap, wrap the fleet, stand down, going quiet, sprint close. NOT for: mid-sprint monitoring (keep your loops), spawning a successor (use /session-handoff first).