plugins/game-dev/skills/playtest-design/SKILL.md
Question generation for playtests, what to observe vs. ask, metrics to track, and how to interpret playtest data without confirmation bias. Use when planning a playtest session, designing a feedback survey, setting up analytics, or when you have playtest data and need to make decisions from it.
npx skillsauth add rbergman/dark-matter-marketplace playtest-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Purpose: Get useful signal from playtests. Most playtest sessions are wasted — observers confirm what they already believe, ask leading questions, and draw conclusions from noise. This skill provides structured methods to avoid those traps.
Influences: Frameworks here draw on cognitive UX research methodology, metrics-driven iterative design practice, and experience engineering theory (emergent behavior observation, planning under uncertainty).
Use this skill when:
Players are reliable reporters of their experience (what they felt) but unreliable reporters of causes (why they felt it). Design your process accordingly.
Most Reliable ←———————————————→ Least Reliable
What they did What they felt Why they think
(behavior) (experience) they felt it
(attribution)
Hierarchy of evidence:
Players attributing frustration to "bad controls" might actually be experiencing a perception failure (they couldn't see the indicator) or a pacing problem (too many new concepts at once). Use behavior to diagnose; use self-report to locate.
Generate questions along the perception → attention → memory pipeline:
Perception Questions (Did they see it?)
Attention Questions (Did they focus on the right thing?)
Memory Questions (Will they retain it?)
| Dev Stage | Focus | Key Questions | |-----------|-------|---------------| | Prototype | Core loop viability | Is the core action inherently interesting? Do they want to do it again? | | Alpha | System comprehension | Do they understand the rules? Can they make intentional decisions? | | Beta | Pacing and polish | Does the session arc feel right? Where do they get bored or frustrated? | | Pre-launch | Edge cases and balance | What breaks? What's exploitable? What did we miss? |
| Observable | What It Tells You | |------------|-------------------| | First action | What the UI communicates as "start here" | | Hesitation points | Where clarity fails or cognitive load spikes | | Repeated failures | Where difficulty exceeds skill (or UI is misleading) | | Where they look | What's grabbing attention (intended or not) | | Body language | Leaning in = engaged; leaning back = disengaged; fidgeting = frustrated | | Utterances | Unprompted comments ("what?", "oh!", "come on") are gold | | Where they quit | The most valuable data point you'll collect | | What they skip | Content they ignore reveals priority mismatches |
| Metric | What It Measures | Warning Signal | |--------|-----------------|----------------| | Session length | Engagement | Bimodal distribution (some quit fast, some stay long) | | Quit points | Pain points | Cluster of quits at same location/moment | | Completion rate | Difficulty/clarity | < 70% on intended-critical-path content | | Time per section | Pacing | Sections taking 2x+ longer than designed | | Death/failure rate | Difficulty curve | Spike = wall; zero = too easy |
| Metric | What It Measures | Warning Signal | |--------|-----------------|----------------| | Pick rate by option | Strategy diversity | One option > 50% pick rate | | Win rate by strategy | Balance | Any strategy > 55% win rate at comparable skill | | Average game/match length | Pacing | Games consistently shorter or longer than intended | | Resource accumulation rate | Economy health | Exponential growth = inflation incoming | | Strategy churn | Meta health | If dominant strategy shifts too fast, balance is noisy |
| Metric | What It Measures | Warning Signal | |--------|-----------------|----------------| | Time to first meaningful action | Onboarding quality | > 60 seconds before the player does something | | Tutorial completion rate | Tutorial design | < 90% = tutorial is the problem, not the player | | Hint/help usage | Clarity | High usage = UI isn't communicating; zero usage = help system is invisible | | Error rate on intended actions | Usability | Player tries to do the right thing but fails due to UI |
The biggest threat to useful playtest data is your own expectations.
Before the session:
After the session:
| Trap | Mechanism | Counter | |------|-----------|---------| | Anchoring | First session dominates your impression | Review all sessions before concluding | | Availability | Dramatic moments overshadow quiet ones | Use metrics, not memory | | Projection | Attributing your own experience to players | Watch what they do, not what you'd do | | Sunk cost | Defending features you spent time on | Ask "would we add this today?" not "should we cut this?" | | Survivorship | Only hearing from players who stayed | Track quit points with equal priority |
If you can only ask one question: "Tell me about a moment that stood out — good or bad."
Then follow up with: "What were you trying to do?" and "What happened next?"
| Signal | Confidence | Action | |--------|------------|--------| | Metrics + observation + self-report all agree | High | Act on it | | Metrics show it, observation confirms, self-report disagrees | Moderate-High | Trust behavior over self-report | | Self-report says it, but metrics/observation don't show it | Low | Investigate further — the report may point to a different real problem | | Single session shows it, others don't | Very Low | Note it but don't act — one data point isn't a pattern |
When you're building alone, you can't run traditional playtests during development. These techniques bridge the gap:
| Technique | How | What It Catches | |-----------|-----|-----------------| | The 2-week break | Play your own game after not touching it for 2 weeks | UX failures, forgotten controls, unclear objectives | | The mute test | Play with sound off | Audio-dependent information, missing visual feedback | | The squint test | Squint at the screen or reduce resolution | Visual clarity, contrast, UI readability | | The record-and-review | Record gameplay, watch it the next day | Pacing problems, dead time, repetitive patterns | | The explain test | Explain what you're doing out loud while playing | Logic gaps, unjustified assumptions, unclear goals | | The wrong-hand test | Play with your non-dominant hand | Input complexity, timing windows, control accessibility |
When you're ready for external eyes (earlier than you think):
If you're a solo developer shipping updates:
development
Initialize a new repository with standard scaffolding - git, gitignore, AGENTS.md, justfile, mise, beads, and timbers. Use when starting a new project or setting up an existing repo for Claude Code workflows.
data-ai
Activate at session start when using Agent Teams for complex multi-agent work. Establishes team lead role with delegation protocols, teammate spawning, model selection, and beads integration. You coordinate the team; teammates implement.
data-ai
Use when creating a worktree, setting up a worktree, starting feature work that needs isolation, or before executing implementation plans. Covers git worktree creation under .worktrees/, gitignore setup, beads integration, and merge guardrails.
data-ai
Activate when you are a delegated subagent (not the orchestrator). Establishes subagent protocol with terse returns, details to history/, file ownership boundaries, and escalation rules. You implement; orchestrator reviews and commits.