claude/skills/playwright-testing/SKILL.md
Plan, implement, and debug frontend tests: unit/integration/E2E/visual/a11y. Use for Playwright MCP browser automation, Vitest/Jest/RTL, flaky test triage, CI stabilization, and canvas/WebGL games (Phaser) needing deterministic input plus screenshot/state assertions. Trigger: "test", "E2E", "flaky", "visual regression", "Playwright", "game testing".
npx skillsauth add lilpacy/dotfiles playwright-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Unlock reliable confidence fast: enable safe refactors by choosing the right test layer, making the app observable, and eliminating nondeterminism so failures are actionable.
Frontend tests fail for two reasons: the product is broken, or the test is lying. Your job is to maximize signal and minimize "test is lying".
Before writing a test, ask:
setTimeout?Core principles:
Pick the cheapest layer that provides needed confidence:
| Layer | Speed | Use For | |-------|-------|---------| | Unit | Fastest | Pure functions, reducers, validators, math, pathfinding, deterministic simulation | | Component | Medium | UI behavior with mocked IO (React Testing Library, Vue Testing Library) | | E2E | Slowest | Critical user flows across routing, storage, real bundling/runtime | | Visual | Specialized | Layout/pixel regressions; for canvas/WebGL, only after locking determinism |
Step-by-step sequence for testing a Phaser/canvas game:
1. mcp__playwright__browser_navigate
→ http://localhost:3000?test=1&seed=42
2. mcp__playwright__browser_evaluate
→ () => new Promise(r => { const c = () => window.__TEST__?.ready ? r(true) : setTimeout(c, 100); c(); })
(Wait for game ready)
3. mcp__playwright__browser_console_messages
→ level: "error"
(Fail if any errors)
4. mcp__playwright__browser_snapshot
→ Get UI state and refs
5. mcp__playwright__browser_click
→ element: "Start Button", ref: [from snapshot]
6. mcp__playwright__browser_evaluate
→ () => window.__TEST__.state()
(Assert game state is correct)
7. mcp__playwright__browser_press_key
→ key: "ArrowRight" (or WASD for movement)
8. mcp__playwright__browser_evaluate
→ () => window.__TEST__.state().player.x
(Verify movement happened)
9. mcp__playwright__browser_take_screenshot
→ filename: "gameplay-state.png"
(Visual evidence after deterministic setup)
Add to the app for testability (read-only, stable, minimal):
window.__TEST__ = {
ready: false, // true after first interactive frame
seed: null, // current RNG seed
sceneKey: null, // current scene/route
state: () => ({ // JSON-serializable snapshot
scene: this.sceneKey,
player: { x, y, hp },
score: gameState.score,
entities: entities.map(e => ({ id: e.id, type: e.type, x: e.x, y: e.y }))
}),
commands: { // optional mutation commands
reset: () => {},
seed: (n) => {},
skipIntro: () => {}
}
};
Rule: Expose IDs + essential fields, not raw Phaser/engine objects.
❌ Testing the wrong layer: E2E tests for pure logic Why tempting: "Let's just test everything through the browser" Better: Unit tests for logic; reserve E2E for integration contracts
❌ Testing implementation details: Asserting DOM structure/classnames Why tempting: Easy to assert what you can see in DevTools Better: Assert user-meaningful outputs (text, score, HP changes)
❌ Sleep-driven tests: wait 2s then click
Why tempting: Simple and "works on my machine"
Better: Wait on explicit readiness (DOM marker, window.__TEST__.ready)
❌ Uncontrolled randomness: RNG/time in assertions
Why tempting: "The game uses random, so the test should too"
Better: Seed RNG (?seed=42), freeze time, assert stable invariants
❌ Pixel snapshots without determinism: Canvas screenshots that flake Why tempting: "I'll catch visual bugs automatically" Better: Deterministic mode first; then screenshot at known stable frames
❌ Retries as a strategy: "Just bump retries to 3" Why tempting: Quick fix that makes CI green Better: Fix the flake source; retries hide real problems
When a test fails, gather evidence in this order:
mcp__playwright__browser_console_messages({ level: "error" })mcp__playwright__browser_network_requests() → check for non-2xxmcp__playwright__browser_take_screenshot() → visual state at failuremcp__playwright__browser_evaluate({ function: "() => window.__TEST__.state()" })Minimum viable test suite:
window.__TEST__ with ready flag and state)?test=1 enables seeding)Level up when:
For pixel comparison of screenshots:
# Compare baseline to current
python scripts/imgdiff.py baseline.png current.png --out diff.png
# Allow small tolerance (anti-aliasing differences)
python scripts/imgdiff.py baseline.png current.png --max-rms 2.0
Exit codes: 0 = identical, 1 = different, 2 = error
Canvas UI issues (panel seams, segmented ribbons, invisible HUD fills) are best caught with a dedicated UI harness instead of the full gameplay flow.
test.html/scene that loads only the UI assets.window.__TEST__ with .commands.showTest(n) so Playwright can toggle each mode deterministically.See references/phaser-canvas-testing.md for the deterministic setup + screenshot workflow.
Adapt approach based on context:
window.__TEST__.readyreferences/phaser-canvas-testing.md).Read these when needed:
references/playwright-mcp-cheatsheet.md: Detailed MCP tool patternsreferences/phaser-canvas-testing.md: Deterministic mode for Phaser gamesreferences/flake-reduction.md: Flake classification and fixesYou can make almost any frontend (including canvas/WebGL games) testable by adding a tiny, stable seam for readiness + state. One reliable smoke test is the foundation. Aim for tests that are boring to maintain: deterministic, explicit about readiness, and rich in failure evidence. The goal is confidence, not coverage numbers.
development
Use when searching the web or reading online documentation. Prefer DuckDuckGo for search and read documents through npx curl.md instead of raw HTML.
testing
Use when writing or editing tests. Tests should be ordered by near-normal, normal, then abnormal cases where applicable, and test names must be Japanese behavior descriptions from a reviewer/user perspective.
development
GoF/オブジェクト指向デザインパターンを関数型プログラミング(pure functions, higher-order functions, ADT, composition, immutability, effect boundaries)でシンプルに整理・設計・リファクタリングする。Strategy/Factory/Adapter/ObserverなどGoF全23パターンのFP置き換え、適用判断、具体事例を提示する必要があるときに使う。
tools
Use when committing, pushing, or preparing PRs. Defines the user's commit workflow, message style discovery, review handoff, and branch/worktree push requirements.