engineering-team/playwright-pro/skills/fix/SKILL.md
Fix failing or flaky Playwright tests. Use when user says "fix test", "flaky test", "test failing", "debug test", "test broken", "test passes sometimes", or "intermittent failure".
npx skillsauth add alirezarezvani/claude-skills fixInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Diagnose and fix a Playwright test that fails or passes intermittently using a systematic taxonomy.
$ARGUMENTS contains:
e2e/login.spec.ts"the checkout test fails in CI but passes locally"Run the test to capture the error:
npx playwright test <file> --reporter=list
If the test passes, it's likely flaky. Run burn-in:
npx playwright test <file> --repeat-each=10 --reporter=list
If it still passes, try with parallel workers:
npx playwright test --fully-parallel --workers=4 --repeat-each=5
Run with full tracing:
npx playwright test <file> --trace=on --retries=0
Read the trace output. Use /debug to analyze trace files if available.
Load flaky-taxonomy.md from this skill directory.
Every failing test falls into one of four categories:
| Category | Symptom | Diagnosis |
|---|---|---|
| Timing/Async | Fails intermittently everywhere | --repeat-each=20 reproduces locally |
| Test Isolation | Fails in suite, passes alone | --workers=1 --grep "test name" passes |
| Environment | Fails in CI, passes locally | Compare CI vs local screenshots/traces |
| Infrastructure | Random, no pattern | Error references browser internals |
Timing/Async:
waitForTimeout() with web-first assertionsawait to missing Playwright callstoBeVisible() before interacting with elementsTest Isolation:
Environment:
docker locally to match CI environmentInfrastructure:
retries: 2)Run the test 10 times to confirm stability:
npx playwright test <file> --repeat-each=10 --reporter=list
All 10 must pass. If any fail, go back to step 3.
Suggest:
retries: 2 if not alreadytrace: 'on-first-retry' in configtools
Code review automation for TypeScript, JavaScript, Python, Go, Swift, Kotlin, C#, .NET, Java, C, C++, Rust, Ruby, PHP, and Dart/Flutter. Analyzes PRs for complexity and risk, checks code quality for SOLID violations and code smells, generates review reports. Use when reviewing pull requests, analyzing code quality, identifying issues, generating review checklists.
tools
Use when planning, funding, scoping, or synthesizing enterprise research across workstreams — clinical study design, R&D program finance, market sizing/surveys, or product/user research. Triggers on "design this clinical study", "what sample size", "R&D budget", "burn rate", "capitalize or expense", "TAM SAM SOM", "market sizing", "survey design", "segment the market", "plan user interviews", "usability test", "synthesize research insights". Forks context to route to one of four Research-Operations sub-skills (clinical-research, research-finance, market-research, product-research) and returns a digest. Distinct from ra-qm-team (regulatory submission), finance (corporate close/valuation), research/grants (funding discovery), product-team (persona/journey/live experiments), and marketing-skill (campaign analytics).
development
Use when managing the money for an internal R&D program or portfolio — building a multi-period program budget with the F&A (indirect) split, tracking burn rate and runway against value-inflection milestones, or routing R&D cost items to a capitalize-vs-expense determination. Every budget output surfaces its assumptions block; capitalize-vs-expense is decision-support only and routes to a named finance owner — it never books an entry or decides accounting treatment. Distinct from finance/financial-analysis (corporate DCF, close, valuation) and research/grants (funding discovery — this manages money already won).
development
Use when planning and synthesizing product/user research as a method-and-repository discipline — selecting the right method for the goal (generative interviews vs usability test vs concept test vs validation), computing method-based saturation/sample size with an explicit confidence level, or synthesizing coded observations into insights while flagging single-source anecdotes. Never fabricates user insight; an insight requires recurrence across independent participants. Distinct from product-team/ux-researcher-designer (persona/journey artifacts), product-discovery (discovery-sprint planning), and experiment-designer (live A/B) — this is the research-ops method + insight-repository layer.