skills/e2e-testing/SKILL.md
Comprehensive E2E testing skill using Playwright MCP for systematic web application testing. This skill should be used when users need to test web-based systems end-to-end, set up test regimes, run exploratory tests, or analyze test history. Triggers on requests like "test my webapp", "set up E2E tests", "run the tests", "what's been flaky", or when validating web application functionality. The skill observes and reports only - it never fixes issues. Supports three modes - setup (create test regime), run (execute tests), and report (analyze results).
npx skillsauth add mhylle/claude-skills-collection e2e-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A comprehensive E2E testing skill using Playwright MCP for systematic testing of any web-based system. The skill:
Before using this skill, verify Playwright MCP is available:
playwright in MCP server configuration{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
If Playwright MCP is unavailable, inform the user and provide setup instructions before proceeding.
This skill operates in three modes. Determine mode from user request:
| User Request | Mode | |--------------|------| | "Set up tests for...", "Create test regime" | Setup | | "Run the tests", "Test the...", "Execute tests" | Run | | "Show test results", "What failed?", "What's flaky?" | Report |
If unclear, ask: "Would you like to set up a test regime, run existing tests, or view reports?"
Purpose: Create or update test regime through interactive discovery.
Determine entry point from user context:
| Context | Entry | |---------|-------| | User provides URL | URL Exploration | | User describes system purpose | Description-Based | | User points to documentation | Documentation Extraction | | Combination of above | Combined Flow (recommended) |
Ask for any missing information:
Use Playwright MCP to explore:
Navigate to base URL
Capture accessibility snapshot
Identify:
- Navigation elements (menus, links)
- Interactive elements (buttons, forms)
- Key pages and sections
For each discovered element, note:
While exploring, actively look for:
Document discoveries as: "Found alternative: [description]"
For each key workflow, create scenario with:
scenario: [Descriptive name]
description: [What this tests]
preconditions:
- [Required state before test]
blocking: [true/false - does failure prevent other tests?]
steps:
- action: [navigate/click/type/verify/wait]
target: [selector or description]
value: [input value if applicable]
flexibility:
type: [exact/contains/ai_judgment]
criteria: [specific rules or judgment prompt]
success_criteria:
- [What must be true for pass]
alternatives:
- [Alternative path if primary fails]
Write regime to tests/e2e/test_regime.yml:
# Test Regime: [Application Name]
# Created: [YYYY-MM-DD]
# Last Updated: [YYYY-MM-DD]
metadata:
application: [Name]
base_url: [URL]
description: [Purpose]
global_settings:
screenshot_every_step: true
capture_network: true
capture_console: true
discovery_cap: 5 # Max new paths to discover per run
blocking_dependencies:
- scenario: login
blocks: [profile, settings, checkout] # These won't run if login fails
scenarios:
- scenario: [name]
# ... scenario definition
Show user:
Purpose: Execute tests sequentially with full evidence capture.
Principle: No Invalid Skips
A test should only have three outcomes:
| Status | Meaning | |--------|---------| | PASSED | The feature works as specified | | FAILED | The feature doesn't work or doesn't exist | | SKIPPED | Only for legitimate environmental reasons (see below) |
@skip decorator for documented WIP features with ticket reference| Situation | Correct Status | Notes Format | |-----------|----------------|--------------| | Feature doesn't exist in UI | FAILED | "Expected [feature] not found. Feature not implemented." | | Test wasn't executed/completed | FAILED | "Test not executed. [What wasn't verified]." | | Test would fail | FAILED | That's the point of testing | | "Didn't get around to it" | FAILED | Incomplete test coverage is a failure | | Feature works differently than spec | FAILED | "Implementation doesn't match specification: [details]" |
The purpose of a test is to fail when something doesn't work. Marking missing features or unexecuted tests as "skipped" produces artificially inflated pass rates and hides real issues. A test report that only shows green isn't useful if it achieved that by ignoring problems.
When a test cannot find the expected UI element or feature:
Status: FAILED
Notes: "FAILED: Expected 'Add to Cart' button not found. Feature not implemented or selector changed."
When a test is not fully executed:
Status: FAILED
Notes: "FAILED: Test not executed. Checkout flow verification was not completed - stopped at payment step."
When environment is genuinely unavailable (valid skip):
Status: SKIPPED
Notes: "SKIPPED: Payment gateway sandbox unavailable. Ticket: PAY-123"
Verify regime exists: Check for tests/e2e/test_regime.yml
Load history: Check for tests/e2e/test_history.json
Verify Playwright MCP: Confirm browser automation is available
Every test run starts fresh from Step 1 of Scenario 1. Never skip steps or use cached state.
Execute scenarios in order. For each scenario:
1. Check preconditions
2. Execute each step:
a. Perform action via Playwright MCP
b. Capture screenshot
c. Capture DOM state
d. Capture network activity
e. Capture console logs
f. Evaluate success using flexibility criteria
3. Record result (pass/fail/blocked/skipped)
- PASS: Step completed successfully
- FAIL: Step failed OR element not found OR feature missing
- BLOCKED: Dependent on a failed blocking scenario
- SKIPPED: Only for valid environmental reasons (see Test Status Integrity)
4. If failed: Try alternatives if defined
5. If blocking failure: Stop dependent scenarios
When a step fails:
While executing, watch for undocumented paths:
For discoveries:
discovery_cap limit)For each success check, apply the configured flexibility type:
| Type | Evaluation Method |
|------|-------------------|
| exact | String/value must match exactly |
| contains | Target must contain specified text |
| ai_judgment | Use AI reasoning: "Does this accomplish [goal]?" |
For ai_judgment, provide confidence level:
For each step, capture and store:
evidence/
scenario-name/
step-01/
screenshot.png
dom-snapshot.html
network-log.json
console-log.txt
accessibility-snapshot.yaml
After run completes:
Compare to previous runs:
Update history file:
{
"runs": [
{
"timestamp": "ISO-8601",
"scenarios": {
"scenario-name": {
"result": "pass|fail|blocked|skipped",
"result_notes": "Details about the result",
"duration_ms": 1234,
"steps_completed": 5,
"confidence": "high|medium|low",
"discoveries": []
}
}
}
],
"flaky_scenarios": ["scenario-1", "scenario-2"],
"suggested_variations": [
{
"scenario": "login",
"variation": "Test with special characters in password",
"reason": "Failed 3/10 runs with complex passwords"
}
]
}
Result status rules (see Test Status Integrity):
pass: Feature works as specifiedfail: Feature doesn't work, doesn't exist, or test incompleteblocked: Depends on failed blocking scenarioskipped: ONLY for valid environmental reasons (with ticket reference)suggested_variations in historyPurpose: Generate actionable reports from test results.
Generate both reports after every run:
Output to tests/e2e/reports/YYYY-MM-DD-HHmmss-report.md:
# E2E Test Report: [Application Name]
**Run Date**: YYYY-MM-DD HH:mm:ss
**Duration**: X minutes
**Result**: X passed, Y failed, Z blocked, W skipped
## Summary
| Scenario | Result | Duration | Confidence |
|----------|--------|----------|------------|
| Login | PASS | 2.3s | High |
| Checkout | FAIL | 5.1s | High |
## Failures
### Checkout Flow
**Step Failed**: Step 3 - Click "Complete Purchase"
**Error**: Button not found within timeout
**Evidence**:
- Screenshot: `evidence/checkout/step-03/screenshot.png`
- Expected: Button with text "Complete Purchase"
- Actual: Page showed error message "Session expired"
**Reproduction Steps**:
1. Navigate to https://app.example.com
2. Login with test credentials
3. Add item to cart
4. Click checkout
5. [FAILS HERE] Click "Complete Purchase"
**Suggested Investigation**:
- Session timeout may be too aggressive
- Check if login state persists through checkout flow
## Discoveries
Found 2 undocumented paths:
1. **Alternative checkout**: Guest checkout available via footer link
2. **Quick reorder**: "Buy again" button on order history
## Flaky Areas
Based on history (last 10 runs):
- `search-results`: 7/10 pass rate - timing issue suspected
- `image-upload`: 8/10 pass rate - file size variations
## Suggested New Tests
Based on failures and history:
1. Test session persistence during long checkout
2. Test guest checkout flow (discovered)
3. Add timeout resilience to search tests
Output to tests/e2e/reports/YYYY-MM-DD-HHmmss-report.json:
{
"metadata": {
"application": "App Name",
"base_url": "https://...",
"run_timestamp": "ISO-8601",
"duration_ms": 123456,
"regime_version": "hash-of-regime-file"
},
"summary": {
"total": 10,
"passed": 7,
"failed": 2,
"blocked": 1,
"skipped": 0
},
"scenarios": [
{
"name": "checkout",
"result": "fail",
"duration_ms": 5100,
"confidence": "high",
"failed_step": {
"index": 3,
"action": "click",
"target": "button:Complete Purchase",
"error": "Element not found",
"evidence_path": "evidence/checkout/step-03/"
},
"reproduction": {
"playwright_commands": [
"await page.goto('https://app.example.com')",
"await page.fill('#username', 'test')",
"await page.click('button:Login')",
"await page.click('.add-to-cart')",
"await page.click('button:Checkout')",
"// FAILED: await page.click('button:Complete Purchase')"
]
},
"alternatives_tried": [
{
"path": "Use keyboard Enter instead of click",
"result": "fail"
}
]
}
],
"discoveries": [
{
"type": "alternative_path",
"description": "Guest checkout via footer",
"location": "footer > a.guest-checkout",
"tested": true,
"result": "pass"
}
],
"history_analysis": {
"regressions": ["checkout"],
"persistent_failures": [],
"flaky": ["search-results", "image-upload"]
},
"suggested_actions": [
{
"type": "investigate",
"scenario": "checkout",
"reason": "New regression - passed in previous 5 runs"
},
{
"type": "add_test",
"scenario": "guest-checkout",
"reason": "Discovered undocumented path"
}
]
}
After generating reports:
Display summary to user:
Highlight actionable items:
Offer next steps:
Before completing any mode, verify:
test-regime-schema.md - Complete YAML schema for test regime filesflexibility-criteria-guide.md - How to define and evaluate flexible success criteriahistory-schema.md - JSON schema for test history trackingReport templates are embedded in this skill. The machine-readable format is designed for consumption by a future bug-fix skill.
testing
One-command issue-to-merge pipeline orchestrator. Drives a GitHub issue through nine stages (preflight, plan, implement, review, ci, cloud_review, deploy, e2e, logs) with two human gates, persisting all run state to files so a crashed or interrupted run resumes losslessly. Triggers on "/ship-issue" with an issue number or URL. User-invoked only.
tools
--- name: tt-workflow-build description: Tasktracker-native trigger for a PARALLEL build via the Claude Code Workflow tool. Thin by design — it does two things, then drives to done: (1) ensure a tasktracker project exists (use the existing one, or create one), then (2) start a dynamic `Workflow` that builds it, tracking the work in tasktracker and using the build + verify skills. It does NOT analyze parallelism up front, ask the user to choose a mode, hand back, or fall back to a sequential skil
tools
--- name: grumpy-reviewer description: A single grumpy, nitpicky structural code reviewer that runs as an isolated subagent and treats the code as third-party work submitted by a junior programmer for validation. It cares about exactly one thing — maintainability — judged through separation of concerns, service-oriented design, helper-method extraction, small files, and the rule of 7 (as any grouping nears 7 members, it pushes for sub-groupings). It is deliberately kept OUT of the implementation
development
--- name: tt-workflow-run description: Tasktracker-native autonomous build-loop orchestrator. Drives a first-class `workflow_run` end-to-end — create the run (Gate 1 lifecycle completeness + Gate 2 zero-defects-in), then loop while `getNextReadyTask(projectId)` returns a slice — `setActiveTask` → record a pre-slice `scanArchitectureDrift` baseline → delegate the slice to `/tt-implement-phase` (which does the code work, registers the architecture delta in-slice, and auto-logs defects/learnings/fr