skills/qa-testing/SKILL.md
Perform QA testing on running web applications using cmux_browser. Covers functional testing, accessibility auditing, edge case testing, and structured reporting with evidence. **Triggers — use this skill when:** - User says "test this", "QA this", "QA this PR", "check the app" - User says "run QA", "verify the build", "test the UI" - User asks to "check if it works", "validate the feature" - User says "acceptance testing", "smoke test", "regression test" - User asks to "find bugs", "test for bugs", "break it" - User wants to "check accessibility", "run axe", "a11y audit" **Covers:** Any web application accessible via HTTP. Uses cmux_browser (Playwright under the hood) for all browser interactions.
npx skillsauth add espennilsen/pi qa-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Test running web applications by interacting as a real user via cmux_browser, then report findings with structured evidence.
If the app is already running, skip to Step 2.
If the app needs to be started:
# Start in a background cmux pane
cmux_split({ direction: "down", command: "cd /path/to/project && npm run dev\n" })
# Wait for it to be ready (check the pane output)
cmux_read({ surface: "surface:N" })
# Or poll the URL
cmux_browser({ action: "open", url: "http://localhost:3000" })
cmux_browser({ action: "wait", waitCondition: "load-state", loadState: "networkidle" })
If Docker is needed:
cd /path/to/project && docker compose up -d
# Wait for healthy
timeout 60 bash -c 'until curl -sf http://localhost:3000 > /dev/null; do sleep 2; done'
Verify the app loads at all before deeper testing.
# Open the app
cmux_browser({ action: "open", url: "http://localhost:3000" })
# Wait for it to load
cmux_browser({ action: "wait", waitCondition: "load-state", loadState: "networkidle" })
# Screenshot the initial state
cmux_browser({ action: "screenshot" })
# Check for JavaScript errors
cmux_browser({ action: "errors" })
# Check console output
cmux_browser({ action: "console" })
# Get a DOM snapshot (accessibility tree)
cmux_browser({ action: "snapshot" })
If the smoke test fails (page doesn't load, crash errors), stop and report immediately.
For each acceptance criterion, follow this pattern:
# 1. Navigate to the relevant state
cmux_browser({ action: "navigate", url: "/some-page" })
cmux_browser({ action: "wait", waitCondition: "load-state", loadState: "networkidle" })
# 2. Interact as a real user
cmux_browser({ action: "click", selector: "button.submit" })
cmux_browser({ action: "fill", selector: "input[name='email']", value: "[email protected]" })
cmux_browser({ action: "press", key: "Enter" })
# 3. Verify outcomes
cmux_browser({ action: "wait", waitCondition: "text", text: "Success" })
cmux_browser({ action: "snapshot", interactive: true }) # check element states
cmux_browser({ action: "get", selector: ".result", subaction: "textContent" })
cmux_browser({ action: "is", selector: ".modal", subaction: "visible" })
# 4. Screenshot as evidence
cmux_browser({ action: "screenshot" })
# 5. Check for errors after the interaction
cmux_browser({ action: "errors" })
Tips for reliable element selection:
snapshot to see the accessibility tree and find the right selectorsfind with role for semantic targeting: cmux_browser({ action: "find", subaction: "role", name: "Submit" })identify to see interactive elements on the pagedata-testid, aria-label, or semantic selectors over fragile CSS pathsBe skeptical. Agents often ship code that works for the happy path but breaks on edge cases. Actively try to break things:
| Test | How | What to look for |
|------|-----|-----------------|
| Empty state | Clear all data, visit pages with no content | Crashes, blank screens, missing "no data" messages |
| Empty inputs | Submit forms with empty required fields | Missing validation, silent failures, crashes |
| Long text | Paste 500+ character strings into inputs | Overflow, layout breaking, truncation without indication |
| Special characters | Input <script>alert(1)</script>, emoji 🎉, Unicode ñ | XSS, encoding errors, display issues |
| Rapid clicks | Double-click submit buttons, rapidly toggle switches | Duplicate submissions, race conditions, broken state |
| Back button | Navigate forward through a flow, then press back | Lost state, stale data, errors |
| Refresh | F5 / reload mid-flow | Lost state, errors, unexpected redirects |
| Network errors | Disconnect WiFi / block API calls (if possible) | Missing error handling, infinite spinners, blank screens |
# Example: test empty form submission
cmux_browser({ action: "click", selector: "button[type='submit']" })
cmux_browser({ action: "screenshot" })
cmux_browser({ action: "errors" })
# Example: test long text
cmux_browser({ action: "fill", selector: "input[name='title']", value: "A".repeat(500) })
cmux_browser({ action: "screenshot" })
# Example: test special characters
cmux_browser({ action: "fill", selector: "input[name='name']", value: "<script>alert('xss')</script>" })
cmux_browser({ action: "click", selector: "button[type='submit']" })
cmux_browser({ action: "screenshot" })
cmux_browser({ action: "errors" })
Inject axe-core and run a full accessibility audit:
# Inject axe-core library
cmux_browser({ action: "eval", value: `
await new Promise((resolve, reject) => {
const script = document.createElement('script');
script.src = 'https://cdnjs.cloudflare.com/ajax/libs/axe-core/4.10.2/axe.min.js';
script.onload = resolve;
script.onerror = reject;
document.head.appendChild(script);
});
const results = await axe.run();
return JSON.stringify({
violations: results.violations.map(v => ({
id: v.id,
impact: v.impact,
description: v.description,
helpUrl: v.helpUrl,
nodes: v.nodes.length
})),
passes: results.passes.length,
incomplete: results.incomplete.length,
inapplicable: results.inapplicable.length
}, null, 2);
` })
CDN fallback: If the CDN is unreachable (air-gapped CI, network restriction, outage), the script injection will fail. In that case, fall back to:
npx axe-cli http://localhost:3000 --save axe-report.json (CLI-based audit)Security note: For production use, add Subresource Integrity (SRI) verification to the script tag: script.integrity = "sha384-..."; script.crossOrigin = "anonymous"; (hash available on cdnjs.com).
Interpreting axe-core results:
Scoring accessibility: | Violations | Score | |-----------|-------| | 0 violations | 10 | | 1-3 minor | 8-9 | | 1-3 serious | 6-7 | | 4-10 mixed | 4-5 | | 10+ or any critical | 2-3 |
Run Lighthouse for performance auditing when requested:
npx lighthouse http://localhost:3000 \
--output=json \
--output-path=./lighthouse-report.json \
--chrome-flags="--headless --no-sandbox" \
--only-categories=performance,accessibility,best-practices \
--quiet
Then read and summarize the results:
read lighthouse-report.json
Produce a structured QA report. Always include:
If you started a dev server or Docker container in Step 1, clean up:
# Stop a dev server running in a cmux pane
cmux_close({ surface: "surface:N" })
# Or stop Docker containers
cd /path/to/project && docker compose down
This prevents orphaned processes and keeps the environment clean for the next test run.
# QA Report: [Feature/PR Name]
**Date:** YYYY-MM-DD
**App URL:** http://localhost:XXXX
**Tested by:** QA Agent
## Verdict: PASS ✅ / FAIL ❌
## Scores
| Dimension | Score | Notes |
|-----------|-------|-------|
| Functionality | X/10 | Brief justification |
| Completeness | X/10 | Brief justification |
| UX | X/10 | Brief justification |
| Robustness | X/10 | Brief justification |
| Accessibility | X/10 | N violations (N critical, N serious) |
**Average:** X.X/10
## Acceptance Criteria
- [x] Criterion 1 — PASS
- [x] Criterion 2 — PASS
- [ ] Criterion 3 — FAIL: [specific issue]
## Bugs Found
### 🔴 Bug 1: [Title] (Critical)
- **Steps:** 1. Navigate to /page 2. Click button 3. ...
- **Expected:** Form submits and shows success
- **Actual:** Page crashes with TypeError in console
- **Screenshot:** #3
### 🟡 Bug 2: [Title] (Major)
...
### 🔵 Bug 3: [Title] (Minor)
...
## Edge Cases Tested
| Test | Result | Notes |
|------|--------|-------|
| Empty form submission | ✅ PASS | Shows validation errors |
| Long text (500 chars) | ⚠️ WARN | Text overflows container |
| Special characters | ✅ PASS | Properly escaped |
| Back button | ❌ FAIL | State lost, shows blank page |
| Rapid double-click | ✅ PASS | Button disabled after first click |
## Accessibility (axe-core)
- **Violations:** N
- **Passes:** N
- **Details:**
- [serious] button-name: 2 buttons missing accessible names
- [moderate] color-contrast: 3 elements with insufficient contrast
## Screenshots
1. Initial load — [description]
2. After form submission — [description]
3. Bug #1 evidence — [description]
| Dimension | 10 (Exceptional) | 8-9 (Very Good) | 6-7 (Good) | 4-5 (Acceptable) | 2-3 (Poor) | |-----------|-------------------|-----------------|------------|------------------|------------| | Functionality | All criteria pass, flows smooth | — | Most pass, minor issues | Some criteria fail | Core flows broken | | Completeness | Everything built and working | — | Minor features missing | Significant gaps | Mostly stubs | | UX | Polished, delightful | — | Good, minor rough edges | Functional but clunky | Confusing/broken | | Robustness | Handles everything gracefully | — | Handles common cases | Some edge cases crash | Fragile | | Accessibility | 0 violations | 1-3 minor violations | 1-3 serious violations | 4-10 mixed violations | 10+ or any critical |
When reporting FAIL, always provide specific, actionable feedback the builder can use to fix the issues. Reference exact elements, URLs, and steps.
tools
# pi-a2a Long-Running Tasks Skill ## Overview The pi-a2a extension supports **long-running tasks** that can execute for hours or days without timeouts. This is essential for: - Data processing pipelines - Batch operations - Research and aggregation tasks - External API jobs with unpredictable duration - Any A2A task that exceeds the standard timeout ## When to Use **Use long-running tasks when:** - Task execution time is unpredictable or known to exceed 10 minutes - The remote agent is proc
development
Orchestrate cmux terminal panes — split terminals, run parallel processes, read output from other panes, and use the built-in browser. Use when working inside cmux and you need to run a dev server, watch tests, spawn sub-agents, or preview web pages.
testing
Review UI designs and implementations for accessibility, consistency, usability, and visual quality. Use when asked to review a design, audit accessibility, check UI consistency, compare implementation against mockups, or evaluate a user interface.
tools
Create, review, and improve skills for Pi agents. A skill is a folder with a SKILL.md that teaches an agent specialized workflows, domain knowledge, or tool integrations. Use when asked to create a new skill, improve an existing skill, review a skill for quality, scaffold a skill from a workflow, or convert documentation into a skill. Also triggers on "make a skill for", "build a skill", "skill for [topic]", "teach the agent to", or "package this workflow as a skill".