code/ui-test/SKILL.md
Runs UI tests described in plain English by driving real Chrome via the Claude-in-Chrome extension. Covers end-to-end flows (clicks, forms, assertions), visual checks (screenshot + optional baseline diff), accessibility (axe-core), performance (Web Vitals + light Lighthouse-style metrics), and an interactive --debug mode that tails console + network and surfaces issues without needing explicit assertions. Accepts inline descriptions or test files in `./tests/ui/*.md`. Per-run folders contain artifacts (screenshots, console, network, axe report, perf metrics) plus a single-file 2026-aesthetic HTML report with a verdict-forward layout. Learns from per-run feedback — false-positive categories, screenshot strategy, severity floors, and per-project baselines personalize future runs. Use when the user wants to verify a UI flow, screenshot a regression, audit accessibility, profile a page, or debug something visibly broken in a real browser session.
npx skillsauth add mostafa-drz/claude-skills ui-testInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
On startup, use Read to load ~/.claude/skills/ui-test/preferences.md. If missing, treat as first-run (see First-time detection).
Defaults when no preferences exist:
output-root: ./test-results/ui-test/ (cwd-relative when in a project; falls back to ~/.claude/skills/ui-test/runs/ outside a project)viewport: 1440x900 (single) — accepts a comma list for matrix runs (e.g. 375x812,1440x900)categories: e2e,visual,a11y,perf (debug is a separate mode, not a category)screenshot-strategy: per-step (per-step / on-fail / never)a11y-severity-floor: serious (minor / moderate / serious / critical — anything at-or-above is a failure)perf-budget: lcp<=2500,cls<=0.1,inp<=200,js-heap<=50mb — overridable per-run via --perf-budgetverbosity: verbose-on-failure (concise / detailed / verbose-on-failure)spec-priority: inline-first (inline-first / project-first / both)auto-baseline: true (record on first run, diff thereafter)auto-open-report: trueconfirm-plan: always (always / only-multi-step / never)Also load ~/.claude/skills/ui-test/feedback-journal.md if present — surface the one-line Signal: from the latest 5 sessions to bias defaults (false-positive categories get pre-deselected, baselines flagged-as-flaky get re-recorded, etc.).
On startup, use Bash to detect: today's date (date +%Y-%m-%d-%H%M), the current working directory (pwd), whether ./tests/ui/ exists, whether the user is inside a git repo (git rev-parse --is-inside-work-tree — silently fail if not), and whether the configured output-root exists. Do NOT call any browser tool yet — preflight runs in Step 1.
Check $ARGUMENTS:
help → display help, stopconfig → run config flow, stopreset → delete preferences.md + feedback-journal.md + sessions/ + resume-state.md, confirm, stopfeedback → collect ratings on the most recent run (see Feedback section), stophistory → list recent run folders with title, verdict, timestamp, stopsetup → walk through Claude-in-Chrome extension setup (see Setup section), stoprecord <name> → take the most recent inline run and persist it as ./tests/ui/<name>.md, stop--debug <url|description> → enter interactive debug mode (see Debug section), stop--file <path> → load a single project test file and run it--suite <glob> → resolve the glob under ./tests/ui/, run each match in order$ARGUMENTS as an inline description, run it (Step 1 onward)ui-test — Drive real Chrome through plain-English UI tests
Usage:
/ui-test "<description>" Inline test, parsed and run
/ui-test --file <path> Run a single ./tests/ui/*.md file
/ui-test --suite "<glob>" Run a glob of project tests
/ui-test --debug "<url|description>" Interactive debug session
/ui-test record <name> Save the last inline run as a project file
/ui-test history List recent runs (verdict + timestamp)
/ui-test feedback Rate the last run (teaches the skill)
/ui-test setup Walk through Chrome extension setup
/ui-test config Set preferences
/ui-test reset Clear preferences + journal + sessions
/ui-test help This help
Per-run flags:
--env <url> --viewport <WxH[,WxH]> --only <csv> --skip <csv>
--baseline (force-record) --perf-budget "lcp<=2000,cls<=0.05"
--a11y-floor <minor|moderate|serious|critical>
--headful-pause (pause for keypress before each navigation)
--record-gif --verbose
--html-report (default on) | --no-html-report (skip the HTML report; print CLI only)
--open-report | --no-open-report (override auto-open-report preference)
Examples:
/ui-test "log in as [email protected] on staging, expect Dashboard"
/ui-test "homepage <2s load + passes a11y" --env https://acme.com --only perf,a11y
/ui-test --suite "checkout/*.md" --viewport 375x812,1440x900
/ui-test --debug "sidebar disappears on resize at /dashboard"
/ui-test --file tests/ui/login.md --baseline
Output: {output-root}/{YYYY-MM-DD-HHMM}-{kebab-slug}/ contains
report.html, data.json, spec.md, steps/, a11y/, perf/, visual/, sources.md.
Current preferences:
(loaded from preferences.md)
Use AskUserQuestion to collect, in batches of ≤4 per call:
Batch 1 — Output & viewport
./test-results/ui-test/)Batch 2 — Categories & strictness
4. Default categories enabled (e2e / visual / a11y / perf — multiSelect)
5. Screenshot strategy (per-step / on-fail / never)
6. a11y severity floor (minor / moderate / serious / critical)
7. Perf budget (free text — lcp<=2500,cls<=0.1,inp<=200)
Batch 3 — Workflow 8. Spec priority (inline-first / project-first / both) 9. Confirm plan before run (always / only-multi-step / never) 10. Auto-record baseline on first visual run (true/false)
Save to ~/.claude/skills/ui-test/preferences.md:
# /ui-test preferences
Updated: {date}
## Defaults
- output-root: {path}
- viewport: {WxH[,WxH]}
- categories: {csv}
- screenshot-strategy: {per-step|on-fail|never}
- a11y-severity-floor: {minor|moderate|serious|critical}
- perf-budget: {lcp<=...,cls<=...,inp<=...}
- verbosity: {concise|detailed|verbose-on-failure}
- spec-priority: {inline-first|project-first|both}
- auto-baseline: {true|false}
- auto-open-report: {true|false}
- confirm-plan: {always|only-multi-step|never}
## Profile (optional — edit freely)
- Default base URLs by env: staging=, prod=, local=http://localhost:3000
- Standing exclusions (selectors / pages to ignore for visual diff):
- Standing a11y rule overrides (e.g., color-contrast: skip on /admin):
- Trusted slow third-parties (perf budget exclusions):
## Learned
<!-- Auto-appended from feedback over time -->
Confirm: "Saved. I'll use this as the baseline — and I'll keep sharpening as you rate runs."
Delete preferences.md, feedback-journal.md, sessions/, resume-state.md (each via Bash(test -f ... && rm) style — graceful on missing). Confirm: "All cleared. Starting fresh on the next run."
If no preferences file exists, show one warm non-blocking line:
First time running
/ui-test— describe a test in plain English, I drive real Chrome via the Claude-in-Chrome extension and produce a per-run folder + single-file HTML report. Run/ui-test setupif the extension isn't connected yet, or/ui-test configto set defaults — otherwise continue and I'll auto-check the connection.
Then proceed to Step 1 (preflight). After the first successful run, inline-prompt for 1-2 quick prefs (categories, screenshot strategy) — don't force the full config flow.
When invoked as /ui-test setup (or when Step 1 preflight fails and the user asks for help), load ~/.claude/skills/ui-test/reference/setup.md via Read and walk through it: prerequisites checklist (browser / plan / Claude Code version / extension version), enable steps (/chrome, /mcp), and a troubleshooting table. Reference: https://code.claude.com/docs/en/chrome — fetch via WebFetch if a detail in the local guide looks stale.
After the user says they're set up, run preflight (Step 1). On pass: "Connection good. Paste a test description: /ui-test \"<description>\"."
Read ~/.claude/skills/ui-test/preferences.md and ~/.claude/skills/ui-test/feedback-journal.md. Continue silently if either is missing. Extract the latest 5 Signal: lines and apply them as biases (e.g., "user keeps flagging color-contrast as false positive on /admin → pre-deselect that rule for this run").
tabs_context_mcp.Hmm — the Claude-in-Chrome extension isn't responding.
Common causes:
• Extension not installed → https://chromewebstore.google.com/detail/claude/fcoeoabgfenejglbffodgkkbkcdhcgfn
• Chrome integration not enabled → run /chrome → "Enable"
• Extension idle → run /chrome → "Reconnect extension"
• First time ever → restart Chrome once after enabling, then retry
Need the full walkthrough? Run: /ui-test setup
One retry allowed: if the user replies "retry", call tabs_context_mcp once more. If still failing, route to /ui-test setup.
Decide what to run from $ARGUMENTS, honoring spec-priority:
--file <path> → load that file, parse its frontmatter (env, viewport, categories) + body.--suite "<glob>" → resolve under ./tests/ui/, queue each, run sequentially../tests/ui/ exists and spec-priority is project-first or both, list project files via Glob and offer them via AskUserQuestion. Otherwise prompt: "Describe the test in one or two sentences."Also extract per-run flags: --env, --viewport, --only, --skip, --baseline, --perf-budget, --a11y-floor, --headful-pause, --record-gif, --verbose.
Convert the spec into a structured plan:
{
title: kebab-slug,
base_url: env || inferred,
viewports: [{w, h}, ...],
categories: filtered by --only/--skip and preference defaults,
steps: [
{ id, kind: "navigate" | "click" | "fill" | "expect" | "wait" | "screenshot",
target: selector | text | role,
value: string | regex,
timeout_ms: 5000,
assertions: [{ type: "text" | "visible" | "url" | "console-clean" | "network-2xx", ... }] }
],
perf_budget: parsed,
a11y_floor: level,
screenshot_strategy: from prefs unless overridden,
}
Write it to spec.md inside the run folder once Step 5 has created it. The plan is the source of truth — every later step references step IDs.
Honor confirm-plan preference:
always → always showonly-multi-step → skip if plan has 1 step + 0 assertionsnever → only echo a one-liner: "Running: {title} ({N} steps, {categories})"Format:
ui-test plan — {title}
Base URL: {url}
Viewport: {WxH[, WxH...]}
Categories: e2e ✓ visual ✓ a11y ✓ perf ✓
Steps:
1. navigate → /login
2. fill #email = [email protected]
3. fill #password = ••••
4. click button[type=submit]
5. expect h1 contains "Dashboard"
6. screenshot full-page
...
Estimated: {N} browser ops, {M} screenshots, ~{seconds}s
[Run] [Tweak] [Skip]
AskUserQuestion: "Run this plan?" — pre-select Run if Learned rules don't contradict.
Slug: {YYYY-MM-DD-HHMM}-{kebab-title-max-60-chars}. Title summarizes the outcome being verified, not the URL: 2026-05-08-1430-login-flow-staging, 2026-05-08-1430-checkout-a11y-mobile.
mkdir -p {output-root}/{slug}/{steps,a11y,perf,visual}
Isolation rule: only the per-run folder is created/written. Never touch sibling folders. Baselines live at {output-root}/baselines/{title}/{viewport}/{step}.png so they're shared across runs of the same test.
For each viewport (matrix outer loop), for each step:
tabs_context_mcp once at the start of the run; create a fresh tab via tabs_create_mcp. Resize via resize_window to the current viewport. Never reuse tab IDs from a prior session.mcp__claude-in-chrome__navigate with the resolved URL.click / fill → prefer find to locate by role/text first, then javascript_tool for the actual interaction (more deterministic than relying on visual coordinates).fill on real form fields → form_input is fine for simple cases.wait → javascript_tool polling with timeout.javascript_tool returns a JSON { pass: bool, actual, expected, evidence }. Auto-assertions every step:
read_console_messages since last step → fail on error level unless allowlisted in Profile.read_network_requests since last step → fail on 5xx, optionally 4xx based on prefs.screenshot-strategy): use javascript_tool to call html2canvas (load via CDN if missing, single attempt; fall back to logging the gap) and write the PNG to steps/{NN}-{stepId}.png. If a step fails, screenshot regardless of strategy.data.json — { id, status, duration_ms, screenshot, console, network, assertions }.skipped. Other viewports continue.Browser dialog rule (from MCP server instructions): if any step would trigger an alert/confirm/prompt, refuse to run it and mark the step blocked with a clear note. Never click elements that may surface a modal — the extension becomes unresponsive.
Batching: when many similar reads are needed in a row (selectors, text), prefer browser_batch over a serial loop.
Optional GIF: if --record-gif, wrap the run with gif_creator start/stop calls and save to run.gif.
After the main step loop, run category-specific passes:
a11y — inject axe-core via javascript_tool:
const s = document.createElement('script');
s.src = 'https://cdnjs.cloudflare.com/ajax/libs/axe-core/4.10.0/axe.min.js';
document.head.appendChild(s);
Wait for window.axe, run axe.run(), write a11y/violations.json. Filter by a11y-severity-floor. If CDN load fails, log the gap and skip — don't fake-pass.
perf — javascript_tool reading performance.getEntriesByType('navigation'), PerformanceObserver for LCP/CLS/INP, performance.memory for JS heap. Combine with read_network_requests for waterfall. Write perf/metrics.json. Compare against perf_budget; mark each metric pass/fail.
visual — for each screenshot step: if auto-baseline=true and no baseline exists, save as baseline; else compute pixel diff (use javascript_tool with a tiny inline diff implementation, or skip diff and just save the new snapshot if diff lib unavailable). Write visual/{step}.png + visual/{step}.diff.png. Threshold from prefs (default 0.1% pixel delta).
Each category degrades gracefully — a CDN block, a browser quirk, or a missing API does NOT fail the run. It logs the gap in sources.md and shows up in the report's "Coverage gaps" panel so the verdict is honest about what was actually checked.
Compose verdict per run:
PASS — {title}FAIL — {first-failing-step-summary}PASS WITH ISSUES — {N} categories below barINCONCLUSIVE — {gap reason}Generate report.html (skip if --no-html-report). Single self-contained file: inline JS ≤200 lines, no external scripts, CSS variables only, dark + light via prefers-color-scheme, mobile-responsive, bento-grid layout (12-col grid with mixed-size cards) reflecting current 2026 dashboard trends. Verdict-forward: hero card with verdict pill + screenshot, perf bento row (4 metric cards with conic-gradient rings + sparklines), step timeline with filter chips, a11y panel with severity badges, visual diff strip, coverage panel, footer.
Skeleton source — copy the structure and styles from ~/.claude/skills/ui-test/examples/sample-report.html (a fully-rendered reference with mock data). Replace its hardcoded data with the run's actual values. Load ~/.claude/skills/ui-test/reference/report.md for the full layout spec and inline-JS responsibilities if more detail is needed.
Print to terminal (verbosity-aware):
ui-test — {title}
{VERDICT} {one-line summary}
Steps: {pass}/{total} passed ({fail} failed, {skip} skipped)
a11y: {N} violations ≥ {floor} ({pass|fail})
perf: LCP {ms}ms · CLS {n} · INP {ms}ms ({pass|fail})
visual: {changed}/{total} snapshots over threshold
Folder: {output-root}/{slug}/
Report: {output-root}/{slug}/report.html
[Open report] [/ui-test feedback] [/ui-test record <name>]
If auto-open-report=true, run open {report.html}.
Append a one-page session log to ~/.claude/skills/ui-test/sessions/{slug}.md:
# {slug}
Date: {timestamp}
Spec: {one-line summary or path}
Verdict: {label}
Failures: {short list}
Categories run: {csv}
Coverage gaps: {csv if any}
End with one soft line:
Run
/ui-test feedback— even one rating sharpens future runs (esp. flagging false-positive a11y rules or flaky baselines).
When invoked as /ui-test feedback:
sessions/. If none, say No recent run found. and stop.AskUserQuestion (3-4 questions, one batch — pre-select using Learned):
~/.claude/skills/ui-test/feedback-journal.md:## {slug} — {date}
- Verdict: {label}
- Correctness: {answer}
- Misfiring categories: {csv}
- Screenshot strategy: {answer}
- Notes: {free text}
- Signal: {one-line generalization, e.g., "color-contrast on /admin keeps false-positive — exempt rule"}
Signal: (substring match on the key phrase), promote it to ## Learned in preferences.md and mention once: "Noticed {pattern} across recent runs — saving as a standing default."Demoted: <reason> line). Never leave stale rules in Learned.The journal is human-editable — respect anything the user writes there directly.
--debug)Invoked as /ui-test --debug "<url-or-description>". Skips the assertion-driven flow; instead, runs an investigative session:
[Reproduce now] (waits for the user to act in the browser, then re-snapshots), [Resize {viewport}], [Run a11y], [Run perf], [Inspect selector], [Tail console for 5s], [Done — write report].debug/iter-NN.{png,console.json,network.json}.[Done], generate report.html (same template, "Debug" verdict variant — focuses on findings, not pass/fail) and a findings.md with prioritized issues (Critical / Medium / Low)./ui-test record <name>."Debug mode never asserts and never fails — it surfaces.
record <name>)Invoked as /ui-test record <name>:
./tests/ui/<name>.md (creating the directory if missing) with:---
title: <Human Title>
env: {base url}
viewport: 1440x900
categories: e2e,a11y
---
# <Human Title>
> Recorded from run {slug} on {date}.
## Steps
1. Navigate to /login
2. Fill #email = [email protected]
3. ...
## Assertions
- Page contains heading "Dashboard"
- No console errors
- All network requests 2xx-3xx
/ui-test --file tests/ui/<name>.md or include in a suite."{YYYY-MM-DD-HHMM}-{kebab-title} — title ≤ 60 chars, summarizes the verification, not just the URL{NN}-{kebab-step}.png — NN zero-padded order{output-root}/baselines/{title}/{viewport}/{step}.png — shared across runs of the same test./tests/ui/<kebab-name>.md — never .test.md suffix-2, -3 for collisions; differentiate by viewport or envspec.md so the user can see exactly what was parsed before anything moves.tabs_context_mcp first; always create a new tab; never reuse a tab ID from a prior session.reset and --baseline (overwrites baselines) print what they'll touch and require a yes.resume-state.md with the next step ID and invite /ui-test resume (future enhancement; not yet a subcommand).spec.md; never silently rewrite assertions.development
--- name: triage-board description: >- Generates a structured triage artifact from the current conversation's findings — a self-contained Desktop folder with a JSON Schema, schema-conformant report.json, prose markdown, and a single-file HTML viewer. Viewer ships with MD / CSV / JSON download buttons in the header and a per-finding "Copy as Markdown" action that produces a GitHub/Linear/Notion-ready ticket block. Stateless — triage state lives in the user's ticket system, not in the
development
Runs a beginner-mind end-to-end UI audit of any running app — local dev server, staging, production, or a specific URL. Drives Chrome through every interactive element on the target surface, collects structured findings (severity, category, where, symptom, impact, repro, triage), and hands the result off to `/triage-board` which produces the Desktop folder (schema + JSON + Markdown + single-file HTML viewer with MD/CSV/JSON exports and a per-finding Copy as Markdown button). Use when you want fresh-eyes verification of a feature, page, modal, flow, branch, or whole app — before shipping, before review, before a demo, or any time the UI deserves a careful poke.
development
Reviews the user's past Claude Code conversations from a wellbeing perspective — sentiment, tone, emotional arc, recurring patterns — and generates a supportive, science-grounded report in both Markdown and HTML. Default lookback is 48 hours across all projects. Uses recognised emotion frameworks (Plutchik, Ekman, Russell's circumplex, Pennebaker linguistic markers) and cites the science behind every observation. Learns the user's baseline tone over time so future reports flag genuine shifts, not noise. Use when the user asks for an emotional/wellbeing recap, mood check, sentiment review, or wants to understand their own ups and downs across recent work sessions.
development
--- name: workflow-advisor description: >- Analyzes recent Claude Code conversations and local Claude state (skills, settings, memory files, CLAUDE.md), researches the latest Claude Code features and best practices online, and suggests one workflow improvement at a time with reasoning and a concrete action item. Can save accepted suggestions to memory for tracking. Use when you want to discover underused Claude Code features, improve your development workflow, stay current with the lat