skills/ab-testing/SKILL.md
# A/B Testing Skill Compare two URL variants with the same persona to determine which provides a better user experience. This skill guides the comparison logic and report generation. ## When to Use - Comparing a redesign against current production - Testing feature flag variations - Evaluating staging vs production differences - Comparing different UX approaches ## Core Concept ``` Multi-Persona Testing: Same URL + Different Personas → Compare user types A/B Testing: Same Per
npx skillsauth add ncklrs/claude-chrome-user-testing skills/ab-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Compare two URL variants with the same persona to determine which provides a better user experience. This skill guides the comparison logic and report generation.
Multi-Persona Testing: Same URL + Different Personas → Compare user types
A/B Testing: Same Persona + Different URLs → Compare design variants
Track these metrics for each variant:
| Metric | Better When | Comparison | |--------|-------------|------------| | Tasks Completed | Higher | X/Y vs X/Y | | Critical Issues | Lower | Count vs Count | | Major Issues | Lower | Count vs Count | | Minor Issues | Lower | Count vs Count | | Confusion Events | Lower | Count vs Count | | Frustration Triggers | Lower | Count vs Count |
function getMetricWinner(metricName, valueA, valueB) {
// For issues/confusion/frustration: lower is better
if (['criticalIssues', 'majorIssues', 'minorIssues',
'confusionEvents', 'frustrationTriggers'].includes(metricName)) {
if (valueA < valueB) return 'A';
if (valueB < valueA) return 'B';
return 'Tie';
}
// For task completion: higher is better
if (metricName === 'tasksCompleted') {
if (valueA > valueB) return 'A';
if (valueB > valueA) return 'B';
return 'Tie';
}
return 'Tie';
}
function getOverallWinner(results) {
let aWins = 0;
let bWins = 0;
for (const metric of results.metrics) {
if (metric.winner === 'A') aWins++;
if (metric.winner === 'B') bWins++;
}
if (aWins > bWins) return 'A';
if (bWins > aWins) return 'B';
// Tiebreaker: task completion
const taskA = results.variantA.tasksCompleted;
const taskB = results.variantB.tasksCompleted;
if (taskA > taskB) return 'A';
if (taskB > taskA) return 'B';
return 'Tie';
}
Issues found only in one variant:
### Variant A Only Issues
- Navigation menu hidden on mobile
- Form validation error messages unclear
### Variant B Only Issues
- New feature tooltip blocks content
- Loading spinner too subtle
Issues found in both variants (baseline problems):
### Common Issues (Both Variants)
- Footer links have low contrast
- No keyboard focus indicators
- Missing alt text on hero image
Common issues should be fixed regardless of which variant wins.
Capture comparable screenshots at the same moments:
| Moment | Variant A | Variant B | |--------|-----------|-----------| | Landing Page | First impression | First impression | | Navigation | Menu interaction | Menu interaction | | Form Start | Beginning task | Beginning task | | Confusion | If/when confused | If/when confused | | Error | If error occurs | If error occurs | | Task Complete | Success state | Success state |
ab-test-{persona}-variant-{a|b}-{moment}-{timestamp}.png
Examples:
ab-test-genz-digital-native-variant-a-landing-143022.png
ab-test-genz-digital-native-variant-b-landing-143024.png
ab-test-genz-digital-native-variant-a-confusion-143156.png
When testing multiple personas on both variants:
Option 1: Persona-first (recommended)
For each persona:
Test Variant A
Test Variant B
Record preference
Option 2: Variant-first
Test all personas on Variant A
Test all personas on Variant B
Compare results
Persona-first is recommended because it keeps each persona's experience fresh and comparable.
function getConsensus(personaResults) {
let aPrefs = 0;
let bPrefs = 0;
let ties = 0;
for (const result of personaResults) {
if (result.prefers === 'A') aPrefs++;
else if (result.prefers === 'B') bPrefs++;
else ties++;
}
const total = personaResults.length;
if (aPrefs > bPrefs) {
return `Variant A preferred by ${aPrefs}/${total} personas`;
} else if (bPrefs > aPrefs) {
return `Variant B preferred by ${bPrefs}/${total} personas`;
} else {
return `Split decision: ${aPrefs} prefer A, ${bPrefs} prefer B, ${ties} tied`;
}
}
"I'm going to compare two versions of this site. Let me start with Variant A..."
"Okay, I've finished testing Variant A. Now let me try Variant B with fresh eyes..."
"Interesting - this is different from Variant A. The navigation is much clearer here."
"Hmm, I preferred how Variant A handled this form. This version feels cluttered."
"After testing both versions, Variant B was definitely easier to use.
The clearer navigation and faster load times made a big difference."
When --quiet is active:
A/B Test: [persona-id]
Variant A: [url-a]
Variant B: [url-b]
Testing Variant A...
[Screenshot: variant-a-landing.png]
[Screenshot: variant-a-complete.png]
Testing Variant B...
[Screenshot: variant-b-landing.png]
[Screenshot: variant-b-complete.png]
# Results
| Metric | A | B | Winner |
|--------|---|---|--------|
| Tasks | 2/3 | 3/3 | B |
| Critical | 1 | 0 | B |
| Major | 2 | 1 | B |
**Winner: Variant B** (3 metric wins vs 0)
Key differences:
- B: Clearer call-to-action buttons
- B: Faster page load
- A: Form validation broke on submit
When recording A/B tests, create separate trace files:
recordings/
├── ab-test-genz-digital-native-variant-a-2025-01-06-143022.zip
├── ab-test-genz-digital-native-variant-b-2025-01-06-143156.zip
This allows reviewing each variant's test independently at trace.playwright.dev.
# A/B Testing Comparison Report
## Test Configuration
- **Persona**: {{PERSONA_NAME}} ({{PERSONA_ID}})
- **Variant A (Control)**: {{URL_A}}
- **Variant B (Test)**: {{URL_B}}
- **Tasks**: {{TASKS}}
- **Date**: {{DATE}}
## Results Summary
| Metric | Variant A | Variant B | Winner |
|--------|-----------|-----------|--------|
| Tasks Completed | {{A_TASKS}} | {{B_TASKS}} | {{TASKS_WINNER}} |
| Critical Issues | {{A_CRITICAL}} | {{B_CRITICAL}} | {{CRITICAL_WINNER}} |
| Major Issues | {{A_MAJOR}} | {{B_MAJOR}} | {{MAJOR_WINNER}} |
| Minor Issues | {{A_MINOR}} | {{B_MINOR}} | {{MINOR_WINNER}} |
## Overall Winner: Variant {{WINNER}}
{{WINNER_EXPLANATION}}
## Variant A Issues
{{#A_ISSUES}}
- {{ISSUE}}
{{/A_ISSUES}}
## Variant B Issues
{{#B_ISSUES}}
- {{ISSUE}}
{{/B_ISSUES}}
## Common Issues
{{#COMMON_ISSUES}}
- {{ISSUE}}
{{/COMMON_ISSUES}}
## Recommendations
{{#RECOMMENDATIONS}}
1. {{RECOMMENDATION}}
{{/RECOMMENDATIONS}}
# Multi-Persona A/B Comparison
## Configuration
- **Variant A**: {{URL_A}}
- **Variant B**: {{URL_B}}
- **Personas Tested**: {{PERSONA_COUNT}}
## Results Matrix
| Persona | A Tasks | B Tasks | A Issues | B Issues | Prefers |
|---------|---------|---------|----------|----------|---------|
{{#PERSONA_RESULTS}}
| {{PERSONA_ID}} | {{A_TASKS}} | {{B_TASKS}} | {{A_ISSUES}} | {{B_ISSUES}} | {{PREFERS}} |
{{/PERSONA_RESULTS}}
## Consensus: {{CONSENSUS}}
{{#PERSONA_BREAKDOWNS}}
### {{PERSONA_ID}}
- **Prefers**: {{PREFERS}}
- **Reason**: {{REASON}}
{{/PERSONA_BREAKDOWNS}}
development
# WCAG Auditor Skill This skill provides WCAG 2.1 accessibility audit capabilities, including criteria definitions, check implementations, and scoring logic. ## Purpose Systematically evaluate web pages against WCAG 2.1 Level A and AA success criteria to identify accessibility barriers and provide remediation guidance. ## WCAG 2.1 Overview WCAG is organized around four principles (POUR): - **Perceivable**: Information must be presentable to users - **Operable**: Interface must be usable - *
development
Comprehensive persona-based user testing skill for web applications. Simulates how real users from different demographics interact with interfaces, including realistic timing, behavioral patterns, and frustration triggers. Use when: - Testing user interfaces before release - Validating UX flows from diverse perspectives - Conducting accessibility reviews - Optimizing onboarding or checkout experiences - Getting feedback on form design
development
# Stripe Checkout Testing Skill This skill provides guidance for testing Stripe checkout flows with any persona. It handles test card data, form detection, and payment-specific narration. ## Purpose Enable realistic user testing of Stripe payment flows using official test cards, with persona-appropriate reactions to checkout experiences. ## Test Card Reference Load card data from `test-cards.json`. Key scenarios: | Scenario | Card | When to Use | |----------|------|-------------| | `succes
testing
# Smoke Testing Skill Run pre-configured smoke tests for common user flows. Quick validation that critical functionality works. ## What is Smoke Testing? Smoke testing is a quick sanity check to ensure basic functionality works before deeper testing. The name comes from electronics - if you turn on a circuit and smoke comes out, you know something is wrong without further testing. ## When to Use - Before releases to catch obvious breaks - After deployments to verify functionality - In CI/CD