Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ncklrs/skills/ab-testing

Name: skills/ab-testing
Author: ncklrs

skills/ab-testing/SKILL.md

npx skillsauth add ncklrs/claude-chrome-user-testing skills/ab-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

A/B Testing Skill

Compare two URL variants with the same persona to determine which provides a better user experience. This skill guides the comparison logic and report generation.

When to Use

Comparing a redesign against current production
Testing feature flag variations
Evaluating staging vs production differences
Comparing different UX approaches

Core Concept

Multi-Persona Testing:  Same URL    + Different Personas → Compare user types
A/B Testing:            Same Persona + Different URLs    → Compare design variants

Comparison Metrics

Track these metrics for each variant:

| Metric | Better When | Comparison | |--------|-------------|------------| | Tasks Completed | Higher | X/Y vs X/Y | | Critical Issues | Lower | Count vs Count | | Major Issues | Lower | Count vs Count | | Minor Issues | Lower | Count vs Count | | Confusion Events | Lower | Count vs Count | | Frustration Triggers | Lower | Count vs Count |

Winner Determination

Per-Metric Winner

function getMetricWinner(metricName, valueA, valueB) {
  // For issues/confusion/frustration: lower is better
  if (['criticalIssues', 'majorIssues', 'minorIssues',
       'confusionEvents', 'frustrationTriggers'].includes(metricName)) {
    if (valueA < valueB) return 'A';
    if (valueB < valueA) return 'B';
    return 'Tie';
  }

  // For task completion: higher is better
  if (metricName === 'tasksCompleted') {
    if (valueA > valueB) return 'A';
    if (valueB > valueA) return 'B';
    return 'Tie';
  }

  return 'Tie';
}

Overall Winner

function getOverallWinner(results) {
  let aWins = 0;
  let bWins = 0;

  for (const metric of results.metrics) {
    if (metric.winner === 'A') aWins++;
    if (metric.winner === 'B') bWins++;
  }

  if (aWins > bWins) return 'A';
  if (bWins > aWins) return 'B';

  // Tiebreaker: task completion
  const taskA = results.variantA.tasksCompleted;
  const taskB = results.variantB.tasksCompleted;
  if (taskA > taskB) return 'A';
  if (taskB > taskA) return 'B';

  return 'Tie';
}

Issue Categorization

Variant-Specific Issues

Issues found only in one variant:

### Variant A Only Issues
- Navigation menu hidden on mobile
- Form validation error messages unclear

### Variant B Only Issues
- New feature tooltip blocks content
- Loading spinner too subtle

Common Issues

Issues found in both variants (baseline problems):

### Common Issues (Both Variants)
- Footer links have low contrast
- No keyboard focus indicators
- Missing alt text on hero image

Common issues should be fixed regardless of which variant wins.

Screenshot Strategy

Capture comparable screenshots at the same moments:

| Moment | Variant A | Variant B | |--------|-----------|-----------| | Landing Page | First impression | First impression | | Navigation | Menu interaction | Menu interaction | | Form Start | Beginning task | Beginning task | | Confusion | If/when confused | If/when confused | | Error | If error occurs | If error occurs | | Task Complete | Success state | Success state |

Screenshot Naming Convention

ab-test-{persona}-variant-{a|b}-{moment}-{timestamp}.png

Examples:
ab-test-genz-digital-native-variant-a-landing-143022.png
ab-test-genz-digital-native-variant-b-landing-143024.png
ab-test-genz-digital-native-variant-a-confusion-143156.png

Multi-Persona A/B Testing

When testing multiple personas on both variants:

Execution Order

Option 1: Persona-first (recommended)
For each persona:
  Test Variant A
  Test Variant B
  Record preference

Option 2: Variant-first
Test all personas on Variant A
Test all personas on Variant B
Compare results

Persona-first is recommended because it keeps each persona's experience fresh and comparable.

Consensus Calculation

function getConsensus(personaResults) {
  let aPrefs = 0;
  let bPrefs = 0;
  let ties = 0;

  for (const result of personaResults) {
    if (result.prefers === 'A') aPrefs++;
    else if (result.prefers === 'B') bPrefs++;
    else ties++;
  }

  const total = personaResults.length;

  if (aPrefs > bPrefs) {
    return `Variant A preferred by ${aPrefs}/${total} personas`;
  } else if (bPrefs > aPrefs) {
    return `Variant B preferred by ${bPrefs}/${total} personas`;
  } else {
    return `Split decision: ${aPrefs} prefer A, ${bPrefs} prefer B, ${ties} tied`;
  }
}

Narration Guidelines

Starting A/B Test

"I'm going to compare two versions of this site. Let me start with Variant A..."

Transitioning Between Variants

"Okay, I've finished testing Variant A. Now let me try Variant B with fresh eyes..."

Noting Differences

"Interesting - this is different from Variant A. The navigation is much clearer here."
"Hmm, I preferred how Variant A handled this form. This version feels cluttered."

Concluding Comparison

"After testing both versions, Variant B was definitely easier to use.
The clearer navigation and faster load times made a big difference."

Quiet Mode Output

When --quiet is active:

A/B Test: [persona-id]
Variant A: [url-a]
Variant B: [url-b]

Testing Variant A...
[Screenshot: variant-a-landing.png]
[Screenshot: variant-a-complete.png]

Testing Variant B...
[Screenshot: variant-b-landing.png]
[Screenshot: variant-b-complete.png]

# Results

| Metric | A | B | Winner |
|--------|---|---|--------|
| Tasks | 2/3 | 3/3 | B |
| Critical | 1 | 0 | B |
| Major | 2 | 1 | B |

**Winner: Variant B** (3 metric wins vs 0)

Key differences:
- B: Clearer call-to-action buttons
- B: Faster page load
- A: Form validation broke on submit

Session Recording

When recording A/B tests, create separate trace files:

recordings/
├── ab-test-genz-digital-native-variant-a-2025-01-06-143022.zip
├── ab-test-genz-digital-native-variant-b-2025-01-06-143156.zip

This allows reviewing each variant's test independently at trace.playwright.dev.

Report Templates

Single Persona A/B Report

# A/B Testing Comparison Report

## Test Configuration
- **Persona**: {{PERSONA_NAME}} ({{PERSONA_ID}})
- **Variant A (Control)**: {{URL_A}}
- **Variant B (Test)**: {{URL_B}}
- **Tasks**: {{TASKS}}
- **Date**: {{DATE}}

## Results Summary

| Metric | Variant A | Variant B | Winner |
|--------|-----------|-----------|--------|
| Tasks Completed | {{A_TASKS}} | {{B_TASKS}} | {{TASKS_WINNER}} |
| Critical Issues | {{A_CRITICAL}} | {{B_CRITICAL}} | {{CRITICAL_WINNER}} |
| Major Issues | {{A_MAJOR}} | {{B_MAJOR}} | {{MAJOR_WINNER}} |
| Minor Issues | {{A_MINOR}} | {{B_MINOR}} | {{MINOR_WINNER}} |

## Overall Winner: Variant {{WINNER}}
{{WINNER_EXPLANATION}}

## Variant A Issues
{{#A_ISSUES}}
- {{ISSUE}}
{{/A_ISSUES}}

## Variant B Issues
{{#B_ISSUES}}
- {{ISSUE}}
{{/B_ISSUES}}

## Common Issues
{{#COMMON_ISSUES}}
- {{ISSUE}}
{{/COMMON_ISSUES}}

## Recommendations
{{#RECOMMENDATIONS}}
1. {{RECOMMENDATION}}
{{/RECOMMENDATIONS}}

Multi-Persona A/B Report

# Multi-Persona A/B Comparison

## Configuration
- **Variant A**: {{URL_A}}
- **Variant B**: {{URL_B}}
- **Personas Tested**: {{PERSONA_COUNT}}

## Results Matrix

| Persona | A Tasks | B Tasks | A Issues | B Issues | Prefers |
|---------|---------|---------|----------|----------|---------|
{{#PERSONA_RESULTS}}
| {{PERSONA_ID}} | {{A_TASKS}} | {{B_TASKS}} | {{A_ISSUES}} | {{B_ISSUES}} | {{PREFERS}} |
{{/PERSONA_RESULTS}}

## Consensus: {{CONSENSUS}}

{{#PERSONA_BREAKDOWNS}}
### {{PERSONA_ID}}
- **Prefers**: {{PREFERS}}
- **Reason**: {{REASON}}
{{/PERSONA_BREAKDOWNS}}

Best Practices

Test Same Tasks: Ensure identical tasks are tested on both variants
Clear Browser State: Start each variant test with fresh state
Capture Comparable Screenshots: Take screenshots at same moments
Note Subjective Differences: Some preferences can't be quantified
Consider All Personas: Different users may prefer different variants
Fix Common Issues First: Issues in both variants need fixing regardless
Document Assumptions: Note any differences in variant structure

ncklrs/skills/ab-testing

skills/ab-testing/SKILL.md

# A/B Testing Skill Compare two URL variants with the same persona to determine which provides a better user experience. This skill guides the comparison logic and report generation. ## When to Use - Comparing a redesign against current production - Testing feature flag variations - Evaluating staging vs production differences - Comparing different UX approaches ## Core Concept ``` Multi-Persona Testing: Same URL + Different Personas → Compare user types A/B Testing: Same Per

14 stars

testing

Updated Apr 8, 2026

$ install --global

skillsauth

npx skillsauth add ncklrs/claude-chrome-user-testing skills/ab-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 8, 2026, 2:31 AM53.0s1 file scanned

SKILL.md

A/B Testing Skill

Compare two URL variants with the same persona to determine which provides a better user experience. This skill guides the comparison logic and report generation.

When to Use

Comparing a redesign against current production
Testing feature flag variations
Evaluating staging vs production differences
Comparing different UX approaches

Core Concept

Multi-Persona Testing:  Same URL    + Different Personas → Compare user types
A/B Testing:            Same Persona + Different URLs    → Compare design variants

Comparison Metrics

Track these metrics for each variant:

Winner Determination

Per-Metric Winner

function getMetricWinner(metricName, valueA, valueB) {
  // For issues/confusion/frustration: lower is better
  if (['criticalIssues', 'majorIssues', 'minorIssues',
       'confusionEvents', 'frustrationTriggers'].includes(metricName)) {
    if (valueA < valueB) return 'A';
    if (valueB < valueA) return 'B';
    return 'Tie';
  }

  // For task completion: higher is better
  if (metricName === 'tasksCompleted') {
    if (valueA > valueB) return 'A';
    if (valueB > valueA) return 'B';
    return 'Tie';
  }

  return 'Tie';
}

Overall Winner

function getOverallWinner(results) {
  let aWins = 0;
  let bWins = 0;

  for (const metric of results.metrics) {
    if (metric.winner === 'A') aWins++;
    if (metric.winner === 'B') bWins++;
  }

  if (aWins > bWins) return 'A';
  if (bWins > aWins) return 'B';

  // Tiebreaker: task completion
  const taskA = results.variantA.tasksCompleted;
  const taskB = results.variantB.tasksCompleted;
  if (taskA > taskB) return 'A';
  if (taskB > taskA) return 'B';

  return 'Tie';
}

Issue Categorization

Variant-Specific Issues

Issues found only in one variant:

### Variant A Only Issues
- Navigation menu hidden on mobile
- Form validation error messages unclear

### Variant B Only Issues
- New feature tooltip blocks content
- Loading spinner too subtle

Common Issues

Issues found in both variants (baseline problems):

### Common Issues (Both Variants)
- Footer links have low contrast
- No keyboard focus indicators
- Missing alt text on hero image

Common issues should be fixed regardless of which variant wins.

Screenshot Strategy

Capture comparable screenshots at the same moments:

Screenshot Naming Convention

ab-test-{persona}-variant-{a|b}-{moment}-{timestamp}.png

Examples:
ab-test-genz-digital-native-variant-a-landing-143022.png
ab-test-genz-digital-native-variant-b-landing-143024.png
ab-test-genz-digital-native-variant-a-confusion-143156.png

Multi-Persona A/B Testing

When testing multiple personas on both variants:

Execution Order

Option 1: Persona-first (recommended)
For each persona:
  Test Variant A
  Test Variant B
  Record preference

Option 2: Variant-first
Test all personas on Variant A
Test all personas on Variant B
Compare results

Persona-first is recommended because it keeps each persona's experience fresh and comparable.

Consensus Calculation

function getConsensus(personaResults) {
  let aPrefs = 0;
  let bPrefs = 0;
  let ties = 0;

  for (const result of personaResults) {
    if (result.prefers === 'A') aPrefs++;
    else if (result.prefers === 'B') bPrefs++;
    else ties++;
  }

  const total = personaResults.length;

  if (aPrefs > bPrefs) {
    return `Variant A preferred by ${aPrefs}/${total} personas`;
  } else if (bPrefs > aPrefs) {
    return `Variant B preferred by ${bPrefs}/${total} personas`;
  } else {
    return `Split decision: ${aPrefs} prefer A, ${bPrefs} prefer B, ${ties} tied`;
  }
}

Narration Guidelines

Starting A/B Test

"I'm going to compare two versions of this site. Let me start with Variant A..."

Transitioning Between Variants

"Okay, I've finished testing Variant A. Now let me try Variant B with fresh eyes..."

Noting Differences

"Interesting - this is different from Variant A. The navigation is much clearer here."
"Hmm, I preferred how Variant A handled this form. This version feels cluttered."

Concluding Comparison

"After testing both versions, Variant B was definitely easier to use.
The clearer navigation and faster load times made a big difference."

Quiet Mode Output

When --quiet is active:

A/B Test: [persona-id]
Variant A: [url-a]
Variant B: [url-b]

Testing Variant A...
[Screenshot: variant-a-landing.png]
[Screenshot: variant-a-complete.png]

Testing Variant B...
[Screenshot: variant-b-landing.png]
[Screenshot: variant-b-complete.png]

# Results

| Metric | A | B | Winner |
|--------|---|---|--------|
| Tasks | 2/3 | 3/3 | B |
| Critical | 1 | 0 | B |
| Major | 2 | 1 | B |

**Winner: Variant B** (3 metric wins vs 0)

Key differences:
- B: Clearer call-to-action buttons
- B: Faster page load
- A: Form validation broke on submit

Session Recording

When recording A/B tests, create separate trace files:

recordings/
├── ab-test-genz-digital-native-variant-a-2025-01-06-143022.zip
├── ab-test-genz-digital-native-variant-b-2025-01-06-143156.zip

This allows reviewing each variant's test independently at trace.playwright.dev.

Report Templates

Single Persona A/B Report

# A/B Testing Comparison Report

## Test Configuration
- **Persona**: {{PERSONA_NAME}} ({{PERSONA_ID}})
- **Variant A (Control)**: {{URL_A}}
- **Variant B (Test)**: {{URL_B}}
- **Tasks**: {{TASKS}}
- **Date**: {{DATE}}

## Results Summary

| Metric | Variant A | Variant B | Winner |
|--------|-----------|-----------|--------|
| Tasks Completed | {{A_TASKS}} | {{B_TASKS}} | {{TASKS_WINNER}} |
| Critical Issues | {{A_CRITICAL}} | {{B_CRITICAL}} | {{CRITICAL_WINNER}} |
| Major Issues | {{A_MAJOR}} | {{B_MAJOR}} | {{MAJOR_WINNER}} |
| Minor Issues | {{A_MINOR}} | {{B_MINOR}} | {{MINOR_WINNER}} |

## Overall Winner: Variant {{WINNER}}
{{WINNER_EXPLANATION}}

## Variant A Issues
{{#A_ISSUES}}
- {{ISSUE}}
{{/A_ISSUES}}

## Variant B Issues
{{#B_ISSUES}}
- {{ISSUE}}
{{/B_ISSUES}}

## Common Issues
{{#COMMON_ISSUES}}
- {{ISSUE}}
{{/COMMON_ISSUES}}

## Recommendations
{{#RECOMMENDATIONS}}
1. {{RECOMMENDATION}}
{{/RECOMMENDATIONS}}

Multi-Persona A/B Report

# Multi-Persona A/B Comparison

## Configuration
- **Variant A**: {{URL_A}}
- **Variant B**: {{URL_B}}
- **Personas Tested**: {{PERSONA_COUNT}}

## Results Matrix

| Persona | A Tasks | B Tasks | A Issues | B Issues | Prefers |
|---------|---------|---------|----------|----------|---------|
{{#PERSONA_RESULTS}}
| {{PERSONA_ID}} | {{A_TASKS}} | {{B_TASKS}} | {{A_ISSUES}} | {{B_ISSUES}} | {{PREFERS}} |
{{/PERSONA_RESULTS}}

## Consensus: {{CONSENSUS}}

{{#PERSONA_BREAKDOWNS}}
### {{PERSONA_ID}}
- **Prefers**: {{PREFERS}}
- **Reason**: {{REASON}}
{{/PERSONA_BREAKDOWNS}}

Best Practices

Test Same Tasks: Ensure identical tasks are tested on both variants
Clear Browser State: Start each variant test with fresh state
Capture Comparable Screenshots: Take screenshots at same moments
Note Subjective Differences: Some preferences can't be quantified
Consider All Personas: Different users may prefer different variants
Fix Common Issues First: Issues in both variants need fixing regardless
Document Assumptions: Note any differences in variant structure

Related Skills

ncklrs/skills/wcag-auditor

development

VerifiedTrustedCommunity

# WCAG Auditor Skill This skill provides WCAG 2.1 accessibility audit capabilities, including criteria definitions, check implementations, and scoring logic. ## Purpose Systematically evaluate web pages against WCAG 2.1 Level A and AA success criteria to identify accessibility barriers and provide remediation guidance. ## WCAG 2.1 Overview WCAG is organized around four principles (POUR): - **Perceivable**: Information must be presentable to users - **Operable**: Interface must be usable - *

14SKILL.mdUpdated Apr 8, 2026

ncklrs/skills/wcag-auditor

ncklrs/user-testing

development

VerifiedTrustedCommunity

Comprehensive persona-based user testing skill for web applications. Simulates how real users from different demographics interact with interfaces, including realistic timing, behavioral patterns, and frustration triggers. Use when: - Testing user interfaces before release - Validating UX flows from diverse perspectives - Conducting accessibility reviews - Optimizing onboarding or checkout experiences - Getting feedback on form design

14SKILL.mdUpdated Apr 8, 2026

ncklrs/skills/stripe-checkout

development

VerifiedTrustedCommunity

# Stripe Checkout Testing Skill This skill provides guidance for testing Stripe checkout flows with any persona. It handles test card data, form detection, and payment-specific narration. ## Purpose Enable realistic user testing of Stripe payment flows using official test cards, with persona-appropriate reactions to checkout experiences. ## Test Card Reference Load card data from `test-cards.json`. Key scenarios: | Scenario | Card | When to Use | |----------|------|-------------| | `succes

14SKILL.mdUpdated Apr 8, 2026

ncklrs/skills/stripe-checkout

ncklrs/skills/smoke-testing

testing

VerifiedTrustedCommunity

# Smoke Testing Skill Run pre-configured smoke tests for common user flows. Quick validation that critical functionality works. ## What is Smoke Testing? Smoke testing is a quick sanity check to ensure basic functionality works before deeper testing. The name comes from electronics - if you turn on a circuit and smoke comes out, you know something is wrong without further testing. ## When to Use - Before releases to catch obvious breaks - After deployments to verify functionality - In CI/CD

14SKILL.mdUpdated Apr 8, 2026

ncklrs/skills/smoke-testing

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ncklrs/claude-chrome-user-testing.git

# Copy into Claude Code skills folder (global)
cp -r claude-chrome-user-testing/skills/ab-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ncklrs/claude-chrome-user-testing

14 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT