Where This Fits

Main Chat                          Task Agent
─────────────────────────────────────────────────────
/goal <condition> (set at phase entry; refires turns)
dev-implement (loads dev-tdd)
  → dev-delegate
    → Task agent ──────────────────→ uses dev-test (this skill)
                                       ↓ loads dev-tdd again
                                     has TDD protocol + gates
                                       → routes to specific tool

<EXTREMELY-IMPORTANT> ## Load TDD Enforcement (REQUIRED)

Before choosing testing tools, you MUST load the TDD skill to ensure gate compliance:

Read ${CLAUDE_SKILL_DIR}/../../skills/dev-tdd/SKILL.md and follow its instructions.

This loads:

Task reframing (your job is writing tests, not features)
The Execution Gate (6 mandatory gates before E2E testing)
GATE 5: READ LOGS (mandatory - cannot skip)
The Iron Law of TDD (test-first approach)

Read dev-tdd skill content now before selecting testing tools. </EXTREMELY-IMPORTANT>

This skill routes to the right testing tool. The loaded dev-tdd skill provides TDD protocol details.

The Iron Law
Browser Testing Decision Tree
Platform Detection
Sub-Skills Reference
Unit & Integration Tests

<EXTREMELY-IMPORTANT> ## The Iron Law of Testing

YOU MUST WRITE E2E TESTS FOR USER-FACING FEATURES. This is not negotiable.

When your changes affect what users see or interact with, you MUST:

Write an E2E test that simulates user behavior
Run it and verify it PASSES (not just unit tests)
Document: "E2E: [test name] passes with [evidence]"
Include screenshot/snapshot for visual changes

Unit tests prove components work. E2E tests prove YOUR feature works for users.

E2E Facts

Manual testing leaves no evidence and misses regressions — "it worked when I tried it" becomes an unverified claim the moment the code changes again. Capture the manual check as an automated test.
Backend-only changes still need E2E when they affect user-visible output; "this is just backend" classifies by where the code lives, not by what the user sees.
"The UI is unchanged" is an assumption until a visual snapshot proves it; asserting it without one is an unverified claim presented as fact.

Fake E2E Detection - STOP

If your "E2E test" does any of these, it's NOT E2E:

| Pattern | Why It's Fake | Real E2E Alternative | |---------|---------------|----------------------| | grep "success" logs.txt | Only proves code ran | Verify actual output file/UI/API response | | assert mock.called | Tests mock, not real system | Use real integration, verify real data | | cat output.txt \| wc -l | File exists ≠ correct content | Read file, assert exact expected content | | "I ran it manually" | No automation = no evidence | Capture manual test as automated test | | Check log for icon name | Observability, not verification | Screenshot + visual diff of rendered icon | | Exit code 0 | Process succeeded ≠ output correct | Verify the actual output data |

The test: If removing the actual implementation still passes your "E2E test", it's fake.

Example of fake E2E that caught nothing:

# FAKE E2E - only checks logs
def test_icon_theme_change():
    run_command("set-theme papirus")
    logs = read_logs()
    assert "papirus" in logs  # ❌ FAKE - only proves code ran
    # BUG: 89% of icons weren't changed, test still passed!

Real E2E that would have caught the bug:

# REAL E2E - verifies actual output
def test_icon_theme_change():
    run_command("set-theme papirus")
    screenshot = capture_desktop()
    assert visual_diff(screenshot, "expected_papirus.png") < threshold  # ✅ REAL
    # This would have shown 89% of icons were wrong

</EXTREMELY-IMPORTANT>

Browser Testing Decision Tree

┌─────────────────────────────────────────────────────────────────┐
│                    BROWSER TESTING REQUIRED?                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────────┐
        │  Need to debug JS errors or API calls?       │
        │  (console.log, network requests, XHR)        │
        └─────────────────────────────────────────────┘
                    │                    │
                   YES                   NO
                    │                    │
                    ▼                    ▼
        ┌───────────────────┐  ┌──────────────────────────┐
        │   CHROME MCP      │  │  Running in CI/CD?        │
        │   (debugging)     │  │  (headless, automated)    │
        └───────────────────┘  └──────────────────────────┘
                                      │           │
                                     YES          NO
                                      │           │
                                      ▼           ▼
                        ┌──────────────┐  ┌───────────────────┐
                        │ PLAYWRIGHT   │  │ Cross-browser     │
                        │ MCP          │  │ needed?           │
                        └──────────────┘  └───────────────────┘
                                                │          │
                                               YES         NO
                                                │          │
                                                ▼          ▼
                                    ┌──────────────┐ ┌────────────┐
                                    │ PLAYWRIGHT   │ │ Either OK  │
                                    │ MCP          │ │ (prefer    │
                                    └──────────────┘ │ Playwright)│
                                                     └────────────┘

<EXTREMELY-IMPORTANT> ### Iron Laws: Browser MCP Selection

YOU MUST USE CHROME MCP FOR API/CONSOLE DEBUGGING. NO EXCEPTIONS. YOU MUST USE PLAYWRIGHT MCP FOR CI/CD TESTING. NO EXCEPTIONS.

Quick Decision Table

| Need | Tool | Why | |------|------|-----| | Debug console errors | Chrome MCP | read_console_messages | | Inspect API calls/responses | Chrome MCP | read_network_requests | | Execute custom JS in page | Chrome MCP | javascript_tool | | Record interaction as GIF | Chrome MCP | gif_creator | | Headless/CI automation | Playwright MCP | Headless mode | | Cross-browser testing | Playwright MCP | Firefox/WebKit support | | Standard E2E suite | Playwright MCP | Test isolation, maturity | | Interactive debugging | Chrome MCP | Real browser, console access |

Capability Comparison

| Capability | Playwright MCP | Chrome MCP | |------------|---------------|------------| | Navigate/click/type | ✅ | ✅ | | Accessibility tree | ✅ browser_snapshot | ✅ read_page | | Screenshots | ✅ | ✅ | | Console messages | ❌ | ✅ read_console_messages | | Network requests | ❌ | ✅ read_network_requests | | JavaScript execution | ❌ | ✅ javascript_tool | | GIF recording | ❌ | ✅ gif_creator | | Headless mode | ✅ | ❌ (requires visible browser) | | Cross-browser | ✅ (Chromium/Firefox/WebKit) | ❌ (Chrome only) | | Natural language find | ❌ | ✅ find |

Browser MCP Facts

Playwright MCP cannot read console messages or network requests — debugging those requires Chrome MCP (read_console_messages, read_network_requests). Chrome MCP cannot run headless — CI/CD requires Playwright MCP. Choosing by familiarity instead of by these constraints produces a test that cannot observe what it claims to verify.
An API response inferred from code is not the real response — real data differs. Asserting behavior from inference instead of read_network_requests is an unverified claim presented as fact. </EXTREMELY-IMPORTANT>

Platform Detection

Detect the operating system and display server to select the appropriate testing tool:

# Detect platform for desktop automation
case "$(uname -s)" in
    Darwin) echo "macOS - use dev-test-hammerspoon" ;;
    Linux)
        if [ "$XDG_SESSION_TYPE" = "wayland" ]; then
            echo "Linux/Wayland - use dev-test-linux (ydotool)"
        else
            echo "Linux/X11 - use dev-test-linux (xdotool)"
        fi
        ;;
esac

Desktop Automation Decision Tree

┌─────────────────────────────────────────────────────────────────┐
│                   DESKTOP AUTOMATION REQUIRED?                   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │   Platform?      │
                    └─────────────────┘
                    /        |         \
                 macOS    Linux      Windows
                   │         │           │
                   ▼         ▼           ▼
        ┌──────────────┐ ┌─────────┐ ┌─────────┐
        │ HAMMERSPOON  │ │ LINUX   │ │ NOT     │
        │ (dev-test-   │ │ (dev-   │ │ SUPPORTED│
        │ hammerspoon) │ │ test-   │ └─────────┘
        └──────────────┘ │ linux)  │
                         └─────────┘
                              │
                    ┌─────────┴─────────┐
                    │   Display Server?  │
                    └───────────────────┘
                         /         \
                    Wayland        X11
                       │            │
                       ▼            ▼
                 ┌──────────┐ ┌──────────┐
                 │ ydotool  │ │ xdotool  │
                 └──────────┘ └──────────┘

Sub-Skills Reference

<EXTREMELY-IMPORTANT> ### Tool Availability Gate

Verify tools are available BEFORE proceeding. Missing tools = FULL STOP.

Each sub-skill has its own availability gate. Load the appropriate skill and follow its gate. </EXTREMELY-IMPORTANT>

Browser Testing

| Skill | Use Case | Key Capabilities | |-------|----------|------------------| | skills/dev-test-chrome/SKILL.md (via cache lookup) | Debugging, console/network inspection | read_console_messages, read_network_requests, javascript_tool | | skills/dev-test-playwright/SKILL.md (via cache lookup) | CI/CD, headless, cross-browser E2E | Headless mode, Firefox/WebKit, test isolation |

Desktop Automation

| Skill | Platform | Primary Tool | |-------|----------|--------------| | skills/dev-test-hammerspoon/SKILL.md (via cache lookup) | macOS | Hammerspoon (hs) | | skills/dev-test-linux/SKILL.md (via cache lookup) | Linux | ydotool (Wayland) / xdotool (X11) |

Unit & Integration Tests

Test Discovery

Locate test directories and identify the test framework used in the project:

# Find test directory
ls -d tests/ test/ spec/ __tests__/ 2>/dev/null

# Find test framework
cat package.json 2>/dev/null | grep -E "(test|jest)"
cat pyproject.toml 2>/dev/null | grep -i pytest
cat Cargo.toml 2>/dev/null | grep -i "\[dev-dependencies\]"
cat meson.build 2>/dev/null | grep -i test

Common Test Frameworks

| Language | Framework | Command | |----------|-----------|---------| | Python | pytest | pytest tests/ -v | | JavaScript | jest | npm test | | TypeScript | vitest | npx vitest | | Rust | cargo | cargo test | | C/C++ | meson | meson test -C build -v | | Go | go test | go test ./... |

CLI Application Testing

Execute CLI applications with test inputs and verify outputs against expected results:

# Run with test inputs
./app --test-mode input.txt > output.txt

# Compare to expected
diff expected.txt output.txt

# Check exit code
./app --validate file && echo "PASS" || echo "FAIL"

Output Requirements

Every test run MUST be documented in LEARNINGS.md:

## Test Run: [Description]

**Tool:** [Chrome MCP / Playwright / Hammerspoon / ydotool / pytest / etc.]

**Command:**
```bash
pytest tests/ -v

Output:

tests/test_feature.py::test_basic PASSED
tests/test_feature.py::test_edge_case PASSED
tests/test_feature.py::test_error FAILED

1 failed, 2 passed

Result: 2/3 PASS, 1 FAIL

Next: Fix test_error failure


## Integration

For TDD protocol (RED-GREEN-REFACTOR), see:

Read `${CLAUDE_SKILL_DIR}/../../skills/dev-tdd/SKILL.md` and follow its instructions.

This skill is invoked by Task agents during `dev-implement` phase.

Where This Fits

Main Chat                          Task Agent
─────────────────────────────────────────────────────
/goal <condition> (set at phase entry; refires turns)
dev-implement (loads dev-tdd)
  → dev-delegate
    → Task agent ──────────────────→ uses dev-test (this skill)
                                       ↓ loads dev-tdd again
                                     has TDD protocol + gates
                                       → routes to specific tool

<EXTREMELY-IMPORTANT> ## Load TDD Enforcement (REQUIRED)

Before choosing testing tools, you MUST load the TDD skill to ensure gate compliance:

Read ${CLAUDE_SKILL_DIR}/../../skills/dev-tdd/SKILL.md and follow its instructions.

This loads:

Task reframing (your job is writing tests, not features)
The Execution Gate (6 mandatory gates before E2E testing)
GATE 5: READ LOGS (mandatory - cannot skip)
The Iron Law of TDD (test-first approach)

Read dev-tdd skill content now before selecting testing tools. </EXTREMELY-IMPORTANT>

This skill routes to the right testing tool. The loaded dev-tdd skill provides TDD protocol details.

The Iron Law
Browser Testing Decision Tree
Platform Detection
Sub-Skills Reference
Unit & Integration Tests

<EXTREMELY-IMPORTANT> ## The Iron Law of Testing

YOU MUST WRITE E2E TESTS FOR USER-FACING FEATURES. This is not negotiable.

When your changes affect what users see or interact with, you MUST:

Write an E2E test that simulates user behavior
Run it and verify it PASSES (not just unit tests)
Document: "E2E: [test name] passes with [evidence]"
Include screenshot/snapshot for visual changes

Unit tests prove components work. E2E tests prove YOUR feature works for users.

E2E Facts

Manual testing leaves no evidence and misses regressions — "it worked when I tried it" becomes an unverified claim the moment the code changes again. Capture the manual check as an automated test.
Backend-only changes still need E2E when they affect user-visible output; "this is just backend" classifies by where the code lives, not by what the user sees.
"The UI is unchanged" is an assumption until a visual snapshot proves it; asserting it without one is an unverified claim presented as fact.

Fake E2E Detection - STOP

If your "E2E test" does any of these, it's NOT E2E:

The test: If removing the actual implementation still passes your "E2E test", it's fake.

Example of fake E2E that caught nothing:

# FAKE E2E - only checks logs
def test_icon_theme_change():
    run_command("set-theme papirus")
    logs = read_logs()
    assert "papirus" in logs  # ❌ FAKE - only proves code ran
    # BUG: 89% of icons weren't changed, test still passed!

Real E2E that would have caught the bug:

# REAL E2E - verifies actual output
def test_icon_theme_change():
    run_command("set-theme papirus")
    screenshot = capture_desktop()
    assert visual_diff(screenshot, "expected_papirus.png") < threshold  # ✅ REAL
    # This would have shown 89% of icons were wrong

</EXTREMELY-IMPORTANT>

Browser Testing Decision Tree

┌─────────────────────────────────────────────────────────────────┐
│                    BROWSER TESTING REQUIRED?                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
        ┌─────────────────────────────────────────────┐
        │  Need to debug JS errors or API calls?       │
        │  (console.log, network requests, XHR)        │
        └─────────────────────────────────────────────┘
                    │                    │
                   YES                   NO
                    │                    │
                    ▼                    ▼
        ┌───────────────────┐  ┌──────────────────────────┐
        │   CHROME MCP      │  │  Running in CI/CD?        │
        │   (debugging)     │  │  (headless, automated)    │
        └───────────────────┘  └──────────────────────────┘
                                      │           │
                                     YES          NO
                                      │           │
                                      ▼           ▼
                        ┌──────────────┐  ┌───────────────────┐
                        │ PLAYWRIGHT   │  │ Cross-browser     │
                        │ MCP          │  │ needed?           │
                        └──────────────┘  └───────────────────┘
                                                │          │
                                               YES         NO
                                                │          │
                                                ▼          ▼
                                    ┌──────────────┐ ┌────────────┐
                                    │ PLAYWRIGHT   │ │ Either OK  │
                                    │ MCP          │ │ (prefer    │
                                    └──────────────┘ │ Playwright)│
                                                     └────────────┘

<EXTREMELY-IMPORTANT> ### Iron Laws: Browser MCP Selection

YOU MUST USE CHROME MCP FOR API/CONSOLE DEBUGGING. NO EXCEPTIONS. YOU MUST USE PLAYWRIGHT MCP FOR CI/CD TESTING. NO EXCEPTIONS.

Quick Decision Table

Capability Comparison

Browser MCP Facts

Playwright MCP cannot read console messages or network requests — debugging those requires Chrome MCP (read_console_messages, read_network_requests). Chrome MCP cannot run headless — CI/CD requires Playwright MCP. Choosing by familiarity instead of by these constraints produces a test that cannot observe what it claims to verify.
An API response inferred from code is not the real response — real data differs. Asserting behavior from inference instead of read_network_requests is an unverified claim presented as fact. </EXTREMELY-IMPORTANT>

Platform Detection

Detect the operating system and display server to select the appropriate testing tool:

# Detect platform for desktop automation
case "$(uname -s)" in
    Darwin) echo "macOS - use dev-test-hammerspoon" ;;
    Linux)
        if [ "$XDG_SESSION_TYPE" = "wayland" ]; then
            echo "Linux/Wayland - use dev-test-linux (ydotool)"
        else
            echo "Linux/X11 - use dev-test-linux (xdotool)"
        fi
        ;;
esac

Desktop Automation Decision Tree

┌─────────────────────────────────────────────────────────────────┐
│                   DESKTOP AUTOMATION REQUIRED?                   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │   Platform?      │
                    └─────────────────┘
                    /        |         \
                 macOS    Linux      Windows
                   │         │           │
                   ▼         ▼           ▼
        ┌──────────────┐ ┌─────────┐ ┌─────────┐
        │ HAMMERSPOON  │ │ LINUX   │ │ NOT     │
        │ (dev-test-   │ │ (dev-   │ │ SUPPORTED│
        │ hammerspoon) │ │ test-   │ └─────────┘
        └──────────────┘ │ linux)  │
                         └─────────┘
                              │
                    ┌─────────┴─────────┐
                    │   Display Server?  │
                    └───────────────────┘
                         /         \
                    Wayland        X11
                       │            │
                       ▼            ▼
                 ┌──────────┐ ┌──────────┐
                 │ ydotool  │ │ xdotool  │
                 └──────────┘ └──────────┘

Sub-Skills Reference

<EXTREMELY-IMPORTANT> ### Tool Availability Gate

Verify tools are available BEFORE proceeding. Missing tools = FULL STOP.

Each sub-skill has its own availability gate. Load the appropriate skill and follow its gate. </EXTREMELY-IMPORTANT>

Browser Testing

Desktop Automation

Unit & Integration Tests

Test Discovery

Locate test directories and identify the test framework used in the project:

# Find test directory
ls -d tests/ test/ spec/ __tests__/ 2>/dev/null

# Find test framework
cat package.json 2>/dev/null | grep -E "(test|jest)"
cat pyproject.toml 2>/dev/null | grep -i pytest
cat Cargo.toml 2>/dev/null | grep -i "\[dev-dependencies\]"
cat meson.build 2>/dev/null | grep -i test

Common Test Frameworks

CLI Application Testing

Execute CLI applications with test inputs and verify outputs against expected results:

# Run with test inputs
./app --test-mode input.txt > output.txt

# Compare to expected
diff expected.txt output.txt

# Check exit code
./app --validate file && echo "PASS" || echo "FAIL"

Output Requirements

Every test run MUST be documented in LEARNINGS.md:

## Test Run: [Description]

**Tool:** [Chrome MCP / Playwright / Hammerspoon / ydotool / pytest / etc.]

**Command:**
```bash
pytest tests/ -v

Output:

tests/test_feature.py::test_basic PASSED
tests/test_feature.py::test_edge_case PASSED
tests/test_feature.py::test_error FAILED

1 failed, 2 passed

Result: 2/3 PASS, 1 FAIL

Next: Fix test_error failure


## Integration

For TDD protocol (RED-GREEN-REFACTOR), see:

Read `${CLAUDE_SKILL_DIR}/../../skills/dev-tdd/SKILL.md` and follow its instructions.

This skill is invoked by Task agents during `dev-implement` phase.

Adoption

edwinhu/dev-test

$ install --global

Security Scan Results

SKILL.md

Where This Fits

Contents

E2E Facts

Fake E2E Detection - STOP

Browser Testing Decision Tree

Quick Decision Table

Capability Comparison

Browser MCP Facts

Platform Detection

Desktop Automation Decision Tree

Sub-Skills Reference

Browser Testing

Desktop Automation

Unit & Integration Tests

Test Discovery

Common Test Frameworks

CLI Application Testing

Output Requirements

Related Skills

edwinhu/npx-ownership-panel

edwinhu/crsp-v2

edwinhu/fuzzy-name-matching

edwinhu/ds-tables

edwinhu/dev-test

$ install --global

Security Scan Results

SKILL.md

Where This Fits

Contents

E2E Facts

Fake E2E Detection - STOP

Browser Testing Decision Tree

Quick Decision Table

Capability Comparison

Browser MCP Facts

Platform Detection

Desktop Automation Decision Tree

Sub-Skills Reference

Browser Testing

Desktop Automation

Unit & Integration Tests

Test Discovery

Common Test Frameworks

CLI Application Testing

Output Requirements

Related Skills

edwinhu/npx-ownership-panel

edwinhu/crsp-v2

edwinhu/fuzzy-name-matching

edwinhu/ds-tables