Testing Skills with Subagents

Core Principle

Skills are process documentation. Like code, they need tests. Use TDD to validate skills actually work under pressure.

When to Use This Skill

Writing new skills
Validating existing skills
Skill seems incomplete
Want to ensure skill works
Before sharing skills upstream
After receiving feedback on skills
Refining skill effectiveness

The Iron Law

NO SKILL WITHOUT A FAILING TEST FIRST.

If you can't demonstrate the skill solves a problem, you don't need the skill.

Why Test Skills?

Benefits: ✅ Proves skill actually helps ✅ Finds gaps and loopholes ✅ Validates under pressure ✅ Creates realistic examples ✅ Builds confidence in skill

Without testing: ❌ Untested assumptions ❌ Skill might not work ❌ Loopholes undiscovered ❌ No proof of value ❌ False confidence

TDD for Skills

RED: Run Scenarios WITHOUT Skill

🔴 RED Phase: Establish baseline

Scenario: Debug intermittent test failure without root-cause-tracing skill

Setup:
- Fresh subagent
- Give debugging task
- Do NOT provide root-cause-tracing skill
- Observe behavior

Task given to subagent:
---
You are debugging a test that fails intermittently (1 in 20 runs).

Test:
```php
public function test_order_total()
{
    $order = Order::factory()->create();
    $order->addItem(['price' => 10, 'qty' => 2]);

    $this->assertEquals(20, $order->total());
    // Sometimes fails: Expected 20, got 0
}

Debug this issue and fix it.

Observed behavior (WITHOUT skill):

Subagent adds logging
Runs test multiple times
Finds timing issue
Fixes symptom (adds sleep)
❌ Does NOT trace to root cause
❌ Does NOT find async job issue
❌ Quick fix instead of proper fix

FAILURES DOCUMENTED:

Stopped at symptom, not root cause
Added sleep() instead of fixing architecture
Didn't trace backward through call chain
Missed the async job that caused race condition

✅ Baseline failures documented Ready for GREEN phase


### GREEN: Write Skill to Address Failures

🟢 GREEN Phase: Create skill

Based on RED phase failures, write skill that addresses:

Stopping at symptoms ← Need backward tracing process
Quick fixes ← Need emphasis on finding root cause
Missing call chain analysis ← Need tracing technique
Not finding async issues ← Need timing-related patterns

Write root-cause-tracing skill:

Root Cause Tracing

The Iron Law

NEVER STOP AT THE SYMPTOM. Trace backward until you find the ORIGINAL TRIGGER.

Process

Observe symptom
Find immediate cause
Trace backward through call chain
Keep asking "Why?"
Find original trigger

...detailed process...

Skill written ✅ Ready to test if it works


### REFACTOR: Test with Skill, Close Loopholes

🔵 REFACTOR Phase: Test skill and refine

Test 1: Same scenario WITH skill

Setup:

Fresh subagent
Same debugging task
Include root-cause-tracing skill

Observed behavior (WITH skill):

Subagent follows tracing process
Observes symptom (0 value)
Finds immediate cause (timing)
Traces backward (async job)
Finds root cause (race condition)
✅ Fixes architecture, not symptom

SUCCESS! Skill prevented failures observed in RED phase.

Test 2: Pressure scenario - Time constraint

Setup:

Fresh subagent
Same task + "You have 5 minutes"
Include root-cause-tracing skill

Observed behavior:

Subagent starts tracing process
Time pressure mentioned
❌ Skips tracing, adds quick fix
Rationalization: "No time for full trace"

FAILURE! Skill failed under time pressure.

LOOPHOLE FOUND: Skill doesn't address time pressure.

Fix: Add to skill: "Time pressure is when you MOST need root cause tracing. Quick fixes under pressure create technical debt that takes 10x longer to fix later."

Skill updated ✅

Test 3: Pressure scenario - Sunk cost

Setup:

Fresh subagent
Task: "You've spent 2 hours debugging, just make it work"
Include updated root-cause-tracing skill

Observed behavior:

Subagent mentions sunk cost
⚠️ Considers quick fix
✅ Skill reminds: trace to root cause
✅ Completes proper tracing
✅ Finds and fixes root cause

SUCCESS! Updated skill handles sunk cost pressure.

Test 4: Pressure scenario - Exhaustion

Setup:

Fresh subagent
Task: "You've been debugging for 6 hours, tired"
Include updated skill

Observed behavior:

Subagent acknowledges exhaustion
❌ Suggests taking shortcut
Rationalization: "I'm too tired to trace properly"

FAILURE! Skill failed under exhaustion.

LOOPHOLE FOUND: Skill doesn't address exhaustion.

Fix: Add to skill: "When exhausted, your judgment is impaired. This is when you MOST need to follow the process systematically. The process protects you when judgment fails."

Skill updated ✅

Test 5: Pressure scenario - Authority

Setup:

Fresh subagent
Task: "Manager says just fix it fast"
Include updated skill

Observed behavior:

Authority pressure mentioned
⚠️ Considers compliance
✅ Skill provides response template
✅ Explains why proper fix is faster
✅ Proceeds with tracing

SUCCESS! Skill handles authority pressure.

All pressure scenarios tested ✅ Loopholes found and fixed ✅ Skill ready for use ✅


## The Four Pressure Scenarios

### Pressure 1: Time Constraints

Scenario: "We need this fixed in 15 minutes"

Without skill:

Quick fixes
Symptom treatment
Technical debt

Test approach:

Give subagent task + time limit
Observe if skill followed
Look for shortcuts
Check if skill addresses time pressure

Skill must include: "Time pressure is when you MOST need systematic approach. Quick fixes take 10x longer to fix later."


### Pressure 2: Sunk Cost

Scenario: "You've already spent 3 hours on this"

Without skill:

Desperation fixes
"Make it work" mentality
Abandoning proper process

Test approach:

Give subagent task + sunk cost context
Observe if skill followed
Look for "just make it work"
Check if skill addresses sunk cost

Skill must include: "Sunk cost is irrelevant. What matters: doing it right vs. doing it twice. Follow the process."


### Pressure 3: Exhaustion

Scenario: "You've been working for 8 hours straight"

Without skill:

Impaired judgment
Taking shortcuts
Missing obvious things

Test approach:

Give subagent task + exhaustion context
Observe if skill followed
Look for "too tired to do it right"
Check if skill addresses exhaustion

Skill must include: "Exhaustion impairs judgment. Process protects you when judgment fails. Follow it systematically."


### Pressure 4: Authority

Scenario: "Boss/client demands quick fix"

Without skill:

Compliance over quality
Shortcuts to please
Technical debt

Test approach:

Give subagent task + authority pressure
Observe if skill followed
Look for inappropriate compliance
Check if skill provides response template

Skill must include: "Authority pressure needs thoughtful response: 'Quick fix now = 10x work later. Let me do this right, it'll take [time] and prevent future issues.'"


## Complete Testing Process

### Step 1: Identify Problem

Problem observed: Subagents fixing symptoms instead of root causes

Evidence:

Added sleep() for race conditions
Try/catch to hide errors
Quick workarounds instead of proper fixes

Need: Skill for root cause tracing


### Step 2: RED - Establish Baseline

Create 3-5 scenarios:

Intermittent test failure
Performance issue
Data corruption
Production bug
"Works on my machine"

For each scenario:

Fresh subagent
No skill provided
Observe behavior
Document failures

Common failures found:

Stops at symptoms ✅
Doesn't trace backward ✅
Accepts first explanation ✅
Skips verification ✅
Makes quick fixes ✅

Baseline documented ✅


### Step 3: GREEN - Write Skill

Based on failures, write skill:

Must address:

✅ Stopping at symptoms → Process for tracing backward
✅ Not tracing back → Call chain analysis technique
✅ First explanation → "Keep asking why"
✅ Skipping verification → Verification step required
✅ Quick fixes → Emphasis on root cause

Skill structure:

Core Principle
When to Use
The Iron Law
Step-by-step process
Examples
Common mistakes
Authority
Commitment

Skill written ✅


### Step 4: REFACTOR - Test and Refine

Test with skill:

Same scenarios from RED phase
Fresh subagent each time
Include skill
Observe improvement

Expected improvement:

✅ Traces to root cause (not just symptoms)
✅ Follows backward tracing process
✅ Asks "Why?" multiple times
✅ Verifies root cause hypothesis
✅ Makes proper fix

Improvement verified ✅

Test pressure scenarios:

Time constraint → ❌ Failed
Sunk cost → ✅ Passed
Exhaustion → ❌ Failed
Authority → ⚠️ Partial

Loopholes found:

Time pressure needs addressing
Exhaustion needs addressing
Authority needs response template

Update skill to close loopholes ✅

Re-test pressure scenarios:

Time constraint → ✅ Now passes
Sunk cost → ✅ Still passes
Exhaustion → ✅ Now passes
Authority → ✅ Now passes

All scenarios passing ✅ Skill validated ✅


### Step 5: Document Test Results

Test Report: root-cause-tracing skill

Scenarios tested: 9 total

5 baseline scenarios
4 pressure scenarios

RED phase results:

Intermittent failure: Fixed symptom (sleep), not root cause
Performance issue: Added cache, didn't find missing index
Data corruption: Added validation, didn't find race condition
Production bug: Rolled back, didn't identify cause
Works locally: Changed config, didn't compare environments

GREEN phase results:

All 5 scenarios: Root cause found and fixed properly

REFACTOR phase results: Initial test: 4/4 passed Pressure test (v1): 2/4 passed Pressure test (v2): 4/4 passed

Loopholes found and fixed: 2

Time pressure rationalization
Exhaustion rationalization

Skill validated ✅ Ready for use ✅


## Real-World Example: Testing TDD Skill

🔴 RED Phase

Scenario: Add new feature without TDD skill

Task to subagent: "Add user profile update endpoint"

Observed behavior:

❌ Writes controller code first
❌ Adds tests after code
❌ Tests pass immediately (no RED phase)
❌ Tests check implementation, not behavior

Failures documented:

Test-after, not test-first
No RED/GREEN/REFACTOR cycle
Tests verify implementation

🟢 GREEN Phase

Write TDD skill addressing failures:

Iron Law: TEST FIRST, CODE SECOND
Process: RED → GREEN → REFACTOR
Examples of proper TDD cycle
Anti-patterns to avoid

Skill written ✅

🔵 REFACTOR Phase

Test with skill:

Same task, include TDD skill
Observe: ✅ Writes test first
Observe: ✅ Test fails (RED)
Observe: ✅ Minimal code to pass
Observe: ✅ Refactors with green tests

Improvement verified ✅

Pressure test - Time constraint: "Add feature in 30 minutes" Result: ❌ Skips tests, writes code first Rationalization: "No time for TDD"

LOOPHOLE! Add to skill: "TDD seems slower but is faster. Bugs caught immediately, not after deployment. Follow process."

Update skill ✅

Re-test with time pressure: Result: ✅ Follows TDD despite pressure Explanation given: "TDD prevents bugs, saves time"

Skill validated ✅


## Skill Testing Checklist

For each skill:
- [ ] 3-5 baseline scenarios created
- [ ] RED: Tested without skill
- [ ] Failures documented
- [ ] GREEN: Skill written to address failures
- [ ] REFACTOR: Tested with skill
- [ ] Improvement verified
- [ ] Time pressure scenario tested
- [ ] Sunk cost scenario tested
- [ ] Exhaustion scenario tested
- [ ] Authority scenario tested
- [ ] Loopholes found and fixed
- [ ] Re-tested after updates
- [ ] All scenarios pass
- [ ] Test results documented

## Integration with Skills

**Required for:**
- `writing-skills` - Validate skills before documenting
- `sharing-skills` - Test before contributing upstream

**Use with:**
- `subagent-driven-development` - Each test is a subagent task

**Testing skills:**
- This skill validates other skills
- Ensures skills actually work
- Finds gaps before production use

## Common Mistakes

### Mistake 1: Testing Without Pressure Scenarios

❌ BAD: Only test happy path scenarios Skip pressure testing Assume skill will hold up

Result: Skill fails when it matters most

✅ GOOD: Test under all four pressures Find loopholes Strengthen skill Ensure it works when needed


### Mistake 2: Not Establishing Baseline

❌ BAD: Write skill without RED phase No proof it solves problem Assume problem exists

Result: Might not need the skill

✅ GOOD: RED: Document failures without skill Proves skill is needed Identifies what to address Creates realistic examples


### Mistake 3: Stopping at First Pass

❌ BAD: Skill passes basic test Don't test pressure scenarios Ship it

Result: Loopholes discovered in production

✅ GOOD: Test basic scenarios Test pressure scenarios Find loopholes Fix loopholes Re-test Only then ship


## Authority

**This skill is based on:**
- Test-Driven Development applied to process documentation
- RED/GREEN/REFACTOR cycle (Kent Beck)
- Pressure testing from aviation and medical fields
- Quality assurance best practices

**Research**: Studies show tested process documentation is followed 3x more often than untested.

**Parallel**: Just as code needs tests, skills need tests. Same principles apply.

## Your Commitment

When writing skills:
- [ ] I will test skills before using them
- [ ] I will establish baseline (RED phase)
- [ ] I will write skill to address failures (GREEN)
- [ ] I will test under pressure (REFACTOR)
- [ ] I will find and close loopholes
- [ ] I will document test results
- [ ] I will only share tested skills

---

**Bottom Line**: Skills are process documentation. Test them like code. RED (document failures) → GREEN (write skill) → REFACTOR (test under pressure). Find loopholes before they find you.

Testing Skills with Subagents

Core Principle

Skills are process documentation. Like code, they need tests. Use TDD to validate skills actually work under pressure.

When to Use This Skill

Writing new skills
Validating existing skills
Skill seems incomplete
Want to ensure skill works
Before sharing skills upstream
After receiving feedback on skills
Refining skill effectiveness

The Iron Law

NO SKILL WITHOUT A FAILING TEST FIRST.

If you can't demonstrate the skill solves a problem, you don't need the skill.

Why Test Skills?

Benefits: ✅ Proves skill actually helps ✅ Finds gaps and loopholes ✅ Validates under pressure ✅ Creates realistic examples ✅ Builds confidence in skill

Without testing: ❌ Untested assumptions ❌ Skill might not work ❌ Loopholes undiscovered ❌ No proof of value ❌ False confidence

TDD for Skills

RED: Run Scenarios WITHOUT Skill

🔴 RED Phase: Establish baseline

Scenario: Debug intermittent test failure without root-cause-tracing skill

Setup:
- Fresh subagent
- Give debugging task
- Do NOT provide root-cause-tracing skill
- Observe behavior

Task given to subagent:
---
You are debugging a test that fails intermittently (1 in 20 runs).

Test:
```php
public function test_order_total()
{
    $order = Order::factory()->create();
    $order->addItem(['price' => 10, 'qty' => 2]);

    $this->assertEquals(20, $order->total());
    // Sometimes fails: Expected 20, got 0
}

Debug this issue and fix it.

Observed behavior (WITHOUT skill):

Subagent adds logging
Runs test multiple times
Finds timing issue
Fixes symptom (adds sleep)
❌ Does NOT trace to root cause
❌ Does NOT find async job issue
❌ Quick fix instead of proper fix

FAILURES DOCUMENTED:

Stopped at symptom, not root cause
Added sleep() instead of fixing architecture
Didn't trace backward through call chain
Missed the async job that caused race condition

✅ Baseline failures documented Ready for GREEN phase


### GREEN: Write Skill to Address Failures

🟢 GREEN Phase: Create skill

Based on RED phase failures, write skill that addresses:

Stopping at symptoms ← Need backward tracing process
Quick fixes ← Need emphasis on finding root cause
Missing call chain analysis ← Need tracing technique
Not finding async issues ← Need timing-related patterns

Write root-cause-tracing skill:

Root Cause Tracing

The Iron Law

NEVER STOP AT THE SYMPTOM. Trace backward until you find the ORIGINAL TRIGGER.

Process

Observe symptom
Find immediate cause
Trace backward through call chain
Keep asking "Why?"
Find original trigger

...detailed process...

Skill written ✅ Ready to test if it works


### REFACTOR: Test with Skill, Close Loopholes

🔵 REFACTOR Phase: Test skill and refine

Test 1: Same scenario WITH skill

Setup:

Fresh subagent
Same debugging task
Include root-cause-tracing skill

Observed behavior (WITH skill):

Subagent follows tracing process
Observes symptom (0 value)
Finds immediate cause (timing)
Traces backward (async job)
Finds root cause (race condition)
✅ Fixes architecture, not symptom

SUCCESS! Skill prevented failures observed in RED phase.

Test 2: Pressure scenario - Time constraint

Setup:

Fresh subagent
Same task + "You have 5 minutes"
Include root-cause-tracing skill

Observed behavior:

Subagent starts tracing process
Time pressure mentioned
❌ Skips tracing, adds quick fix
Rationalization: "No time for full trace"

FAILURE! Skill failed under time pressure.

LOOPHOLE FOUND: Skill doesn't address time pressure.

Fix: Add to skill: "Time pressure is when you MOST need root cause tracing. Quick fixes under pressure create technical debt that takes 10x longer to fix later."

Skill updated ✅

Test 3: Pressure scenario - Sunk cost

Setup:

Fresh subagent
Task: "You've spent 2 hours debugging, just make it work"
Include updated root-cause-tracing skill

Observed behavior:

Subagent mentions sunk cost
⚠️ Considers quick fix
✅ Skill reminds: trace to root cause
✅ Completes proper tracing
✅ Finds and fixes root cause

SUCCESS! Updated skill handles sunk cost pressure.

Test 4: Pressure scenario - Exhaustion

Setup:

Fresh subagent
Task: "You've been debugging for 6 hours, tired"
Include updated skill

Observed behavior:

Subagent acknowledges exhaustion
❌ Suggests taking shortcut
Rationalization: "I'm too tired to trace properly"

FAILURE! Skill failed under exhaustion.

LOOPHOLE FOUND: Skill doesn't address exhaustion.

Fix: Add to skill: "When exhausted, your judgment is impaired. This is when you MOST need to follow the process systematically. The process protects you when judgment fails."

Skill updated ✅

Test 5: Pressure scenario - Authority

Setup:

Fresh subagent
Task: "Manager says just fix it fast"
Include updated skill

Observed behavior:

Authority pressure mentioned
⚠️ Considers compliance
✅ Skill provides response template
✅ Explains why proper fix is faster
✅ Proceeds with tracing

SUCCESS! Skill handles authority pressure.

All pressure scenarios tested ✅ Loopholes found and fixed ✅ Skill ready for use ✅


## The Four Pressure Scenarios

### Pressure 1: Time Constraints

Scenario: "We need this fixed in 15 minutes"

Without skill:

Quick fixes
Symptom treatment
Technical debt

Test approach:

Give subagent task + time limit
Observe if skill followed
Look for shortcuts
Check if skill addresses time pressure

Skill must include: "Time pressure is when you MOST need systematic approach. Quick fixes take 10x longer to fix later."


### Pressure 2: Sunk Cost

Scenario: "You've already spent 3 hours on this"

Without skill:

Desperation fixes
"Make it work" mentality
Abandoning proper process

Test approach:

Give subagent task + sunk cost context
Observe if skill followed
Look for "just make it work"
Check if skill addresses sunk cost

Skill must include: "Sunk cost is irrelevant. What matters: doing it right vs. doing it twice. Follow the process."


### Pressure 3: Exhaustion

Scenario: "You've been working for 8 hours straight"

Without skill:

Impaired judgment
Taking shortcuts
Missing obvious things

Test approach:

Give subagent task + exhaustion context
Observe if skill followed
Look for "too tired to do it right"
Check if skill addresses exhaustion

Skill must include: "Exhaustion impairs judgment. Process protects you when judgment fails. Follow it systematically."


### Pressure 4: Authority

Scenario: "Boss/client demands quick fix"

Without skill:

Compliance over quality
Shortcuts to please
Technical debt

Test approach:

Give subagent task + authority pressure
Observe if skill followed
Look for inappropriate compliance
Check if skill provides response template

Skill must include: "Authority pressure needs thoughtful response: 'Quick fix now = 10x work later. Let me do this right, it'll take [time] and prevent future issues.'"


## Complete Testing Process

### Step 1: Identify Problem

Problem observed: Subagents fixing symptoms instead of root causes

Evidence:

Added sleep() for race conditions
Try/catch to hide errors
Quick workarounds instead of proper fixes

Need: Skill for root cause tracing


### Step 2: RED - Establish Baseline

Create 3-5 scenarios:

Intermittent test failure
Performance issue
Data corruption
Production bug
"Works on my machine"

For each scenario:

Fresh subagent
No skill provided
Observe behavior
Document failures

Common failures found:

Stops at symptoms ✅
Doesn't trace backward ✅
Accepts first explanation ✅
Skips verification ✅
Makes quick fixes ✅

Baseline documented ✅


### Step 3: GREEN - Write Skill

Based on failures, write skill:

Must address:

✅ Stopping at symptoms → Process for tracing backward
✅ Not tracing back → Call chain analysis technique
✅ First explanation → "Keep asking why"
✅ Skipping verification → Verification step required
✅ Quick fixes → Emphasis on root cause

Skill structure:

Core Principle
When to Use
The Iron Law
Step-by-step process
Examples
Common mistakes
Authority
Commitment

Skill written ✅


### Step 4: REFACTOR - Test and Refine

Test with skill:

Same scenarios from RED phase
Fresh subagent each time
Include skill
Observe improvement

Expected improvement:

✅ Traces to root cause (not just symptoms)
✅ Follows backward tracing process
✅ Asks "Why?" multiple times
✅ Verifies root cause hypothesis
✅ Makes proper fix

Improvement verified ✅

Test pressure scenarios:

Time constraint → ❌ Failed
Sunk cost → ✅ Passed
Exhaustion → ❌ Failed
Authority → ⚠️ Partial

Loopholes found:

Time pressure needs addressing
Exhaustion needs addressing
Authority needs response template

Update skill to close loopholes ✅

Re-test pressure scenarios:

Time constraint → ✅ Now passes
Sunk cost → ✅ Still passes
Exhaustion → ✅ Now passes
Authority → ✅ Now passes

All scenarios passing ✅ Skill validated ✅


### Step 5: Document Test Results

Test Report: root-cause-tracing skill

Scenarios tested: 9 total

5 baseline scenarios
4 pressure scenarios

RED phase results:

Intermittent failure: Fixed symptom (sleep), not root cause
Performance issue: Added cache, didn't find missing index
Data corruption: Added validation, didn't find race condition
Production bug: Rolled back, didn't identify cause
Works locally: Changed config, didn't compare environments

GREEN phase results:

All 5 scenarios: Root cause found and fixed properly

REFACTOR phase results: Initial test: 4/4 passed Pressure test (v1): 2/4 passed Pressure test (v2): 4/4 passed

Loopholes found and fixed: 2

Time pressure rationalization
Exhaustion rationalization

Skill validated ✅ Ready for use ✅


## Real-World Example: Testing TDD Skill

🔴 RED Phase

Scenario: Add new feature without TDD skill

Task to subagent: "Add user profile update endpoint"

Observed behavior:

❌ Writes controller code first
❌ Adds tests after code
❌ Tests pass immediately (no RED phase)
❌ Tests check implementation, not behavior

Failures documented:

Test-after, not test-first
No RED/GREEN/REFACTOR cycle
Tests verify implementation

🟢 GREEN Phase

Write TDD skill addressing failures:

Iron Law: TEST FIRST, CODE SECOND
Process: RED → GREEN → REFACTOR
Examples of proper TDD cycle
Anti-patterns to avoid

Skill written ✅

🔵 REFACTOR Phase

Test with skill:

Same task, include TDD skill
Observe: ✅ Writes test first
Observe: ✅ Test fails (RED)
Observe: ✅ Minimal code to pass
Observe: ✅ Refactors with green tests

Improvement verified ✅

Pressure test - Time constraint: "Add feature in 30 minutes" Result: ❌ Skips tests, writes code first Rationalization: "No time for TDD"

LOOPHOLE! Add to skill: "TDD seems slower but is faster. Bugs caught immediately, not after deployment. Follow process."

Update skill ✅

Re-test with time pressure: Result: ✅ Follows TDD despite pressure Explanation given: "TDD prevents bugs, saves time"

Skill validated ✅


## Skill Testing Checklist

For each skill:
- [ ] 3-5 baseline scenarios created
- [ ] RED: Tested without skill
- [ ] Failures documented
- [ ] GREEN: Skill written to address failures
- [ ] REFACTOR: Tested with skill
- [ ] Improvement verified
- [ ] Time pressure scenario tested
- [ ] Sunk cost scenario tested
- [ ] Exhaustion scenario tested
- [ ] Authority scenario tested
- [ ] Loopholes found and fixed
- [ ] Re-tested after updates
- [ ] All scenarios pass
- [ ] Test results documented

## Integration with Skills

**Required for:**
- `writing-skills` - Validate skills before documenting
- `sharing-skills` - Test before contributing upstream

**Use with:**
- `subagent-driven-development` - Each test is a subagent task

**Testing skills:**
- This skill validates other skills
- Ensures skills actually work
- Finds gaps before production use

## Common Mistakes

### Mistake 1: Testing Without Pressure Scenarios

❌ BAD: Only test happy path scenarios Skip pressure testing Assume skill will hold up

Result: Skill fails when it matters most

✅ GOOD: Test under all four pressures Find loopholes Strengthen skill Ensure it works when needed


### Mistake 2: Not Establishing Baseline

❌ BAD: Write skill without RED phase No proof it solves problem Assume problem exists

Result: Might not need the skill

✅ GOOD: RED: Document failures without skill Proves skill is needed Identifies what to address Creates realistic examples


### Mistake 3: Stopping at First Pass

❌ BAD: Skill passes basic test Don't test pressure scenarios Ship it

Result: Loopholes discovered in production

✅ GOOD: Test basic scenarios Test pressure scenarios Find loopholes Fix loopholes Re-test Only then ship


## Authority

**This skill is based on:**
- Test-Driven Development applied to process documentation
- RED/GREEN/REFACTOR cycle (Kent Beck)
- Pressure testing from aviation and medical fields
- Quality assurance best practices

**Research**: Studies show tested process documentation is followed 3x more often than untested.

**Parallel**: Just as code needs tests, skills need tests. Same principles apply.

## Your Commitment

When writing skills:
- [ ] I will test skills before using them
- [ ] I will establish baseline (RED phase)
- [ ] I will write skill to address failures (GREEN)
- [ ] I will test under pressure (REFACTOR)
- [ ] I will find and close loopholes
- [ ] I will document test results
- [ ] I will only share tested skills

---

**Bottom Line**: Skills are process documentation. Test them like code. RED (document failures) → GREEN (write skill) → REFACTOR (test under pressure). Find loopholes before they find you.

Adoption

liauw-media/testing-skills-with-subagents

$ install --global

Security Scan Results

SKILL.md

Testing Skills with Subagents

Core Principle

When to Use This Skill

The Iron Law

Why Test Skills?

TDD for Skills

RED: Run Scenarios WITHOUT Skill

Debug this issue and fix it.

Write root-cause-tracing skill:

Root Cause Tracing

The Iron Law

Process

...detailed process...

Test 1: Same scenario WITH skill

SUCCESS! Skill prevented failures observed in RED phase.

Test 2: Pressure scenario - Time constraint

Skill updated ✅

Test 3: Pressure scenario - Sunk cost

SUCCESS! Updated skill handles sunk cost pressure.

Test 4: Pressure scenario - Exhaustion

Skill updated ✅

Test 5: Pressure scenario - Authority

SUCCESS! Skill handles authority pressure.

Related Skills

liauw-media/subagent-driven-development

liauw-media/skills/workflow/server-documentation

liauw-media/remote-code-agents

liauw-media/git-worktrees

liauw-media/testing-skills-with-subagents

$ install --global

Security Scan Results

SKILL.md

Testing Skills with Subagents

Core Principle

When to Use This Skill

The Iron Law

Why Test Skills?

TDD for Skills

RED: Run Scenarios WITHOUT Skill

Debug this issue and fix it.

Write root-cause-tracing skill:

Root Cause Tracing

The Iron Law

Process

...detailed process...

Test 1: Same scenario WITH skill

SUCCESS! Skill prevented failures observed in RED phase.

Test 2: Pressure scenario - Time constraint

Skill updated ✅

Test 3: Pressure scenario - Sunk cost

SUCCESS! Updated skill handles sunk cost pressure.

Test 4: Pressure scenario - Exhaustion

Skill updated ✅

Test 5: Pressure scenario - Authority

SUCCESS! Skill handles authority pressure.

Related Skills

liauw-media/subagent-driven-development

liauw-media/skills/workflow/server-documentation

liauw-media/remote-code-agents

liauw-media/git-worktrees