Skill access pattern (post-kernel-pivot, 2026.5+)

Skill names referenced in this document are AIWG skills, not slash commands. Most are not kernel-listed and cannot be invoked as /skill-name by the platform. Reach them via:
aiwg discover "<capability>"
aiwg show skill <name>
Only kernel-listed skills (aiwg-doctor, aiwg-refresh, aiwg-status, aiwg-help, use, steward) are directly invokable as slash commands. See skill-discovery rule.

Flaky Detect Skill

Purpose

Identify flaky tests (tests that pass and fail non-deterministically) by analyzing CI history, execution patterns, and test characteristics. Google research shows 4.56% of tests are flaky, costing millions in developer productivity.

Research Foundation

| Finding | Source | Reference | |---------|--------|-----------| | 4.56% flaky rate | Google (2016) | Flaky Tests at Google | | ML Classification | FlaKat (2024) | arXiv:2403.01003 - 85%+ accuracy | | LLM Auto-repair | FlakyFix (2023) | arXiv:2307.00012 | | Flaky Taxonomy | Luo et al. (2014) | "An Empirical Analysis of Flaky Tests" |

When This Skill Applies

User reports "tests sometimes fail" or "intermittent failures"
CI has been unstable or unreliable
User wants to audit test suite reliability
Pre-release quality assessment
Debugging non-deterministic behavior

Trigger Phrases

| Natural Language | Action | |------------------|--------| | "Find flaky tests" | Analyze CI history for flaky patterns | | "Why does CI keep failing?" | Identify flaky tests causing failures | | "Test suite is unreliable" | Full flaky test audit | | "This test sometimes passes" | Analyze specific test for flakiness | | "Audit test reliability" | Comprehensive flaky detection | | "Quarantine flaky tests" | Identify and isolate flaky tests |

Flaky Test Taxonomy (Google Research)

| Category | Percentage | Root Causes | |----------|------------|-------------| | Async/Timing | 45% | Race conditions, insufficient waits, timeouts | | Test Order | 20% | Shared state, execution order dependencies | | Environment | 15% | File system, network, configuration differences | | Resource Limits | 10% | Memory, threads, connection pools | | Non-deterministic | 10% | Random values, timestamps, UUIDs |

Detection Methods

1. CI History Analysis

Parse GitHub Actions / CI logs to find inconsistent results:

def analyze_ci_history(repo, days=30):
    """Analyze CI runs for flaky patterns"""
    runs = get_ci_runs(repo, days)
    test_results = {}

    for run in runs:
        for test in run.tests:
            if test.name not in test_results:
                test_results[test.name] = {"pass": 0, "fail": 0}

            if test.passed:
                test_results[test.name]["pass"] += 1
            else:
                test_results[test.name]["fail"] += 1

    # Identify flaky tests (pass rate between 5% and 95%)
    flaky = []
    for test, results in test_results.items():
        total = results["pass"] + results["fail"]
        if total >= 5:  # Enough data
            pass_rate = results["pass"] / total
            if 0.05 < pass_rate < 0.95:
                flaky.append({
                    "test": test,
                    "pass_rate": pass_rate,
                    "total_runs": total
                })

    return sorted(flaky, key=lambda x: x["pass_rate"])

2. Code Pattern Analysis

Scan test code for flaky patterns:

FLAKY_PATTERNS = [
    # Timing issues
    (r'setTimeout|sleep|delay', "timing", "Uses explicit delays"),
    (r'Date\.now\(\)|new Date\(\)', "timing", "Uses current time"),

    # Async issues
    (r'\.then\([^)]*\)(?!.*await)', "async", "Promise without await"),
    (r'async.*(?!await)', "async", "Async without await"),

    # Order dependencies
    (r'Math\.random\(\)', "random", "Uses random values"),
    (r'uuid|nanoid', "random", "Uses generated IDs"),

    # Environment
    (r'process\.env', "environment", "Environment-dependent"),
    (r'fs\.(read|write)', "environment", "File system access"),
    (r'fetch\(|axios\.|http\.', "network", "Network calls"),
]

def scan_for_flaky_patterns(test_file):
    """Scan test file for flaky patterns"""
    content = read_file(test_file)
    matches = []

    for pattern, category, description in FLAKY_PATTERNS:
        if re.search(pattern, content):
            matches.append({
                "category": category,
                "description": description,
                "pattern": pattern
            })

    return matches

3. Re-run Analysis

Run tests multiple times to detect flakiness:

# Run tests 10 times, track results
for i in {1..10}; do
  npm test -- --reporter=json >> test-results.jsonl
done

# Analyze for inconsistency
python analyze_reruns.py test-results.jsonl

Output Format

## Flaky Test Report

**Analysis Period**: Last 30 days
**Total Tests**: 450
**Flaky Tests Found**: 12 (2.7%)

### Critical Flaky Tests (< 50% pass rate)

#### 1. `test/api/login.test.ts:45`
**Pass Rate**: 42% (21/50 runs)
**Category**: Timing
**Pattern**: Uses `Date.now()` for token expiry

```typescript
// Flaky code
it('should expire token after 1 hour', () => {
  const token = createToken();
  const expiry = Date.now() + 3600000;  // Flaky!
  expect(token.expiresAt).toBe(expiry);
});

Root Cause: Test creates token and checks expiry in same millisecond sometimes, different millisecond other times.

Recommended Fix: Use mocked time

it('should expire token after 1 hour', () => {
  vi.setSystemTime(new Date('2024-01-01T00:00:00Z'));
  const token = createToken();
  expect(token.expiresAt).toBe(new Date('2024-01-01T01:00:00Z').getTime());
  vi.useRealTimers();
});

High Flaky Tests (50-80% pass rate)

2. `test/db/connection.test.ts:23`

Pass Rate: 68% (34/50 runs) Category: Resource Pattern: Connection pool exhaustion

[... more tests ...]

Summary by Category

| Category | Count | Impact | |----------|-------|--------| | Timing | 5 | HIGH | | Async | 3 | HIGH | | Environment | 2 | MEDIUM | | Order | 1 | MEDIUM | | Network | 1 | LOW |

Recommendations

Quick Win: Fix 5 timing tests with vi.setSystemTime() (+0.5% stability)
Medium Effort: Add proper async handling (+0.3% stability)
Infrastructure: Add test isolation for DB tests (+0.2% stability)

Quarantine Candidates

These tests should be skipped in CI until fixed:

// vitest.config.ts
export default {
  test: {
    exclude: [
      'test/api/login.test.ts',       // Timing flaky
      'test/db/connection.test.ts',   // Resource flaky
    ]
  }
}

Note: Track quarantined tests in .aiwg/testing/flaky-quarantine.md


## Quarantine Process

### 1. Identify

```bash
# Run flaky detection
python scripts/flaky_detect.py --ci-history 30 --threshold 95

2. Quarantine

// Mark test as flaky
describe.skip('flaky: login expiry', () => {
  // FLAKY: https://github.com/org/repo/issues/123
  // Root cause: timing-dependent
  // Fix in progress: PR #456
});

3. Track

Create tracking issue:

## Flaky Test: test/api/login.test.ts:45

- **Pass Rate**: 42%
- **Category**: Timing
- **Root Cause**: Uses real system time
- **Quarantined**: 2024-12-12
- **Fix PR**: #456
- **Target Unquarantine**: 2024-12-15

4. Fix and Unquarantine

After fix:

# Verify fix with multiple runs
for i in {1..20}; do npm test -- test/api/login.test.ts; done

# Remove from quarantine if all pass

Integration Points

Works with flaky-fix skill for automated repairs
Reports to CI dashboard
Feeds into flow-gate-check for release decisions
Tracks in .aiwg/testing/flaky-registry.md

Script Reference

flaky_detect.py

Analyze CI history for flaky tests:

python scripts/flaky_detect.py --repo owner/repo --days 30

flaky_scanner.py

Scan code for flaky patterns:

python scripts/flaky_scanner.py --target test/

References

@$AIWG_ROOT/agentic/code/addons/testing-quality/README.md — Testing quality addon overview
@$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Research-first approach for root cause analysis
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/README.md — SDLC framework context for quality gates
@$AIWG_ROOT/docs/cli-reference.md — CLI reference

Skill access pattern (post-kernel-pivot, 2026.5+)

Skill names referenced in this document are AIWG skills, not slash commands. Most are not kernel-listed and cannot be invoked as /skill-name by the platform. Reach them via:
aiwg discover "<capability>"
aiwg show skill <name>
Only kernel-listed skills (aiwg-doctor, aiwg-refresh, aiwg-status, aiwg-help, use, steward) are directly invokable as slash commands. See skill-discovery rule.

Flaky Detect Skill

Purpose

Research Foundation

When This Skill Applies

User reports "tests sometimes fail" or "intermittent failures"
CI has been unstable or unreliable
User wants to audit test suite reliability
Pre-release quality assessment
Debugging non-deterministic behavior

Trigger Phrases

Flaky Test Taxonomy (Google Research)

Detection Methods

1. CI History Analysis

Parse GitHub Actions / CI logs to find inconsistent results:

def analyze_ci_history(repo, days=30):
    """Analyze CI runs for flaky patterns"""
    runs = get_ci_runs(repo, days)
    test_results = {}

    for run in runs:
        for test in run.tests:
            if test.name not in test_results:
                test_results[test.name] = {"pass": 0, "fail": 0}

            if test.passed:
                test_results[test.name]["pass"] += 1
            else:
                test_results[test.name]["fail"] += 1

    # Identify flaky tests (pass rate between 5% and 95%)
    flaky = []
    for test, results in test_results.items():
        total = results["pass"] + results["fail"]
        if total >= 5:  # Enough data
            pass_rate = results["pass"] / total
            if 0.05 < pass_rate < 0.95:
                flaky.append({
                    "test": test,
                    "pass_rate": pass_rate,
                    "total_runs": total
                })

    return sorted(flaky, key=lambda x: x["pass_rate"])

2. Code Pattern Analysis

Scan test code for flaky patterns:

FLAKY_PATTERNS = [
    # Timing issues
    (r'setTimeout|sleep|delay', "timing", "Uses explicit delays"),
    (r'Date\.now\(\)|new Date\(\)', "timing", "Uses current time"),

    # Async issues
    (r'\.then\([^)]*\)(?!.*await)', "async", "Promise without await"),
    (r'async.*(?!await)', "async", "Async without await"),

    # Order dependencies
    (r'Math\.random\(\)', "random", "Uses random values"),
    (r'uuid|nanoid', "random", "Uses generated IDs"),

    # Environment
    (r'process\.env', "environment", "Environment-dependent"),
    (r'fs\.(read|write)', "environment", "File system access"),
    (r'fetch\(|axios\.|http\.', "network", "Network calls"),
]

def scan_for_flaky_patterns(test_file):
    """Scan test file for flaky patterns"""
    content = read_file(test_file)
    matches = []

    for pattern, category, description in FLAKY_PATTERNS:
        if re.search(pattern, content):
            matches.append({
                "category": category,
                "description": description,
                "pattern": pattern
            })

    return matches

3. Re-run Analysis

Run tests multiple times to detect flakiness:

# Run tests 10 times, track results
for i in {1..10}; do
  npm test -- --reporter=json >> test-results.jsonl
done

# Analyze for inconsistency
python analyze_reruns.py test-results.jsonl

Output Format

## Flaky Test Report

**Analysis Period**: Last 30 days
**Total Tests**: 450
**Flaky Tests Found**: 12 (2.7%)

### Critical Flaky Tests (< 50% pass rate)

#### 1. `test/api/login.test.ts:45`
**Pass Rate**: 42% (21/50 runs)
**Category**: Timing
**Pattern**: Uses `Date.now()` for token expiry

```typescript
// Flaky code
it('should expire token after 1 hour', () => {
  const token = createToken();
  const expiry = Date.now() + 3600000;  // Flaky!
  expect(token.expiresAt).toBe(expiry);
});

Root Cause: Test creates token and checks expiry in same millisecond sometimes, different millisecond other times.

Recommended Fix: Use mocked time

it('should expire token after 1 hour', () => {
  vi.setSystemTime(new Date('2024-01-01T00:00:00Z'));
  const token = createToken();
  expect(token.expiresAt).toBe(new Date('2024-01-01T01:00:00Z').getTime());
  vi.useRealTimers();
});

High Flaky Tests (50-80% pass rate)

2. `test/db/connection.test.ts:23`

Pass Rate: 68% (34/50 runs) Category: Resource Pattern: Connection pool exhaustion

[... more tests ...]

Summary by Category

| Category | Count | Impact | |----------|-------|--------| | Timing | 5 | HIGH | | Async | 3 | HIGH | | Environment | 2 | MEDIUM | | Order | 1 | MEDIUM | | Network | 1 | LOW |

Recommendations

Quick Win: Fix 5 timing tests with vi.setSystemTime() (+0.5% stability)
Medium Effort: Add proper async handling (+0.3% stability)
Infrastructure: Add test isolation for DB tests (+0.2% stability)

Quarantine Candidates

These tests should be skipped in CI until fixed:

// vitest.config.ts
export default {
  test: {
    exclude: [
      'test/api/login.test.ts',       // Timing flaky
      'test/db/connection.test.ts',   // Resource flaky
    ]
  }
}

Note: Track quarantined tests in .aiwg/testing/flaky-quarantine.md


## Quarantine Process

### 1. Identify

```bash
# Run flaky detection
python scripts/flaky_detect.py --ci-history 30 --threshold 95

2. Quarantine

// Mark test as flaky
describe.skip('flaky: login expiry', () => {
  // FLAKY: https://github.com/org/repo/issues/123
  // Root cause: timing-dependent
  // Fix in progress: PR #456
});

3. Track

Create tracking issue:

## Flaky Test: test/api/login.test.ts:45

- **Pass Rate**: 42%
- **Category**: Timing
- **Root Cause**: Uses real system time
- **Quarantined**: 2024-12-12
- **Fix PR**: #456
- **Target Unquarantine**: 2024-12-15

4. Fix and Unquarantine

After fix:

# Verify fix with multiple runs
for i in {1..20}; do npm test -- test/api/login.test.ts; done

# Remove from quarantine if all pass

Integration Points

Works with flaky-fix skill for automated repairs
Reports to CI dashboard
Feeds into flow-gate-check for release decisions
Tracks in .aiwg/testing/flaky-registry.md

Script Reference

flaky_detect.py

Analyze CI history for flaky tests:

python scripts/flaky_detect.py --repo owner/repo --days 30

flaky_scanner.py

Scan code for flaky patterns:

python scripts/flaky_scanner.py --target test/

References

@$AIWG_ROOT/agentic/code/addons/testing-quality/README.md — Testing quality addon overview
@$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Research-first approach for root cause analysis
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/README.md — SDLC framework context for quality gates
@$AIWG_ROOT/docs/cli-reference.md — CLI reference

Adoption

jmagly/flaky-detect

$ install --global

Security Scan Results

SKILL.md

Flaky Detect Skill

Purpose

Research Foundation

When This Skill Applies

Trigger Phrases

Flaky Test Taxonomy (Google Research)

Detection Methods

1. CI History Analysis

2. Code Pattern Analysis

3. Re-run Analysis

Output Format

High Flaky Tests (50-80% pass rate)

2. test/db/connection.test.ts:23

Summary by Category

Recommendations

Quarantine Candidates

2. Quarantine

3. Track

4. Fix and Unquarantine

Integration Points

Script Reference

flaky_detect.py

flaky_scanner.py

References

Related Skills

jmagly/radar-status

jmagly/radar-report

jmagly/radar-init

jmagly/profile-temporal

jmagly/flaky-detect

$ install --global

Security Scan Results

SKILL.md

Flaky Detect Skill

Purpose

Research Foundation

When This Skill Applies

Trigger Phrases

Flaky Test Taxonomy (Google Research)

Detection Methods

1. CI History Analysis

2. Code Pattern Analysis

3. Re-run Analysis

Output Format

High Flaky Tests (50-80% pass rate)

2. test/db/connection.test.ts:23

Summary by Category

Recommendations

Quarantine Candidates

2. Quarantine

3. Track

4. Fix and Unquarantine

Integration Points

Script Reference

flaky_detect.py

flaky_scanner.py

References

Related Skills

jmagly/radar-status

jmagly/radar-report

jmagly/radar-init

jmagly/profile-temporal

2. `test/db/connection.test.ts:23`

2. `test/db/connection.test.ts:23`