agentic/code/addons/testing-quality/skills/flaky-fix/SKILL.md
Suggest and apply fixes for flaky tests based on detected patterns. Use after flaky-detect identifies unreliable tests that need repair.
npx skillsauth add jmagly/aiwg flaky-fixInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze flaky test patterns and suggest or auto-apply fixes. Based on FlakyFix research showing LLMs can automatically repair flaky tests with targeted prompts.
| Finding | Source | Reference | |---------|--------|-----------| | LLM Auto-repair | FlakyFix (2023) | arXiv:2307.00012 - 70%+ success rate | | Flaky Taxonomy | Google (2016) | Flaky Tests Study | | Pattern-based Fixes | FlaKat (2024) | arXiv:2403.01003 |
flaky-detect identifies flaky tests| Natural Language | Action | |------------------|--------| | "Fix this flaky test" | Analyze and suggest fix | | "Make this test reliable" | Apply deterministic patterns | | "Why is this test flaky?" | Root cause analysis + fix | | "Auto-fix flaky tests" | Batch fix safe patterns | | "Remove timing dependency" | Specific timing fix |
// FLAKY: Time-dependent
it('should expire after 1 hour', () => {
const token = createToken();
expect(token.expiresAt).toBeGreaterThan(Date.now());
});
// FIXED: Mocked time
it('should expire after 1 hour', () => {
const fixedTime = new Date('2024-01-01T00:00:00Z');
vi.setSystemTime(fixedTime);
const token = createToken();
expect(token.expiresAt).toBe(fixedTime.getTime() + 3600000);
vi.useRealTimers();
});
// FLAKY: Arbitrary delay
it('should complete async operation', async () => {
startAsyncOperation();
await sleep(100); // Race condition!
expect(result).toBeDefined();
});
// FIXED: Wait for actual completion
it('should complete async operation', async () => {
const result = await startAsyncOperation();
expect(result).toBeDefined();
});
// Or use waitFor for DOM
it('should show loading state', async () => {
render(<Component />);
await waitFor(() => {
expect(screen.getByText('Loaded')).toBeInTheDocument();
});
});
// FLAKY: Promise not awaited
it('should fetch data', () => {
const promise = fetchData();
promise.then(data => {
expect(data).toBeDefined(); // May not run before test ends
});
});
// FIXED: Awaited promise
it('should fetch data', async () => {
const data = await fetchData();
expect(data).toBeDefined();
});
// FLAKY: Order not guaranteed
it('should process items', async () => {
const results = [];
items.forEach(async item => {
results.push(await process(item));
});
expect(results.length).toBe(3); // Race!
});
// FIXED: Guaranteed order
it('should process items', async () => {
const results = await Promise.all(
items.map(item => process(item))
);
expect(results.length).toBe(3);
});
// FLAKY: Shared state between tests
let counter = 0;
it('should increment', () => {
counter++;
expect(counter).toBe(1); // Fails if other test runs first
});
// FIXED: Isolated state
describe('counter', () => {
let counter;
beforeEach(() => {
counter = 0; // Fresh state each test
});
it('should increment', () => {
counter++;
expect(counter).toBe(1);
});
});
// FLAKY: Random ID
it('should create user with ID', () => {
const user = createUser();
expect(user.id).toBe('expected-id'); // Random!
});
// FIXED: Deterministic ID
it('should create user with ID', () => {
vi.mock('uuid', () => ({
v4: () => 'test-uuid-1234'
}));
const user = createUser();
expect(user.id).toBe('test-uuid-1234');
});
// FLAKY: Real network
it('should fetch from API', async () => {
const data = await fetch('https://api.example.com/data');
expect(data).toBeDefined(); // Network failures!
});
// FIXED: Mocked network
it('should fetch from API', async () => {
vi.mock('node-fetch', () => ({
default: vi.fn().mockResolvedValue({
json: () => ({ success: true })
})
}));
const data = await fetchFromApi();
expect(data.success).toBe(true);
});
| Pattern | Detection | Fix |
|---------|-----------|-----|
| Date.now() in assertion | Regex | Wrap with vi.setSystemTime() |
| Missing await on async | AST analysis | Add await keyword |
| setTimeout in test | Regex | Replace with vi.advanceTimersByTime() |
| Math.random() | Regex | Mock with deterministic value |
| Pattern | Why Review Needed | |---------|-------------------| | Shared test state | May require architectural changes | | Database fixtures | Needs isolation strategy | | External service calls | Mock design decision | | Complex async flows | Multiple fix approaches |
## Flaky Test Fix Report
### Test: `test/api/login.test.ts:45`
**Root Cause**: Timing - uses `Date.now()` in assertion
**Confidence**: HIGH (pattern match)
**Auto-fixable**: YES
#### Original Code
```typescript
it('should create token with expiry', () => {
const token = createToken();
expect(token.expiresAt).toBeGreaterThan(Date.now());
});
it('should create token with expiry', () => {
const now = new Date('2024-01-01T12:00:00Z');
vi.setSystemTime(now);
const token = createToken();
expect(token.expiresAt).toBe(now.getTime() + TOKEN_LIFETIME);
vi.useRealTimers();
});
vi.setSystemTime() for deterministic timevi.useRealTimers() cleanupRun 10x to confirm fix:
for i in {1..10}; do npm test -- test/api/login.test.ts:45; done
| Test | Category | Auto-Fixed | Status | |------|----------|-----------|--------| | login.test.ts:45 | Timing | Yes | ✅ Fixed | | user.test.ts:23 | Async | Yes | ✅ Fixed | | db.test.ts:67 | State | No | Suggested | | api.test.ts:12 | Network | No | Suggested |
Auto-fixed: 2 tests Manual review: 2 tests Estimated stability improvement: +1.5%
## Integration Points
- Works with `flaky-detect` for test identification
- Reports to Test Engineer for complex fixes
- Feeds into CI stability metrics
- Updates `.aiwg/testing/flaky-fixes.md`
## Script Reference
### flaky_fixer.py
Analyze and fix flaky tests:
```bash
python scripts/flaky_fixer.py --test test/api/login.test.ts --auto-fix
Fix multiple flaky tests:
python scripts/batch_fix.py --input flaky-report.json --safe-only
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.