plugins/ce/skills/fixing-flaky-tests/SKILL.md
Diagnose and fix tests that pass in isolation but fail when run concurrently. Covers shared state isolation and resource conflicts. References condition-based-waiting for timing issues.
npx skillsauth add rileyhilliard/claude-essentials fixing-flaky-testsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Target symptom: Tests pass when run alone, fail when run with other tests.
Test passes alone, fails with others?
│
├─ Same error every time → Shared state
│ └─ Database, globals, files, singletons
│
├─ Random/timing failures → Race condition
│ └─ Use `condition-based-waiting` skill
│
└─ Resource errors (port, file lock) → Resource conflict
└─ Need unique resources per test/worker
Quick diagnosis:
Tests pollute state that other tests depend on. Fix by isolating state per test.
| State Type | Isolation Pattern |
|------------|-------------------|
| Database | Transaction rollback, savepoints, worker-specific DBs |
| Global variables | Reset in beforeEach/afterEach |
| Singletons | Provide fresh instance per test |
| Module state | jest.resetModules() or equivalent |
| Files | Unique paths per test, temp directories |
| Environment vars | Save/restore in setup/teardown |
Database isolation (most common):
# Python: Savepoint rollback - each test gets rolled back
@pytest.fixture
async def db_session(db_engine):
async with db_engine.connect() as conn:
await conn.begin()
await conn.begin_nested() # Savepoint
# ... yield session ...
await conn.rollback() # All changes vanish
// Jest: Reset mocks between tests
beforeEach(() => {
jest.clearAllMocks()
jest.resetModules() // Clear module cache before test
})
afterEach(() => {
jest.restoreAllMocks() // Restore spied functions
})
See language-specific references for complete patterns.
Tests don't wait for async operations to complete.
Use the condition-based-waiting skill for detailed patterns on:
findBy, Playwright auto-wait)Quick summary: Wait for conditions, not time:
// Bad
await sleep(500)
// Good
await waitFor(() => expect(result).toBe('done'))
Multiple tests or workers compete for same resource.
Worker-specific resources:
# Python pytest-xdist: unique DB per worker
@pytest.fixture(scope="session")
def database_url(worker_id):
if worker_id == "master":
return "postgresql://localhost/test"
return f"postgresql://localhost/test_{worker_id}"
// Jest/Node: dynamic port allocation
const server = app.listen(0) // OS assigns available port
const port = server.address().port
File conflicts:
import tempfile
@pytest.fixture
def temp_dir():
with tempfile.TemporaryDirectory() as d:
yield d
| Stack | Reference | |-------|-----------| | Python (pytest, SQLAlchemy) | references/python.md | | Jest / Testing Library | references/jest.md | | Playwright E2E | references/playwright.md |
After fixing, verify the fix worked:
# Run the specific test many times
pytest tests/test_flaky.py -x --count=20
# Run with parallelism
pytest -n auto
# Jest equivalent
jest --runInBand # First verify serial works
jest # Then verify parallel works
development
Selects and applies professional journalistic story structures (WSJ Formula, Inverted Pyramid, Hourglass, Tick-Tock, etc.) based on the content being written. Use when writing articles, blog posts, features, essays, long-form content, news stories, trend pieces, investigative reports, profiles, or any narrative prose longer than a few paragraphs. Also use when the user asks for help structuring a piece, choosing a story framework, organizing a draft, outlining an article, or wants to know which article format fits their content. Trigger on requests like "help me structure this," "what format should I use," "write a feature about," "draft a blog post on," or any mention of story structure, article architecture, or narrative frameworks. Complements the writer skill (which handles tone and anti-AI rhetoric) by providing the structural blueprint.
testing
Writing style and tone guide for human-sounding content. Use when writing documentation, READMEs, commit messages, PR descriptions, blog posts, LinkedIn posts, social media content, or any user-facing content.
data-ai
Create implementation plans with tasks grouped by subsystem. Related tasks share agent context; groups parallelize across subsystems.
development
Debugging framework that finds root causes before proposing fixes. Use when investigating bugs, errors, unexpected behavior, failed tests, or when previous fixes haven't worked.