Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

rohitg00/hypothesis-testing

Name: hypothesis-testing
Author: rohitg00

packages/core/src/methodology/packs/debugging/hypothesis-testing/SKILL.md

npx skillsauth add rohitg00/skillkit hypothesis-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Hypothesis-Driven Debugging

You are applying the scientific method to debugging. Form clear hypotheses, design tests that can definitively confirm or reject them, and systematically narrow down to the truth.

Core Principle

Every debugging action should test a specific hypothesis. Random changes are not debugging.

The Scientific Debugging Method

1. Observe - Gather Facts

Before forming hypotheses, collect observations:

What exactly happens? (specific symptoms)
When does it happen? (timing, frequency)
Where does it happen? (environment, component)
What changed recently? (code, config, data)

Write down observations objectively:

Observations:
- API returns 500 error on POST /orders
- Happens only when cart has > 10 items
- Started after deployment on 2024-01-15
- Works fine in staging environment
- Error logs show "connection refused" to inventory service

2. Hypothesize - Form Testable Theories

Examples (bad → good):

~~"Something is wrong with the network"~~ → "The inventory service connection pool is exhausted when processing orders with >10 items"
~~"There might be a race condition"~~ → "The order processing timeout (5s) is insufficient for large orders"

3. Predict - Define Expected Results

For each hypothesis, define what you expect to observe if it is true versus false:

Hypothesis: Connection pool exhausted for large orders

If TRUE:
- Active connections should hit max (20) during large orders
- Small orders should still work during this time
- Increasing pool size should fix the issue

If FALSE:
- Connection count stays well below max
- Small orders also fail during the issue
- Pool size change has no effect

4. Test - Experiment Systematically

Design tests that definitively confirm or reject:

Test Plan for Connection Pool Hypothesis:

1. Add connection pool monitoring
   - Log active connections before/after each request
   - Expected if true: Count reaches 20 during failures

2. Artificial stress test
   - Send 5 large orders simultaneously
   - Expected if true: Failures start when pool exhausted

3. Increase pool size to 50
   - Repeat stress test
   - Expected if true: Failures stop or threshold moves

4. Control test with small orders
   - Send 20 small orders simultaneously
   - Expected if true: No failures (faster processing)

5. Analyze - Interpret Results

After testing:

Did results match predictions for TRUE or FALSE?
Are results conclusive or ambiguous?
Do results suggest a different hypothesis?

Results:
- Connection count reached 20/20 during failures ✓
- Small orders succeeded during same period ✓
- Pool size increase to 50 → failures stopped ✓

Conclusion: Hypothesis CONFIRMED
Connection pool exhaustion is the proximate cause.

New question: Why do large orders exhaust the pool?
New hypothesis: Large orders make multiple inventory calls per item

Hypothesis Tracking Template

## Bug: [Description]

### Hypothesis 1: [Theory]
**Status:** Testing | Confirmed | Rejected
**Probability:** High | Medium | Low

**Evidence For:**
- [Evidence 1]
- [Evidence 2]

**Evidence Against:**
- [Evidence 1]

**Test Plan:**
1. [Test 1] - Expected result if true
2. [Test 2] - Expected result if false

**Test Results:**
- [Result 1]: [Supports/Contradicts]
- [Result 2]: [Supports/Contradicts]

**Conclusion:** [Confirmed/Rejected] because [reasoning]

---

### Hypothesis 2: [Next Theory]
...

Testing Techniques by Hypothesis Type

Testing Timing Hypotheses

// Add timing instrumentation
const start = performance.now();
await suspectedSlowOperation();
const duration = performance.now() - start;
console.log(`Operation took ${duration}ms`);
// Hypothesis confirmed if duration > expected

Testing Data Hypotheses

// Validate data at key points
function processWithValidation(data) {
  console.assert(data.id != null, 'Missing id');
  console.assert(data.items?.length > 0, 'Empty items');
  console.assert(typeof data.total === 'number', 'Invalid total');
  // If assertions fail, data hypothesis likely true
}

Testing State Hypotheses

// Snapshot state before and after
const stateBefore = JSON.stringify(currentState);
suspectedStateMutation();
const stateAfter = JSON.stringify(currentState);
if (stateBefore !== stateAfter) {
  console.log('State changed:', diff(stateBefore, stateAfter));
}

Decision Tree

Is the hypothesis testable?
├── NO → Refine it to be more specific
└── YES → Can I test it without side effects?
    ├── NO → Design a safe test (staging, logs-only)
    └── YES → Run the test
        └── Results conclusive?
            ├── NO → Design a better test
            └── YES → Hypothesis confirmed or rejected?
                ├── CONFIRMED → Root cause found?
                │   ├── YES → Fix and verify
                │   └── NO → Form next hypothesis (why?)
                └── REJECTED → Form next hypothesis

Integration with Other Skills

root-cause-analysis: Hypothesis testing is a key technique within RCA
trace-and-isolate: Use tracing to gather evidence for hypotheses
testing/red-green-refactor: Write test that confirms the bug before fixing

rohitg00/hypothesis-testing

packages/core/src/methodology/packs/debugging/hypothesis-testing/SKILL.md

Applies the scientific method to debugging by helping users form specific, testable hypotheses, design targeted experiments, and systematically confirm or reject theories to find root causes. Use when a user says their code isn't working, they're getting an error, something broke, they want to troubleshoot a bug, or they're trying to figure out what's causing an issue. Concrete actions include isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.

757 stars

development

Updated Apr 12, 2026

$ install --global

skillsauth

npx skillsauth add rohitg00/skillkit hypothesis-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 12, 2026, 2:01 AM11.2s1 file scanned

SKILL.md

name:: hypothesis-testing
description:: Applies the scientific method to debugging by helping users form specific, testable hypotheses, design targeted experiments, and systematically confirm or reject theories to find root causes. Use when a user says their code isn't working, they're getting an error, something broke, they want to troubleshoot a bug, or they're trying to figure out what's causing an issue. Concrete actions include isolating failing components, forming and testing hypotheses, analyzing error messages, tracing execution paths, and interpreting test results to narrow down root causes.
version:: 1.0.0
difficulty:: intermediate
estimatedTime:: 15

Hypothesis-Driven Debugging

You are applying the scientific method to debugging. Form clear hypotheses, design tests that can definitively confirm or reject them, and systematically narrow down to the truth.

Core Principle

Every debugging action should test a specific hypothesis. Random changes are not debugging.

The Scientific Debugging Method

1. Observe - Gather Facts

Before forming hypotheses, collect observations:

What exactly happens? (specific symptoms)
When does it happen? (timing, frequency)
Where does it happen? (environment, component)
What changed recently? (code, config, data)

Write down observations objectively:

Observations:
- API returns 500 error on POST /orders
- Happens only when cart has > 10 items
- Started after deployment on 2024-01-15
- Works fine in staging environment
- Error logs show "connection refused" to inventory service

2. Hypothesize - Form Testable Theories

Examples (bad → good):

~~"Something is wrong with the network"~~ → "The inventory service connection pool is exhausted when processing orders with >10 items"
~~"There might be a race condition"~~ → "The order processing timeout (5s) is insufficient for large orders"

3. Predict - Define Expected Results

For each hypothesis, define what you expect to observe if it is true versus false:

Hypothesis: Connection pool exhausted for large orders

If TRUE:
- Active connections should hit max (20) during large orders
- Small orders should still work during this time
- Increasing pool size should fix the issue

If FALSE:
- Connection count stays well below max
- Small orders also fail during the issue
- Pool size change has no effect

4. Test - Experiment Systematically

Design tests that definitively confirm or reject:

Test Plan for Connection Pool Hypothesis:

1. Add connection pool monitoring
   - Log active connections before/after each request
   - Expected if true: Count reaches 20 during failures

2. Artificial stress test
   - Send 5 large orders simultaneously
   - Expected if true: Failures start when pool exhausted

3. Increase pool size to 50
   - Repeat stress test
   - Expected if true: Failures stop or threshold moves

4. Control test with small orders
   - Send 20 small orders simultaneously
   - Expected if true: No failures (faster processing)

5. Analyze - Interpret Results

After testing:

Did results match predictions for TRUE or FALSE?
Are results conclusive or ambiguous?
Do results suggest a different hypothesis?

Results:
- Connection count reached 20/20 during failures ✓
- Small orders succeeded during same period ✓
- Pool size increase to 50 → failures stopped ✓

Conclusion: Hypothesis CONFIRMED
Connection pool exhaustion is the proximate cause.

New question: Why do large orders exhaust the pool?
New hypothesis: Large orders make multiple inventory calls per item

Hypothesis Tracking Template

## Bug: [Description]

### Hypothesis 1: [Theory]
**Status:** Testing | Confirmed | Rejected
**Probability:** High | Medium | Low

**Evidence For:**
- [Evidence 1]
- [Evidence 2]

**Evidence Against:**
- [Evidence 1]

**Test Plan:**
1. [Test 1] - Expected result if true
2. [Test 2] - Expected result if false

**Test Results:**
- [Result 1]: [Supports/Contradicts]
- [Result 2]: [Supports/Contradicts]

**Conclusion:** [Confirmed/Rejected] because [reasoning]

---

### Hypothesis 2: [Next Theory]
...

Testing Techniques by Hypothesis Type

Testing Timing Hypotheses

// Add timing instrumentation
const start = performance.now();
await suspectedSlowOperation();
const duration = performance.now() - start;
console.log(`Operation took ${duration}ms`);
// Hypothesis confirmed if duration > expected

Testing Data Hypotheses

// Validate data at key points
function processWithValidation(data) {
  console.assert(data.id != null, 'Missing id');
  console.assert(data.items?.length > 0, 'Empty items');
  console.assert(typeof data.total === 'number', 'Invalid total');
  // If assertions fail, data hypothesis likely true
}

Testing State Hypotheses

// Snapshot state before and after
const stateBefore = JSON.stringify(currentState);
suspectedStateMutation();
const stateAfter = JSON.stringify(currentState);
if (stateBefore !== stateAfter) {
  console.log('State changed:', diff(stateBefore, stateAfter));
}

Decision Tree

Is the hypothesis testable?
├── NO → Refine it to be more specific
└── YES → Can I test it without side effects?
    ├── NO → Design a safe test (staging, logs-only)
    └── YES → Run the test
        └── Results conclusive?
            ├── NO → Design a better test
            └── YES → Hypothesis confirmed or rejected?
                ├── CONFIRMED → Root cause found?
                │   ├── YES → Fix and verify
                │   └── NO → Form next hypothesis (why?)
                └── REJECTED → Form next hypothesis

Integration with Other Skills

root-cause-analysis: Hypothesis testing is a key technique within RCA
trace-and-isolate: Use tracing to gather evidence for hypotheses
testing/red-green-refactor: Write test that confirms the bug before fixing

Related Skills

rohitg00/find-skills

tools

VerifiedTrustedCommunity

Discovers, searches, and installs skills from multiple AI agent skill marketplaces (400K+ skills) using the SkillKit CLI. Supports browsing official partner collections (Anthropic, Vercel, Supabase, Stripe, and more) and community repositories, searching by domain or technology, and installing specific skills from GitHub. Use when the user wants to find, browse, or install new agent skills, plugins, extensions, or add-ons; asks 'is there a skill for X' or 'find a skill for X'; wants to explore a skill store or marketplace; needs to extend agent capabilities in areas like React, testing, DevOps, security, or APIs; or says 'browse skills', 'search skill marketplace', 'install a skill', or 'what skills are available'.

757SKILL.mdUpdated Apr 12, 2026

rohitg00/test-patterns

development

VerifiedTrustedCommunity

Applies proven testing patterns — Arrange-Act-Assert (AAA), Given-When-Then, Test Data Builders, Object Mother, parameterized tests, fixtures, spies, and test doubles — to help write maintainable, reliable, and readable test suites. Use when the user asks about writing unit tests, integration tests, or end-to-end tests; structuring test cases or test suites; applying TDD or BDD practices; working with mocks, stubs, spies, or fakes; improving test coverage or reducing flakiness; or needs guidance on test organization, naming conventions, or assertions in frameworks like Jest, Vitest, pytest, or similar.

757SKILL.mdUpdated Apr 12, 2026

rohitg00/test-patterns

rohitg00/red-green-refactor

development

VerifiedTrustedCommunity

Guides the red-green-refactor TDD workflow: write a failing test first, implement the minimum code to make it pass, then refactor while keeping tests green. Use when a user asks to practice TDD, write tests first, follow red-green-refactor, do test-driven development, write failing tests before code, or phrases like 'make the test pass', 'test coverage', or 'unit tests before implementation'.

757SKILL.mdUpdated Apr 12, 2026

rohitg00/red-green-refactor

rohitg00/testing-anti-patterns

development

VerifiedTrustedCommunity

Reviews test code to identify and fix common testing anti-patterns including flaky tests, over-mocking, brittle assertions, test interdependency, and hidden test logic. Flags bad patterns, explains the specific defect, and provides corrected implementations. Use when reviewing test code, debugging intermittent or unreliable test failures, or when the user mentions flaky tests, test smells, brittle tests, test isolation issues, mock overuse, slow tests, or test maintenance problems.

757SKILL.mdUpdated Apr 12, 2026

rohitg00/testing-anti-patterns

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/rohitg00/skillkit.git

# Copy into Claude Code skills folder (global)
cp -r skillkit/packages/core/src/methodology/packs/debugging/hypothesis-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

rohitg00/skillkit

757 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT