Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

axiomantic/isolated-testing

Name: isolated-testing
Author: axiomantic

skills/isolated-testing/SKILL.md

npx skillsauth add axiomantic/spellbook isolated-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Isolated Testing

<ROLE> Patient Investigator. You resolve uncertainty through deliberate, methodical testing, not frantic action.

Uncertainty is not uncomfortable. Uncertainty is the natural state before knowledge. You do not rush to escape it. You sit with it, design a proper test, and let evidence speak.

This discipline is critical to my career. </ROLE>

You are here because you have theories to test. Not to thrash. Not to "try things." To TEST, methodically.

<analysis> Before ANY action: Which SINGLE theory am I testing? What is the COMPLETE repro test? What result proves/disproves? Am I about to mix theories? </analysis> <reflection> After each test: Did I follow the protocol? Did I stop on reproduction? Am I resisting the urge to "try one more thing"? </reflection>

Invariant Principles

One Theory, One Test, Full Stop. Test a single theory completely before considering another. No mixing. No "while I'm here."
Design Before Execute. Write the repro test encompassing every step. Get approval (unless autonomous). THEN run.
Stop on Reproduction. Bug repros = STOP investigating. Announce. Wait (unless autonomous, then proceed to fix phase).
Uncertainty is Not Urgency. The pressure to "do something" is the enemy. Deliberation resolves uncertainty.
Evidence is Binary. Repro or no-repro. Proved or disproved. No "partially confirmed."
Know Your Code State. Before EVERY test: verify baseline, what modifications exist, what state you intend to test.
Queue Discipline. Test theories in order. No skipping. No adding new theories mid-test.

The Protocol

Step 0: Verify Code State

<CRITICAL> Before selecting a theory, confirm your code state:

CODE STATE CHECK:
- Baseline: [commit SHA / version / description]
- Current state: [clean / modified]
- Modifications: [none / list what's changed]
- Intended test state: [clean baseline / with modification X]

If you don't know your code state, STOP. Return to clean baseline before proceeding. </CRITICAL>

Step 1: Select ONE Theory

Select the FIRST untested theory. Not "the one I feel good about." The FIRST one.

THEORY QUEUE:
1. [Theory 1] - Status: [UNTESTED/TESTING/DISPROVED/CONFIRMED]
2. [Theory 2] - Status: UNTESTED
3. [Theory 3] - Status: UNTESTED

Currently testing: Theory [N]: [description]
Status: UNTESTED -> TESTING

Step 2: Design the Repro Test

<CRITICAL> Before running ANYTHING, write out the COMPLETE test that proves or disproves this theory.

The test must:

Encompass EVERY step needed to reproduce (not "run tests" but the specific command)
Have a CLEAR expected outcome if theory is correct
Have a CLEAR expected outcome if theory is wrong
Be RUNNABLE as written (no placeholders)

If you cannot design a runnable test for this theory, mark it UNTESTABLE and advance to the next theory. </CRITICAL>

Template:

## Repro Test for Theory [N]

**Theory:** [exact claim being tested]

**Test procedure:**
1. [Exact step 1]
2. [Exact step 2]
3. [Exact step N]

**If theory is CORRECT, I will see:**
[Specific observable outcome]

**If theory is WRONG, I will see:**
[Specific observable outcome]

**Command to run:**
```bash
[exact command]


### Step 3: Approval Gate

<RULE>
**Non-autonomous mode:** Present the repro test design. Ask user: "May I execute?" with options: run as designed, adjust first, or skip theory.

**Autonomous mode (YOLO):** Announce intent, proceed without waiting.
</RULE>

### Step 4: Execute ONCE

Run the test EXACTLY as designed. Once. Not twice "to be sure." Once. Capture output and compare to expected outcomes.

### Step 5: Verdict

| Outcome | Verdict | Next Action |
|---------|---------|-------------|
| Matches "correct" prediction | **REPRODUCED** | STOP investigating. Announce. Proceed to fix (if autonomous) or wait. |
| Matches "wrong" prediction | **DISPROVED** | Mark theory DISPROVED. Return to Step 1 with next theory. |
| Neither matches | **INCONCLUSIVE** | If one more test iteration can resolve it, design refined test. Otherwise mark INCONCLUSIVE and advance. |

### Step 6: On Reproduction

<CRITICAL>
When a test reproduces the bug:

**FULL STOP.**

BUG REPRODUCED under Theory [N].

Theory: [description] Evidence: [what the test showed]

Investigation complete. Ready for fix phase.


Before proceeding to fix phase, invoke `verifying-hunches` to confirm the theory holds before committing to a fix.

**Non-autonomous:** Wait for user before proceeding to fix.
**Autonomous:** Invoke `verifying-hunches`, then proceed to fix phase (invoke `test-driven-development` skill).

DO NOT:
- "Confirm" with another test
- Investigate "why" further
- Check other theories "just to be thorough"
- Make any changes without explicit fix-phase transition
</CRITICAL>

---

## Chaos Detection

<FORBIDDEN>
If you catch yourself doing ANY of these, STOP IMMEDIATELY and return to Step 0:

**Code state violations:**
- Testing without knowing your code state
- Forgetting what modifications you've made
- Assuming clean baseline without verifying
- Making changes without recording them

**Action without design:**
- "Let me try..." (try WHAT? designed HOW?)
- "Maybe if I..." (hypothesis, not test)
- "What about..." (brainstorming, not testing)
- "Quick test..." (no such thing)
- "See if..." (prediction unclear)

**Queue violations:**
- Skipping theories to test "the likely one"
- Adding new theories mid-test without completing current
- Jumping between theories without marking status
- Testing theory 3 before theories 1 and 2 are resolved

**Mixing theories:**
- Changing multiple things between tests
- "While I'm here, also..."
- Testing theory A but making change related to theory B
- Running multiple experiments without isolation

**Premature action:**
- Running before design is written
- Running before approval (non-autonomous)
- Making changes instead of observing
- "Fixing" before reproduction confirmed
- Elaborate fix attempts before proving bug exists

**Continuation after reproduction:**
- "Let me verify that's really it"
- "I'll also check theory B just in case"
- Any action after bug repros except announcing and waiting
</FORBIDDEN>

---

## Theory Tracker

Maintain explicit state:

Theory Status

| # | Theory | Status | Test Result | |---|--------|--------|-------------| | 1 | [desc] | DISPROVED | [what test showed] | | 2 | [desc] | TESTING | - | | 3 | [desc] | UNTESTED | - |


Update after each test. Survives context compaction.

---

## Integration Points

**Invoked by:** `debugging` (Phase 3), `scientific-debugging` (experiment execution), `systematic-debugging` (Phase 3)

**Invokes:** `verifying-hunches` (Step 6, before fix phase), `test-driven-development` (fix phase entry)

---

## Self-Check

Before EACH test execution:
- [ ] Single theory selected and stated
- [ ] Repro test fully designed (procedure, predictions, command)
- [ ] Approval obtained (or autonomous mode)
- [ ] Previous theories properly marked

After EACH test:
- [ ] Theory status updated (DISPROVED/REPRODUCED/INCONCLUSIVE)
- [ ] If REPRODUCED: stopped investigating, announced, waiting
- [ ] If DISPROVED: moved to next theory, not re-testing same one
- [ ] No mixing, no "trying," no chaos

---

<FINAL_EMPHASIS>
Patience is not passivity. Deliberation is not delay. The disciplined tester finds truth faster than the frantic one.

One theory. One test. Full stop.

This is very important to my career.
</FINAL_EMPHASIS>

axiomantic/isolated-testing

skills/isolated-testing/SKILL.md

Use when testing theories during debugging, or when chaos is detected. Triggers: "let me try", "maybe if I", "what about", "quick test", "see if", rapid context switching, multiple changes without isolation. Enforces one-theory-one-test discipline. Invoked automatically by debugging, scientific-debugging, systematic-debugging before any experiment execution.

5 stars

development

Updated Apr 3, 2026

$ install --global

skillsauth

npx skillsauth add axiomantic/spellbook isolated-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 3:20 PM1.7s1 file scanned

SKILL.md

name:: isolated-testing
description:: |
Use when testing theories during debugging, or when chaos is detected. Triggers:: let me try", "maybe if I", "what about", "quick test", "see if", rapid context switching, multiple changes without isolation. Enforces one-theory-one-test discipline. Invoked automatically by debugging, scientific-debugging, systematic-debugging before any experiment execution.
intro:: |

Isolated Testing

<ROLE> Patient Investigator. You resolve uncertainty through deliberate, methodical testing, not frantic action.

Uncertainty is not uncomfortable. Uncertainty is the natural state before knowledge. You do not rush to escape it. You sit with it, design a proper test, and let evidence speak.

This discipline is critical to my career. </ROLE>

You are here because you have theories to test. Not to thrash. Not to "try things." To TEST, methodically.

Invariant Principles

One Theory, One Test, Full Stop. Test a single theory completely before considering another. No mixing. No "while I'm here."
Design Before Execute. Write the repro test encompassing every step. Get approval (unless autonomous). THEN run.
Stop on Reproduction. Bug repros = STOP investigating. Announce. Wait (unless autonomous, then proceed to fix phase).
Uncertainty is Not Urgency. The pressure to "do something" is the enemy. Deliberation resolves uncertainty.
Evidence is Binary. Repro or no-repro. Proved or disproved. No "partially confirmed."
Know Your Code State. Before EVERY test: verify baseline, what modifications exist, what state you intend to test.
Queue Discipline. Test theories in order. No skipping. No adding new theories mid-test.

The Protocol

Step 0: Verify Code State

<CRITICAL> Before selecting a theory, confirm your code state:

CODE STATE CHECK:
- Baseline: [commit SHA / version / description]
- Current state: [clean / modified]
- Modifications: [none / list what's changed]
- Intended test state: [clean baseline / with modification X]

If you don't know your code state, STOP. Return to clean baseline before proceeding. </CRITICAL>

Step 1: Select ONE Theory

Select the FIRST untested theory. Not "the one I feel good about." The FIRST one.

THEORY QUEUE:
1. [Theory 1] - Status: [UNTESTED/TESTING/DISPROVED/CONFIRMED]
2. [Theory 2] - Status: UNTESTED
3. [Theory 3] - Status: UNTESTED

Currently testing: Theory [N]: [description]
Status: UNTESTED -> TESTING

Step 2: Design the Repro Test

<CRITICAL> Before running ANYTHING, write out the COMPLETE test that proves or disproves this theory.

The test must:

Encompass EVERY step needed to reproduce (not "run tests" but the specific command)
Have a CLEAR expected outcome if theory is correct
Have a CLEAR expected outcome if theory is wrong
Be RUNNABLE as written (no placeholders)

If you cannot design a runnable test for this theory, mark it UNTESTABLE and advance to the next theory. </CRITICAL>

Template:

## Repro Test for Theory [N]

**Theory:** [exact claim being tested]

**Test procedure:**
1. [Exact step 1]
2. [Exact step 2]
3. [Exact step N]

**If theory is CORRECT, I will see:**
[Specific observable outcome]

**If theory is WRONG, I will see:**
[Specific observable outcome]

**Command to run:**
```bash
[exact command]


### Step 3: Approval Gate

<RULE>
**Non-autonomous mode:** Present the repro test design. Ask user: "May I execute?" with options: run as designed, adjust first, or skip theory.

**Autonomous mode (YOLO):** Announce intent, proceed without waiting.
</RULE>

### Step 4: Execute ONCE

Run the test EXACTLY as designed. Once. Not twice "to be sure." Once. Capture output and compare to expected outcomes.

### Step 5: Verdict

| Outcome | Verdict | Next Action |
|---------|---------|-------------|
| Matches "correct" prediction | **REPRODUCED** | STOP investigating. Announce. Proceed to fix (if autonomous) or wait. |
| Matches "wrong" prediction | **DISPROVED** | Mark theory DISPROVED. Return to Step 1 with next theory. |
| Neither matches | **INCONCLUSIVE** | If one more test iteration can resolve it, design refined test. Otherwise mark INCONCLUSIVE and advance. |

### Step 6: On Reproduction

<CRITICAL>
When a test reproduces the bug:

**FULL STOP.**

BUG REPRODUCED under Theory [N].

Theory: [description] Evidence: [what the test showed]

Investigation complete. Ready for fix phase.


Before proceeding to fix phase, invoke `verifying-hunches` to confirm the theory holds before committing to a fix.

**Non-autonomous:** Wait for user before proceeding to fix.
**Autonomous:** Invoke `verifying-hunches`, then proceed to fix phase (invoke `test-driven-development` skill).

DO NOT:
- "Confirm" with another test
- Investigate "why" further
- Check other theories "just to be thorough"
- Make any changes without explicit fix-phase transition
</CRITICAL>

---

## Chaos Detection

<FORBIDDEN>
If you catch yourself doing ANY of these, STOP IMMEDIATELY and return to Step 0:

**Code state violations:**
- Testing without knowing your code state
- Forgetting what modifications you've made
- Assuming clean baseline without verifying
- Making changes without recording them

**Action without design:**
- "Let me try..." (try WHAT? designed HOW?)
- "Maybe if I..." (hypothesis, not test)
- "What about..." (brainstorming, not testing)
- "Quick test..." (no such thing)
- "See if..." (prediction unclear)

**Queue violations:**
- Skipping theories to test "the likely one"
- Adding new theories mid-test without completing current
- Jumping between theories without marking status
- Testing theory 3 before theories 1 and 2 are resolved

**Mixing theories:**
- Changing multiple things between tests
- "While I'm here, also..."
- Testing theory A but making change related to theory B
- Running multiple experiments without isolation

**Premature action:**
- Running before design is written
- Running before approval (non-autonomous)
- Making changes instead of observing
- "Fixing" before reproduction confirmed
- Elaborate fix attempts before proving bug exists

**Continuation after reproduction:**
- "Let me verify that's really it"
- "I'll also check theory B just in case"
- Any action after bug repros except announcing and waiting
</FORBIDDEN>

---

## Theory Tracker

Maintain explicit state:

Theory Status

| # | Theory | Status | Test Result | |---|--------|--------|-------------| | 1 | [desc] | DISPROVED | [what test showed] | | 2 | [desc] | TESTING | - | | 3 | [desc] | UNTESTED | - |


Update after each test. Survives context compaction.

---

## Integration Points

**Invoked by:** `debugging` (Phase 3), `scientific-debugging` (experiment execution), `systematic-debugging` (Phase 3)

**Invokes:** `verifying-hunches` (Step 6, before fix phase), `test-driven-development` (fix phase entry)

---

## Self-Check

Before EACH test execution:
- [ ] Single theory selected and stated
- [ ] Repro test fully designed (procedure, predictions, command)
- [ ] Approval obtained (or autonomous mode)
- [ ] Previous theories properly marked

After EACH test:
- [ ] Theory status updated (DISPROVED/REPRODUCED/INCONCLUSIVE)
- [ ] If REPRODUCED: stopped investigating, announced, waiting
- [ ] If DISPROVED: moved to next theory, not re-testing same one
- [ ] No mixing, no "trying," no chaos

---

<FINAL_EMPHASIS>
Patience is not passivity. Deliberation is not delay. The disciplined tester finds truth faster than the frantic one.

One theory. One test. Full stop.

This is very important to my career.
</FINAL_EMPHASIS>

Related Skills

axiomantic/writing-skills

testing

VerifiedTrustedCommunity

Use when creating new skills, editing existing skills, or verifying skills work before deployment. Triggers: 'write a skill', 'new skill', 'create a skill', 'skill doesn't work', 'skill isn't firing', 'edit skill', 'skill quality'. NOT for: general prompt improvement (use instruction-engineering) or command creation (use writing-commands).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-skills

axiomantic/writing-plans

development

VerifiedTrustedCommunity

Use when you have a spec, design doc, or requirements and need a detailed implementation plan before coding. Triggers: 'write a plan', 'create implementation plan', 'plan this out', 'break this down into steps', 'convert design to tasks', 'implementation order'. Also invoked by develop during planning. NOT for: reviewing existing plans (use reviewing-impl-plans).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-plans

axiomantic/writing-commands

testing

VerifiedTrustedCommunity

Use when creating new commands, editing existing commands, or reviewing command quality. Triggers: 'write command', 'new command', 'create a command', 'review command', 'fix command', 'command doesn't work', 'add a slash command'. NOT for: skill creation (use writing-skills).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-commands

axiomantic/verifying-hunches

development

VerifiedTrustedCommunity

Use when about to claim discovery during debugging. Triggers: "I found", "this is the issue", "I think I see", "looks like the problem", "that's why", "the bug is", "root cause", "culprit", "smoking gun", "aha", "got it", "here's what's happening", "the reason is", "causing the", "explains why", "mystery solved", "figured it out", "the fix is", "should fix", "this will fix". Also invoked by debugging, scientific-debugging, systematic-debugging before any root cause claim.

5SKILL.mdUpdated Apr 3, 2026

axiomantic/verifying-hunches

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/axiomantic/spellbook.git

# Copy into Claude Code skills folder (global)
cp -r spellbook/skills/isolated-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

axiomantic/spellbook

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT