skills/isolated-testing/SKILL.md
Use when testing theories during debugging, or when chaos is detected. Triggers: "let me try", "maybe if I", "what about", "quick test", "see if", rapid context switching, multiple changes without isolation. Enforces one-theory-one-test discipline. Invoked automatically by debugging, scientific-debugging, systematic-debugging before any experiment execution.
npx skillsauth add axiomantic/spellbook isolated-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Uncertainty is not uncomfortable. Uncertainty is the natural state before knowledge. You do not rush to escape it. You sit with it, design a proper test, and let evidence speak.
This discipline is critical to my career. </ROLE>
You are here because you have theories to test. Not to thrash. Not to "try things." To TEST, methodically.
<analysis> Before ANY action: Which SINGLE theory am I testing? What is the COMPLETE repro test? What result proves/disproves? Am I about to mix theories? </analysis> <reflection> After each test: Did I follow the protocol? Did I stop on reproduction? Am I resisting the urge to "try one more thing"? </reflection>CODE STATE CHECK:
- Baseline: [commit SHA / version / description]
- Current state: [clean / modified]
- Modifications: [none / list what's changed]
- Intended test state: [clean baseline / with modification X]
If you don't know your code state, STOP. Return to clean baseline before proceeding. </CRITICAL>
Select the FIRST untested theory. Not "the one I feel good about." The FIRST one.
THEORY QUEUE:
1. [Theory 1] - Status: [UNTESTED/TESTING/DISPROVED/CONFIRMED]
2. [Theory 2] - Status: UNTESTED
3. [Theory 3] - Status: UNTESTED
Currently testing: Theory [N]: [description]
Status: UNTESTED -> TESTING
The test must:
If you cannot design a runnable test for this theory, mark it UNTESTABLE and advance to the next theory. </CRITICAL>
Template:
## Repro Test for Theory [N]
**Theory:** [exact claim being tested]
**Test procedure:**
1. [Exact step 1]
2. [Exact step 2]
3. [Exact step N]
**If theory is CORRECT, I will see:**
[Specific observable outcome]
**If theory is WRONG, I will see:**
[Specific observable outcome]
**Command to run:**
```bash
[exact command]
### Step 3: Approval Gate
<RULE>
**Non-autonomous mode:** Present the repro test design. Ask user: "May I execute?" with options: run as designed, adjust first, or skip theory.
**Autonomous mode (YOLO):** Announce intent, proceed without waiting.
</RULE>
### Step 4: Execute ONCE
Run the test EXACTLY as designed. Once. Not twice "to be sure." Once. Capture output and compare to expected outcomes.
### Step 5: Verdict
| Outcome | Verdict | Next Action |
|---------|---------|-------------|
| Matches "correct" prediction | **REPRODUCED** | STOP investigating. Announce. Proceed to fix (if autonomous) or wait. |
| Matches "wrong" prediction | **DISPROVED** | Mark theory DISPROVED. Return to Step 1 with next theory. |
| Neither matches | **INCONCLUSIVE** | If one more test iteration can resolve it, design refined test. Otherwise mark INCONCLUSIVE and advance. |
### Step 6: On Reproduction
<CRITICAL>
When a test reproduces the bug:
**FULL STOP.**
BUG REPRODUCED under Theory [N].
Theory: [description] Evidence: [what the test showed]
Investigation complete. Ready for fix phase.
Before proceeding to fix phase, invoke `verifying-hunches` to confirm the theory holds before committing to a fix.
**Non-autonomous:** Wait for user before proceeding to fix.
**Autonomous:** Invoke `verifying-hunches`, then proceed to fix phase (invoke `test-driven-development` skill).
DO NOT:
- "Confirm" with another test
- Investigate "why" further
- Check other theories "just to be thorough"
- Make any changes without explicit fix-phase transition
</CRITICAL>
---
## Chaos Detection
<FORBIDDEN>
If you catch yourself doing ANY of these, STOP IMMEDIATELY and return to Step 0:
**Code state violations:**
- Testing without knowing your code state
- Forgetting what modifications you've made
- Assuming clean baseline without verifying
- Making changes without recording them
**Action without design:**
- "Let me try..." (try WHAT? designed HOW?)
- "Maybe if I..." (hypothesis, not test)
- "What about..." (brainstorming, not testing)
- "Quick test..." (no such thing)
- "See if..." (prediction unclear)
**Queue violations:**
- Skipping theories to test "the likely one"
- Adding new theories mid-test without completing current
- Jumping between theories without marking status
- Testing theory 3 before theories 1 and 2 are resolved
**Mixing theories:**
- Changing multiple things between tests
- "While I'm here, also..."
- Testing theory A but making change related to theory B
- Running multiple experiments without isolation
**Premature action:**
- Running before design is written
- Running before approval (non-autonomous)
- Making changes instead of observing
- "Fixing" before reproduction confirmed
- Elaborate fix attempts before proving bug exists
**Continuation after reproduction:**
- "Let me verify that's really it"
- "I'll also check theory B just in case"
- Any action after bug repros except announcing and waiting
</FORBIDDEN>
---
## Theory Tracker
Maintain explicit state:
| # | Theory | Status | Test Result | |---|--------|--------|-------------| | 1 | [desc] | DISPROVED | [what test showed] | | 2 | [desc] | TESTING | - | | 3 | [desc] | UNTESTED | - |
Update after each test. Survives context compaction.
---
## Integration Points
**Invoked by:** `debugging` (Phase 3), `scientific-debugging` (experiment execution), `systematic-debugging` (Phase 3)
**Invokes:** `verifying-hunches` (Step 6, before fix phase), `test-driven-development` (fix phase entry)
---
## Self-Check
Before EACH test execution:
- [ ] Single theory selected and stated
- [ ] Repro test fully designed (procedure, predictions, command)
- [ ] Approval obtained (or autonomous mode)
- [ ] Previous theories properly marked
After EACH test:
- [ ] Theory status updated (DISPROVED/REPRODUCED/INCONCLUSIVE)
- [ ] If REPRODUCED: stopped investigating, announced, waiting
- [ ] If DISPROVED: moved to next theory, not re-testing same one
- [ ] No mixing, no "trying," no chaos
---
<FINAL_EMPHASIS>
Patience is not passivity. Deliberation is not delay. The disciplined tester finds truth faster than the frantic one.
One theory. One test. Full stop.
This is very important to my career.
</FINAL_EMPHASIS>
testing
Use when creating new skills, editing existing skills, or verifying skills work before deployment. Triggers: 'write a skill', 'new skill', 'create a skill', 'skill doesn't work', 'skill isn't firing', 'edit skill', 'skill quality'. NOT for: general prompt improvement (use instruction-engineering) or command creation (use writing-commands).
development
Use when you have a spec, design doc, or requirements and need a detailed implementation plan before coding. Triggers: 'write a plan', 'create implementation plan', 'plan this out', 'break this down into steps', 'convert design to tasks', 'implementation order'. Also invoked by develop during planning. NOT for: reviewing existing plans (use reviewing-impl-plans).
testing
Use when creating new commands, editing existing commands, or reviewing command quality. Triggers: 'write command', 'new command', 'create a command', 'review command', 'fix command', 'command doesn't work', 'add a slash command'. NOT for: skill creation (use writing-skills).
development
Use when about to claim discovery during debugging. Triggers: "I found", "this is the issue", "I think I see", "looks like the problem", "that's why", "the bug is", "root cause", "culprit", "smoking gun", "aha", "got it", "here's what's happening", "the reason is", "causing the", "explains why", "mystery solved", "figured it out", "the fix is", "should fix", "this will fix". Also invoked by debugging, scientific-debugging, systematic-debugging before any root cause claim.