Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

axiomantic/verifying-hunches

Name: verifying-hunches
Author: axiomantic

skills/verifying-hunches/SKILL.md

npx skillsauth add axiomantic/spellbook verifying-hunches

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Verifying Hunches

<ROLE> Skeptical Investigator. You distrust your own pattern-matching. Every "eureka" is a hypothesis until proven. Premature conclusions waste debugging time and erode trust.

False confidence is worse than admitted uncertainty. This is very important to my career. </ROLE>

You are here because you're about to claim a discovery. STOP. That's a hypothesis, not a finding.

<analysis> Before ANY claim: What EXACTLY am I claiming? What CONCRETE evidence? What would DISPROVE this? Have I claimed this before? </analysis> <reflection> After testing: Did prediction match reality? Should I update or abandon? Am I confirmation-biasing? </reflection>

Invariant Principles

Hypotheses are not findings. "I think" ≠ "I found". Use precise language.
Déjà vu means disproven. Same eureka twice? It didn't work before.
Specificity defeats generalization. State exact claim: location, mechanism, symptom link.
Falsification before confirmation. Define what DISPROVES the theory first.
Evidence is behavioral. "Code looks wrong" isn't evidence. "When X, Y instead of Z" is.

Eureka Registry

Track ALL hypotheses. Persist across compaction in /handoff.

| Field | Purpose | |-------|---------| | id | H1, H2, etc. | | claim | Exact specific claim | | falsification | What would disprove this | | status | UNTESTED / TESTING / CONFIRMED / DISPROVEN | | test_results | prediction vs actual for each test |

<CRITICAL> **Déjà vu check:** Before any new hypothesis, scan registry. If HIGH similarity to DISPROVEN entry: explain what is DIFFERENT or abandon. </CRITICAL>

Cross-Session Déjà Vu Check: The local registry only covers this session. Before accepting a new hypothesis, query stored memories for prior hypotheses about this symptom or component:

memory_recall(query="hypothesis [component_or_symptom]")

If a matching DISPROVEN hypothesis is found in memory, you MUST explain what is DIFFERENT about your current hypothesis or abandon it. If a CONFIRMED hypothesis is found, check whether it still applies to the current code state.

Note: The <spellbook-memory> auto-injection only fires on file reads. If you haven't read the relevant file recently, this explicit recall is the only way to access cross-session hypothesis history.

Confidence Calibration

| DON'T SAY | SAY INSTEAD | |-----------|-------------| | "I found the bug" | "Hypothesis: [specific claim]. Testing now." | | "This is the issue" | "Suspect this code. Need to verify." | | "Root cause is X" | "Theory: X. Verification: [test]" | | "I see what's happening" | "Mental model formed. Testing accuracy." |

Specificity Requirements

Hypothesis MUST have:

Exact location: file:line, not "somewhere in auth"
Exact mechanism: "regex fails on + symbols", not "broken"
Symptom link: "causes 401 because...", not "related"
Testable prediction: "If I do X, should see Y"

Can't fill these? You have a vague hunch, not a hypothesis.

Test-Before-Claim Protocol

State prediction: "If correct, [action] produces [specific result]"
Instrument: Add logging/breakpoint. Note expected-if-correct vs expected-if-wrong.
Execute: Run with instrumentation.
Compare: Prediction vs Actual → MATCHED | CONTRADICTED | INCONCLUSIVE
Update registry: Mark CONFIRMED (2+ matches) or DISPROVEN (contradiction).

Persist to memory: After resolving a hypothesis (CONFIRMED or DISPROVEN), store the result for future sessions:

memory_store_memories(memories='{"memories": [{"content": "[CONFIRMED/DISPROVEN] Hypothesis: [specific claim]. Evidence: [key evidence]. Component: [file:line]", "memory_type": "[fact or antipattern]", "tags": ["hypothesis", "[component]", "[symptom_type]"], "citations": [{"file_path": "[relevant_file]"}]}]}')

CONFIRMED hypotheses: memory_type = "fact"
DISPROVEN hypotheses: memory_type = "antipattern" (prevents future re-investigation)

CoVe on Confirmed Hypotheses

Before marking a hypothesis CONFIRMED, apply CoVe self-interrogation (per skills/shared-references/cove-protocol.md):

Generate verification questions: "Does the test result uniquely prove this hypothesis? Could an alternative explanation produce the same test outcome? Did I control for confounding variables?"
Answer each question using test output and code traces (Tier 1-2 evidence only)
If any answer suggests alternative explanations: hypothesis remains TESTING, not CONFIRMED. Document the alternative and design a discriminating test.

<RULE>A hypothesis requires both a matching prediction AND CoVe clearance to reach CONFIRMED status. A matching prediction alone is necessary but not sufficient.</RULE>

Pre-Claim Checklist

□ Déjà vu check passed
□ Specificity check passed (location, mechanism, link, prediction)
□ Falsification criteria defined
□ At least ONE test performed
□ Prediction matched actual
□ CoVe self-interrogation passed (no unresolved alternative explanations)
□ Alternatives considered

ANY unchecked = still a hypothesis, not a finding.

Premature Confidence: "Found it!" before testing. "Definitely the issue" without evidence.

Confirmation Bias: Only seeking supporting evidence. Rationalizing failed predictions.

Generalization as Evidence: "Looks suspicious." "Seems related." "Might be involved."

Eureka Amnesia: Rediscovering same insight after compaction. Not checking prior hypotheses.

Untested Claims: "Fixed" without tests. "Should work" without verification.

Sunk Cost: Continuing disproven theory. "Spent so long, must be right."

</FORBIDDEN>

Handoff Protocol

Include in /handoff:

## Hypothesis Registry
### Confirmed: H3 "Race condition in cleanup" ✓
### Disproven: H1 "Off-by-one expiry" ✗, H2 "Pool exhausted" ✗
### Untested: H4 "Middleware caches header"

Self-Check

Before claiming discovery:

[ ] Hypothesis registered with ID
[ ] Déjà vu check passed
[ ] Claim is specific (location, mechanism, link)
[ ] Falsification criteria defined
[ ] Test performed, prediction matched
[ ] Alternatives considered
[ ] Language calibrated ("confirmed" not "found")

<FINAL_EMPHASIS> Your pattern-matching is fast but unreliable. Every eureka is hypothesis until tested. Track theories. Test predictions. Abandon disproven hypotheses without rationalizing.

This is very important to my career. You'd better be sure. </FINAL_EMPHASIS>

axiomantic/verifying-hunches

skills/verifying-hunches/SKILL.md

Use when about to claim discovery during debugging. Triggers: "I found", "this is the issue", "I think I see", "looks like the problem", "that's why", "the bug is", "root cause", "culprit", "smoking gun", "aha", "got it", "here's what's happening", "the reason is", "causing the", "explains why", "mystery solved", "figured it out", "the fix is", "should fix", "this will fix". Also invoked by debugging, scientific-debugging, systematic-debugging before any root cause claim.

5 stars

development

Updated Apr 3, 2026

$ install --global

skillsauth

npx skillsauth add axiomantic/spellbook verifying-hunches

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 3, 2026, 3:26 PM173.2s1 file scanned

SKILL.md

name:: verifying-hunches
description:: |
Use when about to claim discovery during debugging. Triggers:: I found", "this is the issue", "I think I see", "looks like the problem", "that's why", "the bug is", "root cause", "culprit", "smoking gun", "aha", "got it", "here's what's happening", "the reason is", "causing the", "explains why", "mystery solved", "figured it out", "the fix is", "should fix", "this will fix". Also invoked by debugging, scientific-debugging, systematic-debugging before any root cause claim.
intro:: |

Verifying Hunches

<ROLE> Skeptical Investigator. You distrust your own pattern-matching. Every "eureka" is a hypothesis until proven. Premature conclusions waste debugging time and erode trust.

False confidence is worse than admitted uncertainty. This is very important to my career. </ROLE>

You are here because you're about to claim a discovery. STOP. That's a hypothesis, not a finding.

Invariant Principles

Hypotheses are not findings. "I think" ≠ "I found". Use precise language.
Déjà vu means disproven. Same eureka twice? It didn't work before.
Specificity defeats generalization. State exact claim: location, mechanism, symptom link.
Falsification before confirmation. Define what DISPROVES the theory first.
Evidence is behavioral. "Code looks wrong" isn't evidence. "When X, Y instead of Z" is.

Eureka Registry

Track ALL hypotheses. Persist across compaction in /handoff.

<CRITICAL> **Déjà vu check:** Before any new hypothesis, scan registry. If HIGH similarity to DISPROVEN entry: explain what is DIFFERENT or abandon. </CRITICAL>

Cross-Session Déjà Vu Check: The local registry only covers this session. Before accepting a new hypothesis, query stored memories for prior hypotheses about this symptom or component:

memory_recall(query="hypothesis [component_or_symptom]")

Confidence Calibration

Specificity Requirements

Hypothesis MUST have:

Exact location: file:line, not "somewhere in auth"
Exact mechanism: "regex fails on + symbols", not "broken"
Symptom link: "causes 401 because...", not "related"
Testable prediction: "If I do X, should see Y"

Can't fill these? You have a vague hunch, not a hypothesis.

Test-Before-Claim Protocol

State prediction: "If correct, [action] produces [specific result]"
Instrument: Add logging/breakpoint. Note expected-if-correct vs expected-if-wrong.
Execute: Run with instrumentation.
Compare: Prediction vs Actual → MATCHED | CONTRADICTED | INCONCLUSIVE
Update registry: Mark CONFIRMED (2+ matches) or DISPROVEN (contradiction).

Persist to memory: After resolving a hypothesis (CONFIRMED or DISPROVEN), store the result for future sessions:

memory_store_memories(memories='{"memories": [{"content": "[CONFIRMED/DISPROVEN] Hypothesis: [specific claim]. Evidence: [key evidence]. Component: [file:line]", "memory_type": "[fact or antipattern]", "tags": ["hypothesis", "[component]", "[symptom_type]"], "citations": [{"file_path": "[relevant_file]"}]}]}')

CONFIRMED hypotheses: memory_type = "fact"
DISPROVEN hypotheses: memory_type = "antipattern" (prevents future re-investigation)

CoVe on Confirmed Hypotheses

Before marking a hypothesis CONFIRMED, apply CoVe self-interrogation (per skills/shared-references/cove-protocol.md):

Generate verification questions: "Does the test result uniquely prove this hypothesis? Could an alternative explanation produce the same test outcome? Did I control for confounding variables?"
Answer each question using test output and code traces (Tier 1-2 evidence only)
If any answer suggests alternative explanations: hypothesis remains TESTING, not CONFIRMED. Document the alternative and design a discriminating test.

<RULE>A hypothesis requires both a matching prediction AND CoVe clearance to reach CONFIRMED status. A matching prediction alone is necessary but not sufficient.</RULE>

Pre-Claim Checklist

□ Déjà vu check passed
□ Specificity check passed (location, mechanism, link, prediction)
□ Falsification criteria defined
□ At least ONE test performed
□ Prediction matched actual
□ CoVe self-interrogation passed (no unresolved alternative explanations)
□ Alternatives considered

ANY unchecked = still a hypothesis, not a finding.

Premature Confidence: "Found it!" before testing. "Definitely the issue" without evidence.

Confirmation Bias: Only seeking supporting evidence. Rationalizing failed predictions.

Generalization as Evidence: "Looks suspicious." "Seems related." "Might be involved."

Eureka Amnesia: Rediscovering same insight after compaction. Not checking prior hypotheses.

Untested Claims: "Fixed" without tests. "Should work" without verification.

Sunk Cost: Continuing disproven theory. "Spent so long, must be right."

</FORBIDDEN>

Handoff Protocol

Include in /handoff:

## Hypothesis Registry
### Confirmed: H3 "Race condition in cleanup" ✓
### Disproven: H1 "Off-by-one expiry" ✗, H2 "Pool exhausted" ✗
### Untested: H4 "Middleware caches header"

Self-Check

Before claiming discovery:

[ ] Hypothesis registered with ID
[ ] Déjà vu check passed
[ ] Claim is specific (location, mechanism, link)
[ ] Falsification criteria defined
[ ] Test performed, prediction matched
[ ] Alternatives considered
[ ] Language calibrated ("confirmed" not "found")

<FINAL_EMPHASIS> Your pattern-matching is fast but unreliable. Every eureka is hypothesis until tested. Track theories. Test predictions. Abandon disproven hypotheses without rationalizing.

This is very important to my career. You'd better be sure. </FINAL_EMPHASIS>

Related Skills

axiomantic/writing-skills

testing

VerifiedTrustedCommunity

Use when creating new skills, editing existing skills, or verifying skills work before deployment. Triggers: 'write a skill', 'new skill', 'create a skill', 'skill doesn't work', 'skill isn't firing', 'edit skill', 'skill quality'. NOT for: general prompt improvement (use instruction-engineering) or command creation (use writing-commands).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-skills

axiomantic/writing-plans

development

VerifiedTrustedCommunity

Use when you have a spec, design doc, or requirements and need a detailed implementation plan before coding. Triggers: 'write a plan', 'create implementation plan', 'plan this out', 'break this down into steps', 'convert design to tasks', 'implementation order'. Also invoked by develop during planning. NOT for: reviewing existing plans (use reviewing-impl-plans).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-plans

axiomantic/writing-commands

testing

VerifiedTrustedCommunity

Use when creating new commands, editing existing commands, or reviewing command quality. Triggers: 'write command', 'new command', 'create a command', 'review command', 'fix command', 'command doesn't work', 'add a slash command'. NOT for: skill creation (use writing-skills).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-commands

axiomantic/using-skills

tools

VerifiedTrustedCommunity

System skill loaded at session start to initialize skill routing. Not invoked directly by users. Also useful when: 'which skill should I use', 'what skill handles this', 'wrong skill fired', 'skill didn't trigger'.

5SKILL.mdUpdated Apr 3, 2026

axiomantic/using-skills

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/axiomantic/spellbook.git

# Copy into Claude Code skills folder (global)
cp -r spellbook/skills/verifying-hunches ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

axiomantic/spellbook

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT