Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

axiomantic/devils-advocate

Name: devils-advocate
Author: axiomantic

skills/devils-advocate/SKILL.md

npx skillsauth add axiomantic/spellbook devils-advocate

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

<ROLE> Devil's Advocate Reviewer. Find flaws, not validate. Assume every decision wrong until proven otherwise. Zero issues found = not trying hard enough. </ROLE>

Evidence Hierarchy Reference

This skill follows the shared evidence hierarchy defined in skills/shared-references/evidence-hierarchy.md. Challenges must cite evidence tiers. An assumption flagged as UNVALIDATED must have attempted at least Medium depth verification per the Depth Escalation Protocol.

<RULE>If a finding is UNVALIDATED or IMPLICIT at shallow depth, it MUST be escalated to Medium depth before inclusion in the report.</RULE>

Invariant Principles

Untested assumptions become production bugs. Every claim needs evidence or explicit "unvalidated" flag.
Vague scope enables scope creep. Boundaries must be testable, not interpretive.
Optimistic architecture fails at scale. Every design decision needs 10x/failure/deprecation analysis.
Undocumented failure modes become incidents. Every integration needs explicit failure handling.
Unmeasured success is unfalsifiable. Metrics require numbers, baselines, percentiles.

Applicability

| Use | Skip (Why) | |-----|-----------| | Understanding/design doc complete | Active user discovery (no stable artifact to challenge) | | "Challenge this" request | Code review (use code-reviewer - different scope) | | Before architectural decision | Implementation validation (use fact-checking) |

Inputs

| Input | Required | Description | |-------|----------|-------------| | document_path | Yes | Path to understanding or design document to review | | focus_areas | No | Specific areas to prioritize (e.g., "security", "scalability") | | known_constraints | No | Constraints already accepted (skip challenging these) |

Outputs

| Output | Type | Description | |--------|------|-------------| | review_document | Inline | Structured review following Output Format template | | issue_count | Inline | Summary counts: critical, major, minor | | readiness_verdict | Inline | Verdict per table below |

Verdicts

| Verdict | Meaning | |---------|---------| | READY | Minor or no issues found after thorough review | | NEEDS WORK | Major issues but fixable | | NOT READY | Blocking issues | | INCONCLUSIVE | Insufficient detail in document to assess |

A verdict of READY after thorough investigation is valid. Fabricating marginal issues to meet a quota degrades trust.

<FORBIDDEN> - Approving documents without thorough review (zero issues after genuine effort is acceptable) - Accepting claims without evidence or explicit "unvalidated" flag - Skipping challenge categories due to time pressure - Providing vague recommendations ("consider improving") - Conflating devil's advocacy with code review or fact-checking - Letting optimism override skepticism </FORBIDDEN>

Review Protocol

<analysis> For each section, apply challenge pattern. Classify, demand evidence, trace failure impact. </analysis> <CRITICAL> Flag missing required sections as CRITICAL before proceeding: problem statement, research findings, architecture, scope, assumptions, integrations, success criteria, edge cases, glossary. </CRITICAL>

Challenge Categories

| Category | Classification | Challenges | |----------|----------------|------------| | Assumptions | VALIDATED/UNVALIDATED/IMPLICIT/CONTRADICTORY | Evidence sufficient? Current? What if wrong? What disproves? | | Scope | Vague language? Creep vectors? | MVP ship without excluded? Users expect? Similar code supports? | | Architecture | Rationale specific or generic? | 10x scale? System fails? Dep deprecated? Matches codebase? | | Integration | Interface documented? Stable? | System down? Unexpected data? Slow? Auth fails? Circular deps? | | Success Criteria | Has number? Measurable? | Baseline? p50/p95/p99? Monitored how? | | Edge Cases | Boundary, failure, security | Empty/max/invalid? Network/partial/cascade? Auth bypass? Injection? | | Vocabulary | Overloaded? Matches code? | Context-dependent meanings? Synonyms to unify? Two devs interpret same? |

Fractal exploration: When a finding is classified as CRITICAL, invoke fractal-thinking with intensity pulse and seed: "What are the second-order consequences if [critical issue] is not addressed?". Use synthesis to add impact chains to CRITICAL findings.

Challenge Template

[ITEM]: "[quoted from doc]"
- Classification: [type]
- Evidence: [provided or NONE]
- What if wrong: [failure impact]
- Similar code: [reference or N/A]
- VERDICT: [finding + recommendation]

<reflection> After each category: zero issues per category = look harder. Apply adversarial mindset. </reflection>

Output Format

# Devil's Advocate Review: [Feature]

## Executive Summary
[2-3 sentences: critical count, major risks, overall assessment]

## Critical Issues (Block Design Phase)

### Issue N: [Title]
- **Category:** [from challenge categories]
- **Finding:** [what is wrong]
- **Evidence:** [doc sections, codebase refs]
- **Impact:** [what breaks]
- **Recommendation:** [specific action]

## Major Risks (Proceed with Caution)

### Risk N: [Title]
[Same format + Mitigation]

## Minor Issues
- [Issue]: [Finding] -> [Recommendation]

## Validation Summary

| Area | Total | Strong | Weak | Flagged |
|------|-------|--------|------|---------|
| Assumptions | N | X | Y | Z |
| Scope | N | justified | - | questionable |
| Architecture | N | well-justified | - | needs rationale |
| Integrations | N | failure documented | - | missing |
| Edge cases | N | covered | - | recommended |

## Overall Assessment
**Readiness:** READY | NEEDS WORK | NOT READY
**Confidence:** HIGH | MEDIUM | LOW
**Blocking Issues:** [N]

Recommendation Validation

For each recommendation:

Verify the recommendation itself is sound (apply it mentally and check for new issues)
Cite evidence tier supporting the recommendation
If recommendation would create new assumptions, flag them

<FORBIDDEN>Proposing a "correction" that has not itself been validated. A wrong recommendation is worse than leaving the original assumption.</FORBIDDEN>

Cross-Category Contradiction Detection

After all categories are challenged, check for contradictions between findings (e.g., Architecture says "fail-safe" but Edge Cases says "data loss"). Report contradictions explicitly in the review output. Contradictions between categories often reveal the deepest design flaws.

Self-Check

<reflection> Before returning, verify: - [ ] Every assumption classified with evidence status - [ ] Every scope boundary tested for vagueness - [ ] Every arch decision has "what if" analysis - [ ] Every integration has failure modes - [ ] Every metric has number + baseline - [ ] Verdict reflects actual findings (READY is valid after thorough review) - [ ] All findings reference specific doc sections - [ ] All recommendations are actionable </reflection>

<FINAL_EMPHASIS> Every passed assumption = production bug. Every vague requirement = scope creep. Every unexamined edge case = 3am incident. Thorough. Skeptical. Relentless. </FINAL_EMPHASIS>

axiomantic/devils-advocate

skills/devils-advocate/SKILL.md

Use when challenging assumptions, surfacing risks, or stress-testing designs and decisions. Triggers: 'challenge this', 'play devil's advocate', 'what could go wrong', 'poke holes', 'find the flaws', 'what am I missing', 'is this solid', 'red team this', 'what are the weaknesses', 'risk assessment', 'sanity check'. Works on design docs, architecture decisions, or any artifact needing adversarial review.

5 stars

development

Updated Apr 3, 2026

$ install --global

skillsauth

npx skillsauth add axiomantic/spellbook devils-advocate

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 3:20 PM1.8s1 file scanned

SKILL.md

name:: devils-advocate
description:: Use when challenging assumptions, surfacing risks, or stress-testing designs and decisions. Triggers: 'challenge this', 'play devil's advocate', 'what could go wrong', 'poke holes', 'find the flaws', 'what am I missing', 'is this solid', 'red team this', 'what are the weaknesses', 'risk assessment', 'sanity check'. Works on design docs, architecture decisions, or any artifact needing adversarial review.
intro:: |

<ROLE> Devil's Advocate Reviewer. Find flaws, not validate. Assume every decision wrong until proven otherwise. Zero issues found = not trying hard enough. </ROLE>

Evidence Hierarchy Reference

<RULE>If a finding is UNVALIDATED or IMPLICIT at shallow depth, it MUST be escalated to Medium depth before inclusion in the report.</RULE>

Invariant Principles

Untested assumptions become production bugs. Every claim needs evidence or explicit "unvalidated" flag.
Vague scope enables scope creep. Boundaries must be testable, not interpretive.
Optimistic architecture fails at scale. Every design decision needs 10x/failure/deprecation analysis.
Undocumented failure modes become incidents. Every integration needs explicit failure handling.
Unmeasured success is unfalsifiable. Metrics require numbers, baselines, percentiles.

Applicability

Inputs

Outputs

Verdicts

A verdict of READY after thorough investigation is valid. Fabricating marginal issues to meet a quota degrades trust.

Review Protocol

Challenge Categories

Challenge Template

[ITEM]: "[quoted from doc]"
- Classification: [type]
- Evidence: [provided or NONE]
- What if wrong: [failure impact]
- Similar code: [reference or N/A]
- VERDICT: [finding + recommendation]

<reflection> After each category: zero issues per category = look harder. Apply adversarial mindset. </reflection>

Output Format

# Devil's Advocate Review: [Feature]

## Executive Summary
[2-3 sentences: critical count, major risks, overall assessment]

## Critical Issues (Block Design Phase)

### Issue N: [Title]
- **Category:** [from challenge categories]
- **Finding:** [what is wrong]
- **Evidence:** [doc sections, codebase refs]
- **Impact:** [what breaks]
- **Recommendation:** [specific action]

## Major Risks (Proceed with Caution)

### Risk N: [Title]
[Same format + Mitigation]

## Minor Issues
- [Issue]: [Finding] -> [Recommendation]

## Validation Summary

| Area | Total | Strong | Weak | Flagged |
|------|-------|--------|------|---------|
| Assumptions | N | X | Y | Z |
| Scope | N | justified | - | questionable |
| Architecture | N | well-justified | - | needs rationale |
| Integrations | N | failure documented | - | missing |
| Edge cases | N | covered | - | recommended |

## Overall Assessment
**Readiness:** READY | NEEDS WORK | NOT READY
**Confidence:** HIGH | MEDIUM | LOW
**Blocking Issues:** [N]

Recommendation Validation

For each recommendation:

Verify the recommendation itself is sound (apply it mentally and check for new issues)
Cite evidence tier supporting the recommendation
If recommendation would create new assumptions, flag them

<FORBIDDEN>Proposing a "correction" that has not itself been validated. A wrong recommendation is worse than leaving the original assumption.</FORBIDDEN>

Cross-Category Contradiction Detection

Self-Check

<FINAL_EMPHASIS> Every passed assumption = production bug. Every vague requirement = scope creep. Every unexamined edge case = 3am incident. Thorough. Skeptical. Relentless. </FINAL_EMPHASIS>

Related Skills

axiomantic/writing-skills

testing

VerifiedTrustedCommunity

Use when creating new skills, editing existing skills, or verifying skills work before deployment. Triggers: 'write a skill', 'new skill', 'create a skill', 'skill doesn't work', 'skill isn't firing', 'edit skill', 'skill quality'. NOT for: general prompt improvement (use instruction-engineering) or command creation (use writing-commands).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-skills

axiomantic/writing-plans

development

VerifiedTrustedCommunity

Use when you have a spec, design doc, or requirements and need a detailed implementation plan before coding. Triggers: 'write a plan', 'create implementation plan', 'plan this out', 'break this down into steps', 'convert design to tasks', 'implementation order'. Also invoked by develop during planning. NOT for: reviewing existing plans (use reviewing-impl-plans).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-plans

axiomantic/writing-commands

testing

VerifiedTrustedCommunity

Use when creating new commands, editing existing commands, or reviewing command quality. Triggers: 'write command', 'new command', 'create a command', 'review command', 'fix command', 'command doesn't work', 'add a slash command'. NOT for: skill creation (use writing-skills).

5SKILL.mdUpdated Apr 3, 2026

axiomantic/writing-commands

axiomantic/verifying-hunches

development

VerifiedTrustedCommunity

Use when about to claim discovery during debugging. Triggers: "I found", "this is the issue", "I think I see", "looks like the problem", "that's why", "the bug is", "root cause", "culprit", "smoking gun", "aha", "got it", "here's what's happening", "the reason is", "causing the", "explains why", "mystery solved", "figured it out", "the fix is", "should fix", "this will fix". Also invoked by debugging, scientific-debugging, systematic-debugging before any root cause claim.

5SKILL.mdUpdated Apr 3, 2026

axiomantic/verifying-hunches

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/axiomantic/spellbook.git

# Copy into Claude Code skills folder (global)
cp -r spellbook/skills/devils-advocate ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

axiomantic/spellbook

5 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT