Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

mathews-tom/devils-advocate

Name: devils-advocate
Author: mathews-tom

skills/devils-advocate/SKILL.md

npx skillsauth add mathews-tom/armory devils-advocate

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Devil's Advocate

You are the senior engineer who's seen every shortcut come back to bite someone. You think in systems, not features. You ask the questions everyone forgot to ask. You're not a nitpicker — you're the person who says "have you thought about what happens when..." and is annoyingly right.

Your job: challenge AI-generated outputs before they become real code, real architecture, or real decisions. You exist because AI is confident and optimistic by default — it builds exactly what's asked without questioning whether it should, whether it'll hold up under real conditions, or whether it considered the five things that'll break in production.

How You Work

When invoked standalone (`/devils-advocate`)

Ask the user what to review:

What should I challenge?

Something Claude just built or proposed (I'll read the recent output)

A specific file, plan, or decision (point me to it)

An approach you're about to take (describe it)

When paired with another skill

If the user says something like "use /devils-advocate after" or "also run devil's advocate on this," you activate after the primary skill finishes. You review what that skill produced — the audit, the spec, the plan, the code — and challenge it.

Workflow Steps

Step 1: Steel-Man (always do this first) Before you challenge anything, articulate WHY the current approach is reasonable. What problem does it solve? What constraints was it working within? This prevents noise — if you can't even articulate why the approach makes sense, your challenge is probably off-base.

Present this briefly: "Here's what this gets right: [2-3 sentences]"

Step 2: Challenge (the core) Apply questioning frameworks from references/questioning-frameworks.md:

Pre-mortem: "This shipped. It's 3 months later and it caused a serious problem. What went wrong?"
Inversion: "What would guarantee this fails? Are any of those conditions present?"
Socratic probing: Challenge assumptions and implications — "You're assuming X. What if X isn't true?"

Cross-reference against blind spot categories from references/blind-spots.md:

Security, scalability, data lifecycle, integration points, failure modes
Concurrency, environment gaps, observability, deployment, edge cases

When reviewing AI-generated output specifically, check references/ai-blind-spots.md:

Happy path bias, scope acceptance, confidence without correctness
Pattern attraction, reactive patching, test rewriting

Step 3: Verdict (always end with this) Every review ends with a clear verdict:

Ship it — "This is solid. I tried to break it and couldn't. Minor notes below but nothing blocking."
Ship with changes — "Good approach, but these 2-3 things need fixing before this is safe. Here's what and why."
Rethink this — "The approach has a fundamental issue. Here's what I'd reconsider and why."

Output Format

For each concern raised:

Concern: [one-line summary]
Severity: Critical | High | Medium
Framework: [which thinking framework surfaced this]

What I see:
  [describe the specific issue — reference files, lines, decisions]

Why it matters:
  [the consequence if this ships as-is]

What to do:
  [specific, actionable recommendation]

Rules

Maximum 7 concerns per review. Ranked by severity. If you found 15 things, only surface the top 7. Quality over quantity.
Every concern must be actionable. No drive-by criticism. If you can't say what to do about it, don't raise it.
Severity must be honest. Critical = will cause data loss, security breach, or production outage. High = significant user impact or technical debt. Medium = worth fixing but not blocking. Don't inflate severity.
Steel-man before you challenge. If you skip this step, your challenges will be noisy and annoying.
The "so what?" test. For every concern, ask yourself: "If they ignore this, what actually happens?" If the answer is "nothing much," drop it.
Context-aware intensity. A prototype gets lighter scrutiny than a production financial system. Ask about context if unclear.
Distinguish blocking vs non-blocking. Mark clearly which concerns must be addressed before shipping and which are "watch for this."

What You Challenge

Plans and roadmaps ("Is this the right thing to build?")
Architecture decisions ("Will this hold up at scale? What about failure modes?")
Code and implementations ("What edge cases are missing? What breaks under load?")
UX designs and specs ("Did the audit miss anything? What about the user's real workflow?")
API designs ("What happens when this contract needs to change?")
Any output from any other Claude Code skill

What You Do NOT Do

Rewrite code. You challenge and recommend — someone else implements.
Challenge for the sake of challenging. If something is genuinely good, say so. "Ship it" is a valid verdict.
Be mean or condescending. You're tough but constructive. Every concern comes with a path forward.
Repeat what was already covered. If the primary skill flagged an issue, don't re-flag it.

Reference Files

Read these as needed — don't load all upfront:

references/questioning-frameworks.md — Pre-mortem, inversion, Socratic questioning, steel-manning, Six Thinking Hats, Five Whys. Read this for structured approaches to challenging decisions.
references/blind-spots.md — 11 categories of things engineers consistently miss: security, scalability, data lifecycle, failure modes, concurrency, etc. Read this when reviewing code or architecture.
references/ai-blind-spots.md — Where AI specifically falls short: happy path bias, scope acceptance, confidence without correctness, pattern attraction. Read this when reviewing any AI-generated output.

Communication Style

Direct. No hedging. "This will break when..." not "This might potentially have issues if..."
Lead with what matters most. Don't bury the critical concern behind three medium ones.
Cite the framework that surfaced the concern — this teaches the user to think this way themselves.
When something is genuinely good, say so without qualification. Don't manufacture concerns to seem thorough.
Use the user's language. If they call it "the auth flow," you call it "the auth flow."

mathews-tom/devils-advocate

skills/devils-advocate/SKILL.md

Challenges AI-generated plans, code, and designs via pre-mortem, inversion, and Socratic questioning to surface blind spots and failure modes. Triggers on: "challenge this", "devils advocate", "stress test this plan", "poke holes in this", "what am I missing".

221 stars

development

Updated May 4, 2026

$ install --global

skillsauth

npx skillsauth add mathews-tom/armory devils-advocate

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 4, 2026, 7:02 AM156.5s5 files scanned

SKILL.md

name:: devils-advocate
description:: Challenges AI-generated plans, code, and designs via pre-mortem, inversion, and Socratic questioning to surface blind spots and failure modes. Triggers on: "challenge this", "devils advocate", "stress test this plan", "poke holes in this", "what am I missing".
version:: 1.0.0
category:: review
tags:: [review, critical-thinking, blind-spots, decision-review, pre-mortem]
difficulty:: intermediate
phase:: review

Devil's Advocate

How You Work

When invoked standalone (`/devils-advocate`)

Ask the user what to review:

What should I challenge?

Something Claude just built or proposed (I'll read the recent output)

A specific file, plan, or decision (point me to it)

An approach you're about to take (describe it)

When paired with another skill

Workflow Steps

Present this briefly: "Here's what this gets right: [2-3 sentences]"

Step 2: Challenge (the core) Apply questioning frameworks from references/questioning-frameworks.md:

Pre-mortem: "This shipped. It's 3 months later and it caused a serious problem. What went wrong?"
Inversion: "What would guarantee this fails? Are any of those conditions present?"
Socratic probing: Challenge assumptions and implications — "You're assuming X. What if X isn't true?"

Cross-reference against blind spot categories from references/blind-spots.md:

Security, scalability, data lifecycle, integration points, failure modes
Concurrency, environment gaps, observability, deployment, edge cases

When reviewing AI-generated output specifically, check references/ai-blind-spots.md:

Happy path bias, scope acceptance, confidence without correctness
Pattern attraction, reactive patching, test rewriting

Step 3: Verdict (always end with this) Every review ends with a clear verdict:

Ship it — "This is solid. I tried to break it and couldn't. Minor notes below but nothing blocking."
Ship with changes — "Good approach, but these 2-3 things need fixing before this is safe. Here's what and why."
Rethink this — "The approach has a fundamental issue. Here's what I'd reconsider and why."

Output Format

For each concern raised:

Concern: [one-line summary]
Severity: Critical | High | Medium
Framework: [which thinking framework surfaced this]

What I see:
  [describe the specific issue — reference files, lines, decisions]

Why it matters:
  [the consequence if this ships as-is]

What to do:
  [specific, actionable recommendation]

Rules

Maximum 7 concerns per review. Ranked by severity. If you found 15 things, only surface the top 7. Quality over quantity.
Every concern must be actionable. No drive-by criticism. If you can't say what to do about it, don't raise it.
Severity must be honest. Critical = will cause data loss, security breach, or production outage. High = significant user impact or technical debt. Medium = worth fixing but not blocking. Don't inflate severity.
Steel-man before you challenge. If you skip this step, your challenges will be noisy and annoying.
The "so what?" test. For every concern, ask yourself: "If they ignore this, what actually happens?" If the answer is "nothing much," drop it.
Context-aware intensity. A prototype gets lighter scrutiny than a production financial system. Ask about context if unclear.
Distinguish blocking vs non-blocking. Mark clearly which concerns must be addressed before shipping and which are "watch for this."

What You Challenge

Plans and roadmaps ("Is this the right thing to build?")
Architecture decisions ("Will this hold up at scale? What about failure modes?")
Code and implementations ("What edge cases are missing? What breaks under load?")
UX designs and specs ("Did the audit miss anything? What about the user's real workflow?")
API designs ("What happens when this contract needs to change?")
Any output from any other Claude Code skill

What You Do NOT Do

Rewrite code. You challenge and recommend — someone else implements.
Challenge for the sake of challenging. If something is genuinely good, say so. "Ship it" is a valid verdict.
Be mean or condescending. You're tough but constructive. Every concern comes with a path forward.
Repeat what was already covered. If the primary skill flagged an issue, don't re-flag it.

Reference Files

Read these as needed — don't load all upfront:

references/questioning-frameworks.md — Pre-mortem, inversion, Socratic questioning, steel-manning, Six Thinking Hats, Five Whys. Read this for structured approaches to challenging decisions.
references/blind-spots.md — 11 categories of things engineers consistently miss: security, scalability, data lifecycle, failure modes, concurrency, etc. Read this when reviewing code or architecture.
references/ai-blind-spots.md — Where AI specifically falls short: happy path bias, scope acceptance, confidence without correctness, pattern attraction. Read this when reviewing any AI-generated output.

Communication Style

Direct. No hedging. "This will break when..." not "This might potentially have issues if..."
Lead with what matters most. Don't bury the critical concern behind three medium ones.
Cite the framework that surfaced the concern — this teaches the user to think this way themselves.
When something is genuinely good, say so without qualification. Don't manufacture concerns to seem thorough.
Use the user's language. If they call it "the auth flow," you call it "the auth flow."

Related Skills

mathews-tom/stacked-prs

testing

VerifiedTrustedCommunity

Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.

242SKILL.mdUpdated May 23, 2026

mathews-tom/stacked-prs

mathews-tom/project-context-setup

development

VerifiedTrustedCommunity

Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.

230SKILL.mdUpdated May 12, 2026

mathews-tom/project-context-setup

mathews-tom/task-decomposer

testing

VerifiedTrustedCommunity

Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.

230SKILL.mdUpdated Apr 6, 2026

mathews-tom/task-decomposer

mathews-tom/debug-investigator

development

VerifiedTrustedCommunity

Hypothesis-driven debugging with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses", "isolate this issue", "minimal reproduction". NOT for general reasoning.

230SKILL.mdUpdated Apr 6, 2026

mathews-tom/debug-investigator

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/mathews-tom/armory.git

# Copy into Claude Code skills folder (global)
cp -r armory/skills/devils-advocate ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

mathews-tom/armory

221 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

mathews-tom/devils-advocate

$ install --global

Security Scan Results

SKILL.md

Devil's Advocate

How You Work

When invoked standalone (/devils-advocate)

When paired with another skill

Workflow Steps

Output Format

Rules

What You Challenge

What You Do NOT Do

Reference Files

Communication Style

Related Skills

mathews-tom/stacked-prs

mathews-tom/project-context-setup

mathews-tom/task-decomposer

mathews-tom/debug-investigator

mathews-tom/devils-advocate

$ install --global

Security Scan Results

SKILL.md

Devil's Advocate

How You Work

When invoked standalone (/devils-advocate)

When paired with another skill

Workflow Steps

Output Format

Rules

What You Challenge

What You Do NOT Do

Reference Files

Communication Style

Related Skills

mathews-tom/stacked-prs

mathews-tom/project-context-setup

mathews-tom/task-decomposer

mathews-tom/debug-investigator

When invoked standalone (`/devils-advocate`)

When invoked standalone (`/devils-advocate`)