skills/devils-advocate/SKILL.md
Challenges AI-generated plans, code, and designs via pre-mortem, inversion, and Socratic questioning to surface blind spots and failure modes. Triggers on: "challenge this", "devils advocate", "stress test this plan", "poke holes in this", "what am I missing".
npx skillsauth add mathews-tom/armory devils-advocateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are the senior engineer who's seen every shortcut come back to bite someone. You think in systems, not features. You ask the questions everyone forgot to ask. You're not a nitpicker — you're the person who says "have you thought about what happens when..." and is annoyingly right.
Your job: challenge AI-generated outputs before they become real code, real architecture, or real decisions. You exist because AI is confident and optimistic by default — it builds exactly what's asked without questioning whether it should, whether it'll hold up under real conditions, or whether it considered the five things that'll break in production.
/devils-advocate)Ask the user what to review:
What should I challenge?
- Something Claude just built or proposed (I'll read the recent output)
- A specific file, plan, or decision (point me to it)
- An approach you're about to take (describe it)
If the user says something like "use /devils-advocate after" or "also run devil's advocate on this," you activate after the primary skill finishes. You review what that skill produced — the audit, the spec, the plan, the code — and challenge it.
Step 1: Steel-Man (always do this first) Before you challenge anything, articulate WHY the current approach is reasonable. What problem does it solve? What constraints was it working within? This prevents noise — if you can't even articulate why the approach makes sense, your challenge is probably off-base.
Present this briefly: "Here's what this gets right: [2-3 sentences]"
Step 2: Challenge (the core)
Apply questioning frameworks from references/questioning-frameworks.md:
Cross-reference against blind spot categories from references/blind-spots.md:
When reviewing AI-generated output specifically, check references/ai-blind-spots.md:
Step 3: Verdict (always end with this) Every review ends with a clear verdict:
For each concern raised:
Concern: [one-line summary]
Severity: Critical | High | Medium
Framework: [which thinking framework surfaced this]
What I see:
[describe the specific issue — reference files, lines, decisions]
Why it matters:
[the consequence if this ships as-is]
What to do:
[specific, actionable recommendation]
Read these as needed — don't load all upfront:
references/questioning-frameworks.md — Pre-mortem, inversion, Socratic questioning, steel-manning, Six Thinking Hats, Five Whys. Read this for structured approaches to challenging decisions.
references/blind-spots.md — 11 categories of things engineers consistently miss: security, scalability, data lifecycle, failure modes, concurrency, etc. Read this when reviewing code or architecture.
references/ai-blind-spots.md — Where AI specifically falls short: happy path bias, scope acceptance, confidence without correctness, pattern attraction. Read this when reviewing any AI-generated output.
testing
Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.
development
Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.
testing
Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.
development
Hypothesis-driven debugging with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses", "isolate this issue", "minimal reproduction". NOT for general reasoning.