.agents/skills/skeptic/SKILL.md
Use when the user questions or suspects an agent claim is wrong. Adversarially gathers evidence to verify or refute the claim using the best sources available in the current environment.
npx skillsauth add tkstang/open-agent-toolkit skepticInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Adversarially evaluates a claim made by the agent. Dispatches a Skeptical Evaluator sub-agent (or self-evaluates when unavailable) to find contradicting evidence first, then supporting evidence. Returns an inline verdict with a confidence score and cited sources.
Use when:
Don't use when:
Parse from $ARGUMENTS:
Print this banner once at start:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
/skeptic
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Then print step indicators before beginning work:
[1/5] Identifying claim…[2/5] Scanning environment…[3/5] Checking sub-agent availability…[4/5] Gathering evidence…[5/5] Synthesizing verdict…For long-running operations (large codebase search, multiple web fetches), print a start line and a completion line:
→ Searching codebase for contradicting evidence…
→ Complete. (Contradicting: N · Supporting: N)
Keep it concise; don't print a line for every shell command.
[1/5] Identifying claim…
If $ARGUMENTS is provided, use it as the claim. Otherwise, identify the most recent factual assertion made by the agent in the conversation. State the claim explicitly — evaluation must be anchored to something precise.
If the claim is ambiguous, ask the user to clarify before proceeding:
AskUserQuestionThe prompt should ask: "Which claim should I evaluate? Please describe or quote it."
[2/5] Scanning environment…
Assess available evidence sources based on current environment:
| Environment | Available sources | | --------------------- | ----------------------------------------------------------- | | Claude Code (in repo) | Local files, Grep/Glob, Bash, git history, tests, lockfiles | | Claude Code (general) | Filesystem, Bash, web search if enabled | | Cursor | Filesystem access, web search if enabled | | Codex | Filesystem, web search if enabled | | Claude.ai | Web search if enabled, conversation context only |
Classify the claim type to determine evidence priority:
package.json, installed module source, docs MCP, npm registry[3/5] Checking sub-agent availability…
→ skeptical-evaluator: {available | not resolved} ({reason})
→ Selected: Execution Tier {1|2|3} — {Sub-agent | Self-evaluation (recommended) | Inline evaluation}
Detection logic:
Execution Tier 1 — Sub-agent dispatch (preferred):
Agent tool is available → dispatch with subagent_type: "skeptical-evaluator", resolved from .agents/agents/skeptical-evaluator.md (synced to .claude/agents/)/skeptical-evaluator or natural mention, resolved from .cursor/agents/skeptical-evaluator.md (synced from .agents/agents/)[features] multi_agent = true is enabled in active Codex config. Codex may also auto-select and spawn agents without explicit role pinning.Execution Tier 2 — Self-evaluation (recommended fallback):
→ skeptical-evaluator: not resolved — falling back to Execution Tier 2 self-evaluationExecution Tier 3 — Inline evaluation:
[4/5] Gathering evidence…
The evaluator (or orchestrator in Execution Tier 2/3) receives this context package:
CLAIM: [exact claim]
BASIS: [original reasoning/sources/assumptions]
CLAIM_TYPE: [code_behavior | library_specific | documentation | factual | architectural]
AVAILABLE_SOURCES: [what's accessible]
INSTRUCTION: Adversarial. Disprove first, then note supporting evidence.
Evidence priority by claim type:
package.json → installed module source → docs MCP → npm registryRules:
[5/5] Synthesizing verdict…
Return one of four verdict frames inline. Always attempt a clear lean — only use "inconclusive" when evidence is genuinely ambiguous after thorough search.
Holds up
✅ **This holds up** *(confidence: X%)*
I adversarially reviewed this and still believe it to be accurate.
[Key supporting evidence with citations]
[Any caveats if relevant]
You were right to be skeptical
❌ **You were right to be skeptical** *(confidence: X%)*
The evidence contradicts this claim.
[Contradicting evidence with citations]
[Corrected or more accurate version of the claim]
Nuanced
⚠️ **This is nuanced** *(confidence: X%)*
[What's correct, what's incorrect, and under what conditions]
[Both contradicting and supporting evidence with citations]
Genuinely inconclusive
❓ **I can't determine this with confidence** *(confidence: X%)*
[What was found, what wasn't, and why a clear verdict isn't possible]
[Suggested next steps if applicable]
/skeptic
(evaluates most recent agent assertion)
/skeptic you said React 18 introduced concurrent rendering — verify that
/skeptic the claim that useCallback prevents child re-renders
The skill may also be invoked when the user says things like:
Are you sure about that?
That doesn't sound right — can you check?
Claim is ambiguous:
AskUserQuestion (Claude Code), structured user-input tooling (Codex), or plain text (fallback)No evidence sources available:
Sub-agent dispatch fails:
→ skeptical-evaluator: not resolved (dispatch failed) — falling back to Execution Tier 2documentation
Use when OAT implementation changes and repository reference docs must be synchronized. Updates .oat/repo/reference to match current behavior.
business
Merge multiple analysis artifacts into a single coherent report with provenance tracking. Reads existing artifacts from /deep-research, /analyze, and /compare.
tools
Use when prioritizing backlog work or evaluating a roadmap. Produces value-effort ratings, dependency mapping, and execution recommendations.
documentation
Use when preparing a shipping digest or weekly/biweekly wrap-up summarizing OAT projects and merged PRs over a time window. Reads local summary files and GitHub PR metadata; writes a version-controlled markdown report.