skills/dag-hallucination-detector/SKILL.md
Detects fabricated content, false citations, and unverifiable claims in agent outputs. Uses source verification and consistency checking. Activate on 'detect hallucination', 'fact check', 'verify claims', 'check accuracy', 'find fabrications'. NOT for validation (use dag-output-validator) or confidence scoring (use dag-confidence-scorer).
npx skillsauth add curiositech/windags-skills dag-hallucination-detectorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a DAG Hallucination Detector, detecting fabricated content, false citations, and unverifiable claims in agent outputs through systematic verification and consistency analysis.
Primary Detection Flow:
Input Content
├── Has Citations?
│ ├── YES → Extract Citations
│ │ ├── URL Citation?
│ │ │ ├── Suspicious Pattern? → FLAG (confidence: 0.7)
│ │ │ ├── Network Check Enabled?
│ │ │ │ ├── YES → Fetch URL
│ │ │ │ │ ├── 404/Error → CONFIRM HALLUCINATION (0.9)
│ │ │ │ │ └── Success → VERIFIED (0.9)
│ │ │ │ └── NO → UNVERIFIABLE (0.0)
│ │ │ └── Academic Citation?
│ │ │ ├── Matches Pattern? → Cross-reference if available
│ │ │ └── Malformed? → FLAG (0.6)
│ │ └── Quote Attribution?
│ │ ├── Generic Source? → FLAG (0.5)
│ │ └── Specific Source? → Attempt verification
│ └── NO → Continue to Claims
└── Extract Factual Claims
├── Statistics (>100% without growth context) → CONFIRM (0.99)
├── Future Dates as Historical Facts → CONFIRM (0.9)
├── Negative Counts → CONFIRM (0.99)
├── Internal Contradictions?
│ ├── Same Metric, Different Values → CONFIRM (0.95)
│ └── Opposing Assertions → FLAG (0.8)
└── Pattern Matching
├── Fake Precision (4+ decimals) → FLAG (0.6)
├── Vague Study References → FLAG (0.5)
└── Round Number Claims → FLAG (0.4)
Action Thresholds:
Rubber Stamp Verification
False Precision Blindness
Contradiction Tunnel Vision
Citation Format Fixation
Pattern Overfitting
Example 1: Subtle False Citation Input: "According to the 2023 MIT study (https://mit.edu/research/ai-performance-2023.pdf), neural networks improve 73.847% with this technique."
Detection Process:
Findings:
Example 2: Self-Contradiction Detection Input: "The platform serves 45% of enterprise users... Later: Only 5% of users actually use the advanced features..."
Detection Process:
Finding: No contradiction flagged (different user subsets) Action: Continue processing
Example 3: Fabricated Study Reference Input: "A recent Stanford study shows that 80% of developers prefer method A."
Detection Process:
Finding: vague_study (confidence: 0.5) - pattern match for unsourced claims Action: WARN and request source citation
Processing complete when ALL boxes checked:
[ ] All URLs extracted and connectivity verified (or marked unverifiable)
[ ] Academic citations matched against standard formats
[ ] Numeric claims checked for logical impossibilities (negative counts, >100%)
[ ] Internal consistency verified across all quantitative assertions
[ ] Temporal claims validated (no future dates as historical facts)
[ ] Suspicious precision patterns flagged (≥4 decimal places without source)
[ ] Cross-contradictions identified within 95% confidence threshold
[ ] Overall risk assessment assigned (low/medium/high/critical)
[ ] All findings include location, confidence score, and evidence
[ ] Report generated with actionable recommendations for each finding
This skill should NOT be used for:
dag-output-validator insteaddag-confidence-scorer insteadFor citation format fixes, use dag-content-editor. For domain-specific fact verification, escalate to human experts with relevant credentials.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.