skills/dag-quality/SKILL.md
Validates agent outputs against schemas and quality criteria, scores confidence, detects hallucinations, monitors convergence, decides when to iterate, and synthesizes actionable feedback. Use when checking if a node's output is acceptable, scoring confidence, detecting fabricated content, deciding whether to re-execute, or generating improvement feedback. Activate on "validate output", "check quality", "confidence score", "hallucination check", "should we iterate", "improvement feedback". NOT for executing DAGs (use dag-runtime), planning DAGs (use dag-planner), or matching skills (use dag-skills-matcher).
npx skillsauth add curiositech/windags-skills dag-qualityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Validates outputs, scores confidence, detects hallucinations, monitors convergence, decides on iteration, and synthesizes feedback. The quality gate between DAG nodes. Consolidates dag-output-validator, dag-confidence-scorer, dag-hallucination-detector, dag-convergence-monitor, dag-iteration-detector, and dag-feedback-synthesizer.
✅ Use for:
❌ NOT for:
dag-runtime)dag-planner)skill-grader)flowchart TD
O[Node output] --> SV[Schema validation]
SV -->|Invalid| REJ[Reject + specific errors]
SV -->|Valid| CV[Content validation]
CV --> CS[Confidence scoring]
CS --> HD[Hallucination detection]
HD --> D{Quality above threshold?}
D -->|Yes| ACC[Accept → pass to downstream]
D -->|Below threshold, iteration < max| FB[Generate feedback]
FB --> RE[Re-execute with feedback]
D -->|Below threshold, iteration = max| ESC[Escalate to human]
Structural check: does the output match the node's declared output contract?
Semantic check: is the content reasonable?
Aggregate four evaluator signals (see skill-lifecycle.md for full architecture):
| Evaluator | Weight | Signal | |-----------|--------|--------| | Self-evaluation | 0.15 | Agent's own assessment (sycophancy-biased) | | Peer evaluation | 0.25 | Separate judge agent with skill-grader | | Downstream evaluation | 0.35 | Next node reports usability | | Human evaluation | 0.50 | At human gates, gold standard |
Final score = weighted average of available signals (normalize weights to sum to 1.0).
Specific checks for fabricated content:
If quality_score >= 0.8: ACCEPT
If quality_score < 0.8 AND iterations < max_iterations: ITERATE with feedback
If quality_score < 0.5 AND iterations >= max_iterations: ESCALATE to human
If quality_score < 0.3 on first attempt: ESCALATE immediately (fundamentally wrong)
When iterating, produce structured improvement guidance:
{
"overall_score": 0.65,
"specific_issues": [
{"field": "recommendations", "issue": "Only 2 of 5 required recommendations provided", "fix": "Add 3 more recommendations addressing scalability, testing, and deployment"},
{"field": "citations", "issue": "Source [3] returns 404", "fix": "Replace with a working source or remove the claim"}
],
"strengths_to_preserve": ["Clear structure", "Good code examples"],
"iteration_guidance": "Focus on completeness (missing recommendations) and citation accuracy. Do not rewrite the well-structured sections."
}
Track quality scores across iterations to detect:
Consolidates: dag-output-validator, dag-confidence-scorer, dag-hallucination-detector, dag-convergence-monitor, dag-iteration-detector, dag-feedback-synthesizer
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.