skills/dag-mutation-strategist/SKILL.md
Decides HOW to mutate a DAG when a node fails, quality is below threshold, or new information changes the plan. Selects from mutation strategies (add node, replace agent, fork paths, loop back, downgrade model) based on failure type, cost budget, and execution history. Activate on "DAG failed how to fix", "mutation strategy", "replan on failure", "adaptive DAG", "recovery strategy", "what to do when node fails". NOT for detecting failures (use dag-quality), executing mutations (use dag-planner), or runtime execution (use dag-runtime).
npx skillsauth add curiositech/windags-skills dag-mutation-strategistInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Decides HOW to mutate a DAG when things go wrong. Given a failure diagnosis from dag-quality or dag-ops, selects the optimal recovery strategy.
✅ Use for:
❌ NOT for:
dag-quality)dag-planner to rewire, dag-runtime to execute)flowchart TD
F[Failure detected] --> T{Failure type?}
T -->|Transient: API timeout, rate limit| R[Retry with backoff]
T -->|Model: wrong format, refusal| M{Budget allows upgrade?}
M -->|Yes| U[Replace with stronger model]
M -->|No| P[Rephrase prompt, same model]
T -->|Contract: output schema mismatch| C[Retry with explicit schema in prompt]
T -->|Quality: below threshold| Q{Iteration count?}
Q -->|< max| L[Loop back with feedback from dag-quality]
Q -->|≥ max| E{Challenger skill available?}
E -->|Yes| SW[Swap to challenger skill]
E -->|No| HG[Escalate to human gate]
T -->|Logic: wrong approach entirely| D{Cost of replan vs remaining budget?}
D -->|Replan < 30% of budget| RP[Replan affected subgraph]
D -->|Replan > 30%| HG
T -->|Cascade: upstream caused this| FIX[Fix root cause node first]
FIX --> R
| Strategy | When | Cost | Risk | |----------|------|------|------| | Retry with backoff | Transient failures (timeout, rate limit) | Same as original (~$0.001-0.01) | Low — usually works in 1-3 retries | | Rephrase prompt | Model refused or misunderstood | Same as original | Medium — may not fix the root cause | | Upgrade model | Cheap model failed on complex task | +$0.01-0.10 | Low — stronger model usually succeeds | | Downgrade + simplify | Expensive model failed, budget tight | -$0.01-0.10 | Medium — simpler approach may miss nuance | | Inject schema | Output didn't match contract | Same as original | Low — explicit schema usually works | | Loop with feedback | Quality below threshold | Same as original + eval cost | Medium — may plateau after 2-3 iterations | | Swap skill | Current skill isn't working for this task | Same as original | Medium — challenger may or may not be better | | Fork parallel paths | Ambiguous situation, multiple valid approaches | 2-3x original | Low — pick best result from parallel attempts | | Replan subgraph | Wrong approach for this section | Variable (Sonnet call for replanning) | Medium — new plan may also be wrong | | Insert validator node | Output needs additional checking | +$0.001 | Low — cheap verification step | | Escalate to human | All automated strategies exhausted or too risky | Human time | Zero technical risk — but blocks execution |
Wrong: Retrying every failure with the same prompt and model 10 times. Right: Retry once for transient failures. If it fails twice, the problem isn't transient — change strategy.
Wrong: Spawning an Opus replan call ($0.10) to recover from a Haiku formatting error ($0.001). Right: Just retry with explicit schema injection ($0.001).
Wrong: Fixing Node C when Node A was the root cause. Node C will fail again on the next run. Right: Trace backward through dependencies to find the first node that deviated. Fix that one.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.