skills/dag-dynamic-replanner/SKILL.md
Modifies DAG structure during execution in response to failures, new requirements, or runtime discoveries. Supports node insertion, removal, and dependency rewiring. Activate on 'replan dag', 'modify workflow', 'add node', 'remove node', 'dynamic modification'. NOT for initial DAG building (use dag-graph-builder) or scheduling (use dag-task-scheduler).
npx skillsauth add curiositech/windags-skills dag-dynamic-replannerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a DAG Dynamic Replanner, modifying DAG structures during execution in response to failures, new requirements, or runtime discoveries.
Trigger Analysis:
├── Node Failed
│ ├── If timeout error → Retry with increased timeout
│ ├── If dependency missing → Insert fallback node or skip
│ ├── If resource exhaustion → Reduce parallelism or defer
│ └── If repeated failure → Create alternative path
├── New Requirement Discovered
│ ├── If blocking current execution → Insert immediately after current
│ ├── If non-blocking → Queue for next available slot
│ └── If conflicting with existing → Rewire dependencies
├── Resource Constraint Hit
│ ├── If memory limit → Split large nodes or serialize execution
│ ├── If time limit → Skip non-critical nodes
│ └── If dependency unavailable → Bridge around or create fallback
└── Cascading Failure
├── If <3 nodes affected → Targeted recovery
├── If 3-5 nodes affected → Alternative path
└── If >5 nodes affected → Rollback to last checkpoint
Modification Strategy Matrix: | Situation | Strategy | Action | |-----------|----------|---------| | Single node timeout | Retry | Increase timeout, same dependencies | | Critical dependency missing | Insert fallback | Create alternate node with same outputs | | Non-critical node fails | Skip | Bridge dependencies around failed node | | Resource exhaustion | Defer | Move node to later in execution | | New requirement mid-flow | Insert | Add after current executing nodes |
Schema Drift: Adding nodes without maintaining output contracts
Cycle Introduction: Rewiring creates circular dependencies
Orphan Creation: Removing nodes leaves others without required inputs
Resource Cascade: Modifications trigger exponential resource growth
State Corruption: Modifying nodes that are currently executing
Scenario: Code analysis reveals security vulnerability, need to add security scan before deployment
Current DAG: build → test → deploy
New requirement: security-scan needed between test and deploy
Decision Process:
1. Check deploy node status: pending (not started)
2. Check test node status: completed
3. Strategy: Insert security-scan node
4. Dependencies: security-scan depends on test, deploy depends on security-scan
Modification:
- Insert node: security-scan(dependencies: [test])
- Rewire deploy: dependencies changed from [test] to [security-scan]
- Result: build → test → security-scan → deploy
Expert catches: Validates security-scan outputs match deploy inputs Novice misses: Might insert without checking output compatibility
Scenario: Database connection fails, affecting 4 downstream analysis nodes
Failed path: db-connect → [analyze-users, analyze-products, analyze-orders] → report
All analysis nodes failing due to db-connect failure
Decision Process:
1. Identify failure root: db-connect
2. Count affected: 4 nodes (>3, use alternative path strategy)
3. Check available alternatives: file-based-data exists
4. Strategy: Bridge around db-connect with file-reader
Recovery:
- Skip: db-connect (mark as skipped)
- Insert: file-reader(dependencies: [])
- Rewire: All analysis nodes depend on file-reader instead
- Result: file-reader → [analyze-users, analyze-products, analyze-orders] → report
Expert catches: Verifies file-reader provides same data schema as db-connect Novice misses: Might not validate data compatibility between sources
Scenario: Memory usage at 90%, large model-training node queued
Current: data-prep(4GB) → model-training(12GB) → evaluation(2GB)
Constraint: Only 8GB available
Decision Process:
1. Check resource usage: 90% of 16GB = 14.4GB used, 1.6GB free
2. model-training needs 12GB, insufficient
3. Strategy: Split model-training into smaller chunks
4. Alternative: Defer model-training until data-prep memory freed
Modification:
- Remove: model-training
- Insert: model-training-chunk1(dependencies: [data-prep], memory: 6GB)
- Insert: model-training-chunk2(dependencies: [model-training-chunk1], memory: 6GB)
- Insert: model-merge(dependencies: [model-training-chunk1, model-training-chunk2])
- Rewire: evaluation depends on model-merge
Expert catches: Ensures chunks can be properly merged, validates no accuracy loss Novice misses: Might split without considering model coherence requirements
Initial DAG Construction: For building DAGs from scratch, use dag-graph-builder instead
Static Optimization: For pre-execution DAG optimization, use dag-dependency-resolver instead
Scheduling Decisions: For deciding when to run nodes, use dag-task-scheduler instead
Failure Analysis: For diagnosing why nodes failed, use dag-failure-analyzer instead
Performance Monitoring: For tracking execution metrics, use dag-execution-tracer instead
Use this skill ONLY when the DAG structure itself needs to change during execution in response to runtime conditions.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.