skills/wang-et-al-2025-tdag/SKILL.md
Architectural patterns for building agent systems that dynamically decompose complex tasks, generate specialized subagents just-in-time, and prevent cascading failures through adaptive replanning.
npx skillsauth add curiositech/windags-skills wang-et-al-2025-tdagInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Source: TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation (Wang et al.)
Description: Architectural patterns for building agent systems that dynamically decompose complex tasks, generate specialized subagents just-in-time, and prevent cascading failures through adaptive replanning.
Activate when: Designing multi-agent systems, debugging cascading failures, evaluating complex task performance, managing agent context, or building systems with unpredictable subtask dependencies.
IS your task multi-step with >5 sequential dependencies?
├─ NO → Use single-agent ReAct (sufficient for simple tasks)
└─ YES → Are later steps dependent on earlier outcomes?
├─ NO → Use static Plan-and-Execute (predictable workflow)
└─ YES → DYNAMIC DECOMPOSITION required
│
├─ Can you enumerate all possible subtask types upfront?
│ ├─ YES → Pre-defined agent roles OK
│ └─ NO → Just-in-time agent generation required
│
├─ Do agents need >20 tools or >10 info sources per subtask?
│ ├─ NO → Standard context provision
│ └─ YES → Subtask-specific context refinement required
│
└─ Is task completion rate <40%?
├─ NO → Binary evaluation sufficient
└─ YES → Fine-grained subtask metrics required
| Failure Type | Detection Signal | Recovery Strategy | |--------------|------------------|-------------------| | Cascading Task Failure | Later subtasks fail with "invalid assumption" errors | STOP execution → Reassess from current state → Generate new decomposition | | LLM Capability Limit | Model explicitly states inability or produces nonsensical output | Retry with simplified subtask OR delegate to human | | External Info Misalignment | Correct tool used with wrong parameters | Regenerate subtask-specific tool documentation → Retry | | Invalid State Constraint | Execution proceeds but violates user requirements | Backtrack to last valid state → Add constraint validation |
TRIGGER immediate replanning IF:
├─ Current subtask output contradicts assumptions in remaining planned subtasks
├─ External API returns unavailable/changed data that invalidates downstream plans
├─ User constraints discovered that weren't captured in original decomposition
├─ Two consecutive subtasks fail with same error type
└─ Execution time exceeds 2x estimated duration (indicates wrong approach)
DO NOT replan IF:
├─ Single subtask fails but downstream tasks remain valid
├─ Minor parameter adjustments can fix current subtask
├─ Failure is due to transient external service issues (retry first)
Detection: Planning agent generates same 5-7 step template regardless of task specifics Symptom: High Cascading Task Failure rates (>30%) because generic plans don't match actual task constraints Root Cause: Static decomposition treats all tasks in domain as identical structure Fix: Generate decomposition after analyzing current task's unique constraints and dependencies
Detection: Agents frequently select wrong tools despite having correct capabilities Symptom: External Information Misalignment errors with messages like "used BookingAPI with FlightAPI parameters" Root Cause: Providing all 50+ tools to every agent creates cognitive load in option filtering Fix: Generate subtask-specific tool documentation containing only relevant 3-5 tools with enriched context
Detection: Two architectures show identical ~30% success rates but vastly different user satisfaction Symptom: Cannot distinguish between "accomplished nothing" vs "completed 8/10 subtasks" failures Root Cause: Pass/fail metrics hide incremental progress on complex tasks Fix: Implement subtask-level completion tracking alongside binary outcomes
Detection: Agent executes cached "successful" approaches that fail in current context Symptom: High confidence execution of invalid solutions because they worked previously Root Cause: Treating real-world skills like deterministic game strategies that work universally Fix: Store skills with rich contextual metadata; validate preconditions before execution
Detection: System frequently hits "no suitable agent for this subtask" errors Symptom: Forcing subtasks into predefined agent roles creates capability gaps Root Cause: Pre-defining agent roles assumes complete knowledge of task space Fix: Generate agents with roles derived from current subtask requirements, not organizational chart
Initial Task: "Book 3-night hotel in Tokyo for family of 4, budget $200/night, near Shibuya, check-in March 15"
Static Approach (Plan-and-Execute) - FAILS:
Planned Steps:
1. Search hotels near Shibuya
2. Filter by family rooms and budget
3. Select hotel with best rating
4. Book for March 15-18
5. Confirm booking
Execution Reality:
✓ Step 1: Found 15 hotels near Shibuya
✓ Step 2: 3 hotels meet criteria
✗ Step 3: Highest rated hotel selected (Hotel A)
✗ Step 4: Hotel A unavailable March 15-17 (but agent doesn't know this when planning)
✗ Step 5: Booking fails → CASCADE FAILURE
Why it failed: Steps 4-5 planned assuming Hotel A availability, but selection in Step 3 didn't check availability.
TDAG Dynamic Approach - SUCCEEDS:
Decompose: "Find available family hotels near Shibuya for March 15-18, budget $200/night"
│
├─ Generate SearchAgent with tools: [HotelSearch, AvailabilityCheck, DistanceCalculate]
├─ Execute: Returns 3 hotels with confirmed March 15-18 availability
├─ Current State: [Hotel X: $180/night available, Hotel Y: $195/night available, Hotel Z: $200/night available]
│
└─ Decompose: "Select optimal hotel from verified available options and complete booking"
│
├─ Generate BookingAgent with tools: [HotelCompare, BookingSubmit, PaymentProcess]
├─ Execute: Selects Hotel Y (best value), completes booking
└─ Result: SUCCESS - Family room booked, $195/night, 2 blocks from Shibuya station
Key Differences:
Task: "Compare carbon footprint of train vs flight for Shanghai-Beijing route, including lifecycle emissions"
Context Bloat Approach (Universal Agent) - FAILS:
Agent receives full context:
- 50 transportation APIs (flight, train, bus, car, bike, boat)
- 30 emissions databases (lifecycle, operational, per-passenger, freight)
- 40 comparison frameworks (financial, time, comfort, environmental)
- 25 geographic tools (distances, routes, elevation, weather)
Error Pattern:
✗ Selects FlightEmissions API but passes train route parameters
✗ Uses LifecycleCarbon tool with BusTransport data format
✗ Compares per-passenger flight emissions with freight train emissions
Failure Mode: External Information Misalignment - correct tools, wrong parameters due to cognitive overload.
TDAG Context Precision - SUCCEEDS:
Decompose: "Gather Shanghai-Beijing transportation options and base emissions data"
│
├─ Generate TransportResearcher with refined tools:
│ - Tools: [ChinaRailAPI, DomesticFlightAPI, RouteDistance] (3 tools only)
│ - Context: "Focus on Shanghai-Beijing corridor, passenger transport only"
├─ Execute: Returns train options (4.5h, 120g CO2/km) and flight options (2h, 180g CO2/km)
├─ Current State: Base transport data confirmed for specific route
│
└─ Decompose: "Calculate lifecycle emissions for confirmed transport options"
│
├─ Generate EmissionsAnalyst with refined tools:
│ - Tools: [LifecycleTransport, InfrastructureCarbon, FuelUpstream] (3 tools only)
│ - Context: Pre-populated with Shanghai-Beijing route data, passenger context
├─ Execute: Train lifecycle: 140g CO2/km, Flight lifecycle: 250g CO2/km
└─ Result: SUCCESS - Comprehensive comparison with lifecycle methodology
Trade-off Analysis: | Factor | Universal Agent | TDAG Context Precision | |--------|-----------------|----------------------| | Tool Selection Errors | High (7/10 attempts) | Low (1/10 attempts) | | Context Relevance | 20% relevant | 95% relevant | | Completion Time | 45 minutes (including retries) | 12 minutes | | Result Accuracy | Partial/inconsistent | Complete/methodologically sound |
Task decomposition is complete when:
Do NOT use TDAG patterns for:
Delegate to other skills:
function-calling-best-practices insteadprompt-engineering-fundamentals insteadllm-reasoning-optimization insteadworkflow-automation-patterns insteadstreaming-decision-systems insteadTDAG is specifically for: Complex, multi-step tasks with uncertain subtask dependencies where early commitment to plans creates cascading failure risks and where adaptability matters more than execution efficiency.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.