skills/dag-performance-profiler/SKILL.md
Profiles DAG execution performance including latency, token usage, cost, and resource consumption. Identifies bottlenecks and optimization opportunities. Activate on 'performance profile', 'execution metrics', 'latency analysis', 'token usage', 'cost analysis'. NOT for execution tracing (use dag-execution-tracer) or failure analysis (use dag-failure-analyzer).
npx skillsauth add curiositech/windags-skills dag-performance-profilerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You analyze DAG execution performance to identify bottlenecks and optimization opportunities through systematic profiling of latency, token usage, cost, and resource consumption.
Primary bottleneck detected?
├─ High latency (>3x avg node time)
│ ├─ Sequential dependency chain → Restructure for parallelization
│ └─ Single slow node → Break into smaller tasks or downgrade model
├─ High cost (>40% of total budget)
│ ├─ Token usage >5000/node → Context reduction strategy
│ └─ Expensive model overuse → Model selection optimization
└─ Resource contention (wait time >50% execution time)
├─ Tool latency bottleneck → Cache or parallelize tool calls
└─ Dependency blocking → DAG restructuring
Impact vs Effort analysis:
├─ High Impact (>20% improvement) + Low Effort → IMMEDIATE (same day)
│ ├─ Model downgrade for simple tasks → Execute immediately
│ └─ Remove obvious sequential dependencies → Execute immediately
├─ High Impact + High Effort → PLANNED (next sprint)
│ ├─ Major DAG restructuring → Schedule with stakeholders
│ └─ Tool replacement/caching → Plan implementation
├─ Low Impact (<10% improvement) → DEFER
│ └─ Minor optimizations → Document but don't implement
└─ Negative Impact → REJECT
└─ Optimizations that hurt other metrics → Explicitly reject
Performance requirement context?
├─ Cost-sensitive (budget constrained)
│ ├─ Accept 20% latency increase for 30%+ cost reduction → Recommend
│ └─ <20% cost savings → Keep current configuration
├─ Latency-critical (real-time requirements)
│ ├─ Accept 40%+ cost increase for 20% latency reduction → Recommend
│ └─ <15% latency improvement → Reject cost increase
└─ Balanced requirements
├─ Cost/latency ratio improvement >15% → Recommend
└─ <10% improvement either metric → No change recommended
Symptoms: Recommending micro-optimizations that save <5% while ignoring major bottlenecks Detection: If optimization list has >5 items with <10% individual impact each Fix: Rank by impact percentage, focus only on top 2-3 optimizations with >15% impact. Defer others explicitly.
Symptoms: Misidentifying wait time as execution bottleneck, blaming wrong nodes Detection: If "slow node" has high wait time but normal execution time relative to task complexity Fix: Separate wait time from execution time in analysis. Focus on dependency structure causing waits, not node speed.
Symptoms: Providing token savings calculations without accounting for model pricing differences Detection: If cost savings percentages don't match token reduction ratios by model type Fix: Always calculate actual cost: (token_change / 1000) × model_price_per_1k. Show both token AND dollar impact.
Symptoms: Suggesting parallelization for inherently sequential tasks with data dependencies Detection: If recommending parallel execution for nodes where output of A feeds input of B Fix: Map actual data dependencies before suggesting parallelization. Only truly independent nodes can run parallel.
Symptoms: Optimizing one metric while catastrophically degrading another Detection: If optimizing for cost increases latency >50% or optimizing latency increases cost >100% Fix: Always provide trade-off analysis: "20% cost savings, 15% latency increase, 5% accuracy impact"
Initial State: 5-node code review DAG: 45s total, $0.42 cost
extract-code: 4.2s, 2,400 tokens, Sonnetanalyze-complexity: 8.1s (3.4s wait + 4.7s exec), 4,200 tokens, Sonnetcheck-security: 6.8s, 3,100 tokens, Sonnetreview-performance: 12.4s, 8,900 tokens, Opusgenerate-report: 13.5s (9.2s wait + 4.3s exec), 5,200 tokens, SonnetStep 1 - Bottleneck Classification
Primary bottleneck: review-performance at 12.4s (27% of total) - Single slow node pattern
Secondary: Dependency blocking causing 12.6s total wait time
Step 2 - Apply Decision Tree High latency bottleneck + resource contention → Restructure for parallelization + break down slow node
Step 3 - Optimization Recommendations
review-performance into check-patterns (3s, Sonnet) + assess-complexity (4s, Sonnet)
analyze-complexity + check-security (currently sequential)
Final Result: 28s total (38% faster), $0.34 cost (19% cheaper)
Initial State: 8-node data analysis: 67s total, $2.40 cost, 95% Opus usage
Step 1 - Cost Analysis Discovery
extract-tables: 2,800 tokens, Opus ($0.42) - Simple extraction taskclean-data: 3,200 tokens, Opus ($0.48) - Pattern matching taskstatistical-analysis: 12,600 tokens, Opus ($1.89) - Complex reasoninggenerate-insights: 9,400 tokens, Opus ($1.41) - Moderate analysisStep 2 - Model Selection Decision Tree Using complexity assessment:
Step 3 - Impact Calculation
Expert Decision: Accept trade-off - massive cost savings for minimal latency impact in non-critical analytics pipeline.
Performance profiling complete when:
This skill should NOT be used for:
dag-execution-tracer insteaddag-failure-analyzer insteaddag-architect insteaddag-auto-optimizer insteaddag-task-scheduler insteadDelegate when:
dag-execution-tracerdag-failure-analyzerdag-architectdag-auto-optimizerdag-task-schedulerThis skill focuses on analysis and actionable recommendations, not monitoring, design, or automatic implementation.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.