skills/wei-2022-chain-of-thought/SKILL.md
Strategic framework for eliciting and orchestrating reasoning in LLM-based agents through structured decomposition
npx skillsauth add curiositech/windags-skills wei-2022-chain-of-thoughtInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Strategic framework for eliciting and orchestrating reasoning in LLM-based agents through structured decomposition
Task Complexity × Agent Capability → Action
IF agent_parameters < 100B AND task_requires_multi_step_reasoning:
→ DIRECT prompting (decomposition hurts below emergence threshold)
ELIF baseline_accuracy > 80% AND reasoning_steps < 3:
→ DIRECT prompting (minimal gains, added latency cost)
ELIF task_complexity = HIGH AND multi_step_reasoning = TRUE:
→ CHAIN_OF_THOUGHT prompting (gains scale with complexity)
ELIF task_distribution = MIXED_COMPLEXITY:
→ SELF_SELECT mode (provide examples, let agent choose when to decompose)
Observe failure type:
IF error_type IN [calculator_mistake, symbol_mapping_error, missing_single_step]:
→ SHALLOW failure
→ Add external tools (calculator, validator)
→ Implement consistency checks
→ Re-run with augmentation
ELIF error_type IN [semantic_misunderstanding, incoherent_logic, fundamental_reasoning_flaw]:
→ DEEP failure
→ Check agent capability threshold
→ Re-route to more capable agent OR reject task
ELIF correct_answer = TRUE:
→ Verify reasoning chain (98% of correct answers have sound reasoning)
→ High confidence in output quality
Task Type × Semantic Requirements → Structure
IF task = SYMBOLIC_MANIPULATION:
→ HYBRID: Natural language setup → formal operations → natural language verification
ELIF semantic_understanding_across_steps = TRUE:
→ FULL natural language chain-of-thought
→ Don't optimize for brevity (semantic grounding requires explicit articulation)
ELIF multi_agent_coordination = TRUE:
→ Natural language for coordination medium
→ Formal protocols within individual agents only
Detection Rule: If small model (<100B params) + chain-of-thought prompting → worse performance than baseline Symptom: Fluent but illogical reasoning chains, performance degradation Fix: Verify emergence threshold empirically before deploying decomposition; use direct prompting below threshold
Detection Rule: If adding compute time/tokens without structured intermediate steps → no performance gain Symptom: Longer outputs with dots/padding but same accuracy Fix: Structure compute through meaningful semantic intermediate states, not just duration
Detection Rule: If same recovery strategy applied to calculator errors and semantic failures Symptom: Tool augmentation fails on deep reasoning errors; model re-routing wastes cycles on shallow errors Fix: Classify failures first (shallow vs. deep), then route to appropriate intervention
Detection Rule: If natural language reasoning shortened to save tokens → coherence loss Symptom: Broken semantic grounding, reasoning chain loses logical connection Fix: Preserve explicit articulation; the "inefficiency" maintains semantic coherence
Detection Rule: If demo performance (95%) ≫ production performance (70%) across task variations Symptom: High variance across annotators/exemplar sets on low-complexity tasks Fix: Test robustness envelope before production; expect brittleness on tasks with high baseline accuracy
Scenario: Multi-step arithmetic problem arrives at LaMDA 68B agent
Decision Process:
Execution:
Result: 14% accuracy vs. 6% baseline (emergence threshold effect)
Scenario: Agent produces wrong answer: "The total cost is $47" (correct: $52)
Analysis Process:
Recovery:
Scenario: Multi-agent system coordinating complex financial analysis
Decision Point: Use formal API calls or natural language coordination?
Analysis:
Decision: Natural language coordination despite token overhead
Implementation:
Agent A: "Given the Q3 earnings show 15% revenue growth but 8% margin compression,
I need to analyze if this indicates sustainable growth or pricing pressure..."
Agent B: "Building on your margin analysis, the compression aligns with our competitive
positioning data showing 3 new market entrants..."
Expert reasoning: The "inefficiency" of natural language maintains semantic coherence Novice mistake: Would optimize for concise formal protocols and lose reasoning grounding
Scenario: Prompting technique achieves 95% accuracy on test exemplars
Pre-deployment Process:
Production Strategy:
This skill is NOT for:
Delegate to other skills:
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.