skills/thinking-systems/SKILL.md
Analyze problems as interconnected systems with feedback loops, emergent behavior, and non-linear effects. Use for debugging complex systems, architecture decisions, and understanding unexpected behavior.
npx skillsauth add tjboudreaux/cc-thinking-skills thinking-systemsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systems thinking views problems as part of interconnected wholes rather than isolated components. It focuses on relationships, feedback loops, and emergent properties—behaviors that arise from interactions and can't be predicted from parts alone. Essential for debugging complex distributed systems and understanding why "obvious" fixes often fail.
Core Principle: The behavior of a system cannot be understood by analyzing components in isolation. Look at connections, feedback, and emergence.
Decision flow:
Problem spans multiple components? → yes → APPLY SYSTEMS THINKING
Fix in one place caused issue in another? → yes → APPLY SYSTEMS THINKING
Behavior seems "emergent" or unexpected? → yes → APPLY SYSTEMS THINKING
Reinforcing (Positive) Loops: Amplify change
Technical Debt Loop:
Deadline pressure → Shortcuts → More bugs → More firefighting
↓
← Less time for quality ←
Balancing (Negative) Loops: Counteract change
Auto-scaling Loop:
Load increases → More instances spawn → Load per instance decreases
↓
← Fewer instances needed ←
Questions to identify loops:
Stocks: Accumulated quantities (users, technical debt, cache size) Flows: Rates of change (registrations/day, bugs fixed/sprint)
┌─────────────────────────────────────┐
│ Inflow → [Stock] → Outflow │
│ │
│ New bugs → [Bug Backlog] → Fixes │
│ Requests → [Queue Depth] → Processed│
│ Hires → [Team Size] → Attrition │
└─────────────────────────────────────┘
Key insight: Stocks change slowly even when flows change quickly. Queue depth doesn't drop instantly when you add capacity.
Time lags between cause and effect obscure relationships:
Code deployed → [Delay: Cache TTL] → Users see change
Feature shipped → [Delay: Adoption curve] → Metrics change
New hire starts → [Delay: Ramp-up] → Productivity impact
Danger: Acting before feedback arrives leads to overcorrection.
Small changes can have large effects (and vice versa):
Linear assumption: 2x traffic = 2x latency
Reality: Traffic crosses threshold → 10x latency (queue buildup)
Linear assumption: Adding engineer adds capacity
Reality: Communication overhead grows O(n²)
Behaviors that arise from interactions, not individual components:
Draw components, connections, and data/control flows:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Client │────▶│ API │────▶│ DB │
└─────────┘ └────┬────┘ └─────────┘
│
▼
┌─────────┐
│ Cache │
└─────────┘
For each loop, determine:
Retry Storm Loop (Reinforcing - Dangerous):
Service slow → Clients retry → More load → Service slower → More retries
Follow the symptom backward to find originating cause:
Symptom: High latency in Service C
→ Service C waiting on Service B
→ Service B waiting on Service A
→ Service A doing full table scan (ROOT CAUSE)
What happens when components interact under stress?
One component fails → Dependent components overload → They fail
↓
← More traffic to remaining ←
Mitigation: Circuit breakers, bulkheads, graceful degradation
Cache expires → All requests hit backend simultaneously → Overload
Mitigation: Jittered expiration, cache warming, request coalescing
Processing rate < Arrival rate → Queue grows → Memory pressure → OOM
Mitigation: Backpressure, rate limiting, queue bounds
Multiple processes → Same resource → Lock contention → Serialization
↓
Throughput collapses despite available CPU
Mitigation: Sharding, optimistic locking, resource isolation
┌──────────────────────────────────────────────────────────────┐
│ System: [Name] │
├──────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ │
│ │ Factor │──────(+)──────────────▶│ Factor │ │
│ │ A │ │ B │ │
│ └─────────┘ └────┬────┘ │
│ ▲ │ │
│ │ │ │
│ (-) (+) │
│ │ │ │
│ │ ┌─────────┐ │ │
│ └─────────│ Factor │◀─────────────┘ │
│ │ C │ │
│ └─────────┘ │
│ │
│ Legend: (+) = same direction, (-) = opposite direction │
│ Loop type: Reinforcing / Balancing │
└──────────────────────────────────────────────────────────────┘
Where small changes have large effects (Donella Meadows):
| Leverage | Example | Impact | |----------|---------|--------| | Parameters | Timeout values | Low | | Buffer sizes | Queue limits | Low-Medium | | Feedback loops | Add monitoring | Medium | | Information flows | Make metrics visible | Medium-High | | Rules | Change retry policy | High | | Goals | Redefine SLOs | Very High | | Paradigm | Rethink architecture | Transformational |
"We can't control systems or figure them out. But we can dance with them."
Systems resist simple fixes. Effective intervention requires understanding the whole, finding leverage points, and accepting that you're influencing, not controlling.
tools
Improve by removal rather than addition. Focus on what to stop doing, eliminate the negative, and subtract complexity. Use for system simplification, process improvement, and feature prioritization.
data-ai
Apply TRIZ (Theory of Inventive Problem Solving) methodology to resolve technical contradictions and find innovative solutions. Use for engineering design, breaking through impossible constraints, and systematic innovation.
development
Test ideas through hypothetical scenarios when empirical testing is impractical. Use for architecture evaluation, edge case analysis, ethics considerations, and strategy development.
data-ai
Identify and manage the bottleneck; improvements elsewhere don't matter until the constraint is addressed. Use for performance optimization, process improvement, and resource allocation.