skills/thinking-kepner-tregoe/SKILL.md
Use when a defect is selective (some endpoints/regions/users/times affected, not all) and the cause is unclear — map what IS vs IS-NOT affected; the boundary contrast points at the root cause.
npx skillsauth add tjboudreaux/cc-thinking-skills thinking-kepner-tregoeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Kepner-Tregoe (KT) is a structured root-cause method. This skill focuses on Problem Analysis (PA) — the IS/IS-NOT boundary contrast — which is the high-value KT process for debugging. When a defect is selective (some cases affected, others not), the boundary between IS and IS-NOT reveals the distinction that points at the root cause.
Decision Analysis (DA) and Potential Problem Analysis (PPA) are de-emphasized here. For pure decision-making among alternatives, use thinking-opportunity-cost. For risk anticipation before a change, use thinking-pre-mortem. Those skills are purpose-built for those tasks; KT's DA/PPA add overhead without unique mechanism.
Situation Analysis (SA) is retained as a lightweight triage step when facing multiple concerns, but it is not a required preamble — jump directly to PA when the problem is already clear.
Core Principle: The boundary between what IS affected and what IS NOT affected encodes the root cause. Find the distinction, find the cause.
Decision flow:
Defect is selective (not 100%)? → No → IS/IS-NOT has no signal; use direct debugging or thinking-systems
→ Yes → Cause obvious from stack trace/recent change? → Yes → Just fix it
→ No → APPLY KT PROBLEM ANALYSIS
thinking-systems or direct debugging.thinking-occams-razor) before building a full specification matrix.thinking-opportunity-cost, not KT's Decision Analysis.thinking-pre-mortem, not KT's Potential Problem Analysis.When a defect is selective (some cases affected, others not) and the cause is unclear:
Skip if the failure is uniform (100%) — there's no boundary to contrast; use direct debugging. If the cause is obvious from a stack trace or recent change, just fix it. For a single cheaply-testable hypothesis, test it first.
Only when facing several problems at once. List all concerns, separate them if compound, and prioritize by Timing/Impact/Trend:
| Concern | Timing | Impact | Trend | Priority | |---------|--------|--------|-------|----------| | API latency spike | Urgent | High | Worsening | P0 | | Checkout errors | Soon | High | Stable | P1 |
For each concern, decide: Problem Analysis (PA), or delegate to another skill.
Describe the deviation from expected behavior with specificity:
"API response time increased from 200ms to 800ms for /checkout endpoint,
US-East only, starting Monday 9 AM, affecting ~30% of requests."
Specify the problem across four dimensions. The power is in the distinction column — what's unique about the IS side?
| Dimension | IS (affected) | IS NOT (not affected) | Distinction | |-----------|---------------|----------------------|-------------| | WHAT — object | /checkout endpoint | /cart, /product, /user | Payment processing | | WHAT — defect | 4x latency increase | Errors, timeouts, data corruption | Performance only | | WHERE — location | Production US-East | EU, US-West, staging | Single region | | WHERE — on object | Database query phase | Auth, validation, serialization | DB layer | | WHEN — first seen | Monday 9:00 AM | Before Monday, after 6 PM | Business hours | | WHEN — pattern | During checkout submit | During browsing, cart add | Write operations | | EXTENT — how many | ~30% of requests | 100% of requests | Intermittent | | EXTENT — trend | Stable since Tuesday | Getting worse | Plateaued |
For each row, ask: "What's unique or distinctive about the IS side compared to the IS-NOT side?"
Distinctions:
- Only /checkout (payment processing) — not other endpoints
- Only US-East (specific DB replica) — not other regions
- Only during business hours (load-related?) — not off-peak
- Only ~30% of requests (specific query pattern?) — not all
- Started Monday 9 AM — what changed?
What changed in, on, around, or about the distinctions near the first observation time?
Changes near Monday 9 AM:
- Payment provider SDK updated (Sunday night deploy)
- Database index rebuild scheduled (Sunday maintenance)
- New fraud detection rules enabled (Monday 8:45 AM)
Each candidate cause must explain BOTH the IS and the IS-NOT:
| Possible Cause | Explains IS? | Explains IS-NOT? | Verdict | |----------------|-------------|------------------|---------| | Fraud rules adding DB queries | ✓ Only checkout, only write ops | ✓ Not other endpoints | Pursue | | Payment SDK change | ✓ Only checkout | ✗ Would affect all regions | Ruled out | | Index rebuild | ✓ DB layer | ✗ Would affect all queries | Ruled out |
Design a test to confirm or rule out the leading candidate:
Verification for "Fraud detection rules":
1. Check: Rules enabled 8:45 AM (matches timeline)
2. Check: Rules only on checkout (matches scope)
3. Test: Disable rules in canary, measure latency
4. Examine: Query logs for fraud check queries
A completed KT Problem Analysis produces:
| Anti-Pattern | Symptom | Correction |
|---|---|---|
| KT on uniform failure | Running PA when 100% of requests fail | No boundary to contrast; use direct debugging or thinking-systems |
| Over-specifying the matrix | Filling every IS/IS-NOT cell for a simple bug | Stop when the distinction is clear; don't ritualize |
| DA/PPA sprawl | Running full Decision Analysis or Potential Problem Analysis for routine tasks | Redirect to thinking-opportunity-cost (decisions) or thinking-pre-mortem (risks) |
| Skipping cause testing | Pursuing the first plausible cause without testing against IS-NOT | Every cause must explain BOTH IS and IS-NOT |
| SA as mandatory preamble | Running full Situation Analysis before every PA | Jump directly to PA when the problem is already clear |
| Ignoring the distinction | Building the matrix but not extracting what's unique about IS | The distinction IS the signal; without it, the matrix is just a table |
tools
About to add a feature/layer/process to fix a problem. First ask what to remove instead — subtraction is often more robust than addition. Use for simplification and complexity reduction.
development
Use when stuck between two architecture or API requirements that seem mutually exclusive — name the contradiction precisely, then separate the conflicting states in time, space, or condition.
testing
You need to trace how a system would fail or behave at a scale you can't cheaply test or measure. Use to imagine the scenario and walk the consequence chain step by step.
devops
Use when optimizing latency or throughput in a pipeline and one stage dominates—focus all effort on that single bottleneck, since speeding up the others changes nothing until it's fixed.