skills/thinking-thought-experiment/SKILL.md
You need to trace how a system would fail or behave at a scale you can't cheaply test or measure. Use to imagine the scenario and walk the consequence chain step by step.
npx skillsauth add tjboudreaux/cc-thinking-skills thinking-thought-experimentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Thought experiments use imagination to test ideas when direct experimentation is impractical, impossible, or premature. Einstein's elevator, Schrödinger's cat, and the trolley problem have advanced physics and philosophy. In engineering, thought experiments help evaluate designs, explore edge cases, and stress-test decisions before implementation.
Core Principle: Structured imagination is a tool. When you can't test in reality, test in your mind with discipline.
When you can't run the real experiment — the failure is too rare, the scale too large, the decision too one-way — you can still run it in your head with discipline: fix the conditions, then trace the consequence chain one step at a time until something breaks. The value is in the step-by-step trace, not in the verdict you jumped to.
Decision flow:
Need to understand how something behaves?
→ Can you cheaply test or measure it? → yes → DO THAT (stop here)
→ Is the real test too rare/large/irreversible? → yes → THOUGHT EXPERIMENT
When you need to trace how a system would behave at a scale you can't cheaply test:
If you can cheaply test or measure the real thing (load test, feature flag, query against real data), do that instead — a thought experiment is a substitute for empiricism, not a replacement. For adversarial "how would an attacker break this" analysis, use thinking-red-team.
Imagine a specific situation and trace consequences:
"What if we had 100x current traffic tomorrow?"
"What if our primary database failed right now?"
"What if a competitor copied our core feature?"
Push a parameter to its limit:
"What happens with 0 users? 1 billion users?"
"What if latency was 0? What if it was 10 seconds?"
"What if this team was 1 person? 100 people?"
Imagine the opposite of reality:
"What if we hadn't chosen microservices?"
"What if we didn't have this dependency?"
"What if our main customer segment didn't exist?"
Think from a different perspective:
"What would a malicious actor do with this system?"
"How would a new hire experience this codebase?"
"What would a competitor do with our technology?"
Move forward or backward in time:
"How will this look in 5 years?"
"If we had known this a year ago, what would we have done?"
"What will we wish we had built today?"
State clearly what you're testing:
## Thought Experiment: Database Failure Resilience
Question: How would our system behave if the primary database
became unavailable for 30 minutes during peak traffic?
Variables:
- Database state: Unavailable
- Duration: 30 minutes
- Traffic: Peak (3x normal)
- Preparation time: None (unexpected)
Describe the starting state precisely:
## Initial Conditions
System state:
- 10,000 active users
- 500 RPS to database
- No circuit breaker currently
- 10-second connection timeout
- No read replica failover configured
User behavior:
- Users will retry failed requests
- ~50% will leave after 2 failures
Walk through what happens step by step:
## Consequence Chain
T+0s: Database becomes unavailable
T+0-10s: Requests queue waiting for connection
T+10s: Connection timeouts start
T+10-30s: App servers accumulate waiting requests
T+30s: Thread pools exhausted
T+30s-2m: Cascading timeouts to upstream services
T+2m: Load balancer health checks fail
T+2m: Alerts fire (too late - damage done)
T+2-30m: Service completely unavailable
T+30m: Database returns, but app servers need restart
T+35m: Service gradually recovers
T+1h: Backlog of failed requests processed
What did you learn?
## Insights
1. Circuit breaker would limit cascade (need to implement)
2. Thread pool exhaustion is the failure mode (need limits)
3. Alerts fire too late (need proactive monitoring)
4. Recovery is slow (need automated recovery)
5. User experience is terrible (need graceful degradation)
What should you do based on this experiment?
## Actions
| Insight | Action | Priority |
|---------|--------|----------|
| No circuit breaker | Implement circuit breaker | High |
| Thread pool exhaustion | Add connection limits | High |
| Late alerts | Add DB latency monitoring | Medium |
| Slow recovery | Automate failover | Medium |
| Poor UX | Implement degraded mode | Medium |
## Thought Experiment: What if [component] fails?
For each critical component:
- API Gateway fails: [Trace consequences]
- Auth service fails: [Trace consequences]
- Cache layer fails: [Trace consequences]
- Message queue fails: [Trace consequences]
Template for each:
1. What fails immediately?
2. What fails within 1 minute?
3. What's the blast radius?
4. How would we know?
5. How would we recover?
For real security work, prefer thinking-red-team (structured attack surface + reproducible-attack-path findings). Use the sketch below only as a quick lens, not a substitute.
## Thought Experiment: Attacker Perspective
"I am a malicious actor with [access level]. How would I cause maximum damage?"
Scenario 1: External attacker with no access
- What public endpoints are vulnerable?
- What would a DDoS look like?
- What data could I infer from public behavior?
Scenario 2: Compromised user account
- What can I access beyond my own data?
- Can I escalate privileges?
- What's the blast radius of one compromised account?
Scenario 3: Malicious insider
- What would a rogue employee do?
- What access is over-provisioned?
- What audit trails would catch this?
## Thought Experiment: 10x Scale
Current: 10,000 users, 100 RPS
Imagined: 100,000 users, 1,000 RPS
What breaks?
- [ ] Database connections (pool exhaustion at ~300)
- [ ] Single-threaded workers (CPU bound at ~500 RPS)
- [ ] Memory per request (OOM at ~800 RPS)
- [ ] Network bandwidth (saturated at ~1500 RPS)
What doesn't break?
- [x] Stateless API servers (horizontally scalable)
- [x] CDN-served assets (already edge-cached)
- [x] Read-heavy queries (can add replicas)
First constraint to address: Database connection pooling
## Thought Experiment: 3 Years From Now
It's 2027. Our system is [successful/struggling]. Why?
Success scenario:
- What architectural decisions paid off?
- What did we avoid that would have hurt us?
- What did we invest in that scaled well?
Struggle scenario:
- What technical debt crushed us?
- What market change did we miss?
- What architecture couldn't adapt?
Present implications:
- What should we invest in now for 2027 success?
- What should we avoid now to prevent 2027 failure?
# Thought Experiment: [Title]
## Question
What am I trying to understand?
[Precise question]
## Setup
Initial conditions:
- [Condition 1]
- [Condition 2]
Variables being tested:
- [Variable 1]: [Value in experiment]
- [Variable 2]: [Value in experiment]
## Thought Experiment
[Step-by-step trace of what happens]
T+0: [Initial event]
T+X: [Consequence]
T+Y: [Next consequence]
...
T+end: [Final state]
## Observations
What happened in the experiment?
- [Observation 1]
- [Observation 2]
## Insights
What did I learn?
- [Insight 1]
- [Insight 2]
## Implications
What should I do differently?
- [Action 1]
- [Action 2]
## Limitations
What didn't this experiment capture?
- [Limitation 1]
- [Limitation 2]
"A thought experiment is a device with which one performs an intentional, structured process of intellectual deliberation in order to speculate, within a specifiable problem domain, about potential consequents (or antecedents) for a designated antecedent (or consequent)."
More simply: Imagine a scenario, trace what happens, and learn.
"I have no special talents. I am only passionately curious."
Thought experiments are curiosity with structure. Ask "what if?" systematically, trace the answer carefully, and you'll see what testing in reality would eventually show you—sometimes years sooner.
tools
About to add a feature/layer/process to fix a problem. First ask what to remove instead — subtraction is often more robust than addition. Use for simplification and complexity reduction.
development
Use when stuck between two architecture or API requirements that seem mutually exclusive — name the contradiction precisely, then separate the conflicting states in time, space, or condition.
devops
Use when optimizing latency or throughput in a pipeline and one stage dominates—focus all effort on that single bottleneck, since speeding up the others changes nothing until it's fixed.
development
Use when debugging across services/an incident where a fix in one place breaks another, or behavior is emergent and no single component explains it. Maps the system and traces causes.