skills/thinking-probabilistic/SKILL.md
Use when stating a forecast, estimate, or risk. Anchor on the base rate, give a confidence range instead of a point, and update the number when new evidence arrives.
npx skillsauth add tjboudreaux/cc-thinking-skills thinking-probabilisticInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Probabilistic thinking, informed by Philip Tetlock's "Superforecasting," treats a forecast as a probability and a range rather than a single confident number. Three moves do almost all the work: anchor on the base rate, express the estimate as a range (not a point), and update the number when new evidence arrives.
Core Principle: Start from how often similar things happen, state your estimate as a range with a confidence level, and move the number — explicitly — when the evidence moves.
Stateless-agent note. Across a single task you have no persistent prediction log, so there is no "track my calibration over months" step here. The leverage is in the act of estimating: base rate, range, update. Apply the calibration attitude (assume you're overconfident; widen the range) without pretending to keep a cross-session scorecard you don't have.
Decision flow:
About to state a forecast/estimate/risk?
→ Outcome genuinely uncertain? → yes → BASE RATE, then a RANGE (not a point)
→ New evidence since last estimate? → yes → UPDATE THE NUMBER
→ Can you just look it up / measure it? → yes → DO THAT INSTEAD
thinking-bayesian; use it for the explicit prior × likelihood-ratio update.Convert vague language to numbers:
| Vague Statement | Probability Range | |-----------------|-------------------| | "Certain" | 99%+ | | "Almost certain" | 90-99% | | "Very likely" | 80-90% | | "Likely" / "Probable" | 65-80% | | "Better than even" | 55-65% | | "Toss-up" | 45-55% | | "Unlikely" | 20-35% | | "Very unlikely" | 10-20% | | "Almost impossible" | 1-10% | | "Impossible" | <1% |
Express estimates as ranges, not points:
BAD: "The project will take 6 weeks"
GOOD: "I'm 80% confident the project will take 4-8 weeks"
BETTER: "50% confidence: 5-7 weeks; 90% confidence: 3-10 weeks"
Start with how often similar things happen:
Question: Will this feature launch on time?
Base rate: What % of similar features launched on time? ~40%
Adjustment: This team is experienced (+10%), scope is clear (+10%)
Estimate: ~60% probability of on-time launch
State your belief as a number:
## Prediction: Will we hit Q2 revenue target?
Initial estimate: 65%
Reasoning:
- Last 4 quarters: Hit 3/4 targets (75% base rate)
- Current pipeline: Slightly below historical (-10%)
- New product launching: Uncertain impact
What could change the probability?
Key uncertainties:
1. Will Enterprise deal close? (+15% if yes)
2. Will new product cannibalize existing? (-10% if significant)
3. Will competitor launch disrupt? (-20% if aggressive)
For complex predictions, branch scenarios:
Project success: ?
├── Technical risk resolves well (60%)
│ ├── Team stays intact (80%) → 0.60 × 0.80 = 48% → SUCCESS
│ └── Key person leaves (20%) → 0.60 × 0.20 × 0.50 = 6% → PARTIAL
├── Technical risk causes delays (30%)
│ ├── Scope reduced (60%) → 0.30 × 0.60 × 0.70 = 12.6% → SUCCESS
│ └── Scope maintained (40%) → 0.30 × 0.40 = 12% → FAILURE
└── Technical risk blocks project (10%) → 10% → FAILURE
P(Success) = 48% + 12.6% = 60.6% ≈ 60%
When new evidence arrives, update:
Original estimate: 65% hit revenue target
New information: Enterprise deal delayed to Q3
Impact: -15% (was +15% if closed, now neutral)
Updated estimate: 50%
New information: Competitor launch was weak
Impact: +10% (was -20% if aggressive)
Updated estimate: 60%
Make the forecast falsifiable within the task itself: a clear claim, a timeframe, and the range. This lets the user or a later observation verify it — you don't carry a personal scorecard across sessions, but a sharply-stated prediction can still be proven right or wrong.
Prediction: "80% confident the migration completes with <5 min downtime,
range 1-15 min downtime." (Checkable against the actual run.)
These are sanity checks you apply now, within the task — not a longitudinal tracking exercise.
"Would I bet at these odds?"
Prediction: 80% confident project finishes on time
Equivalent: Would I bet $4 to win $1?
If that feels wrong, adjust the probability.
Always check base rates:
Inside view: "Our team is great, we'll definitely finish on time"
Outside view: "What % of similar projects finished on time?"
Inside tends toward overconfidence
Outside provides calibration anchor
Imagine failure, then adjust:
Initial estimate: 85% success
After pre-mortem: Identified 5 failure modes I hadn't considered
Adjusted estimate: 70%
Are your intervals too narrow?
Test: Of your 90% confidence intervals, do 90% contain the actual?
Common finding: Only 60-70% do
Fix: Widen intervals by 50%
## Project: Payment System Rewrite
Timeline estimate:
- 50% confidence: 8-12 weeks
- 80% confidence: 6-16 weeks
- 95% confidence: 4-24 weeks
Key variables:
- API complexity: High uncertainty (+/- 3 weeks)
- Team availability: Medium uncertainty (+/- 2 weeks)
- Integration testing: High uncertainty (+/- 4 weeks)
Commitment: "We're 80% confident we'll deliver in Q2"
## Risk: Database migration causes extended downtime
Probability assessment:
- Base rate for similar migrations: 20% have issues
- Our preparation level: Above average (-5%)
- Complexity of our schema: Above average (+5%)
- Rollback plan quality: Strong (-5%)
Estimate: 15% probability of extended downtime
Mitigation value:
- If issue occurs: 4 hours downtime × $10K/hour = $40K
- Expected loss: 15% × $40K = $6K
- Mitigation cost: $3K for additional testing
- Decision: Mitigation worth it (ROI positive)
## Decision: Adopt new framework
Success probability factors:
| Factor | Probability | Weight |
|--------|-------------|--------|
| Team learns quickly | 70% | 0.3 |
| Framework matures | 80% | 0.2 |
| Performance meets needs | 60% | 0.3 |
| Integration works | 75% | 0.2 |
Combined probability (simplified):
0.70 × 0.80 × 0.60 × 0.75 = 25% (if all must succeed)
OR weighted average: 70% (if partial success acceptable)
Decision: High uncertainty suggests pilot first
# Probabilistic Assessment: [Prediction]
## Prediction
[Clear, falsifiable statement with timeframe]
## Initial Probability
Estimate: [X]%
Base rate: [Similar events: Y%]
Adjustment rationale: [Why different from base rate]
## Confidence Interval
- 50% CI: [Range]
- 80% CI: [Range]
- 95% CI: [Range]
## Key Uncertainties
| Uncertainty | If positive | If negative |
|-------------|-------------|-------------|
| [Factor 1] | +X% | -Y% |
| [Factor 2] | +X% | -Y% |
## Updates (within this task)
| New information | Old P | New P |
|-----------------|-------|-------|
| | | |
## Checkable Outcome
[The specific observation that will prove this forecast right or wrong]
"The fox knows many things, but the hedgehog knows one big thing."
Superforecasters are foxes—they integrate many perspectives, update frequently, and avoid ideological certainty. They're not smarter; they're more calibrated.
"Beliefs are hypotheses to be tested, not treasures to be protected."
Your predictions should change as evidence changes. Holding steady when you should update is a calibration failure.
tools
About to add a feature/layer/process to fix a problem. First ask what to remove instead — subtraction is often more robust than addition. Use for simplification and complexity reduction.
development
Use when stuck between two architecture or API requirements that seem mutually exclusive — name the contradiction precisely, then separate the conflicting states in time, space, or condition.
testing
You need to trace how a system would fail or behave at a scale you can't cheaply test or measure. Use to imagine the scenario and walk the consequence chain step by step.
devops
Use when optimizing latency or throughput in a pipeline and one stage dominates—focus all effort on that single bottleneck, since speeding up the others changes nothing until it's fixed.