Forecasting Expert Knowledge

Superforecasting Principles

Based on research by Philip Tetlock and the Good Judgment Project:

Triage: Focus on questions that are hard enough to be interesting but not so hard they're unknowable
Break problems apart: Decompose big questions into smaller, researchable sub-questions (Fermi estimation)
Balance inside and outside views: Use both specific evidence AND base rates from reference classes
Update incrementally: Adjust predictions in small steps as new evidence arrives (Bayesian updating)
Look for clashing forces: Identify factors pulling in opposite directions
Distinguish signal from noise: Weight signals by their reliability and relevance
Calibrate: Your 70% predictions should come true ~70% of the time
Post-mortem: Analyze why predictions went wrong, not just celebrate the right ones
Avoid the narrative trap: A compelling story is not the same as a likely outcome
Collaborate: Aggregate views from diverse perspectives

Signal Taxonomy

Signal Types

| Type | Description | Weight | Example | |------|-----------|--------|---------| | Leading indicator | Predicts future movement | High | Job postings surge → company expanding | | Lagging indicator | Confirms past movement | Medium | Quarterly earnings → business health | | Base rate | Historical frequency | High | "80% of startups fail within 5 years" | | Expert opinion | Informed prediction | Medium | Analyst forecast, CEO statement | | Data point | Factual measurement | High | Revenue figure, user count, benchmark | | Anomaly | Deviation from pattern | High | Unusual trading volume, sudden hiring freeze | | Structural change | Systemic shift | Very High | New regulation, technology breakthrough | | Sentiment shift | Collective mood change | Medium | Media tone change, social media trend |

Signal Strength Assessment

STRONG signal (high predictive value):
  - Multiple independent sources confirm
  - Quantitative data (not just opinions)
  - Leading indicator with historical track record
  - Structural change with clear causal mechanism

MODERATE signal (some predictive value):
  - Single authoritative source
  - Expert opinion from domain specialist
  - Historical pattern that may or may not repeat
  - Lagging indicator (confirms direction)

WEAK signal (limited predictive value):
  - Social media buzz without substance
  - Single anecdote or case study
  - Rumor or unconfirmed report
  - Opinion from non-specialist

Confidence Calibration

Probability Scale

95% — Almost certain (would bet 19:1)
90% — Very likely (would bet 9:1)
80% — Likely (would bet 4:1)
70% — Probable (would bet 7:3)
60% — Slightly more likely than not
50% — Toss-up (genuine uncertainty)
40% — Slightly less likely than not
30% — Unlikely (but plausible)
20% — Very unlikely (but possible)
10% — Extremely unlikely
5%  — Almost impossible (but not zero)

Calibration Rules

NEVER use 0% or 100% — nothing is absolutely certain
If you haven't done research, default to the base rate (outside view)
Your first estimate should be the reference class base rate
Adjust from the base rate using specific evidence (inside view)
Typical adjustment: ±5-15% per strong signal, ±2-5% per moderate signal
If your gut says 80% but your analysis says 55%, trust the analysis

Brier Score

The gold standard for measuring prediction accuracy:

Brier Score = (predicted_probability - actual_outcome)^2

actual_outcome = 1 if prediction came true, 0 if not

Perfect score: 0.0 (you're always right with perfect confidence)
Coin flip: 0.25 (saying 50% on everything)
Terrible: 1.0 (100% confident, always wrong)

Good forecaster: < 0.15
Average forecaster: 0.20-0.30
Bad forecaster: > 0.35

Domain-Specific Source Guide

Technology Predictions

| Source Type | Examples | Use For | |-------------|---------|---------| | Product roadmaps | GitHub issues, release notes, blog posts | Feature predictions | | Adoption data | Stack Overflow surveys, NPM downloads, DB-Engines | Technology trends | | Funding data | Crunchbase, PitchBook, TechCrunch | Startup success/failure | | Patent filings | Google Patents, USPTO | Innovation direction | | Job postings | LinkedIn, Indeed, Levels.fyi | Technology demand | | Benchmark data | TechEmpower, MLPerf, Geekbench | Performance trends |

Finance Predictions

| Source Type | Examples | Use For | |-------------|---------|---------| | Economic data | FRED, BLS, Census | Macro trends | | Earnings | SEC filings, earnings calls | Company performance | | Analyst reports | Bloomberg, Reuters, S&P | Market consensus | | Central bank | Fed minutes, ECB statements | Interest rates, policy | | Commodity data | EIA, OPEC reports | Energy/commodity prices | | Sentiment | VIX, put/call ratio, AAII survey | Market mood |

Geopolitics Predictions

| Source Type | Examples | Use For | |-------------|---------|---------| | Official sources | Government statements, UN reports | Policy direction | | Think tanks | RAND, Brookings, Chatham House | Analysis | | Election data | Polls, voter registration, 538 | Election outcomes | | Trade data | WTO, customs data, trade balances | Trade policy | | Military data | SIPRI, defense budgets, deployments | Conflict risk | | Diplomatic signals | Ambassador recalls, sanctions, treaties | Relations |

Climate Predictions

| Source Type | Examples | Use For | |-------------|---------|---------| | Scientific data | IPCC, NASA, NOAA | Climate trends | | Energy data | IEA, EIA, IRENA | Energy transition | | Policy data | COP agreements, national plans | Regulation | | Corporate data | CDP disclosures, sustainability reports | Corporate action | | Technology data | BloombergNEF, patent filings | Clean tech trends | | Investment data | Green bond issuance, ESG flows | Capital allocation |

Reasoning Chain Construction

Template

PREDICTION: [Specific, falsifiable claim]

1. REFERENCE CLASS (Outside View)
   Base rate: [What % of similar events occur?]
   Reference examples: [3-5 historical analogues]

2. SPECIFIC EVIDENCE (Inside View)
   Signals FOR (+):
   a. [Signal] — strength: [strong/moderate/weak] — adjustment: +X%
   b. [Signal] — strength: [strong/moderate/weak] — adjustment: +X%

   Signals AGAINST (-):
   a. [Signal] — strength: [strong/moderate/weak] — adjustment: -X%
   b. [Signal] — strength: [strong/moderate/weak] — adjustment: -X%

3. SYNTHESIS
   Starting probability (base rate): X%
   Net adjustment: +/-Y%
   Final probability: Z%

4. KEY ASSUMPTIONS
   - [Assumption 1]: If wrong, probability shifts to [W%]
   - [Assumption 2]: If wrong, probability shifts to [V%]

5. RESOLUTION
   Date: [When can this be resolved?]
   Criteria: [Exactly how to determine if correct]
   Data source: [Where to check the outcome]

Prediction Tracking & Scoring

Prediction Ledger Format

{
  "id": "pred_001",
  "created": "2025-01-15",
  "prediction": "OpenAI will release GPT-5 before July 2025",
  "confidence": 0.65,
  "domain": "tech",
  "time_horizon": "2025-07-01",
  "reasoning_chain": "...",
  "key_signals": ["leaked roadmap", "compute scaling", "hiring patterns"],
  "status": "active|resolved|expired",
  "resolution": {
    "date": "2025-06-30",
    "outcome": true,
    "evidence": "Released June 15, 2025",
    "brier_score": 0.1225
  },
  "updates": [
    {"date": "2025-03-01", "new_confidence": 0.75, "reason": "New evidence: leaked demo"}
  ]
}

Accuracy Report Template

ACCURACY DASHBOARD
==================
Total predictions:     N
Resolved predictions:  N (N correct, N incorrect, N partial)
Active predictions:    N
Expired (unresolvable):N

Overall accuracy:      X%
Brier score:           0.XX

Calibration:
  Predicted 90%+ → Actual: X% (N predictions)
  Predicted 70-89% → Actual: X% (N predictions)
  Predicted 50-69% → Actual: X% (N predictions)
  Predicted 30-49% → Actual: X% (N predictions)
  Predicted <30% → Actual: X% (N predictions)

Strengths: [domains/types where you perform well]
Weaknesses: [domains/types where you perform poorly]

Cognitive Bias Checklist

Before finalizing any prediction, check for these biases:

Anchoring: Am I fixated on the first number I encountered?
- Fix: Deliberately consider the base rate before looking at specific evidence
Availability bias: Am I overweighting recent or memorable events?
- Fix: Check the actual frequency, not just what comes to mind
Confirmation bias: Am I only looking for evidence that supports my prediction?
- Fix: Actively search for contradicting evidence (steel-man the opposite)
Narrative bias: Am I choosing a prediction because it makes a good story?
- Fix: Boring predictions are often more accurate
Overconfidence: Am I too sure?
- Fix: If you've never been wrong at this confidence level, you're probably overconfident
Scope insensitivity: Am I treating very different scales the same?
- Fix: Be specific about magnitudes and timeframes
Recency bias: Am I extrapolating recent trends too far?
- Fix: Check longer time horizons and mean reversion patterns
Status quo bias: Am I defaulting to "nothing will change"?
- Fix: Consider structural changes that could break the status quo

Contrarian Mode

When enabled, for each consensus prediction:

Identify what the consensus view is
Search for evidence the consensus is wrong
Consider: "What would have to be true for the opposite to happen?"
If credible contrarian evidence exists, include a contrarian prediction
Always label contrarian predictions clearly with the consensus for comparison

Forecasting Expert Knowledge

Superforecasting Principles

Based on research by Philip Tetlock and the Good Judgment Project:

Triage: Focus on questions that are hard enough to be interesting but not so hard they're unknowable
Break problems apart: Decompose big questions into smaller, researchable sub-questions (Fermi estimation)
Balance inside and outside views: Use both specific evidence AND base rates from reference classes
Update incrementally: Adjust predictions in small steps as new evidence arrives (Bayesian updating)
Look for clashing forces: Identify factors pulling in opposite directions
Distinguish signal from noise: Weight signals by their reliability and relevance
Calibrate: Your 70% predictions should come true ~70% of the time
Post-mortem: Analyze why predictions went wrong, not just celebrate the right ones
Avoid the narrative trap: A compelling story is not the same as a likely outcome
Collaborate: Aggregate views from diverse perspectives

Signal Taxonomy

Signal Types

Signal Strength Assessment

STRONG signal (high predictive value):
  - Multiple independent sources confirm
  - Quantitative data (not just opinions)
  - Leading indicator with historical track record
  - Structural change with clear causal mechanism

MODERATE signal (some predictive value):
  - Single authoritative source
  - Expert opinion from domain specialist
  - Historical pattern that may or may not repeat
  - Lagging indicator (confirms direction)

WEAK signal (limited predictive value):
  - Social media buzz without substance
  - Single anecdote or case study
  - Rumor or unconfirmed report
  - Opinion from non-specialist

Confidence Calibration

Probability Scale

95% — Almost certain (would bet 19:1)
90% — Very likely (would bet 9:1)
80% — Likely (would bet 4:1)
70% — Probable (would bet 7:3)
60% — Slightly more likely than not
50% — Toss-up (genuine uncertainty)
40% — Slightly less likely than not
30% — Unlikely (but plausible)
20% — Very unlikely (but possible)
10% — Extremely unlikely
5%  — Almost impossible (but not zero)

Calibration Rules

NEVER use 0% or 100% — nothing is absolutely certain
If you haven't done research, default to the base rate (outside view)
Your first estimate should be the reference class base rate
Adjust from the base rate using specific evidence (inside view)
Typical adjustment: ±5-15% per strong signal, ±2-5% per moderate signal
If your gut says 80% but your analysis says 55%, trust the analysis

Brier Score

The gold standard for measuring prediction accuracy:

Brier Score = (predicted_probability - actual_outcome)^2

actual_outcome = 1 if prediction came true, 0 if not

Perfect score: 0.0 (you're always right with perfect confidence)
Coin flip: 0.25 (saying 50% on everything)
Terrible: 1.0 (100% confident, always wrong)

Good forecaster: < 0.15
Average forecaster: 0.20-0.30
Bad forecaster: > 0.35

Domain-Specific Source Guide

Technology Predictions

Finance Predictions

Geopolitics Predictions

Climate Predictions

Reasoning Chain Construction

Template

PREDICTION: [Specific, falsifiable claim]

1. REFERENCE CLASS (Outside View)
   Base rate: [What % of similar events occur?]
   Reference examples: [3-5 historical analogues]

2. SPECIFIC EVIDENCE (Inside View)
   Signals FOR (+):
   a. [Signal] — strength: [strong/moderate/weak] — adjustment: +X%
   b. [Signal] — strength: [strong/moderate/weak] — adjustment: +X%

   Signals AGAINST (-):
   a. [Signal] — strength: [strong/moderate/weak] — adjustment: -X%
   b. [Signal] — strength: [strong/moderate/weak] — adjustment: -X%

3. SYNTHESIS
   Starting probability (base rate): X%
   Net adjustment: +/-Y%
   Final probability: Z%

4. KEY ASSUMPTIONS
   - [Assumption 1]: If wrong, probability shifts to [W%]
   - [Assumption 2]: If wrong, probability shifts to [V%]

5. RESOLUTION
   Date: [When can this be resolved?]
   Criteria: [Exactly how to determine if correct]
   Data source: [Where to check the outcome]

Prediction Tracking & Scoring

Prediction Ledger Format

{
  "id": "pred_001",
  "created": "2025-01-15",
  "prediction": "OpenAI will release GPT-5 before July 2025",
  "confidence": 0.65,
  "domain": "tech",
  "time_horizon": "2025-07-01",
  "reasoning_chain": "...",
  "key_signals": ["leaked roadmap", "compute scaling", "hiring patterns"],
  "status": "active|resolved|expired",
  "resolution": {
    "date": "2025-06-30",
    "outcome": true,
    "evidence": "Released June 15, 2025",
    "brier_score": 0.1225
  },
  "updates": [
    {"date": "2025-03-01", "new_confidence": 0.75, "reason": "New evidence: leaked demo"}
  ]
}

Accuracy Report Template

ACCURACY DASHBOARD
==================
Total predictions:     N
Resolved predictions:  N (N correct, N incorrect, N partial)
Active predictions:    N
Expired (unresolvable):N

Overall accuracy:      X%
Brier score:           0.XX

Calibration:
  Predicted 90%+ → Actual: X% (N predictions)
  Predicted 70-89% → Actual: X% (N predictions)
  Predicted 50-69% → Actual: X% (N predictions)
  Predicted 30-49% → Actual: X% (N predictions)
  Predicted <30% → Actual: X% (N predictions)

Strengths: [domains/types where you perform well]
Weaknesses: [domains/types where you perform poorly]

Cognitive Bias Checklist

Before finalizing any prediction, check for these biases:

Anchoring: Am I fixated on the first number I encountered?
- Fix: Deliberately consider the base rate before looking at specific evidence
Availability bias: Am I overweighting recent or memorable events?
- Fix: Check the actual frequency, not just what comes to mind
Confirmation bias: Am I only looking for evidence that supports my prediction?
- Fix: Actively search for contradicting evidence (steel-man the opposite)
Narrative bias: Am I choosing a prediction because it makes a good story?
- Fix: Boring predictions are often more accurate
Overconfidence: Am I too sure?
- Fix: If you've never been wrong at this confidence level, you're probably overconfident
Scope insensitivity: Am I treating very different scales the same?
- Fix: Be specific about magnitudes and timeframes
Recency bias: Am I extrapolating recent trends too far?
- Fix: Check longer time horizons and mean reversion patterns
Status quo bias: Am I defaulting to "nothing will change"?
- Fix: Consider structural changes that could break the status quo

Contrarian Mode

When enabled, for each consensus prediction:

Identify what the consensus view is
Search for evidence the consensus is wrong
Consider: "What would have to be true for the opposite to happen?"
If credible contrarian evidence exists, include a contrarian prediction
Always label contrarian predictions clearly with the consensus for comparison

Adoption

rfdiosuao/predictor-hand-skill

$ install --global

Security Scan Results

SKILL.md

Forecasting Expert Knowledge

Superforecasting Principles

Signal Taxonomy

Signal Types

Signal Strength Assessment

Confidence Calibration

Probability Scale

Calibration Rules

Brier Score

Domain-Specific Source Guide

Technology Predictions

Finance Predictions

Geopolitics Predictions

Climate Predictions

Reasoning Chain Construction

Template

Prediction Tracking & Scoring

Prediction Ledger Format

Accuracy Report Template

Cognitive Bias Checklist

Contrarian Mode

Related Skills

rfdiosuao/writing-coach

rfdiosuao/web-search

rfdiosuao/wasm-expert

rfdiosuao/vector-db

rfdiosuao/predictor-hand-skill

$ install --global

Security Scan Results

SKILL.md

Forecasting Expert Knowledge

Superforecasting Principles

Signal Taxonomy

Signal Types

Signal Strength Assessment

Confidence Calibration

Probability Scale

Calibration Rules

Brier Score

Domain-Specific Source Guide

Technology Predictions

Finance Predictions

Geopolitics Predictions

Climate Predictions

Reasoning Chain Construction

Template

Prediction Tracking & Scoring

Prediction Ledger Format

Accuracy Report Template

Cognitive Bias Checklist

Contrarian Mode

Related Skills

rfdiosuao/writing-coach

rfdiosuao/web-search

rfdiosuao/wasm-expert

rfdiosuao/vector-db