Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

rickoslyder/prediction-tracking

Name: prediction-tracking
Author: rickoslyder

.claude/skills/prediction-tracking/SKILL.md

npx skillsauth add rickoslyder/HypeDelta prediction-tracking

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Prediction Tracking Skill

Track predictions made by AI researchers and critics, evaluate their accuracy over time.

Prediction Recording

When recording a new prediction, capture:

Required Fields

text: The prediction as stated
author: Who made it
madeAt: When it was made
timeframe: When they expect it to happen
topic: What area of AI
confidence: How confident they seemed

Optional Fields

sourceUrl: Where the prediction was made
targetDate: Specific date if mentioned
conditions: Any caveats or conditions
metrics: How to measure success

Evaluation Status

When evaluating predictions, assign one of:

`verified`

Clearly came true as stated.

The predicted capability/event occurred
Within the stated timeframe
Substantially as described

`falsified`

Clearly did not come true.

Timeframe passed without occurrence
Contradictory evidence emerged
Author retracted or modified claim

`partially-verified`

Partially accurate.

Some aspects came true, others didn't
Capability exists but weaker than claimed
Timeframe was off but direction correct

`too-early`

Not enough time has passed.

Still within stated timeframe
No definitive evidence either way

`unfalsifiable`

Cannot be objectively assessed.

Too vague to measure
No clear success criteria
Moved goalposts

`ambiguous`

Prediction was too vague to evaluate.

Multiple interpretations possible
Success criteria unclear

Evaluation Process

For each prediction being evaluated:

1. Restate the prediction

What exactly was claimed?

2. Identify timeframe

Has enough time passed to evaluate?

3. Gather evidence

What has happened since?

Relevant releases or announcements
Benchmark results
Real-world deployments
Counter-evidence

4. Assess status

Which evaluation status applies?

5. Score accuracy

If verifiable, rate 0.0-1.0:

1.0: Exactly as predicted
0.7-0.9: Substantially correct
0.4-0.6: Partially correct
0.1-0.3: Mostly wrong
0.0: Completely wrong

6. Note lessons

What does this tell us about:

The author's forecasting ability
The topic's predictability
Common prediction pitfalls

Output Format

For evaluation:

{
  "evaluations": [
    {
      "predictionId": "id",
      "status": "verified",
      "accuracyScore": 0.85,
      "evidence": "Description of evidence",
      "notes": "Additional context",
      "evaluatedAt": "timestamp"
    }
  ]
}

For accuracy statistics:

{
  "author": "Author name",
  "totalPredictions": 15,
  "verified": 5,
  "falsified": 3,
  "partiallyVerified": 2,
  "pending": 4,
  "unfalsifiable": 1,
  "averageAccuracy": 0.62,
  "topicBreakdown": {
    "reasoning": { "predictions": 5, "accuracy": 0.7 },
    "agents": { "predictions": 3, "accuracy": 0.4 }
  },
  "calibration": "Assessment of how well-calibrated they are"
}

Calibration Assessment

Evaluate whether predictors are well-calibrated:

Well-Calibrated

High-confidence predictions usually come true
Low-confidence predictions have mixed results
Acknowledges uncertainty appropriately

Overconfident

High-confidence predictions often fail
Rarely expresses uncertainty
Doesn't update on evidence

Underconfident

Low-confidence predictions often come true
Hedges even on likely outcomes
Too conservative

Inconsistent

Confidence doesn't correlate with accuracy
Random relationship between stated and actual accuracy

Tracking Notable Predictors

Keep running assessments of key voices:

| Predictor | Total | Accuracy | Calibration | Notes | |-----------|-------|----------|-------------|-------| | Sam Altman | 20 | 55% | Overconfident | Timeline optimism | | Gary Marcus | 15 | 70% | Well-calibrated | Conservative | | Dario Amodei | 12 | 65% | Slightly over | Safety-focused |

Red Flags

Watch for prediction patterns that suggest bias:

Always bullish regardless of topic
Never acknowledges failed predictions
Moves goalposts when wrong
Predictions align suspiciously with financial interests
Vague enough to claim credit for anything

rickoslyder/prediction-tracking

.claude/skills/prediction-tracking/SKILL.md

Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.

data-ai

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add rickoslyder/HypeDelta prediction-tracking

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 2:51 PM4.1s1 file scanned

SKILL.md

name:: prediction-tracking
description:: Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.

Prediction Tracking Skill

Track predictions made by AI researchers and critics, evaluate their accuracy over time.

Prediction Recording

When recording a new prediction, capture:

Required Fields

text: The prediction as stated
author: Who made it
madeAt: When it was made
timeframe: When they expect it to happen
topic: What area of AI
confidence: How confident they seemed

Optional Fields

sourceUrl: Where the prediction was made
targetDate: Specific date if mentioned
conditions: Any caveats or conditions
metrics: How to measure success

Evaluation Status

When evaluating predictions, assign one of:

`verified`

Clearly came true as stated.

The predicted capability/event occurred
Within the stated timeframe
Substantially as described

`falsified`

Clearly did not come true.

Timeframe passed without occurrence
Contradictory evidence emerged
Author retracted or modified claim

`partially-verified`

Partially accurate.

Some aspects came true, others didn't
Capability exists but weaker than claimed
Timeframe was off but direction correct

`too-early`

Not enough time has passed.

Still within stated timeframe
No definitive evidence either way

`unfalsifiable`

Cannot be objectively assessed.

Too vague to measure
No clear success criteria
Moved goalposts

`ambiguous`

Prediction was too vague to evaluate.

Multiple interpretations possible
Success criteria unclear

Evaluation Process

For each prediction being evaluated:

1. Restate the prediction

What exactly was claimed?

2. Identify timeframe

Has enough time passed to evaluate?

3. Gather evidence

What has happened since?

Relevant releases or announcements
Benchmark results
Real-world deployments
Counter-evidence

4. Assess status

Which evaluation status applies?

5. Score accuracy

If verifiable, rate 0.0-1.0:

1.0: Exactly as predicted
0.7-0.9: Substantially correct
0.4-0.6: Partially correct
0.1-0.3: Mostly wrong
0.0: Completely wrong

6. Note lessons

What does this tell us about:

The author's forecasting ability
The topic's predictability
Common prediction pitfalls

Output Format

For evaluation:

{
  "evaluations": [
    {
      "predictionId": "id",
      "status": "verified",
      "accuracyScore": 0.85,
      "evidence": "Description of evidence",
      "notes": "Additional context",
      "evaluatedAt": "timestamp"
    }
  ]
}

For accuracy statistics:

{
  "author": "Author name",
  "totalPredictions": 15,
  "verified": 5,
  "falsified": 3,
  "partiallyVerified": 2,
  "pending": 4,
  "unfalsifiable": 1,
  "averageAccuracy": 0.62,
  "topicBreakdown": {
    "reasoning": { "predictions": 5, "accuracy": 0.7 },
    "agents": { "predictions": 3, "accuracy": 0.4 }
  },
  "calibration": "Assessment of how well-calibrated they are"
}

Calibration Assessment

Evaluate whether predictors are well-calibrated:

Well-Calibrated

High-confidence predictions usually come true
Low-confidence predictions have mixed results
Acknowledges uncertainty appropriately

Overconfident

High-confidence predictions often fail
Rarely expresses uncertainty
Doesn't update on evidence

Underconfident

Low-confidence predictions often come true
Hedges even on likely outcomes
Too conservative

Inconsistent

Confidence doesn't correlate with accuracy
Random relationship between stated and actual accuracy

Tracking Notable Predictors

Keep running assessments of key voices:

Red Flags

Watch for prediction patterns that suggest bias:

Always bullish regardless of topic
Never acknowledges failed predictions
Moves goalposts when wrong
Predictions align suspiciously with financial interests
Vague enough to claim credit for anything

Related Skills

rickoslyder/content-filter

development

VerifiedTrustedCommunity

Filter and classify AI research content for relevance. Use when processing raw content from Twitter, Substacks, blogs, or podcasts to determine if it's worth extracting claims from. Assigns relevance scores, topics, and author categories.

SKILL.mdUpdated Apr 16, 2026

rickoslyder/content-filter

rickoslyder/topic-synthesis

data-ai

VerifiedTrustedCommunity

Synthesize claims across multiple sources to identify consensus, disagreements, and emerging narratives on AI research topics. Use when you have claims from both lab researchers and critics on the same topic and need to understand where they agree, disagree, and what the overall hype level is.

SKILL.mdUpdated Apr 16, 2026

rickoslyder/topic-synthesis

rickoslyder/hype-assessment

testing

VerifiedTrustedCommunity

Assess overall hype levels across AI topics by comparing lab researcher enthusiasm against critic skepticism. Use after topic synthesis to identify which topics are overhyped, underhyped, or accurately assessed by the field.

SKILL.mdUpdated Apr 16, 2026

rickoslyder/hype-assessment

rickoslyder/hint-detection

data-ai

VerifiedTrustedCommunity

Detect hints about unreleased AI research or capabilities from lab researcher communications. Use when analyzing tweets, posts, or interviews from people at major AI labs to identify signals about upcoming work.

SKILL.mdUpdated Apr 16, 2026

rickoslyder/hint-detection

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/rickoslyder/HypeDelta.git

# Copy into Claude Code skills folder (global)
cp -r HypeDelta/.claude/skills/prediction-tracking ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

rickoslyder/HypeDelta

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT