.claude/skills/hype-assessment/SKILL.md
Assess overall hype levels across AI topics by comparing lab researcher enthusiasm against critic skepticism. Use after topic synthesis to identify which topics are overhyped, underhyped, or accurately assessed by the field.
npx skillsauth add rickoslyder/HypeDelta hype-assessmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Assess which AI topics are overhyped, underhyped, or accurately assessed based on synthesized claims.
Signs of overhype:
Signs of underhype:
Signs of accurate assessment:
For each topic, assign a score from -1.0 to +1.0:
| Score | Meaning | |-------|---------| | +1.0 | Severely overhyped - massive gap between claims and reality | | +0.5 | Moderately overhyped - lab enthusiasm outpaces evidence | | +0.2 | Slightly overhyped | | 0.0 | Accurately assessed | | -0.2 | Slightly underhyped | | -0.5 | Moderately underhyped - real progress being underrated | | -1.0 | Severely underhyped - major developments being ignored |
Return JSON:
{
"overhypedTopics": [
{
"topic": "agents",
"score": 0.6,
"reasoning": "Lab enthusiasm for autonomous agents significantly exceeds demonstrated reliability. Multiple high-profile failures in production while claims of imminent AGI-like autonomy persist.",
"keyEvidence": [
"Devin and similar demos failed to replicate",
"Production agent deployments have high failure rates",
"Claims of 'replacing developers' haven't materialized"
]
}
],
"underhypedTopics": [
{
"topic": "interpretability",
"score": -0.5,
"reasoning": "Significant progress on mechanistic interpretability is being made at Anthropic and elsewhere, but mainstream coverage focuses on capabilities. Real tools for understanding models are emerging.",
"keyEvidence": [
"Golden Gate Claude demonstrated genuine steering",
"Feature extraction becoming reproducible",
"SAEs showing practical utility"
]
}
],
"accuratelyAssessedTopics": [
{
"topic": "multimodal",
"score": 0.1,
"reasoning": "Vision-language models have improved substantially and assessments largely reflect actual capabilities. Both enthusiasm and concerns are grounded.",
"keyEvidence": [
"GPT-4V and Claude vision work as advertised",
"Known limitations acknowledged",
"Incremental improvements match expectations"
]
}
],
"overallFieldSentiment": 0.72,
"summary": "A paragraph summarizing the overall hype landscape..."
}
Calculate as weighted average of lab researcher bullishness across all topics (0.0-1.0).
Interpretation:
Write a single paragraph summarizing:
Tone: Direct, opinionated but fair, grounded in evidence.
development
Filter and classify AI research content for relevance. Use when processing raw content from Twitter, Substacks, blogs, or podcasts to determine if it's worth extracting claims from. Assigns relevance scores, topics, and author categories.
data-ai
Synthesize claims across multiple sources to identify consensus, disagreements, and emerging narratives on AI research topics. Use when you have claims from both lab researchers and critics on the same topic and need to understand where they agree, disagree, and what the overall hype level is.
data-ai
Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.
data-ai
Detect hints about unreleased AI research or capabilities from lab researcher communications. Use when analyzing tweets, posts, or interviews from people at major AI labs to identify signals about upcoming work.