skills/05-kthorn-research-superpower/research/evaluating-paper-relevance/SKILL.md
<!-- ╔══════════════════════════════════════════════════════════════╗ ║ 本文件为开源 Skill 原始文档,收录仅供学习与研究参考 ║ ║ CoPaper.AI 收集整理 | https://copaper.ai ║ ╚══════════════════════════════════════════════════════════════╝ 来源仓库: https://github.com/kthorn/research-superpower 项目名称: research-superpower 开源协议: MIT License 收录日期: 2026-04-02 声明: 本文件版权归原作者所有。此处收录旨在为社会科学实证研究者 提供 AI Agent Skills 的集中参考。如有侵权,请联系删除。 --> --- name: Evaluating Paper Relevance description: Two-
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research skills/05-kthorn-research-superpower/research/evaluating-paper-relevanceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Two-stage screening process: quick abstract scoring followed by deep dive into promising papers.
Core principle: Precision over breadth. Find papers that actually contain the specific data/methods user needs, not just topically related papers.
Use this skill when:
Small searches (<50 papers):
Large searches (50-150 papers):
Very large searches (>150 papers):
Goal: Quickly identify promising papers
Score 0-10 based on:
Decision rules:
IMPORTANT: Report to user for EVERY paper:
📄 [N/Total] Screening: "Paper Title"
Abstract score: 8 → Fetching full text...
or
📄 [N/Total] Screening: "Paper Title"
Abstract score: 4 → Skipping (insufficient relevance)
Never screen silently - user needs to see progress happening
Goal: Extract specific data/methods from promising papers
If paper describes medicinal chemistry / SAR data:
Use skills/research/checking-chembl to check if paper is in ChEMBL database:
curl -s "https://www.ebi.ac.uk/chembl/api/data/document.json?doi=$doi"
If found in ChEMBL:
Continue to full text fetch for context, methods, discussion.
Try in order:
A. PubMed Central (free full text):
# Check if available in PMC
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pmc&term=PMID[PMID]&retmode=json"
# If found, fetch full text XML via API
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=PMCID&rettype=full&retmode=xml"
# Or fetch HTML directly (note: use pmc.ncbi.nlm.nih.gov, not www.ncbi.nlm.nih.gov/pmc)
curl "https://pmc.ncbi.nlm.nih.gov/articles/PMCID/"
B. DOI resolution:
# Try publisher link
curl -L "https://doi.org/10.1234/example.2023"
# May hit paywall - check response
C. Unpaywall (MANDATORY if paywalled): CRITICAL: If step B hits a paywall, you MUST immediately try Unpaywall before giving up.
Use skills/research/finding-open-access-papers to find free OA version:
curl "https://api.unpaywall.org/v2/DOI?email=USER_EMAIL"
# Often finds versions in repositories, preprint servers, author copies
# IMPORTANT: Ask user for their email if not already provided - do NOT use [email protected]
Report to user:
⚠️ Paper behind paywall, checking Unpaywall...
✓ Found open access version at [repository/preprint server]
or
⚠️ Paper behind paywall, checking Unpaywall...
✗ No open access version available - continuing with abstract only
D. Preprints (direct):
https://www.biorxiv.org/content/10.1101/{doi}If full text unavailable AFTER trying Unpaywall:
CRITICAL: Do NOT skip Unpaywall check. Many paywalled papers have free versions in repositories.
Focus on sections:
What to look for (adapt to research domain):
Use grep/text search (adapt search terms):
# Examples for different domains
grep -i "IC50\|Ki\|MIC" paper.xml # Medicinal chemistry
grep -i "expression\|FPKM\|RNA-seq" paper.xml # Genomics
grep -i "abundance\|population\|sampling" paper.xml # Ecology
grep -i "algorithm\|github\|code" paper.xml # Computational
Create structured extraction (adapt to research domain):
Example 1: Medicinal chemistry
{
"doi": "10.1234/medchem.2023",
"title": "Novel kinase inhibitors...",
"relevance_score": 9,
"findings": {
"data_found": [
"IC50 values for compounds 1-12 (Table 2)",
"Selectivity data (Figure 3)",
"Synthesis route (Scheme 1)"
],
"key_results": [
"Compound 7: IC50 = 12 nM",
"10-step synthesis, 34% yield"
]
}
}
Example 2: Genomics
{
"doi": "10.1234/genomics.2023",
"title": "Gene expression in disease...",
"relevance_score": 8,
"findings": {
"data_found": [
"RNA-seq data for 50 samples (GEO: GSE12345)",
"Differential expression results (Table 1)",
"Gene set enrichment analysis (Figure 4)"
],
"key_results": [
"123 genes upregulated (FDR < 0.05)",
"Pathway enrichment: immune response"
]
}
}
Example 3: Computational methods
{
"doi": "10.1234/compbio.2023",
"title": "Novel alignment algorithm...",
"relevance_score": 9,
"findings": {
"data_found": [
"Algorithm pseudocode (Methods)",
"Code repository (github.com/user/tool)",
"Benchmark results (Table 2)"
],
"key_results": [
"10x faster than BLAST",
"98% accuracy on test dataset"
]
}
}
PDFs:
# If PDF available
curl -L -o "papers/$(echo $doi | tr '/' '_').pdf" "https://doi.org/$doi"
Supplementary data:
# Download SI files if URLs found
curl -o "papers/${doi}_supp.zip" "https://publisher.com/supp/file.zip"
CRITICAL: Use ONLY papers-reviewed.json and SUMMARY.md. Do NOT create custom tracking files.
CRITICAL: Add EVERY paper to papers-reviewed.json, regardless of score. This prevents re-reviewing papers and tracks complete search history.
Add to papers-reviewed.json:
For relevant papers (score ≥7):
{
"10.1234/example.2023": {
"pmid": "12345678",
"status": "relevant",
"score": 9,
"source": "pubmed_search",
"timestamp": "2025-10-11T10:30:00Z",
"found_data": ["IC50 values", "synthesis methods"],
"has_full_text": true,
"chembl_id": "CHEMBL1234567"
}
}
For not-relevant papers (score <7):
{
"10.1234/another.2023": {
"pmid": "12345679",
"status": "not_relevant",
"score": 4,
"source": "pubmed_search",
"timestamp": "2025-10-11T10:31:00Z",
"reason": "no activity data, review paper"
}
}
Always add papers even if skipped - this prevents re-processing and documents what was already checked.
Add to SUMMARY.md (examples for different domains):
Medicinal chemistry example:
### [Novel kinase inhibitors with improved selectivity](https://doi.org/10.1234/medchem.2023) (Score: 9)
**DOI:** [10.1234/medchem.2023](https://doi.org/10.1234/medchem.2023)
**PMID:** [12345678](https://pubmed.ncbi.nlm.nih.gov/12345678/)
**ChEMBL:** [CHEMBL1234567](https://www.ebi.ac.uk/chembl/document_report_card/CHEMBL1234567/)
**Key Findings:**
- IC50 values for 12 inhibitors (Table 2)
- Compound 7: IC50 = 12 nM, >80-fold selectivity
- Synthesis route (Scheme 1, page 4)
**Files:** PDF, supplementary data
Genomics example:
### [Transcriptomic analysis of disease progression](https://doi.org/10.1234/genomics.2023) (Score: 8)
**DOI:** [10.1234/genomics.2023](https://doi.org/10.1234/genomics.2023)
**PMID:** [23456789](https://pubmed.ncbi.nlm.nih.gov/23456789/)
**Data:** [GEO: GSE12345](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12345)
**Key Findings:**
- RNA-seq data: 50 samples, 3 conditions
- 123 differentially expressed genes (FDR < 0.05)
- Immune pathway enrichment (Figure 3)
**Files:** PDF, supplementary tables with gene lists
Computational methods example:
### [Fast sequence alignment with novel algorithm](https://doi.org/10.1234/compbio.2023) (Score: 9)
**DOI:** [10.1234/compbio.2023](https://doi.org/10.1234/compbio.2023)
**Code:** [github.com/user/tool](https://github.com/user/tool)
**Key Findings:**
- New alignment algorithm (pseudocode in Methods)
- 10x faster than BLAST, 98% accuracy
- Benchmark datasets available
**Files:** PDF, code repository linked
IMPORTANT: Always make DOIs and PMIDs clickable links:
[10.1234/example.2023](https://doi.org/10.1234/example.2023)[12345678](https://pubmed.ncbi.nlm.nih.gov/12345678/)CRITICAL: Report to user as you work - never work silently!
For every paper, report:
📄 [N/Total] Screening: "Title..."Abstract score: X/10For relevant papers, report findings immediately (adapt to domain):
Medicinal chemistry example:
📄 [15/127] Screening: "Selective BTK inhibitors..."
Abstract score: 8 → Fetching full text...
✓ Found IC50 data for 8 compounds (Table 2)
✓ Selectivity data vs 50 kinases (Figure 3)
→ Added to SUMMARY.md
Genomics example:
📄 [23/89] Screening: "Gene expression in liver disease..."
Abstract score: 9 → Fetching full text...
✓ RNA-seq data available (GEO: GSE12345)
✓ 123 DEGs identified (Table 1, FDR < 0.05)
→ Added to SUMMARY.md
Computational methods example:
📄 [7/45] Screening: "Novel phylogenetic algorithm..."
Abstract score: 8 → Fetching full text...
✓ Code available (github.com/user/tool)
✓ Benchmark results (10x faster, Table 2)
→ Added to SUMMARY.md
Update user every 5-10 papers with summary:
📊 Progress: Reviewed 30/127 papers
- Highly relevant: 3
- Relevant: 5
- Currently screening paper 31...
Why this matters: User needs to see work happening and provide feedback/corrections early
For medicinal chemistry papers:
skills/research/checking-chembl to find curated SAR dataDuring full text fetching:
skills/research/finding-open-access-papers (Unpaywall)After finding relevant paper:
| Score | Meaning | Action | |-------|---------|--------| | 0-4 | Not relevant | Skip, brief note in summary | | 5-6 | Possibly relevant | Note for later, skip deep dive for now | | 7-8 | Relevant | Deep dive, extract data, add to summary | | 9-10 | Highly relevant | Deep dive, extract data, follow citations, highlight in summary |
When screening many papers (>20), consider creating a helper script:
Benefits:
Create in research session folder:
# research-sessions/YYYY-MM-DD-query/screen_papers.py
Key components:
For large-scale screening, use two-script pattern:
Script 1: Abstract Screening (screen_papers.py)
evaluated-papers.json with basic metadataScript 2: Deep Dive (deep_dive_papers.py)
Benefits:
Script design:
When NOT to create helper script:
Not tracking all papers: Only adding relevant papers to papers-reviewed.json → Add EVERY paper regardless of score to prevent re-review Skipping Unpaywall: Hitting paywall and giving up → ALWAYS check Unpaywall first, many papers have free versions Creating unnecessary files for small searches: For <50 papers, use ONLY papers-reviewed.json and SUMMARY.md. For large searches (>100 papers), structured evaluated-papers.json and auxiliary files (README.md, TOP_PRIORITY_PAPERS.md) add significant value and should be used. Too strict: Skipping papers that mention data indirectly → Re-read abstract carefully Too lenient: Deep diving into tangentially related papers → Focus on specific data user needs Missing supplementary data: Many papers hide key data in SI → Always check for supplementary files Silent screening: User can't see progress → Report EVERY paper as you screen it No periodic summaries: User loses big picture → Update every 5-10 papers Non-clickable DOIs/PMIDs: Plain text identifiers → Always use markdown links Re-reviewing papers: Wastes time → Always check papers-reviewed.json first Not using helper scripts: Manually screening 100+ papers → Consider batch script
| Task | Action |
|------|--------|
| Check if reviewed | Look up DOI in papers-reviewed.json |
| Score abstract | Keywords (0-3) + Data type (0-4) + Specificity (0-3) |
| Get full text | Try PMC → DOI → Unpaywall → Preprints |
| Find data | Grep for terms, focus on Methods/Results/Tables |
| Download PDF | curl -L -o papers/FILE.pdf URL |
| Update tracking | Add to papers-reviewed.json + SUMMARY.md |
After evaluating paper:
skills/research/traversing-citationsUse this structure for research projects with 100+ papers:
Project Overview
Quick Start Guide
File Inventory
Key Findings Summary
Methodology
Next Steps
For datasets with >50 relevant papers, create curated priority list:
Example structure:
# Top Priority Papers
## Tier 1: Must-Read (Score 10)
### [Paper Title](https://doi.org/10.xxxx/yyyy) (Score: 10)
**DOI:** [10.xxxx/yyyy](https://doi.org/10.xxxx/yyyy)
**PMID:** [12345678](https://pubmed.ncbi.nlm.nih.gov/12345678/)
**Full text:** ✓ PMC12345678
**Key Findings:**
- Finding 1
- Finding 2
---
## Tier 2: High-Value (Score 8-9)
[Additional papers organized by priority...]
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.