skills/tooluniverse-polygenic-risk-score/SKILL.md
Build and interpret polygenic risk scores (PRS) for complex diseases using GWAS summary statistics. Covers PRS construction (clumping/thresholding, PRS-CS), validation in independent cohorts, ancestry-aware adjustment, and clinical interpretation (population-relative risk, not absolute prediction). Use for PRS-based risk stratification.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-polygenic-risk-scoreInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build and interpret polygenic risk scores for complex diseases using genome-wide association study (GWAS) data.
A polygenic risk score predicts genetic risk, not disease. A high PRS means elevated risk relative to the population — it does not mean the person will develop the condition, and a low PRS does not confer immunity. PRS performance varies dramatically across ancestries: a European-derived PRS applied to a West African population can lose 50–70% of its predictive power because the underlying GWAS was trained on European allele frequencies and LD patterns. Effect sizes from discovery GWAS are subject to winner's curse (overestimation in single studies); always prefer weights from large meta-analyses or validated PGS Catalog models. PRS should always be interpreted in the context of non-genetic risk factors — for most complex diseases, environmental factors contribute as much or more than genetics.
LOOK UP DON'T GUESS: Do not assume effect sizes, allele frequencies, or which SNPs are genome-wide significant for a trait — always query GWAS Catalog (gwas_get_associations_for_trait) for actual data. Do not assume a validated PRS model exists for a trait; check PGS Catalog via PubMed search.
Use Cases:
What This Skill Does:
What This Skill Does NOT Do:
A polygenic risk score is calculated as a weighted sum across genetic variants:
PRS = Σ (dosage_i × effect_size_i)
Where:
Raw PRS is standardized to z-scores for interpretation:
z-score = (PRS - population_mean) / population_std
This allows comparison to population distribution and percentile calculation.
This skill uses ToolUniverse GWAS tools to query:
GWAS Catalog (EMBL-EBI)
gwas_search_associations (param: disease_trait, size; also gwas_get_associations_for_trait), gwas_get_snps_for_gene (param: gene_symbol), dbsnp_get_variant_by_rsiddisease_trait search returns associations where the trait is one of potentially several linked EFO traits. For precise filtering, use EFO IDs via efo_trait param.Open Targets Genetics
OpenTargets_search_gwas_studies_by_disease, EnsemblVEP_annotate_hgvs (for variant consequence/frequency)Variant Annotation
gnomad_search_variants + gnomad_get_variant — population allele frequencies (ancestry-specific via VEP colocated_variants)MyVariant_query_variants — CADD, SIFT, PolyPhen, ClinVar, gnomAD in one callgnomad_get_gene_constraints — gene constraint metrics (pLI, oe_lof) for target prioritizationPolygenic risk scores aggregate the effects of many genetic variants to estimate an individual's genetic predisposition to a trait or disease. Unlike Mendelian diseases caused by single mutations, complex diseases involve hundreds to thousands of variants, each with small effects.
Key Properties:
GWAS compare allele frequencies between cases and controls (or correlate with trait values) across millions of SNPs to identify disease-associated variants.
Study Design:
Nearby variants are often inherited together (LD). To avoid double-counting:
GWAS and PRS are most accurate when ancestries match:
PRS can stratify individuals for:
Example: Khera et al. (2018) showed PRS identifies 3× more individuals at >3-fold coronary artery disease risk than monogenic mutations.
Consumer genetic testing (23andMe, Ancestry DNA) provides raw genotypes. Users can:
Caution: Personal PRS should not replace medical advice. Results may cause anxiety if not properly contextualized.
Identify the disease or trait of interest:
Query GWAS databases for genome-wide significant associations:
prs = build_polygenic_risk_score(
trait="coronary artery disease",
p_threshold=5e-8, # Genome-wide significance
max_snps=1000
)
Considerations:
Extract beta coefficients or odds ratios:
Quality control filters:
Calculate weighted sum of genotype dosages:
result = calculate_personal_prs(
prs_weights=prs,
genotypes=my_genotypes,
population_mean=0.0,
population_std=1.0
)
Genotype Sources:
Convert to percentiles and risk categories:
result = interpret_prs_percentile(result)
print(f"Percentile: {result.percentile:.1f}%")
print(f"Risk: {result.risk_category}")
Risk Categories:
Clinical Interpretation:
This skill is for educational and research purposes only.
For clinical genetic testing, consult:
PRS is a rapidly evolving field. Guidelines and best practices will continue to change as research progresses.
Regulatory Status:
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.