plugin/skills/tooluniverse-polygenic-risk-score/SKILL.md
Build and interpret polygenic risk scores (PRS) for complex diseases using GWAS summary statistics. Covers PRS construction (clumping/thresholding, PRS-CS), validation in independent cohorts, ancestry-aware adjustment, and clinical interpretation (population-relative risk, not absolute prediction). Use for PRS-based risk stratification.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-polygenic-risk-scoreInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build and interpret polygenic risk scores for complex diseases using genome-wide association study (GWAS) data.
A polygenic risk score predicts genetic risk, not disease. A high PRS means elevated risk relative to the population — it does not mean the person will develop the condition, and a low PRS does not confer immunity. PRS performance varies dramatically across ancestries: a European-derived PRS applied to a West African population can lose 50–70% of its predictive power because the underlying GWAS was trained on European allele frequencies and LD patterns. Effect sizes from discovery GWAS are subject to winner's curse (overestimation in single studies); always prefer weights from large meta-analyses or validated PGS Catalog models. PRS should always be interpreted in the context of non-genetic risk factors — for most complex diseases, environmental factors contribute as much or more than genetics.
LOOK UP DON'T GUESS: Do not assume effect sizes, allele frequencies, or which SNPs are genome-wide significant for a trait — always query GWAS Catalog (gwas_get_associations_for_trait) for actual data. Do not assume a validated PRS model exists for a trait; check PGS Catalog via PubMed search.
Use Cases:
What This Skill Does:
What This Skill Does NOT Do:
A polygenic risk score is calculated as a weighted sum across genetic variants:
PRS = Σ (dosage_i × effect_size_i)
Where:
Raw PRS is standardized to z-scores for interpretation:
z-score = (PRS - population_mean) / population_std
This allows comparison to population distribution and percentile calculation.
This skill uses ToolUniverse GWAS tools to query:
GWAS Catalog (EMBL-EBI)
gwas_search_associations (param: disease_trait, size; also gwas_get_associations_for_trait), gwas_get_snps_for_gene (param: gene_symbol), dbsnp_get_variant_by_rsiddisease_trait search returns associations where the trait is one of potentially several linked EFO traits. For precise filtering, use EFO IDs via efo_trait param.Open Targets Genetics
OpenTargets_search_gwas_studies_by_disease, EnsemblVEP_annotate_hgvs (for variant consequence/frequency)Variant Annotation
gnomad_search_variants + gnomad_get_variant — population allele frequencies (ancestry-specific via VEP colocated_variants)MyVariant_query_variants — CADD, SIFT, PolyPhen, ClinVar, gnomAD in one callgnomad_get_gene_constraints — gene constraint metrics (pLI, oe_lof) for target prioritizationPolygenic risk scores aggregate the effects of many genetic variants to estimate an individual's genetic predisposition to a trait or disease. Unlike Mendelian diseases caused by single mutations, complex diseases involve hundreds to thousands of variants, each with small effects.
Key Properties:
GWAS compare allele frequencies between cases and controls (or correlate with trait values) across millions of SNPs to identify disease-associated variants.
Study Design:
Nearby variants are often inherited together (LD). To avoid double-counting:
GWAS and PRS are most accurate when ancestries match:
PRS can stratify individuals for:
Example: Khera et al. (2018) showed PRS identifies 3× more individuals at >3-fold coronary artery disease risk than monogenic mutations.
Consumer genetic testing (23andMe, Ancestry DNA) provides raw genotypes. Users can:
Caution: Personal PRS should not replace medical advice. Results may cause anxiety if not properly contextualized.
Identify the disease or trait of interest:
Query GWAS databases for genome-wide significant associations:
prs = build_polygenic_risk_score(
trait="coronary artery disease",
p_threshold=5e-8, # Genome-wide significance
max_snps=1000
)
Considerations:
Extract beta coefficients or odds ratios:
Quality control filters:
Calculate weighted sum of genotype dosages:
result = calculate_personal_prs(
prs_weights=prs,
genotypes=my_genotypes,
population_mean=0.0,
population_std=1.0
)
Genotype Sources:
Convert to percentiles and risk categories:
result = interpret_prs_percentile(result)
print(f"Percentile: {result.percentile:.1f}%")
print(f"Risk: {result.risk_category}")
Risk Categories:
Clinical Interpretation:
This skill is for educational and research purposes only.
For clinical genetic testing, consult:
PRS is a rapidly evolving field. Guidelines and best practices will continue to change as research progresses.
Regulatory Status:
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).