plugin/skills/tooluniverse-variant-interpretation/SKILL.md
Clinical variant interpretation from raw variant calls to ACMG-classified recommendations with structural impact analysis. Use for VUS classification, pathogenicity assessment with cited criteria, structure-based variant impact (AlphaFold/PDB), and producing clinical-grade variant reports for return of results or molecular tumor boards.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-variant-interpretationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic variant interpretation using ToolUniverse - from raw variant calls to ACMG-classified clinical recommendations with structural impact analysis.
Use this skill when users:
When asked about a variant's significance, query ClinVar/gnomAD/CIViC FIRST. Never classify a variant without checking databases. When you're not sure about a fact, your first instinct should be to SEARCH for it using tools, not to reason harder from memory.
Phase 1: VARIANT IDENTITY → Normalize HGVS, map gene/transcript/consequence
Phase 2: CLINICAL DATABASES → ClinVar, gnomAD, OMIM, ClinGen, COSMIC, SpliceAI
Phase 2.5: REGULATORY CONTEXT → ChIPAtlas, ENCODE (non-coding variants only)
Phase 3: COMPUTATIONAL PREDICTIONS → CADD, AlphaMissense, EVE, SIFT/PolyPhen
Phase 4: STRUCTURAL ANALYSIS → PDB/AlphaFold2, domains, functional sites (VUS/novel)
Phase 4.5: EXPRESSION CONTEXT → CELLxGENE, GTEx tissue expression
Phase 5: LITERATURE EVIDENCE → PubMed, EuropePMC, BioRxiv, MedRxiv
Phase 6: ACMG CLASSIFICATION → Evidence codes, classification, recommendations
Tools: MyVariant_query_variants, EnsemblVar_get_variant_consequences, NCBIGene_search, VariantValidator_gene2transcripts, VariantValidator_validate_variant, Tark_get_mane_transcripts, Tark_get_transcript
VariantValidator_gene2transcripts: Look up MANE Select and MANE Plus Clinical transcripts for a gene. Use this to identify the correct canonical transcript before variant annotation.
gene_symbol (e.g. "TP53"), transcript_set ("mane" | "refseq" | "ensembl" | "all"), genome_build ("GRCh38" default){current_symbol, transcripts: [{reference, annotations: {mane_select, mane_plus_clinical}}]}gene and gene_name also accepted for gene_symbolTark_get_mane_transcripts: Lightweight MANE Select / MANE Plus Clinical lookup from Ensembl Tark with the ENST↔RefSeq pairing (e.g. gene="BRCA2" → ENST00000380152.8 / NM_000059.4). Use as a quick cross-check of the canonical transcript namespace alongside VariantValidator, or to translate an ENST↔NM accession. Tark_get_transcript (param stable_id, e.g. "ENST00000380152") returns the archived transcript record (assembly, biotype, coordinates, per-release versions) when you need to resolve a specific transcript version.
VariantValidator_validate_variant: Validate HGVS variant descriptions and get normalized notation with genomic/transcript/protein consequences.
genome_build ("GRCh37" | "GRCh38"), variant_description (HGVS, e.g. "NM_007294.4:c.5266dup"), select_transcripts (transcript or "all")Capture: HGVS notation (c. and p.), gene symbol, canonical transcript (MANE Select via VariantValidator), consequence type, amino acid change, exon/intron location.
Tools: ClinVar_search_variants, gnomad_search_variants, gnomad_get_variant, OMIM_search, OMIM_get_entry, ClinGen_search_gene_validity, ClinGen_search_dosage_sensitivity, ClinGen_search_actionability, COSMIC_search_mutations, COSMIC_get_mutations_by_gene, DisGeNET_search_gene, DisGeNET_get_vda, SpliceAI_predict_splice, SpliceAI_get_max_delta, civic_get_variants_by_gene, civic_search_evidence_items, civic_search_assertions
gnomAD two-step workflow:
gnomad_search_variantsonly accepts rsIDs or variant IDs (not gene names). Search by rsID first, then use the returnedvariant_idwithgnomad_get_variantto get population allele frequencies.CIViC: Use
civic_search_genes(query="<gene_symbol>")to find the CIViC gene ID dynamically (do NOT rely on a hardcoded lookup table). Then usecivic_get_variants_by_gene(gene_id=<id>)andcivic_search_evidence_itemsfor actionability details. Ifcivic_search_genesreturns no results, the gene may not be curated in CIViC — note this gap.OncoKB note: Demo mode only supports BRAF, TP53, ROS1. For other genes, set
ONCOKB_API_TOKENenvironment variable.
Use SpliceAI for: intronic variants near splice sites, synonymous variants, exonic variants near splice junctions.
See CODE_PATTERNS.md for implementation details.
Apply for intronic (non-splice), promoter, UTR, or intergenic variants near disease genes.
Tools: ChIPAtlas_enrichment_analysis, ChIPAtlas_get_peak_data, ENCODE_search_experiments, ENCODE_get_experiment
Before full ACMG classification, check if the variant already has an expert panel classification in ClinVar. Use MyVariant_query_variants with the rsID or HGVS notation — the clinvar field in the response includes clinical significance, review status, and RCV records. If an expert panel has already classified the variant as Pathogenic or Benign, note this prominently and focus on confirming/contextualizing rather than de novo classification.
Primary approach: MyVariant_query_variants with fields=dbnsfp,clinvar,cadd,gnomad_genome retrieves 15+ predictor scores (SIFT, PolyPhen, CADD, REVEL, AlphaMissense, MetaRNN, FATHMM, GERP, PhyloP, etc.) in a single call. This is usually sufficient.
REVEL/AlphaMissense fallback: If MyVariant_query_variants returns no dbnsfp block, use the dedicated tool:
MyVariant_get_pathogenicity_scores (PREFERRED FALLBACK) — returns REVEL, AlphaMissense, SIFT, PolyPhen2, MetaRNN, GERP, PhyloP, and more in a single call with pre-configured dbnsfp fields. Input: variant_id (rsID or HGVS genomic).CADD_get_variant_score (PHRED 0-99) — works for most variantsAlphaMissense_get_variant_score (0-1, needs UniProt ID) — missense onlyEVE_get_variant_score (0-1) — missense onlyEnsemblVEP_annotate_hgvs (VEP with colocated variants) — includes SIFT/PolyPhenConsensus: Run CADD (all variants) + AlphaMissense + EVE (missense). 2+ concordant damaging = strong PP3; 2+ concordant benign = strong BP4.
See ACMG_CLASSIFICATION.md for thresholds.
Tools: PDBe_get_uniprot_mappings, NvidiaNIM_alphafold2 (requires NVIDIA_API_KEY env var; free key at build.nvidia.com), alphafold_get_prediction (param: qualifier, e.g., UniProt accession), InterPro_get_protein_domains, UniProt_get_function_by_accession
Workflow: Get structure -> map residue -> assess domain/functional site -> predict destabilization.
AlphaFold size limitation: Very large proteins (>2,700 aa, e.g., BRCA2 at 3,418 aa) may not have AlphaFold predictions via the standard API. Fall back to published structural studies or
PDBe_get_uniprot_mappingsfor experimental structures.
AlphaMissense / REVEL / CADD give a pathogenicity score but no mechanism. When you need to answer "how does this variant disrupt protein function" — e.g. for VUS write-ups, clinical reports, or to triangulate a discordant predictor consensus — use the ESMC-6B Sparse Autoencoder to identify which interpretable protein-language-model features the mutation disrupts.
One-call mechanism summary (recommended starting point):
mech = tu.tools.ESM_explain_variant_mechanism(
sequence=wt_aa_sequence, # full reference protein sequence
position=600, # 1-indexed
ref_aa="V",
alt_aa="E",
top_k_features=5, # describe top 5 lost + top 5 gained
)
# mech["data"]["mechanism_summary"] e.g.:
# "Disrupted feature categories (lost): catalytic=2, ligand-binding=1;
# Induced feature categories (gained): structural-stability=1"
Returns mechanism_summary, per-feature lost/gained tables, and category aggregates. Use the category aggregate to support or qualify the pathogenicity verdict in the report:
catalytic / ligand-binding / ptm lost → mechanistic support for PP3secondary-structure / structural-stability gained on a stable WT region → mechanistic basis for "destabilizing" claimWhen you have a saturation question (e.g. "score all 19 substitutions at residue 600 to find the most disruptive"): use ESM_score_variant_sae_batch — 1 Forge call for the reference + 1 per variant, instead of 2 per variant.
When the region is what matters (e.g. "what's the SAE signature of the kinase activation loop, residues 754-771"): use ESM_get_region_sae_features then ESM_describe_sae_feature on the top hits.
Requires: ESM_API_KEY env var (free non-commercial token at https://forge.evolutionaryscale.ai) and pip install 'esm @ git+https://github.com/evolutionaryscale/esm@ee891c52' (SAE support is on an unmerged feature branch — PyPI esm 3.2.x does NOT include SAEConfig). License: EvolutionaryScale Cambrian Inference License — non-commercial use only.
Tools: CELLxGENE_get_expression_data, CELLxGENE_get_cell_metadata, GTEx_get_median_gene_expression
Confirms gene expression in disease-relevant tissues. Supports PP4 if highly restricted; challenges classification if not expressed in affected tissue.
Tools: PubMed_search_articles, EuropePMC_search_articles, BioRxiv_list_recent_preprints, MedRxiv_get_preprint, openalex_search_works, SemanticScholar_search_papers
Always flag preprints as NOT peer-reviewed.
Apply all relevant evidence codes (PVS1, PS1, PS3, PM1, PM2, PM5, PP3, PP5 for pathogenic; BA1, BS1, BS3, BP4, BP7 for benign). See ACMG_CLASSIFICATION.md for the complete algorithm.
BS1 (allele frequency too high for disorder) requires gene-specific calibration, not a universal cutoff:
This is one of the most challenging scenarios in variant interpretation. When a biochemical assay shows damage but population/epidemiological data shows no disease association:
Modern clinical labs use a point-based system instead of the original rule-counting approach:
| Evidence Level | Pathogenic Points | Benign Points | |---|---|---| | Very Strong (PVS1) | +8 | -- | | Strong (PS1-PS4) | +4 each | -4 each (BS1-BS4) | | Moderate (PM1-PM6) | +2 each | -- | | Supporting (PP1-PP5) | +1 each | -1 each (BP1-BP7) | | Stand-alone (BA1) | -- | -8 |
Classification by total points:
This system handles conflicting evidence naturally — a variant with PS3 (+4) and BS1 (-4) and BP4 (-1) nets -1, which is VUS. The original rule-based approach struggles with this scenario.
Computational procedure: ACMG Bayesian classification
# Automated ACMG point calculation
# Input: dict of evidence codes with their applied strength
def classify_acmg(evidence: dict) -> dict:
"""
Classify a variant using the Bayesian ACMG point system.
Args:
evidence: dict mapping ACMG codes to strength levels.
Pathogenic codes: 'very_strong', 'strong', 'moderate', 'supporting'
Benign codes: 'stand_alone', 'strong', 'supporting'
Example:
evidence = {
'BS1': 'strong', # AF too high
'BS3': 'supporting', # Epidemiological evidence against pathogenicity
'BP6': 'supporting', # ClinVar benign consensus
'PP3': 'supporting', # Computational predictors say damaging
}
"""
pathogenic_points = {
'very_strong': 8, 'strong': 4, 'moderate': 2, 'supporting': 1
}
benign_points = {
'stand_alone': -8, 'strong': -4, 'supporting': -1
}
total = 0
details = []
for code, strength in evidence.items():
if code.startswith(('PVS', 'PS', 'PM', 'PP')):
pts = pathogenic_points.get(strength, 0)
elif code.startswith(('BA', 'BS', 'BP')):
pts = benign_points.get(strength, 0)
else:
pts = 0
total += pts
details.append(f"{code} ({strength}): {pts:+d}")
if total >= 10:
classification = "Pathogenic"
elif 6 <= total <= 9:
classification = "Likely Pathogenic"
elif -5 <= total <= 5:
classification = "VUS"
elif -9 <= total <= -6:
classification = "Likely Benign"
else:
classification = "Benign"
return {
'classification': classification,
'total_points': total,
'evidence_breakdown': details
}
# Example: PALB2 c.2816T>G (from test case)
result = classify_acmg({
'BS1': 'strong', # gnomAD AF 0.00105 exceeds threshold
'BS3': 'supporting', # Case-control study shows no association
'BP6': 'supporting', # ClinVar 13 submitters say benign/likely benign
})
# Output: Likely Benign, total_points=-6, evidence: BS1(strong):-4, BS3(supporting):-1, BP6(supporting):-1
Use this procedure after collecting all evidence from Phases 1-5 to compute the final classification.
ClinGen Variant Curation Expert Panels (VCEPs) publish gene-specific ACMG modifications. Before classifying, check if a VCEP exists:
ClinGen_search_gene_validity(gene="<gene_symbol>") — if validity is "Definitive" or "Strong", a VCEP likely existsNot all computational predictors are equal. For missense variants:
When predictors disagree: if REVEL says tolerated but SIFT/PolyPhen say damaging, lean toward REVEL. If REVEL is unavailable, require 3+ concordant predictions for PP3/BP4.
If a primary tool fails, use these alternatives:
MyVariant_query_variants with rsID or HGVS — the clinvar field in MyVariant is more reliable for variant lookup than NCBI Entrez searchEnsemblVEP_annotate_hgvs which includes gnomAD frequency via colocated variantsdbnsfp block from MyVariantPDBe_get_uniprot_mappings for experimental structuresNovel Missense VUS: Check PM5 (other pathogenic at same residue), get AlphaFold2 structure, apply PM1/PP3 as appropriate.
Truncating Variant: Check LOF mechanism, NMD escape, alternative isoforms, ClinGen LOF curation. Apply PVS1 at appropriate strength.
Splice Variant: Run SpliceAI, assess canonical splice distance, in-frame skipping potential. Apply PP3/BP7 based on scores.
# Variant Interpretation Report: {GENE} {VARIANT}
## Executive Summary
## 1. Variant Identity
## 2. Population Data
## 3. Clinical Database Evidence
## 4. Computational Predictions
## 5. Structural Analysis
## 6. Literature Evidence
## 7. ACMG Classification
## 8. Clinical Recommendations
## 9. Limitations & Uncertainties
## Data Sources
File naming: {GENE}_{VARIANT}_interpretation_report.md
Pathogenic/Likely Pathogenic: Enhanced screening, risk-reducing options, drug dosing adjustment, reproductive counseling, family cascade screening.
VUS: Do not use for medical decisions. Reinterpret in 1-2 years. Pursue functional studies and segregation data.
Benign/Likely Benign: Not expected to cause disease. No cascade testing needed.
| Section | Requirement | |---------|-------------| | Population frequency | gnomAD overall + at least 3 ancestry groups | | Predictions | At least 3 computational predictors | | Literature search | At least 2 search strategies | | ACMG codes | All applicable codes listed |
For amino acid properties at variant position, run: python3 skills/tooluniverse-sequence-analysis/scripts/amino_acids.py --type amino_acid --code X
ACMG_CLASSIFICATION.md - Evidence codes, classification algorithm, prediction thresholds, structural/regulatory impact tablesCODE_PATTERNS.md - Reusable code patterns for each workflow phaseCHECKLIST.md - Pre-delivery verificationEXAMPLES.md - Sample interpretationsTOOLS_REFERENCE.md - Tool parameters and fallbackstools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.