plugin/skills/tooluniverse-regulatory-variant-analysis/SKILL.md
Non-coding/regulatory variant interpretation — GWAS association lookup, eQTL evidence (GTEx), chromatin state (ENCODE), regulatory variant scoring (RegulomeDB, CADD), and TF-binding disruption. Use for non-coding GWAS hit interpretation, eQTL-based gene assignment, and regulatory mechanism reasoning. Distinct from coding-variant tools.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-regulatory-variant-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Systematic regulatory variant interpretation: discover trait associations from GWAS, map eQTL effects, annotate chromatin context, assess regulatory element overlap, and produce evidence-graded functional impact predictions for non-coding variants.
NOT for (use other skills instead):
tooluniverse-variant-interpretationtooluniverse-variant-interpretationtooluniverse-gene-disease-associationtooluniverse-pharmacogenomicstooluniverse-epigenomicsWhen evaluating a non-coding variant, build evidence across four questions:
1. Is the variant in a regulatory element? Use RegulomeDB to assess whether the variant overlaps TF binding sites, chromatin accessibility peaks, or known regulatory annotations. A low RegulomeDB score (categories 1a-2a) indicates strong evidence that the position is functionally active. Confirm with ENCODE histone marks: H3K27ac signals active enhancers and active promoters; H3K4me1 alone marks poised enhancers; H3K4me3 marks active promoters; H3K27me3 marks silenced regions.
2. Does it alter a transcription factor binding site? Check RegulomeDB's TF binding evidence and ENCODE TF ChIP-seq experiments. A variant that falls within a TF footprint and disrupts the consensus motif is mechanistically actionable, especially if the TF is known to be relevant in the disease tissue.
3. Is there eQTL evidence linking it to a gene? Query GTEx to determine whether the variant (or variants in tight LD) modulates expression of a nearby gene in a tissue-specific or ubiquitous manner. A tissue-specific eQTL suggests cell-type-specific regulation; a ubiquitous eQTL suggests a core regulatory element. The direction of the NES (positive = alternative allele increases expression, negative = decreases) and effect size matter for interpretation.
4. Is there GWAS evidence for trait association? Search the GWAS Catalog for the rsID or the surrounding locus. Genome-wide significant associations (p < 5×10⁻⁸) in relevant traits anchor the variant's biological importance. Cross-reference with OpenTargets for locus-to-gene mapping from multiple GWAS studies.
Synthesizing the evidence: Build a multi-layer case. A variant with GWAS significance + eQTL evidence + RegulomeDB score 1a-2a + active chromatin (H3K27ac) in the relevant tissue represents high-confidence regulatory impact. Two or three converging lines of evidence (e.g., eQTL plus active enhancer) constitute moderate confidence. A single line, or a variant only in a poised but not active regulatory context, represents lower confidence.
Input (rsID, genomic coordinates, trait/disease, gene)
|
v
Phase 0: Variant/Trait Resolution
Resolve rsIDs, map trait names to EFO/MONDO IDs via OLS
|
v
Phase 1: GWAS Association Lookup
GWAS Catalog associations, p-values, effect sizes, study metadata
|
v
Phase 2: eQTL Analysis
GTEx tissue-specific eQTLs, target gene identification
|
v
Phase 3: Regulatory Element Annotation
ENCODE histone marks, RegulomeDB scores, chromatin state
|
v
Phase 4: OpenTargets GWAS Integration
OpenTargets GWAS study aggregation, locus-to-gene mapping
|
v
Phase 5: Functional Impact Synthesis
Integrate all evidence, assign regulatory impact level
|
v
Phase 6: Report
Evidence-graded regulatory variant report
Use ols_search_terms to resolve trait names to ontology IDs before GWAS queries. Restrict to ontology="efo" for GWAS traits; OpenTargets prefers MONDO IDs (e.g., MONDO_0005148 for type 2 diabetes rather than EFO_0001360). Use EnsemblVEP_annotate_rsid (param is variant_id, not rsid) for initial consequence annotation and nearest gene identification.
gwas_search_associations is the primary tool: accepts disease_trait (free text), efo_id (preferred for precision), rs_id, and p_value threshold. Use p_value=5e-8 for genome-wide significance. For locus-level discovery, gwas_get_variants_for_trait retrieves all SNPs for a trait. gwas_get_snps_for_gene finds GWAS-cataloged SNPs mapped to a specific gene.
Reasoning tip: When GWAS Catalog returns empty for a free-text trait, switch to the efo_id parameter — the catalog uses controlled vocabulary and free-text matching is imprecise.
GTEx_query_eqtl accepts a gene symbol (auto-resolved to GENCODE ID) or Ensembl gene ID. It returns tissue-specific SNP-gene associations with NES (normalized effect size) and p-value per tissue.
When interpreting results, ask: does the eQTL effect occur in the tissue most relevant to the disease? A brain-specific eQTL for a neurodegenerative disease variant is more compelling than a ubiquitous one. Use GTEx_get_median_gene_expression to confirm that the target gene is actually expressed in the relevant tissue before placing weight on eQTL evidence.
Note: GTEx API uses v8 data; gtex_v10 endpoints may return empty for some queries.
RegulomeDB_query_variant (param: rsid) returns a regulatory score and feature annotations. Scores in categories 1a–2a indicate strong regulatory evidence (eQTL overlap + TF binding + chromatin accessibility). Scores 3a–6 represent progressively weaker evidence.
ENCODE_search_histone_experiments accepts histone_mark (e.g., "H3K27ac") and biosample_term_name (tissue or cell line name — NOT a disease name; ENCODE uses biological sample names like "liver" or "breast epithelium"). Use assay_title="TF ChIP-seq" (not just "ChIP-seq") when querying TF binding data.
Reasoning tip: RegulomeDB aggregates ENCODE, Roadmap, and other data. If ENCODE doesn't have the specific biosample, RegulomeDB may still have aggregate evidence from related cell types.
When you have the variant as a GRCh38 coordinate, FAVOR_annotate_variant(variant="chr-pos-ref-alt") returns a regulatory annotation block (plus conservation, frequency, and CADD) in one call — use it to quickly confirm whether the position falls in an annotated regulatory element before drilling into RegulomeDB/ENCODE.
OpenTargets_search_gwas_studies_by_disease takes diseaseIds as an array of MONDO IDs. It provides locus-to-gene (L2G) scores from multiple GWAS studies, which go beyond simple proximity to incorporate colocalisation, eQTL, and chromatin data. Use OpenTargets_multi_entity_search_by_query_string or OpenTargets_get_disease_id_description_by_name to resolve disease names to MONDO/EFO IDs first.
After collecting evidence, reason through the layers:
disease_trait to efo_id; broaden the trait term.size parameter.FAVOR_annotate_variant (GRCh38 coordinate) for its regulatory + conservation annotation; the variant may lack regulatory annotations in available data.OpenTargets_multi_entity_search_by_query_string first to confirm the correct ID.Step 1: gwas_search_associations(rs_id="rs429358")
-> All trait associations (Alzheimer's disease, LDL cholesterol, etc.)
Step 2: GTEx_query_eqtl(gene_symbol="APOE")
-> Tissue-specific eQTL evidence; note effect in brain vs liver
Step 3: RegulomeDB_query_variant(rsid="rs429358")
-> Regulatory score and TF binding annotations
Step 4: ENCODE_search_histone_experiments(histone_mark="H3K27ac", biosample_term_name="brain")
-> Active enhancer context near the variant
Step 5: Synthesize: does GWAS significance + eQTL + active chromatin converge on one gene?
Step 1: EnsemblVEP_annotate_rsid(variant_id="rs12345678")
-> Confirm non-coding consequence, identify nearest gene
Step 2: RegulomeDB_query_variant(rsid="rs12345678")
-> Is this position in a regulatory context?
Step 3: gwas_search_associations(rs_id="rs12345678")
-> Any GWAS associations in relevant traits?
Step 4: GTEx_query_eqtl(gene_symbol=nearest_gene)
-> Does this variant or nearby variants modulate expression?
Step 5: ENCODE_search_histone_experiments(histone_mark="H3K27ac", biosample_term_name=relevant_tissue)
-> Active chromatin confirmation
Step 6: Classify impact based on convergence of evidence lines
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.