plugin/skills/tooluniverse-population-genetics-1000genomes/SKILL.md
Population genetics using the 1000 Genomes Project (IGSR) — superpopulation/population search, sample metadata, variant frequencies across AFR/AMR/EAS/EUR/SAS, ancestry-specific analyses. Use for ancestry comparison, population-aware allele frequency lookups, and 1000-Genomes-cohort-specific analyses (distinct from gnomAD which has different sample composition).
npx skillsauth add mims-harvard/tooluniverse tooluniverse-population-genetics-1000genomesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Use IGSR tools to search 1000 Genomes populations and samples, explore data collections, and combine with GWAS tools for population-stratified analysis.
tooluniverse-population-geneticstooluniverse-variant-interpretationtooluniverse-gwas-finemappingIGSR_search_populations: superpopulation (string/null, one of AFR/AMR/EAS/EUR/SAS), query (string/null, free-text search by name), limit (int).
Returns {status, data: {total, populations: [{code, name, description, sample_count, superpopulation_code, superpopulation_name, latitude, longitude}]}, metadata: {source, filter_superpopulation, filter_query}}.
Superpopulation codes: | Code | Ancestry | |------|----------| | AFR | African | | AMR | Admixed American | | EAS | East Asian | | EUR | European | | SAS | South Asian |
// List all AFR populations
{"superpopulation": "AFR", "limit": 10}
// Search by name (free-text)
{"query": "Yoruba", "limit": 5}
// List all populations
{"limit": 26}
Response example:
{
"status": "success",
"data": {
"total": 3,
"populations": [
{"code": "YRI", "name": "Yoruba", "description": "Yoruba in Ibadan, Nigeria",
"sample_count": 188, "superpopulation_code": "AFR", "superpopulation_name": "African Ancestry"}
]
}
}
IGSR_search_samples: population (string/null, population code e.g. "YRI"), data_collection (string/null, collection title), sample_name (string/null, specific sample e.g. "NA12878"), limit (int).
Returns {status, data: {total, samples: [{name, sex, biosample_id, populations: [{code, name, superpopulation}], data_collections: [...]}]}}.
// Find all YRI samples
{"population": "YRI", "limit": 10}
// Look up the reference sample NA12878
{"sample_name": "NA12878", "limit": 1}
// Find samples in the 30x high-coverage collection
{"data_collection": "1000 Genomes 30x on GRCh38", "limit": 5}
NOTE: population takes a population code (e.g. "YRI", "GBR", "CHB"), not a superpopulation code. Use IGSR_search_populations first to get population codes if starting from a superpopulation.
IGSR_list_data_collections: limit (int).
Returns {status, data: {total, collections: [{code, title, short_title, sample_count, population_count, data_types, website}]}}.
{"limit": 20}
Key collections available (18 total): | Collection | Description | Data Types | |------------|-------------|------------| | 1000 Genomes on GRCh38 | 2709 samples, 26 populations | sequence, alignment, variants | | 1000 Genomes 30x on GRCh38 | High-coverage resequencing | sequence, alignment, variants | | 1000 Genomes phase 3 release | Original phase 3 | sequence, alignment, variants | | Human Genome Structural Variation Consortium | HGSVC SV discovery | sequence, alignment | | MAGE RNA-seq | RNA-seq data | - | | Geuvadis | Expression + genotype | - |
gwas_search_associations: trait (string, free text), limit (int).
Returns GWAS associations with rsID, p-value, mapped genes, EFO trait IDs.
{"trait": "type 2 diabetes", "limit": 10}
gwas_get_variants_for_trait: trait (string, EFO ID e.g. "EFO_0001645"), limit (int).
{"trait": "EFO_0001645", "limit": 10}
gwas_get_snps_for_gene: gene_symbol (string), limit (int).
Returns SNPs mapped to the gene with rsIDs, genomic positions, functional classes.
{"gene_symbol": "TCF7L2", "limit": 10}
Step 1 -- Find populations of interest:
// Get all EUR populations
{"superpopulation": "EUR", "limit": 10}
// -> Returns codes like GBR, FIN, CEU, TSI, IBS
Step 2 -- Get samples from target population:
// Get YRI samples (AFR)
{"population": "YRI", "limit": 100}
Step 3 -- Get GWAS SNPs for the gene or trait:
// GWAS hits for TCF7L2 (T2D gene)
{"gene_symbol": "TCF7L2", "limit": 20}
Step 4 -- Cross-reference with population data for stratification analysis.
| Code | Population | Superpopulation | |------|-----------|-----------------| | YRI | Yoruba in Ibadan, Nigeria | AFR | | LWK | Luhya in Webuye, Kenya | AFR | | GWD | Gambian Mandinka | AFR | | CEU | Utah residents (CEPH) | EUR | | GBR | British in England/Scotland | EUR | | FIN | Finnish in Finland | EUR | | TSI | Toscani in Italia | EUR | | CHB | Han Chinese in Beijing | EAS | | JPT | Japanese in Tokyo | EAS | | CHS | Southern Han Chinese | EAS | | MXL | Mexican Ancestry in LA | AMR | | PUR | Puerto Rican in Puerto Rico | AMR | | GIH | Gujarati Indian in Houston | SAS | | PJL | Punjabi from Lahore | SAS |
| Grade | Criteria | Example | |-------|----------|---------| | Strong | AF difference > 0.2 across superpopulations, GWAS p < 5e-8, replicated in multiple cohorts | rs7903146 (TCF7L2) with AF = 0.30 EUR vs 0.05 EAS, GWAS p = 1e-40 | | Moderate | AF difference 0.05-0.2, GWAS p < 5e-8 in one ancestry, nominal in others | Variant with AF = 0.15 AFR vs 0.08 EUR, GWAS p < 5e-8 in EUR only | | Weak | AF difference < 0.05, GWAS p < 5e-8 but single study, no cross-ancestry replication | Common variant with similar AF across populations, significant in one cohort | | Population-specific | Variant common (AF > 0.01) in one superpopulation, rare (AF < 0.01) in others | Sickle cell variant (rs334) AF ~0.10 in AFR, < 0.001 elsewhere |
| Tool | Key Parameters | Notes | |------|---------------|-------| | IGSR_search_populations | superpopulation, query, limit | superpopulation: AFR/AMR/EAS/EUR/SAS | | IGSR_search_samples | population, data_collection, sample_name, limit | population = population code (e.g. YRI) | | IGSR_list_data_collections | limit | 18 collections total | | gwas_search_associations | trait, limit | free-text trait search | | gwas_get_variants_for_trait | trait, limit | trait = EFO ID | | gwas_get_snps_for_gene | gene_symbol, limit | returns mapped SNPs |
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.