skills/ancestry-pca/SKILL.md
Ancestry decomposition PCA against the Simons Genome Diversity Project
npx skillsauth add andyzhuang/openlife claw-ancestry-pcaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Place your study cohort in global genetic context by computing a joint PCA against the Simons Genome Diversity Project (SGDP) — 345 samples from 164 populations spanning every inhabited continent.
If you ask ChatGPT to "run a PCA against a global reference panel," it will:
This skill encodes the correct methodological decisions:
The skill bundles the SGDP v4 dataset (Mallick et al., 2016, Nature):
python ancestry_pca.py \
--vcf your_cohort.vcf.gz \
--pop-map your_populations.tsv \
--output ancestry_report
python ancestry_pca.py --demo --output demo_report
The demo uses pre-computed PCA results from the Peruvian Genome Project (736 samples, 28 populations) and generates the full 4-panel figure instantly.
Ancestry Decomposition PCA
==========================
Cohort: 736 samples, 28 populations
Reference: SGDP (345 samples, 164 populations)
Common variants: 42,831 biallelic SNPs
Variance explained:
PC1: 51.44% PC2: 21.70% PC3: 6.70%
Panel D — Global Context:
Cohort samples cluster between European and East Asian
reference populations, with Amazonian groups showing
distinct positioning from Highland and Coastal groups.
Figures saved to: ancestry_report/
Figure3_PCA_composite.png (300 dpi)
Figure3_PCA_composite.pdf (vector)
Reproducibility:
commands.sh | environment.yml | checksums.sha256
If you use this skill in a publication, please cite:
tools
Search ClinicalTrials.gov with natural language queries. Find clinical trials, enrollment, and outcomes using Valyu semantic search.
development
Comprehensive citation management for academic research. Search Google Scholar and PubMed for papers, extract accurate metadata, validate citations, and generate properly formatted BibTeX entries. This skill should be used when you need to find papers, verify citation information, convert DOIs to BibTeX, or ensure reference accuracy in scientific writing.
development
Unified Python interface to 40+ bioinformatics services. Use when querying multiple databases (UniProt, KEGG, ChEMBL, Reactome) in a single workflow with consistent API. Best for cross-database analysis, ID mapping across services. For quick single-database lookups use gget; for sequence/file manipulation use biopython.
tools
Search bioRxiv biology preprints with natural language queries. Semantic search powered by Valyu.