skills/tooluniverse-epigenomics-chromatin/SKILL.md
Histone-modification ChIP-seq, ATAC-seq accessibility, chromatin state, and TF binding analysis from ENCODE, Roadmap Epigenomics, ChIP-Atlas. Use for chromatin-state-by-tissue queries, TF-binding-by-region, regulatory landscape mapping, and ENCODE-cCRE annotations. For DNA methylation use tooluniverse-epigenomics; for RNA-seq use tooluniverse-rnaseq-deseq2.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-epigenomics-chromatinInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
tooluniverse-epigenomicstooluniverse-rnaseq-deseq2tooluniverse-gwas-snp-interpretationtooluniverse-variant-analysisBefore calling any tool, identify which question type you're answering. Each maps to a different tool set.
(a) Which regulatory elements exist at a locus? Use UCSC_get_encode_cCREs (region-based) or SCREEN_get_regulatory_elements (gene-based). Then check ENCODE_get_chromatin_state for ChromHMM annotation and ENCODE_search_chromatin_accessibility for ATAC-seq evidence.
(b) Which TFs bind there? Use ReMap_get_transcription_factor_binding for ChIP-seq experiments. Use jaspar_search_matrices to retrieve binding motifs and check whether the sequence disrupts a known motif.
(c) How does a variant affect regulation? Use RegulomeDB_query_variant for a scored summary. Then build multi-layer evidence: UCSC_get_encode_cCREs (is the variant in a cCRE?), GTEx_get_single_tissue_eqtls (is it an eQTL?), jaspar_search_matrices (does it disrupt a TF motif?). No single layer is sufficient — see the variant reasoning section below.
(d) What genes are regulated by an element? Use GTEx_get_single_tissue_eqtls or GTEx_query_eqtl to find genes whose expression is associated with variants in the element. Use SCREEN_get_regulatory_elements with element_type="PLS"/"pELS"/"dELS" to classify element-to-promoter relationships.
Use histone mark identity to guide tool queries and interpret results before fetching data.
Bivalent promoter logic: If you observe H3K4me3 + H3K27me3 together at the same locus, the promoter is bivalent — poised but not active. This is common in stem cells and developmentally regulated genes. Do not report such genes as "actively transcribed." Use GTEx_get_expression_summary to check if the gene is actually expressed in the tissue of interest.
Inference rule: If a user asks about a mark you haven't queried yet, ask: does the mark you have found already answer the question? H3K4me3 in a region predicts active transcription; you may not need to also query H3K36me3 unless confirming elongation specifically.
An eQTL means variant X is statistically associated with expression of gene Y in tissue T. Before reporting eQTL results, apply this chain of reasoning:
To assess a non-coding variant's regulatory impact, build evidence from multiple independent layers. No single layer is sufficient.
Layer 1 — RegulomeDB score: High probability (score 1a–2b) means convergent evidence from eQTL + TF binding + DNase. Score 4–7 means weak support. Use as a triage filter.
Layer 2 — Regulatory element overlap: Query UCSC_get_encode_cCREs at the variant's coordinates. If the variant falls in a cCRE (especially PLS or pELS), it is in a functional context.
Layer 3 — eQTL evidence: Query GTEx_get_single_tissue_eqtls for nearby genes. If the variant is a significant eQTL, the association supports regulatory function.
Layer 4 — TFBS disruption: Query jaspar_search_matrices for TFs with motifs at the locus. If the variant changes a high-information-content position in a motif, it is a strong functional candidate.
Synthesis rule: Report each layer separately. Convergence across 3+ layers = high-confidence regulatory variant. A single layer (e.g., eQTL alone) warrants caution.
MyGene_query_genes: query (string). Converts gene symbols to Ensembl IDs and coordinates. Filter results by symbol == '<GENE>' — first hit may not match.
ensembl_lookup_gene: gene_id (Ensembl ID), species (REQUIRED, "homo_sapiens"). Returns chr/start/end.
Key format notes:
ENSG00000012048.20rs4994chr17_43705621_T_C_b38chrom="chr17", start=7668421, end=7687490ENCODE_search_histone_experiments: target (histone mark), cell_type (or tissue alias), biosample_term_name (most explicit ENCODE ontology name), limit.
ENCODE anatomy term notes: "breast" → try "breast epithelium" or "mammary epithelial cell"; "brain" → "brain" works; if 0 results, append "tissue", "epithelium", or "cell".
result = tu.tools.ENCODE_search_histone_experiments(target="H3K27ac", cell_type="GM12878", limit=5)
# result["data"]["experiments"][0]["accession"] -> "ENCSR000AKC"
GEO_search_chipseq_datasets: Fallback for older or non-ENCODE ChIP-seq datasets.
ENCODE_search_chromatin_accessibility: cell_type, limit. Returns ATAC-seq experiments.
ENCODE_get_chromatin_state: cell_type, limit. Returns ChromHMM 15-state annotations (TssA, Enh, TssBiv, ReprPC, etc.). Use to confirm bivalent promoter state or enhancer classification.
ENCODE_search_rnaseq_experiments: assay_type (default "total RNA-seq"), biosample, limit. If 0 results, retry with assay_type="polyA plus RNA-seq".
GEO_search_rnaseq_datasets / GEO_search_atacseq_datasets: query, organism, limit (also max_results). GEO adds "ATAC-seq" automatically for the ATAC tool.
ReMap_get_transcription_factor_binding (CTCF): gene_name="CTCF", cell_type, limit. Returns ENCODE TF ChIP-seq experiments.
SCREEN_get_regulatory_elements: gene_name, element_type (PLS/pELS/dELS/CTCF-only/DNase-H3K4me3), limit.
UCSC_get_encode_cCREs: chrom (REQUIRED), start (REQUIRED), end (REQUIRED), genome (default "hg38"). Returns cCREs with Z-scores for DNase, H3K4me3, H3K27ac, CTCF signals.
# cCREs near TP53
result = tu.tools.UCSC_get_encode_cCREs(chrom="chr17", start=7668421, end=7687490, genome="hg38")
ENCODE_search_annotations: annotation_type ("candidate Cis-Regulatory Elements" or "chromatin state"), biosample_term_name, organism, assembly, limit.
GTEx_get_single_tissue_eqtls: gene_symbol. Returns all significant eQTLs across tissues with snpId, pValue, tissueSiteDetailId, nes (normalized effect size).
result = tu.tools.GTEx_get_single_tissue_eqtls(gene_symbol="BRCA1")
from collections import Counter
tissue_counts = Counter(e["tissueSiteDetailId"] for e in result["data"])
GTEx_query_eqtl: gene_symbol, tissue (tissueSiteDetailId), page (1-indexed), size. Use for a specific tissue.
GTEx_get_multi_tissue_eqtls: operation="get_multi_tissue_eqtls", gencode_id (versioned, REQUIRED). Returns per-variant m-values showing tissue-sharing. m-value near 1.0 = effect present; near 0.0 = absent.
result = tu.tools.GTEx_get_multi_tissue_eqtls(
operation="get_multi_tissue_eqtls",
gencode_id="ENSG00000012048.20"
)
GTEx_calculate_eqtl: operation="calculate_eqtl", gencode_id, variant_id (chr_pos_ref_alt_b38), tissue_site_detail_id. Works for non-significant pairs.
eQTL_list_datasets / eQTL_get_associations: EBI eQTL Catalogue. Use dataset_id (from list call), gene_id (Ensembl), variant. Complementary to GTEx.
GTEx_get_expression_summary: gene_symbol. Recommended — auto-resolves GENCODE versions. Returns median TPM per tissue.
result = tu.tools.GTEx_get_expression_summary(gene_symbol="BRCA1")
top_tissues = sorted(result["data"], key=lambda x: x["median"], reverse=True)[:5]
GTEx_get_median_gene_expression: Requires operation="get_median_gene_expression" + exact versioned gencode_id. Use only when version precision is needed.
GTEx_get_tissue_sites: No params. Returns all tissueSiteDetailId values.
jaspar_search_matrices: name (TF name), collection ("CORE"), tax_group ("vertebrates"), species ("9606"), page_size.
result = tu.tools.jaspar_search_matrices(name="CTCF", collection="CORE", page_size=5)
jaspar_get_matrix: Returns position frequency matrix for a JASPAR matrix ID. Use to check if a variant allele disrupts a high-information-content position.
ReMap_get_transcription_factor_binding: gene_name (TF), cell_type, limit. Same tool used for CTCF in Phase 2 — applies to any TF.
STRING_get_functional_annotations: identifiers (gene name), species (9606), category ("Process"/"Function"/"KEGG"). Returns GO/KEGG/Reactome annotations for regulatory context.
RegulomeDB_query_variant: rsid (e.g., "rs4994"). Returns probability, ranking (1a = strongest, 7 = weakest), and tissue-specific scores.
result = tu.tools.RegulomeDB_query_variant(rsid="rs4994")
score = result["data"]["regulome_score"]
# score["ranking"]: "1a" (eQTL + TF + motif + DNase) ... "7" (no evidence)
# score["probability"]: 0.0–1.0
top_tissues = sorted(score["tissue_specific_scores"].items(), key=lambda x: float(x[1]), reverse=True)[:5]
Rankings 1a–1f all have eQTL evidence. Rankings 2a–3b have TF binding without eQTL. Rankings 4–7 have decreasing evidence. Use ranking <= 2b as a threshold for "strong regulatory support."
Combine evidence tiers before reporting:
Convergence of T1+T2 evidence from independent sources (e.g., ENCODE ChIP-seq overlapping a RegulomeDB 1a variant with GTEx eQTL) constitutes strong evidence for regulatory function. Contradictions between layers (e.g., high RegulomeDB score but no eQTL) should be explicitly noted.
| Phase | Primary Tool | Fallback | |-------|-------------|----------| | Histone ChIP-seq | ENCODE_search_histone_experiments | GEO_search_chipseq_datasets | | RNA-seq | ENCODE_search_rnaseq_experiments (total RNA-seq) | retry with polyA plus RNA-seq | | ATAC-seq | ENCODE_search_chromatin_accessibility | GEO_search_atacseq_datasets | | cCREs | UCSC_get_encode_cCREs | SCREEN_get_regulatory_elements | | eQTLs | GTEx_get_single_tissue_eqtls | eQTL_get_associations (EBI) | | Expression | GTEx_get_expression_summary | GTEx_get_median_gene_expression | | TF motifs | jaspar_search_matrices | ReMap_get_transcription_factor_binding | | Variant scoring | RegulomeDB_query_variant | combine eQTL + TF binding manually |
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.