skills/tooluniverse-meta-analysis/SKILL.md
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).
npx skillsauth add mims-harvard/tooluniverse tooluniverse-meta-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Pool quantitative results from multiple studies into one estimate, and judge how consistent the studies are. This is the statistical half of a systematic review (the literature-collection half is tooluniverse-literature-deep-research).
Do NOT use it to find the studies — use tooluniverse-literature-deep-research / the literature tools for that, then bring the extracted numbers here. Before trusting any input study, consider checking it with Crossref_check_retraction.
1. Extract (effect, SE) per study ← THE ERROR-PRONE STEP
2. Pick fixed vs random effects
3. Pool: MetaAnalysis_run
4. Read heterogeneity (I², Q, τ²)
5. Forest plot + interpret
The pooling step needs an effect size on an additive scale and its standard error. Ratio measures (OR/RR/HR) must be log-transformed first. Most reported numbers give you a CI, not an SE — derive the SE from the CI.
| What the paper reports | effect_size | se |
|---|---|---|
| OR / RR / HR with 95% CI [L, U] | ln(point) | (ln(U) − ln(L)) / (2 × 1.96) |
| OR / RR / HR with a p-value (no CI) | ln(point) | |ln(point)| / z_from_p (two-sided z) |
| GWAS / regression β with SE | β (as reported) | the reported SE |
| Two groups, means + SDs + n₁,n₂ | Hedges' g (see script) | SE of g (see script) |
| Single proportion p, n | logit(p)=ln(p/(1−p)) | sqrt(1/(np) + 1/(n(1−p))) |
| Pearson correlation r, n | Fisher z = atanh(r) | 1 / sqrt(n − 3) |
Critical rules
ln(OR) and exponentiate the pooled result back. The script does this for you.2 × 1.96 assumes a 95% CI; use 2 × 1.645 for 90%, 2 × 2.576 for 99%.The helper script does these conversions — prefer it over hand math:
python skills/tooluniverse-meta-analysis/scripts/meta_analysis.py --input studies.csv
# studies.csv columns (use the set that matches your data):
# name, or, ci_low, ci_high (ratio + CI)
# name, beta, se (already on log/linear scale)
# name, mean1, sd1, n1, mean2, sd2, n2 (two-group means -> Hedges' g)
# name, r, n (correlation -> Fisher z)
| Use fixed-effects when | Use random-effects when | |---|---| | Studies estimate the same true effect (e.g. exact replications, one trial split by site) | Studies differ in population/design/dose (the usual real-world case) | | I² is low (<25%) | I² is moderate–high, or studies are clinically heterogeneous |
When unsure, report random-effects (DerSimonian–Laird) as primary — it is the conservative default and widens the CI to reflect between-study variance.
tu run MetaAnalysis_run '{"method":"random","studies":[
{"name":"Smith 2019","effect_size":0.41,"se":0.12},
{"name":"Lee 2021","effect_size":0.67,"se":0.18},
{"name":"Garcia 2023","effect_size":0.33,"se":0.10}]}'
Returns pooled_effect, pooled_se, pooled_ci_lower/upper, pooled_z, pooled_p_value, a heterogeneity block (Q, Q_df, Q_p_value, I_squared, tau_squared), and per_study weights + CIs.
Scale foot-gun — read this.
MetaAnalysis_runpools whatever scale you hand it and has no idea your inputs were ratios. For an OR/RR/HR you MUST pass the log-transformedeffect_size+sefrom Step 1 (e.g.ln(1.42)=0.351, not1.42) — feeding raw ratios silently produces a wrong pooled value with no error. And the values it returns — including its proseinterpretationstring — are on that same log scale. So: ignore the tool'sinterpretationfield for ratios, andexp()thepooled_effectand CI bounds back to the OR/RR/HR scale yourself before reporting. The helper script avoids all of this — it takes raw ORs, tracks the scale, and prints results already back-transformed.
| I² | Heterogeneity | What it means | |---|---|---| | 0–25% | Low | Studies largely agree; fixed-effects is defensible | | 25–50% | Moderate | Prefer random-effects; note the variability | | 50–75% | Substantial | Random-effects; investigate sources (subgroup / meta-regression) | | >75% | Considerable | Pooling may be inappropriate — explain why studies differ instead |
Q_p_value < 0.10 → statistically significant heterogeneity (Q is low-powered, so 0.10 not 0.05).tau_squared is the between-study variance on the effect scale; > 0 is what random-effects adds over fixed.Q_p_value ≫ 0.10), especially with few studies: fixed and random-effects converge — report random-effects as primary and note that fixed-effects agrees. Don't agonize over the model choice when both give essentially the same pooled estimate.The script prints a text forest plot (per-study effect, CI, weight%, and the pooled diamond). Report, in order:
Example: "Across 3 cohorts (N=4,210), the pooled OR was 1.51 (95% CI 1.33–1.72, p=3.4×10⁻⁶), random-effects. Heterogeneity was substantial (I²=55%, Q p=0.11), so the random-effects model is reported; all three studies showed the same direction of effect."
Crossref_check_retraction) first.tooluniverse-literature-deep-research — find and grade the studies to feed in.tooluniverse-statistical-modeling — single-study regression, Cox, ORs (see references/cox_regression.md for HR extraction).tooluniverse-gwas-study-explorer / tooluniverse-gwas-finemapping — GWAS-specific multi-cohort analysis.tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.