plugin/skills/tooluniverse-diagnostic-test-evaluation/SKILL.md
Diagnostic test / biomarker accuracy — sensitivity, specificity, PPV, NPV, likelihood ratios, accuracy from a 2x2 table; ROC curve, AUC, and the optimal cutoff (Youden) for a continuous biomarker; and post-test probability via Bayes. Use when you have test results vs a gold standard (binary 2x2, or a continuous score + true labels) and need to judge how good the test is, pick a threshold, or compute the probability of disease given a result. Emphasizes the prevalence-dependence of PPV/NPV.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-diagnostic-test-evaluationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Judge how well a test or biomarker discriminates disease — at a fixed cutoff (2×2) or across all cutoffs (ROC) — and turn a result into a probability of disease.
| You have… | Go to |
|---|---|
| A 2×2 table (TP/FP/TN/FN) at a fixed cutoff | Step 1 (Epidemiology_diagnostic) |
| A continuous biomarker score + true labels | Step 2 (ROC / AUC / Youden, Python) |
| A test's sens/spec + a patient's pre-test probability | Step 3 (Epidemiology_bayesian) |
tu run Epidemiology_diagnostic '{"operation":"diagnostic","tp":90,"fp":10,"tn":180,"fn":20}'
Returns sensitivity, specificity, PPV, NPV, accuracy, LR_pos, LR_neg, and the sample prevalence.
| Metric | Question it answers | Depends on prevalence? | |---|---|---| | Sensitivity = TP/(TP+FN) | Of those WITH disease, what fraction test positive? | No | | Specificity = TN/(TN+FP) | Of those WITHOUT disease, what fraction test negative? | No | | PPV = TP/(TP+FP) | If positive, what's the chance of disease? | Yes — strongly | | NPV = TN/(TN+FN) | If negative, what's the chance of being disease-free? | Yes | | LR+ = sens/(1−spec) | How much a positive raises the odds of disease | No | | LR− = (1−sens)/spec | How much a negative lowers the odds | No |
The PPV/NPV trap. Sensitivity and specificity are properties of the test; PPV and NPV depend on the disease prevalence in the tested population. A test with great sens/spec has poor PPV in a low-prevalence (screening) setting. Never quote PPV/NPV from a case-control design (its 50/50 prevalence is artificial) — compute them for the real-world prevalence with
Epidemiology_bayesian(Step 3). Report sensitivity, specificity, and likelihood ratios as the prevalence-independent summary.
When the test is a continuous score, evaluate across all thresholds:
python skills/tooluniverse-diagnostic-test-evaluation/scripts/roc_analysis.py --input scores.csv
# scores.csv columns: label (1=disease, 0=healthy), score (continuous biomarker)
It reports AUC (with a bootstrap 95% CI), the Youden-optimal cutoff (max sensitivity+specificity−1) and its sens/spec, and a text ROC curve.
| AUC | Discrimination | |---|---| | 0.5 | no better than chance | | 0.7–0.8 | acceptable | | 0.8–0.9 | excellent | | >0.9 | outstanding |
Turn a result into the probability of disease for a given pre-test probability/prevalence:
tu run Epidemiology_bayesian '{"operation":"bayesian","prevalence":0.10,
"sensitivity":0.90,"specificity":0.95,"test_result":"positive"}'
Returns pre_test_odds, the LR, and post_test_probability. This is how you get the real-world PPV: plug the true prevalence in. (Example: a 90%/95% test at 10% prevalence gives a post-positive probability of only ~67%, not 95%.)
tooluniverse-statistical-modeling — logistic regression that produces the score, ORs.tooluniverse-epidemiological-analysis — population-level risk, screening program metrics.tooluniverse-meta-analysis — pool diagnostic accuracy across studies.tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.