skills/tooluniverse-diagnostic-test-evaluation/SKILL.md
Diagnostic test / biomarker accuracy — sensitivity, specificity, PPV, NPV, likelihood ratios, accuracy from a 2x2 table; ROC curve, AUC, and the optimal cutoff (Youden) for a continuous biomarker; and post-test probability via Bayes. Use when you have test results vs a gold standard (binary 2x2, or a continuous score + true labels) and need to judge how good the test is, pick a threshold, or compute the probability of disease given a result. Emphasizes the prevalence-dependence of PPV/NPV.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-diagnostic-test-evaluationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Judge how well a test or biomarker discriminates disease — at a fixed cutoff (2×2) or across all cutoffs (ROC) — and turn a result into a probability of disease.
| You have… | Go to |
|---|---|
| A 2×2 table (TP/FP/TN/FN) at a fixed cutoff | Step 1 (Epidemiology_diagnostic) |
| A continuous biomarker score + true labels | Step 2 (ROC / AUC / Youden, Python) |
| A test's sens/spec + a patient's pre-test probability | Step 3 (Epidemiology_bayesian) |
tu run Epidemiology_diagnostic '{"operation":"diagnostic","tp":90,"fp":10,"tn":180,"fn":20}'
Returns sensitivity, specificity, PPV, NPV, accuracy, LR_pos, LR_neg, and the sample prevalence.
| Metric | Question it answers | Depends on prevalence? | |---|---|---| | Sensitivity = TP/(TP+FN) | Of those WITH disease, what fraction test positive? | No | | Specificity = TN/(TN+FP) | Of those WITHOUT disease, what fraction test negative? | No | | PPV = TP/(TP+FP) | If positive, what's the chance of disease? | Yes — strongly | | NPV = TN/(TN+FN) | If negative, what's the chance of being disease-free? | Yes | | LR+ = sens/(1−spec) | How much a positive raises the odds of disease | No | | LR− = (1−sens)/spec | How much a negative lowers the odds | No |
The PPV/NPV trap. Sensitivity and specificity are properties of the test; PPV and NPV depend on the disease prevalence in the tested population. A test with great sens/spec has poor PPV in a low-prevalence (screening) setting. Never quote PPV/NPV from a case-control design (its 50/50 prevalence is artificial) — compute them for the real-world prevalence with
Epidemiology_bayesian(Step 3). Report sensitivity, specificity, and likelihood ratios as the prevalence-independent summary.
When the test is a continuous score, evaluate across all thresholds:
python skills/tooluniverse-diagnostic-test-evaluation/scripts/roc_analysis.py --input scores.csv
# scores.csv columns: label (1=disease, 0=healthy), score (continuous biomarker)
It reports AUC (with a bootstrap 95% CI), the Youden-optimal cutoff (max sensitivity+specificity−1) and its sens/spec, and a text ROC curve.
| AUC | Discrimination | |---|---| | 0.5 | no better than chance | | 0.7–0.8 | acceptable | | 0.8–0.9 | excellent | | >0.9 | outstanding |
Turn a result into the probability of disease for a given pre-test probability/prevalence:
tu run Epidemiology_bayesian '{"operation":"bayesian","prevalence":0.10,
"sensitivity":0.90,"specificity":0.95,"test_result":"positive"}'
Returns pre_test_odds, the LR, and post_test_probability. This is how you get the real-world PPV: plug the true prevalence in. (Example: a 90%/95% test at 10% prevalence gives a post-positive probability of only ~67%, not 95%.)
tooluniverse-statistical-modeling — logistic regression that produces the score, ORs.tooluniverse-epidemiological-analysis — population-level risk, screening program metrics.tooluniverse-meta-analysis — pool diagnostic accuracy across studies.tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).