clinical-biostatistics/missing-data-sensitivity/SKILL.md
Implements missing-data sensitivity analyses for confirmatory clinical trials including MMRM under MAR (with Kenward-Roger correction), reference-based multiple imputation (J2R, CR, CIR, LMCF per Carpenter-Roger 2013), Permutt delta-adjustment / tipping-point analysis, pattern-mixture identifying restrictions (CCMV, NCMV, ACMV), and the Cro vs Bartlett variance debate. Use when handling missing primary or secondary endpoint data in regulatory submissions following NRC 2010 and ICH E9(R1).
npx skillsauth add GPTomics/bioSkills bio-clinical-biostatistics-missing-dataInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: R mmrm 0.3+ (Roche/openpharma), R rbmi 1.5+ (Roche/Bayer via insightsengineering), R mice 3.16+, R mitools 2.4+, Python sklearn 1.4+, statsmodels 0.14+.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_namepip show <package> then help(module.function)If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Handle missing data in a confirmatory clinical trial" -> Pre-specify the missing-data assumption per ICH E9(R1); execute the primary analysis under the chosen assumption (typically MAR via MMRM or MI); run clinically-articulable MNAR sensitivity analyses (reference-based MI per Carpenter-Roger 2013); report the tipping delta that would overturn the conclusion (Permutt 2016).
The U.S. National Research Council Panel ("The Prevention and Treatment of Missing Data in Clinical Trials," 2010; chaired by Roderick Little; Little, D'Agostino, Cohen et al 2012 NEJM 367:1355): 18 recommendations grouped as prevention (Recs 1-7), analysis (Recs 8-14), sensitivity (Recs 15-18).
Key recommendations:
ICH E9(R1) (2019) forces the ordering: define the estimand (5 attributes including ICE strategy) BEFORE choosing the analysis. The missing-data strategy maps to the ICE handling strategy:
| Mechanism | Definition | Testable? | Valid method | |-----------|------------|-----------|--------------| | MCAR | Independent of all data | Partially (Little's 1988 test) | Complete-case unbiased but loses power | | MAR | Depends on observed data only | NOT testable | MMRM under MAR; MI under MAR | | MNAR | Depends on unobserved values | NOT testable | Sensitivity analysis (J2R, CR, CIR, tipping point, pattern-mixture, selection model) |
Critical philosophical point: MAR vs MNAR cannot be distinguished from observed data alone. This is fundamental. Pre-specify the assumed mechanism in the SAP based on clinical reasoning, not the data.
| Method | Estimand strategy | Identification | Variance | Strength | Fails when | |--------|-------------------|----------------|----------|----------|------------| | Complete-case analysis | MAR or MCAR | MAR | Standard | Simple; valid under MCAR | Loses power; biased under MAR with informative covariates | | LOCF | Implicit MNAR | Assumes flat post-ICE trajectory | Standard | Historically used | Biased even under MCAR (Mallinckrodt 2008); NRC 2010 rejects | | MMRM with UN+KR | Hypothetical via MAR | MAR | Kenward-Roger SE | FDA-favoured continuous longitudinal | High differential dropout makes MAR implausible | | Multiple imputation (MAR) | Hypothetical via MAR | MAR | Rubin's rules | Flexible; handles arbitrary patterns | sample_posterior must be enabled; only works with default estimator in sklearn | | Reference-based MI (J2R, CR, CIR, LMCF) | Treatment policy / MNAR sensitivity | Clinical narrative (e.g., "after withdrawal patient resembles placebo") | Cro 2019 information-anchored vs Wolbers 2022 frequentist (active debate) | FDA-acceptable; clinician-interpretable | Variance choice contested; Rubin over-conservative, jackknife may inflate Type-I | | Tipping-point delta-adjustment | MNAR sensitivity | Pre-specified delta function | Standard | Direct regulatory question: "how bad would missing data have to be?" | Delta interpretation depends on scale | | Pattern-mixture with CCMV | Pattern-mixture MNAR | "Missing pattern resembles completer pattern" | Multiple imputation | Identifies missing cells via restriction | CCMV may be implausible if completers are atypical | | Pattern-mixture with NCMV | Pattern-mixture MNAR | "Missing pattern resembles neighbouring pattern" | MI | Less extreme assumption than CCMV | Choice of "neighbouring" is ambiguous | | Pattern-mixture with ACMV | Pattern-mixture MNAR (equivalent to MAR) | "Missing pattern resembles available cases" | MI | Equivalent to MAR (Molenberghs 1998) | Reduces to standard MAR analysis | | Selection model (Diggle-Kenward 1994) | MNAR | Joint normal outcome + logistic dropout | Likelihood | Theoretically elegant | Conclusions driven by untestable parametric assumptions; FDA discouraged | | Retrieved-dropout MI | Treatment policy | Sampling from observed post-discontinuation data | MI variance | Empirically grounded (no model assumption for missing) | Requires actual post-ICE data collection |
Postdoc reading list:
| Scenario | Recommended approach | Why |
|----------|---------------------|-----|
| Continuous endpoint, monotone missingness, MAR plausible | MMRM with UN + Kenward-Roger via R mmrm | FDA-favoured Mallinckrodt 2008/2014 default |
| Continuous endpoint, ICE = treatment discontinuation, treatment policy estimand | Hybrid: J2R for discontinuation ICEs, MMRM-MAR for other (Aprocitentan 2024 precedent) | De facto FDA standard 2024-2025 |
| Continuous endpoint, treatment policy + post-ICE data collected | Retrieved-dropout MI (Wegovy STEP precedent) | Empirically grounded; FDA 2025 obesity guidance endorses |
| Continuous endpoint, MNAR sensitivity required | J2R via rbmi with both Rubin's variance AND frequentist (CMI+jackknife) | Cro vs Bartlett debate; report both for safety |
| Continuous endpoint, tipping-point analysis | Permutt 2016 delta-adjustment in active arm only | Direct regulatory question; pre-specify delta range |
| Binary endpoint with missing primary | MI with logistic imputation; modified Poisson for marginal RR | Per FDA 2023 covariate adjustment |
| Time-to-event with informative censoring | IPCW (Robins) or sensitivity under composite | Censoring-as-event composite; cite Lewis 2023 |
| Very high missingness (>40%) | Report as hypothesis-generating; multiple sensitivity analyses | NRC 2010 caveat |
| Aducanumab-style differential dropout pattern | MAR primary is questionable; treatment-policy with reference-based MI primary | Lessons from FDA 6-1 AdCom decision (2021) |
| Pediatric trial with hard-to-retain population | Retrieved-dropout MI + Bayesian extrapolation from adult data | FDA 2025 obesity pediatric extension guidance |
Goal: Estimate the treatment-by-visit contrast at the primary timepoint for a continuous longitudinal endpoint under MAR with valid Type-I control in small/moderate trials.
Approach: Fit MMRM with unstructured covariance, REML, Kenward-Roger DF correction via the Roche/openpharma mmrm package; pre-specify the convergence fallback hierarchy in the SAP.
Mallinckrodt 2008/2014: for continuous longitudinal endpoints under monotone (or near-monotone) MAR, an MMRM with treatment + visit + treatment-by-visit + baseline + baseline-by-visit, UN covariance, REML, contrast at primary timepoint -- is consistent and FDA-preferred.
library(mmrm)
fit <- mmrm(
formula = change_from_baseline ~ baseline + arm * visit + us(visit | subject),
data = trial_data,
method = "Kenward-Roger-Linear", # matches SAS PROC MIXED bit-for-bit
reml = TRUE
)
summary(fit)
Kenward-Roger flavour question:
method = "Kenward-Roger" -- full second-order KR (Kenward-Roger 1997 Biometrics 53:983)method = "Kenward-Roger-Linear" -- drops second-order Cholesky-derivative term; matches SAS PROC MIXED bit-for-bitConvergence fallback hierarchy (pre-specify in SAP):
The Olarte Parra unification (2022): MMRM under MAR IS a causal hypothetical estimand by g-formula equivalence under specific identifying assumptions. The issue is articulation, not statistical machinery.
Carpenter-Roger-Kenward 2013 operationalises MNAR sensitivity as clinical narrative, not numeric delta:
library(rbmi)
# Define imputation model
vars <- set_vars(
outcome = 'CHG', # change from baseline
visit = 'AVISIT',
subjid = 'USUBJID',
group = 'ARM',
covariates = c('BASE', 'STRATA1'),
method = method_bayes(n_samples = 100)
)
# Draws -> Impute -> Analyse -> Pool pipeline
draws_obj <- draws(data = trial_data, vars = vars)
imputed_j2r <- impute(draws_obj,
references = c('Active' = 'Placebo', 'Placebo' = 'Placebo'))
analyses <- analyse(imputed_j2r,
fun = ancova,
vars = list(outcome = 'CHG', visit = 'AVISIT',
group = 'ARM', covariates = c('BASE')))
result <- pool(analyses) # Rubin's rules pooling
summary(result)
The single most active argument in current biostatistics.
Cro/Carpenter/Kenward 2019 JRSS-A 182:623: Rubin's-rules variance applied to J2R/CR/CIR is approximately information-anchored — the relative loss of information from missingness in sensitivity analysis matches the relative loss in MAR primary analysis. True repeated-sampling variance is "information positive" because reference-based MI borrows from reference arm, reducing marginal variance of active arm BELOW what an MAR analysis with same missingness would give.
Philosophical position (Cro et al): a sensitivity analysis should not import information the primary analysis didn't have; if borrowing from placebo makes active CI narrower, it's no longer anchored.
The clearer framing for postdocs: the dispute is NOT about "which variance is conservative" — it is about "which variance answers the right question." Cro: if the sensitivity analysis is meant to assess robustness to MAR, the variance should be the one that matches the informational scope of the primary MAR analysis (Rubin's, which under-states borrowing). Bartlett: if J2R is taken as the true data-generating mechanism, then the variance of inference under that mechanism is the frequentist (jackknife) variance, which delivers nominal Type-I. Both can be correct under different framings of "what is the sensitivity analysis for?"
Bartlett 2021 Stat Biopharm Res 15(1):178 + Wolbers 2022 Pharm Stat counter: if J2R is the actual sampling model under which inference is made, the correct frequentist variance is the one delivering nominal Type-I and coverage — the jackknife/bootstrap variance, NOT Rubin's. Simulations: Bayesian MI + Rubin's gives Type-I 0.9-2.5% (over-conservative); CMI+jackknife gives 4.84-4.96% (nominal) under J2R.
Regulatory practice 2024-2026:
Permutt 2016 Stat Med 35:2876 (Permutt was head of FDA Division of Biometrics IV at the time): the regulator's question is not "what is a reasonable MNAR adjustment?" but "how bad would the missing data have to be in the active arm to overturn the significant primary result?"
library(rbmi)
# Delta-template: per-visit, per-arm, per-pattern delta
delta_grid <- seq(0, 20, by = 2) # delta range to scan
results <- list()
for (delta in delta_grid) {
delta_template <- delta_template(imputed, delta = delta,
dlag = c(1, 1, 1, 1)) # apply to all post-ICE visits
analyses <- analyse(imputed, delta = delta_template, fun = ancova, vars = vars)
pooled <- pool(analyses)
results[[as.character(delta)]] <- pooled
}
# Find minimum delta that flips p > 0.05 -> tipping delta
Delta-adjustment patterns:
Report tipping delta in residual SD units (FDA preference for cross-trial comparison), not raw outcome units.
Pattern-mixture factorises joint distribution as observed-data distribution stratified by dropout pattern × pattern probability. Introduces unidentifiable parameters for unobserved cells, resolved by restrictions:
Molenberghs et al 1998 proved ACMV is exactly equivalent to MAR under monotone missingness — so pattern-mixture under ACMV is a reparameterisation of MAR analysis.
J2R/CR/CIR/LMCF are pattern-mixture models with reference-arm-based identifying restrictions.
Joints a multivariate normal response model with logistic dropout model that depends on the unobserved current value.
Canonical critique (Diggle-Kenward 1994 discussion; Molenberghs/Kenward/Verbeke subsequent work): selection-model MNAR conclusions are driven not by data but by parametric assumptions that are empirically untestable — the difference between MAR and MNAR fit is identified entirely from joint normality assumption. A non-normal response will spuriously appear MNAR.
FDA position: selection models are used as sensitivity, never primary. FDA Division of Biometrics has repeatedly pushed back on selection models on the grounds that "the sponsor cannot tell me in clinical English what assumption I am being asked to accept." Reference-based MI is preferred because it is clinically articulable.
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
import statsmodels.formula.api as smf
import numpy as np
import pandas as pd
n_imputations = 20 # rule: m >= 100 * FMI
# CRITICAL: sample_posterior=True only works with BayesianRidge (default estimator)
imputer = IterativeImputer(max_iter=10, random_state=0, sample_posterior=True)
results = []
for i in range(n_imputations):
imputer.set_params(random_state=i)
imputed = pd.DataFrame(imputer.fit_transform(df[numeric_cols]), columns=numeric_cols)
for col in ['ARM', 'sex']:
imputed[col] = df[col].values
model = smf.logit('outcome ~ C(ARM, Treatment(reference="Placebo")) + age', data=imputed).fit(disp=0)
results.append({'coef': model.params.iloc[1], 'se': model.bse.iloc[1]})
# Rubin's rules
pooled_coef = np.mean([r['coef'] for r in results])
within_var = np.mean([r['se']**2 for r in results])
between_var = np.var([r['coef'] for r in results], ddof=1)
total_var = within_var + (1 + 1/n_imputations) * between_var
pooled_se = np.sqrt(total_var)
Critical caveats:
sample_posterior=True SILENTLY ignored if estimator changed from BayesianRidge (e.g., to RandomForest) -> MI degenerates to single imputationmiceforest or R mice/rbmiFor confirmatory regulatory work, prefer R rbmi or mice over Python sklearn — the SAS / R precedent is stronger, the variance theory is better-developed.
EMERGE and ENGAGE both stopped early for futility; EMERGE high-dose positive, ENGAGE negative. MMRM-MAR primary. FDA Office of Biostatistics (Tristan Massie review) argued futility-stop-induced missingness was NOT MAR (differential ARIA-driven unblinding). 6-1 AdCom against approval; subsequent approval over-ruled OB. Textbook case showing MAR-based primary in trial with high differential missingness is regulator-divisive.
Lesson: when differential dropout patterns differ by arm in clinically meaningful ways, MAR is questionable; primary should be treatment-policy with reference-based MI.
Mathur 2025 Pharm Stat (PMC12753554) documents the negotiation. FDA pushed back on MMRM-MAR primary; accepted compromise: stratified imputation — J2R for treatment-discontinuation ICEs, MAR-MMRM for other missingness.
Lesson: this hybrid is now the de facto FDA standard for treatment-policy estimands. Pre-specify in SAP.
Retrieved-dropout MI as primary for treatment-policy estimand. Missing body weight at week 68 imputed by sampling from observed week-68 measurements among "retrieved dropouts" (patients who discontinued semaglutide but remained in follow-up). J2R-MI as supportive.
Lesson: RD-MI is now standard for chronic weight management. FDA 2025 obesity guidance explicitly endorses MI as primary.
sample_posterior=False in sklearn IterativeImputersample_posterior=True; verify estimator is BayesianRidge; consider R rbmi.| Pattern | Likely cause | Action | |---------|--------------|--------| | MMRM-MAR CI overlaps null; J2R MI CI excludes null | Reference-based MI borrows information from reference arm, reducing active-arm marginal variance below what MAR analysis with same missingness would give (Cro 2019) | Both valid under their assumptions; choose by clinical plausibility (MAR vs MNAR after differential dropout); report both; cite estimand strategy | | Bayesian MI + Rubin's variance Type-I ~1%; CMI+jackknife Type-I ~5% under J2R | Rubin's information-anchored, over-conservative (Cro 2019); jackknife frequentist nominal (Wolbers 2022); active methodological debate | Report both; flag as Cro vs Bartlett debate; FDA increasingly accepts jackknife as supportive | | LOCF "conservative" sensitivity gives smaller effect than MMRM-MAR | LOCF assumes flat post-ICE trajectory; biased even under MCAR (Mallinckrodt 2008) | Replace LOCF with reference-based MI; cite NRC 2010 Rec 11; reframe sensitivity as "MNAR robustness" not "conservative" | | Tipping delta is small relative to MCID | MAR plausibility weak; primary result fragile to mild MNAR | Report tipping delta in residual SD AND relative to MCID; reconsider primary estimand toward treatment-policy with reference-based MI | | Pattern-mixture with CCMV vs J2R conclusions differ | CCMV assumes missing pattern mirrors completers; J2R assumes mirror reference arm; different MNAR mechanisms | Choose based on which clinical scenario is more plausible (Carpenter-Roger 2013); report both as sensitivity range | | Selection model (Diggle-Kenward) rejects MAR; pattern-mixture (reference-based MI) accepts MAR | Selection model MNAR conclusion driven by joint normality assumption (not data); pattern-mixture clinically articulable | Report pattern-mixture as primary sensitivity; selection model as supportive only; FDA prefers clinical articulation | | MMRM with UN converges in arm A but not arm B | Convergence fragility in unstructured covariance with high dropout in one arm | Apply pre-specified SAP fallback (UN+KR -> UN+Satterthwaite -> heterogeneous Toeplitz -> AR(1)); document in CSR |
| Threshold | Source | Rationale | |-----------|--------|-----------| | m >= 100 * FMI imputations | von Hippel 2020 Sociol Methods Res 49:699 | Stable pooled SE; with 40% missingness and FMI~0.3, m=30 needed | | Missing > 40% on key variable | NRC 2010 | MI under MAR unreliable; treat as hypothesis-generating | | Kenward-Roger DF correction for MMRM with UN | Kenward-Roger 1997 | Without it MMRM-REML under-covers; Type-I inflates 1-2 pp | | Tipping delta in residual SD units | FDA Division of Biometrics preference | Cross-trial comparison | | Rubin's vs frequentist for reference-based MI | Cro 2019 vs Wolbers 2022 | Report both for regulatory safety | | Pre-specify ICE strategy in SAP | ICH E9(R1) | Estimand-before-method; cite Kahan 2023 | | Examine DS domain for differential dropout | NRC 2010 implicit | If dropout differs by arm, MAR is suspect |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| LOCF reported as "conservative" | Persistent misconception | LOCF is biased under MCAR (Mallinckrodt 2008); cite NRC 2010 |
| MAR primary in trial with differential dropout | MAR plausibility not verified | Examine DS; if differential, switch to treatment-policy + reference-based MI |
| Selection model used as primary | Untestable parametric assumption drives MNAR conclusion | Pattern-mixture (reference-based MI) as primary; selection model as supportive |
| sample_posterior silently ignored | Estimator changed from BayesianRidge in sklearn | Verify estimator; consider R rbmi or mice |
| Imputation model missing interaction term | Uncongeniality with analysis model | Include analysis-model predictors AND interactions |
| Tipping delta in raw outcome scale only | Hard to compare | Report in residual SD units alongside raw |
| Rubin's variance for J2R without frequentist sensitivity | Cro 2019 over-conservative | CMI+jackknife frequentist as supportive (Wolbers 2022) |
| MMRM with CS forced after UN convergence failure | Pre-specification of fallback missing | Pre-specify hierarchy in SAP; document deviation if invoked |
| MI without outcome as predictor in imputation model | Underestimates association | Include outcome as predictor; exclude from imputed variables |
| MMRM in Python (statsmodels.mixedlm) treated as FDA-equivalent | Lacks Kenward-Roger | Use R mmrm for confirmatory; statsmodels.mixedlm only for exploratory |
| Pushback | Response | |----------|----------| | "Why MAR not MNAR primary?" | Examined DS domain; dropout rates and reasons symmetric across arms; cite Mallinckrodt 2008 MMRM-MAR convention | | "Why not LOCF as sensitivity?" | LOCF biased even under MCAR; cite NRC 2010 Rec 11; J2R and tipping-point are clinically articulable alternatives | | "Rubin's variance for J2R?" | Cite Cro 2019 information-anchored argument as rationale; report frequentist (CMI+jackknife) as supportive per Wolbers 2022 | | "Tipping delta plausibility?" | Tipping delta = X.X in residual SD units; X.X exceeds MCID of Y.Y; deemed clinically implausible | | "Selection model sensitivity?" | Provided as supportive; cite Diggle-Kenward 1994 critique that conclusions are driven by joint normality assumption; pattern-mixture is the primary sensitivity | | "Imputation model congeniality?" | Imputation model includes all analysis-model predictors and interactions per Meng 1994 | | "ICE strategy?" | Pre-specified per ICH E9(R1): treatment policy primary, hypothetical sensitivity, with explicit ICE mechanism in protocol | | "Why m=20 imputations?" | Computed m >= 100 * FMI; observed FMI=0.20 -> m=20 sufficient (von Hippel 2020) | | "Retrieved-dropout MI with sparse post-ICE data?" | If post-ICE retrieval is <50% complete, retrieved-dropout MI degrades to reference-based; switch to J2R as primary with reference-based MI as named sensitivity per Aprocitentan precedent | | "Why is the reference arm chosen for J2R clinically plausible?" | Per ICH E9(R1), reference-based MI assumes post-discontinuation trajectory mirrors the placebo arm; this matches the clinical scenario of "patient stops treatment due to AE and returns to standard-of-care baseline." Documented in protocol with medical-monitor sign-off. |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.