Version Compatibility

Reference examples tested with: tableone 0.9+, statsmodels 0.14+, scikit-learn 1.4+, pandas 2.1+, numpy 1.26+. R packages cited (essential for current regulatory work): mmrm 0.3+ (Roche/openpharma), rbmi 1.5+ (Roche/Bayer via insightsengineering), gMCP, RBesT.

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Trial Reporting Under CONSORT 2025 + ICH E9(R1)

"Prepare a clinical trial statistical report" -> Define the estimand explicitly per ICH E9(R1); execute a covariate-adjusted primary analysis targeting the right summary measure; pre-specify the missing-data strategy and run regulatory-grade sensitivity analyses; structure the output per CONSORT 2025 and the new SPIRIT 2025 alignment.

The Single Most Important Methodological Shift -- The Estimand Comes First

Kahan, Cro, Li, Harhay 2023 Am J Epidemiol 192:987 ("Eliminating Ambiguous Treatment Effects Using Estimands"): 98% of published trial reports do not describe what the reported treatment effect represents. 54% of trials: impossible to deduce the estimand from reported methods. In 74% of trials submitted for regulatory approval 1996-2017, "what-if" hypothetical effects were used but only 2 trials explained this.

The framework: ICH E9(R1) Addendum (November 2019, EMA effective 30 July 2020, FDA May 2021) defines an estimand as the precise specification of what is being estimated, via five attributes:

Treatment condition -- what is being compared
Population -- which patients
Endpoint -- which variable
Population-level summary measure -- mean diff, OR, HR, RD
Intercurrent-event (ICE) handling strategy -- one of five

The order is non-negotiable: specify the estimand BEFORE choosing the statistical method. Choosing MMRM and retrofitting the estimand to match is the canonical error.

The Five Intercurrent-Event Strategies

| Strategy | What it does | Typical implementation | Identification cost | Regulatory pattern | |----------|--------------|------------------------|---------------------|---------------------| | Treatment policy | Include all data regardless of ICE | ANCOVA on observed value (retrieved-dropout data); ITT-like | Trivially identified; needs full follow-up regardless of ICE | FDA-preferred default for cardio/HF/CV-safety; Fleming 2025 endorsement | | Hypothetical | What would have been observed had ICE not occurred | MMRM under MAR; g-computation; IPCW; reference-based MI under MAR | Sequential ignorability (causal); MAR (missing-data shorthand) | Heavy use in CNS, diabetes, respiratory; EMA more accepting than FDA | | Composite | Incorporate ICE into endpoint | Death = non-responder; PFS = composite of progression OR death; MACE | Identified from observed data; embeds a ranking choice | Standard in oncology PFS; acceptable when ICE has clinical signal | | While-on-treatment | Use only pre-ICE values | Censor at discontinuation (for TTE); analyse last pre-ICE value (for repeated measures) | Estimates conditional quantity | Safety endpoints (AE rate per time on drug); FDA cautious for efficacy | | Principal stratum | Confine to latent stratum (e.g. tolerators) | Bayesian estimation under monotonicity/principal ignorability | Latent membership; unverifiable assumptions | Rare as primary; some oncology/vaccine acceptance |

Postdoc reading list:

Bornkamp, Rufibach, Lin, Liu, Mehrotra, Roychoudhury, Schmidli, Shentu, Wolbers 2021 Pharm Stat 20:737 — principal stratum critique
Olarte Parra, Daniel, Bartlett 2022 Stat Biopharm Res — proves MMRM-MAR IS a hypothetical estimator under specific causal assumptions
Permutt 2016 Stat Med 35:2876; 2020 Stat Biopharm Res 12:45 — FDA Missing Data Working Group taxonomy; "do covariates change the estimand?"
Fleming, Carroll, Wittes et al 2025 Stat Med — argue treatment policy is the only strategy preserving randomisation; critique hypothetical
Morris 2026 Stat Med — causal-perspective comment on Fleming
Lipkovich, Ratitch, Mallinckrodt 2020 Stat Biopharm Res — Rubin causal model connection

Decision Tree for Estimand Selection

| Trial scenario | Recommended estimand strategy | Why | |----------------|-------------------------------|-----| | Continuous endpoint, monotone missingness, MAR plausible | Hypothetical via MMRM (mmrm + KR) | Standard FDA-favoured MAR analysis; cite Mallinckrodt 2008/2014 | | Continuous endpoint, ICE = treatment discontinuation, sponsor wants effectiveness | Treatment policy via retrieved-dropout MI | If post-ICE data available; ITT-respecting | | Continuous endpoint, treatment-policy primary with ICE-related missingness | Hybrid: J2R for discontinuation ICEs, MMRM-MAR for other missingness | Aprocitentan precedent; de facto FDA standard 2024-2025 | | Binary endpoint, RCT, FDA 2023-compliant | Marginal RD via g-computation; conditional OR supportive | See clinical-biostatistics/logistic-regression for g-computation | | Oncology OS with crossover | Treatment policy as primary; hypothetical (RPSFT/IPCW) as sensitivity | Sotorasib CodeBreaK 200 precedent | | Oncology PFS | Treatment policy for subsequent-therapy ICE (composite may incorporate death) | Sun 2021 framework; Fleming 2025 | | Weight management / chronic disease | Retrieved-dropout MI (Wegovy STEP precedent); J2R supportive | FDA 2025 obesity draft guidance explicitly endorses | | Long-term safety endpoint | While-on-treatment for rate; treatment policy for cumulative incidence | Standard ICH E2A practice | | AlloSCT in hematologic oncology | Composite "event-free survival" treating alloSCT as event | Sun 2021 (alloSCT as intercurrent event); "no alloSCT" hypothetical is clinically meaningless | | Symptomatic palliative endpoint with high dropout | Composite with worst-rank for dropouts (Permutt trimmed means) | Permutt 2017 Pharm Stat 16:20 |

MMRM -- The FDA-Favoured MAR Analysis

Mallinckrodt 2008/2014, codified in DIA Scientific Working Group "three pillars" doctrine: for continuous longitudinal endpoints under monotone (or near-monotone) MAR, an MMRM with treatment + visit + treatment-by-visit + baseline + baseline-by-visit, unstructured (UN) within-subject covariance, REML, contrast at the primary timepoint -- is the consistent and FDA-preferred analysis. LOCF is biased even under MCAR because it discards imputation uncertainty and assumes a flat post-withdrawal trajectory.

The mmrm R package (Roche / openpharma)

library(mmrm)
fit <- mmrm(
    formula = change_from_baseline ~ baseline + arm * visit + us(visit | subject),
    data = trial_data,
    method = "Kenward-Roger",  # or "Satterthwaite", "Kenward-Roger-Linear"
    reml = TRUE
)
summary(fit)  # treatment-by-visit contrast at primary timepoint

The Kenward-Roger flavour question: method = "Kenward-Roger" uses full second-order Kenward-Roger (Kenward-Roger 1997 Biometrics 53:983), which inflates SE for fixed-effect contrasts using an adjusted covariance estimator with second-order Taylor terms. method = "Kenward-Roger-Linear" drops the second-order Cholesky-derivative term to match SAS PROC MIXED bit-for-bit. Most submissions use Kenward-Roger-Linear to maintain SAS-R reproducibility.

Convergence-vs-correctness trade-off

Unstructured (UN) covariance has p(p+1)/2 parameters for p visits. With ~30-50 patients per arm by week 12 and 6+ visits, UN can fail to converge. The industry-standard fallback hierarchy per pre-specified SAP:

UN with KR (preferred)
UN with Satterthwaite (if KR fails)
Heterogeneous Toeplitz (k+1 parameters)
AR(1) with heterogeneous variances
CS with heterogeneous variances (last resort)

Each step down imposes more structure and the structure can be wrong — biasing both SEs and point estimates. CS imposes equal correlation across time which is rarely true for treatment-ramp-up endpoints (HbA1c, BP). Pre-specify the fallback in the SAP, not at analysis time.

MMRM = hypothetical estimator (Olarte Parra unification)

Olarte Parra, Daniel, Bartlett 2022 Stat Biopharm Res: under specific identifying assumptions, MMRM under MAR IS a causal hypothetical estimand via g-formula equivalence. The "issue" is articulation, not statistical machinery — MMRM-MAR implicitly answers a hypothetical estimand whose hypothetical scenario must be made explicit in the SAP (e.g., "what would the mean response at week 24 be had all patients continued randomised treatment and remained observable?").

Reference-Based Multiple Imputation -- The rbmi Framework

Carpenter, Roger, Kenward 2013 J Biopharm Stat 23:1352 — the canonical paper. Reference-based MI operationalises MNAR sensitivity not as a numeric delta but as a clinical narrative:

Jump-to-reference (J2R): "after withdrawal the patient instantly resembles the placebo arm"
Copy-reference (CR): "the entire post-baseline trajectory copied from placebo, with subject's baseline deviation preserved"
Copy-increments-in-reference (CIR): "the patient retains the on-treatment increment but trends with the placebo arm thereafter, anchored at last observed value"
Last-mean-carried-forward (LMCF): "patient stays at last on-treatment mean"

rbmi R package (Wolbers et al 2022 Pharm Stat 21(6):1246-1257; CRAN; insightsengineering):

library(rbmi)
# Draws -> Impute -> Analyse -> Pool pipeline
draws <- draws(data = trial_data, vars = vars,
               method = method_bayes(n_samples = 100))
imputed <- impute(draws, references = c('Active' = 'Placebo', 'Placebo' = 'Placebo'))
analyses <- analyse(imputed, fun = ancova,
                    vars = list(outcome = 'change', visit = 'avisit',
                                group = 'arm', covariates = c('baseline')))
result <- pool(analyses)  # Rubin's rules pooling

Four inference engines:

Bayesian MI + Rubin's rules (historical default) — information-anchored variance
Approximate Bayesian via REML + bootstrap — frequentist variance from bootstrap
Conditional mean imputation + jackknife (Wolbers 2022 contribution) — single deterministic imputation per ANCOVA-linearity theorem, jackknife for SE; FDA-friendly because deterministic + frequentist
BMLMI (bootstrapped MI of Lipkovich/Ratitch) — within/between variance decomposition

The Variance Debate -- Cro/Carpenter vs Bartlett/Wolbers

The single most active methodological argument in current biostatistics.

Cro/Carpenter/Kenward 2019 JRSS-A 182:623 ("Information-Anchored Sensitivity Analysis"): proved that Rubin's-rules variance applied to J2R/CR/CIR is approximately information-anchored — the relative loss of information from missingness in the sensitivity analysis matches the relative loss in the MAR primary analysis. True repeated-sampling variance is "information positive" because reference-based imputation borrows from the reference arm and reduces the marginal variance of the active arm BELOW what an MAR analysis with the same missingness would give.

Philosophical position: a sensitivity analysis should not import information the primary analysis did not have; if borrowing from placebo makes the active-arm CI narrower, the analysis is no longer "anchored" to the same information state.

Bartlett 2021 Stat Biopharm Res 15(1):178 + Wolbers 2022 Pharm Stat counter: if J2R is the actual sampling model under which inference is made, then the correct frequentist variance is the one that delivers nominal Type-I error and CI coverage under that model -- the jackknife/bootstrap variance, NOT Rubin's. Simulations in rbmi vignettes: Bayesian MI with Rubin's gives Type-I error 0.9-2.5% (over-conservative); CMI+jackknife gives 4.84-4.96% (nominal) under J2R; Bayesian MI loses real power.

Regulatory practice 2024-2025 is bifurcating: EMA tolerates either; FDA reviewers increasingly flag Rubin's-rules variance under reference-based MI as needing a frequentist sensitivity analysis in addition. What postdocs argue about: whether Type-I inflation under bootstrap is the price of correct inference, or evidence J2R was never coherent as a true sampling model.

Permutt Tipping-Point Analysis -- The Analyst as Adversary

Permutt 2016 Stat Med 35:2876 (Permutt was head of FDA Division of Biometrics IV): the regulator's question is not "what is a reasonable MNAR adjustment?" but "how bad would the missing data have to be in the active arm to overturn the significant primary result?"

Delta-adjustment patterns:

One-arm shift (FDA preferred): add delta to imputed values in active arm only; vary delta from 0 to the value that nullifies the effect
Symmetric shift: both arms worsened by delta (probes systematic optimism)
Reverse shift: placebo improved by delta (more aggressive, rarely needed)

# rbmi with delta adjustment
delta <- delta_template(imputed, delta = c(0, 5, 10, 15, 20), dlag = c(1, 1, 1, 1))
adjusted <- analyse(imputed, delta = delta, ...)
# Report: minimum delta that flips p-value below 0.05

The regulator then judges whether the tipping delta is clinically plausible — larger than the active-arm treatment effect itself? Larger than the MCID? FDA-preferred report: tipping delta in units of residual SD (for cross-trial comparison), not raw outcome units.

Decisive Regulatory Cases -- The 2020-2025 Casebook

Aducanumab (Biogen BLA 761178, 2021): EMERGE and ENGAGE studies both stopped early for futility; EMERGE high-dose positive, ENGAGE negative. MMRM-MAR primary. FDA Office of Biostatistics (Tristan Massie review) argued futility-stop-induced missingness was not MAR (differential ARIA-driven unblinding); the Nov 2020 AdCom voted overwhelmingly against approval. Textbook case showing MAR-based primary in trial with high differential missingness is regulator-divisive.

Aprocitentan (Idorsia PRECISION trial, FDA approval 2024): documented in Sassi-Sayadi et al 2025 Ther Innov Regul Sci (PMC12753554). FDA pushed back on sponsor's MMRM-MAR primary; MAR was not credible for treatment-discontinuers. Accepted compromise: stratified imputation — J2R for treatment-discontinuation ICEs, MAR-MMRM for other missingness. This hybrid is now de facto FDA standard for treatment-policy estimand.

Wegovy/Ozempic STEP trials (Wilding 2021 NEJM; NDA 215256): retrieved-dropout MI as primary for treatment-policy. Missing body weight at week 68 imputed by sampling from observed week-68 measurements among "retrieved dropouts" (patients who discontinued semaglutide but remained in follow-up). J2R-MI as supportive. RD-MI now standard for chronic weight management. FDA 2025 obesity guidance explicitly endorses MI as primary.

Table 1 -- Baseline Characteristics

CONSORT 2010 discouraged baseline significance tests because randomisation is a known mechanism, not a hypothesis. Many journals still require them.

from tableone import TableOne

columns = ['age', 'sex', 'race', 'bmi', 'baseline_score', 'disease_stage']
categorical = ['sex', 'race', 'disease_stage']

table1 = TableOne(df, columns=columns, categorical=categorical,
                  groupby='ARM', pval=True, smd=True,
                  missing=True, overall=True)
print(table1.tabulate(tablefmt='github'))
table1.to_excel('table1.xlsx')

Use standardised mean differences (SMD) rather than p-values: SMD > 0.1 suggests meaningful imbalance regardless of statistical significance.

Senn's "balance testing is incoherent" (1994 Stat Med 13:1715; Altman 1985): balancing via randomisation, testing balance, then adjusting only when the test fails is a selection rule that destroys nominal Type-I error. Pre-specify covariates in the SAP; do not condition adjustment on observed imbalance.

Analysis Populations -- ITT vs FAS vs PP vs Safety

| Population | Definition | Bias direction | Primary use | |-----------|------------|----------------|-------------| | ITT | All randomised, as randomised | Conservative (toward null) | Primary efficacy per ICH E9 | | FAS (Full Analysis Set) | ITT excluding eligibility failures + subjects with no post-baseline data | Middle ground; close to ITT | Common practical primary; ICH E9 "as complete as possible while remaining unbiased" | | Per-Protocol | Completed treatment per protocol without major violations | Anti-conservative (inflates effect) | Sensitivity analysis only | | Safety | All received at least one dose | n/a | AE analysis | | mITT | Sponsor-defined modified ITT | Variable | Pre-specify and justify |

FAS vs ITT distinction is critical for regulatory submissions — FAS may exclude post-randomisation subjects (ineligibility, no post-baseline efficacy); ITT cannot. Sponsors often equate them on the SAP only to discover at submission that FDA expected stricter ITT. Pre-specification in protocol is essential.

itt = dm.copy()
pp = dm[dm['USUBJID'].isin(completers) & ~dm['USUBJID'].isin(protocol_violators)]
dosed = ex[ex['EXDOSE'] > 0]['USUBJID'].unique()
safety = dm[dm['USUBJID'].isin(dosed)]

Missing Data Mechanisms -- The Practical Framework

| Mechanism | Definition | Testable? | Valid method | |-----------|------------|-----------|--------------| | MCAR | Independent of all data | Partially (Little's test) | Complete-case unbiased but loses power | | MAR | Depends on observed data only | NO (assumption) | MMRM under MAR; MI under MAR | | MNAR | Depends on unobserved values | NO | Requires sensitivity analysis (J2R, CR, CIR, tipping point) |

MAR vs MNAR cannot be distinguished from observed data alone — this is a fundamental limitation. Pre-specify the assumed mechanism in the SAP; pre-specify the sensitivity analysis under MNAR (NRC 2010 Recommendation 15: "examining sensitivity to assumptions about the missing-data mechanism should be a mandatory component of reporting").

Clinical reasoning beyond the abstraction: examine the DS (Disposition) domain to tabulate reasons for discontinuation by treatment arm. If discontinuation rates or reasons differ between arms, missing data is likely informative and MNAR sensitivity analyses are mandatory.

Standard MMRM-MAR Primary Analysis Code

Goal: Execute the FDA-preferred primary analysis for a continuous longitudinal endpoint under MAR with valid Type-I in small/moderate trials.

Approach: Fit MMRM with unstructured covariance + Kenward-Roger via the Roche/openpharma mmrm R package; in Python, use statsmodels.mixedlm as exploratory-only (lacks KR).

# Python is weak for MMRM — current state of the art is R `mmrm`
# For Python users, statsmodels.mixedlm is the closest alternative but lacks
# Kenward-Roger; consider rpy2 to call R from Python for confirmatory work.

import statsmodels.formula.api as smf
import pandas as pd

# Random intercept LMM (suboptimal vs MMRM but Python-native)
model = smf.mixedlm(
    'change ~ baseline + C(ARM) * C(VISIT)',
    data=df_long,
    groups=df_long['USUBJID']
).fit(reml=True)
# WARNING: this is NOT FDA-equivalent to MMRM with UN+KR
# For confirmatory work, use R `mmrm` package via rpy2 or fit in R directly

Multiple Imputation in Python (sklearn)

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
import statsmodels.formula.api as smf
import numpy as np

n_imputations = 20  # rule: m >= 100 * FMI (fraction of missing info)
imputer = IterativeImputer(max_iter=10, random_state=0, sample_posterior=True)

results = []
for i in range(n_imputations):
    imputer.set_params(random_state=i)
    imputed = pd.DataFrame(imputer.fit_transform(df[numeric_cols]), columns=numeric_cols)
    for col in ['ARM', 'sex']:
        imputed[col] = df[col].values
    model = smf.logit(
        'outcome ~ C(ARM, Treatment(reference="Placebo")) + age', data=imputed
    ).fit(disp=0)
    results.append({'coef': model.params.iloc[1], 'se': model.bse.iloc[1]})

# Rubin's rules
pooled_coef = np.mean([r['coef'] for r in results])
within_var = np.mean([r['se']**2 for r in results])
between_var = np.var([r['coef'] for r in results], ddof=1)
total_var = within_var + (1 + 1/n_imputations) * between_var
pooled_se = np.sqrt(total_var)
pooled_or = np.exp(pooled_coef)

Critical sklearn caveats:

sample_posterior=True is essential — without it all m imputations are nearly identical
sample_posterior=True only works with BayesianRidge (the default estimator). If estimator is changed (e.g., RandomForestRegressor), parameter is silently ignored and MI degenerates to single imputation
IterativeImputer treats binary covariates as continuous; consider miceforest for mixed types or move to R mice/rbmi
Only impute covariates and post-baseline outcomes, NOT treatment assignment (fully determined by randomisation)
Include outcome in imputation model as predictor but exclude from imputed variables to avoid circular dependency
IterativeImputer is experimental in sklearn — API may change without standard deprecation

Congeniality (Meng 1994): imputation model must be at least as flexible as analysis model. If analysis includes treatment-by-covariate interactions, imputation model should include them. Uncongenial imputation biases estimates and invalidates variance pooling.

Co-Primary Endpoints and Multiplicity

| Method | Approach | Conservatism | |--------|----------|--------------| | Bonferroni | alpha / m | Most conservative | | Hierarchical (gatekeeping) | Pre-specified order; proceed only if previous rejects | Moderate; full alpha for first | | Graphical procedure (Bretz-Maurer) | Directed graph; alpha propagates on rejection | Flexible; standard in modern SAPs | | Hochberg / Hommel step-up | Ordered p-values vs alpha/(m-k+1) | Less conservative than Bonferroni; requires PRDS |

See clinical-biostatistics/multiplicity-graphical for the full Bretz-Maurer-Hommel treatment with gMCP.

Reconciliation: When Methods Disagree

| Pattern | Likely cause | Action | |---------|--------------|--------| | MMRM-MAR primary p < 0.05; reference-based MI sensitivity p > 0.05 | MNAR mechanism after treatment discontinuation; J2R imputes active arm toward placebo | Decide which estimand is regulatory primary; if treatment policy, switch to reference-based MI as primary (Aprocitentan precedent) | | Retrieved-dropout MI vs J2R-MI differ | RD-MI uses observed post-ICE data; J2R uses reference-arm-based assumption | If post-ICE data available, RD-MI is empirically grounded (preferred); J2R as supportive | | Cro information-anchored variance vs Wolbers frequentist variance differ | Information-anchored Rubin's pools within+between; frequentist jackknife/bootstrap captures borrowing-induced variance reduction | Report both; cite Cro 2019 and Wolbers 2022 (Bartlett 2021 frequentist critique); 2024-2025 FDA practice accepts both with one as supportive | | ITT primary p < 0.05; PP secondary p > 0.05 | Per-protocol excludes non-completers (often differentially); PP-only effect inflated when significant; non-significance in PP is signal of fragility | ITT remains primary (ICH E9); PP as sensitivity flag; investigate completer pattern | | Statistical significance achieved but effect estimate below MCID | Powered for δ << MCID; or δ = MCID with no precision margin | Pre-specify δ >= 1.5 × MCID in SAP (postdoc rule); report against pre-specified MCID, not 0 | | Estimand strategies (hypothetical vs treatment-policy) give different effect sizes | Different ICE-handling strategies target different parameters | Report all pre-specified estimands; pick PRIMARY for the regulatory question (not the "winner"); cite Kahan 2023 | | Subgroup effect estimate >2x main effect | Winner's curse -- discovery-data subgroup estimates are inflated | Apply Bayesian shrinkage (Dixon-Simon, RBesT) for corrected estimate; cite as discovery, not confirmatory |

CONSORT 2025 (April 2025) -- What Changed

Citation: Hopewell, Chan, Collins et al 2025. CONSORT 2025 statement. Lancet 405:1633-1640. E&E in BMJ 2025;389:e081124.

30-item checklist (was 25), 7 new items, 3 substantially revised, 1 deleted. Key new items relevant to statistical reporting:

Item 4 (data and code sharing — where/how de-identified IPD and analysis code can be accessed) — NEW
Item 15 (how harms were actually assessed — methods, attribution, intensity, monitoring rules) — NEW; absorbs CONSORT-Harms 2022
Item 21c (missing-data handling methods) — NEW; reflects NRC 2010 / Carpenter-Kenward consensus
Item 24a/24b (intervention/comparator actual delivery details) — absorbs TIDieR

Estimands did NOT make consensus for mandatory inclusion — they appear in Box 1 (terminology only). For regulatory submissions, ICH E9(R1) is the operative estimand standard, NOT CONSORT 2025. Sponsors should follow ICH E9(R1) directly for the 5-attribute estimand statement.

No DOORS framework in CONSORT 2025 — diversity/equity additions are happening via a separate SAGER-SPIRIT-CONSORT alignment workstream (GENDRO/EASE).

CONSORT Flow Diagram

flow = {
    'screened': len(screening_log),
    'eligible': len(screening_log[screening_log['eligible']]),
    'randomized': len(dm),
    'allocated_drug': len(dm[dm['ARM'] == 'Drug']),
    'allocated_placebo': len(dm[dm['ARM'] == 'Placebo']),
    'completed_drug': len(dm[(dm['ARM'] == 'Drug') & dm['USUBJID'].isin(completers)]),
    'completed_placebo': len(dm[(dm['ARM'] == 'Placebo') & dm['USUBJID'].isin(completers)]),
    'analyzed_itt': len(itt),
    'analyzed_fas': len(fas),
    'analyzed_pp': len(pp),
    'analyzed_safety': len(safety),
}

CONSORT 2025 templates on consort-spirit.org now explicitly accommodate non-1:1 allocation, cluster, multi-arm, and crossover variants.

Per-Method Failure Modes

MMRM-MAR with high differential missingness

Trigger: Differential dropout between arms (e.g., toxic active vs tolerated placebo).
Mechanism: MAR assumption requires that conditional on observed covariates, missingness is unrelated to outcome — implausible when patients drop out because the treatment isn't working.
Symptom: Discontinuation reasons differ qualitatively between arms; primary p-value sensitive to model specification.
Fix: Treatment-policy estimand with retrieved-dropout MI as primary; J2R as sensitivity. Tipping-point delta in active arm (Permutt 2016).

Reference-based MI with Rubin's variance

Trigger: J2R/CR/CIR with Rubin's-rules variance reported as primary.
Mechanism: Rubin's variance is information-anchored but over-conservative (Cro 2019); under-rejects relative to nominal.
Symptom: Power loss vs frequentist variance; Type-I 0.9-2.5% vs nominal 5% in J2R simulations.
Fix: Report frequentist variance via CMI+jackknife as supportive (Wolbers 2022); cite both Cro and Bartlett.

sample_posterior=False in IterativeImputer

Trigger: Default behaviour or estimator changed away from BayesianRidge.
Mechanism: Imputations are point predictions, near-identical across draws; between-imputation variance approximately zero.
Symptom: Artificially narrow Rubin's-pooled CIs.
Fix: Always set sample_posterior=True; verify estimator is BayesianRidge; consider miceforest or R mice for mixed types.

Per-protocol as primary

Trigger: SAP specifies PP analysis as primary.
Mechanism: Excludes non-completers who may have dropped due to treatment failure; inflates effect.
Symptom: PP effect much larger than ITT; reviewers raise post-randomisation bias concern.
Fix: ITT or FAS as primary per ICH E9; PP as sensitivity only.

Method-before-estimand

Trigger: SAP describes MMRM, MI, or LOCF without specifying the estimand the method targets.
Mechanism: Retrofits the estimand to match the chosen method.
Symptom: Reviewer asks "what is the estimand?" and sponsor cannot articulate.
Fix: Per ICH E9(R1), define the 5 attributes BEFORE choosing method; cite Kahan 2023.

Missing the FAS vs ITT distinction

Trigger: Treating FAS and ITT as synonymous.
Mechanism: FAS may exclude eligibility failures or no-post-baseline; ITT cannot.
Symptom: Submission counts differ between FAS and ITT; reviewer asks for ITT.
Fix: Pre-specify both populations explicitly in protocol; report both with reconciliation.

Quantitative Thresholds

| Threshold | Source | Rationale | |-----------|--------|-----------| | m >= 100 * FMI imputations (linear rule of thumb) | von Hippel 2020 Sociol Methods Res (two-stage quadratic refinement) | Adequate for stable pooled SE; with 40% missingness and FMI ~0.3, m=30 needed | | SMD > 0.1 = meaningful imbalance | Austin 2009 Stat Med 28:3083 | Beyond what randomisation would normally produce | | Missing data > 40% on key variable | NRC 2010 | Above this, MI under MAR is unreliable; treat as hypothesis-generating | | Kenward-Roger DF correction for MMRM with UN | Kenward-Roger 1997 | Without it, MMRM-REML under-covers in small/moderate trials; Type-I inflates 1-2 pp | | Information-anchored vs frequentist variance for reference-based MI | Cro 2019 vs Wolbers 2022 | Active regulatory debate; report both for safety | | Tipping delta in residual SD units, not raw | FDA Division of Biometrics preference | Cross-trial comparison; report adjacent to raw | | Treatment policy default for cardio/HF/CV | Fleming 2025 Stat Med | Only strategy preserving randomisation; FDA-favoured |

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Reviewer: "what is the estimand?" with no answer | Method-before-estimand | Pre-specify 5 attributes in protocol; cite Kahan 2023 finding 98% don't articulate | | LOCF used as primary | Inertia or "conservative" misconception | LOCF is biased even under MCAR (Mallinckrodt 2008); switch to MMRM or MI | | MMRM with CS covariance treated as equivalent to UN | Convergence forced fallback without pre-specification | Pre-specify fallback hierarchy; document deviation if invoked | | Rubin's variance for J2R with no sensitivity | Cro 2019 information-anchored argument applied without acknowledgement | Cite Bartlett 2021 + Wolbers 2022; report frequentist variance via CMI+jackknife | | Tipping delta in raw outcome units only | Hard to compare cross-trial | Also report in residual SD units (FDA preference) | | ITT and FAS conflated | SAP vague on distinction | Pre-specify both with explicit criteria for FAS exclusion | | Per-protocol significant, ITT not — sponsor highlights PP | Post-randomisation bias inflation | ITT as primary; PP as sensitivity with explicit caveat | | Imputation only of outcome | Throws away covariates | Joint imputation of outcome + covariates; cite Carpenter-Roger 2013 | | sample_posterior silently ignored | Non-default estimator | Verify with imputer.estimator is BayesianRidge or use rbmi/mice |

Anticipated Reviewer Pushback

| Pushback | Response | |----------|----------| | "What is the estimand?" | Articulate 5 ICH E9(R1) attributes; cite Kahan 2023; state ICE strategy with mechanism (not just label). | | "Is MAR plausible?" | Examine DS for differential discontinuation patterns; if differential, switch to treatment-policy with retrieved-dropout MI or J2R sensitivity. | | "Why MMRM not LOCF?" | Mallinckrodt 2008/2014 case for MMRM; LOCF is biased even under MCAR. Cite DIASWG three-pillars doctrine. | | "Why Rubin's not frequentist variance for J2R?" | Cite Cro 2019 information-anchored argument as rationale; report frequentist (CMI+jackknife) as supportive per Wolbers 2022. | | "Tipping point analysis result?" | Minimum delta in active arm that flips p > 0.05; report in residual SD units; judge plausibility against MCID and treatment effect. | | "FAS vs ITT?" | Pre-specified in protocol; FAS excludes [explicit criteria]; both populations analysed; reconciliation table provided. | | "Has the SAP been registered?" | Yes — clinicaltrials.gov NCTxxxxx with full SAP appended; EU CTR EUCTxxxx; SPIRIT 2025 compliant. | | "How is subsequent therapy handled?" | Pre-specified per ICH E9(R1); composite strategy (e.g., subsequent therapy = treatment failure) or treatment-policy (include all data). | | "Where is the multiplicity adjustment for co-primary / key secondary?" | Bretz-Maurer graphical procedure via gMCP pre-specified in SAP; alpha allocation diagram in CSR appendix; cite CONSORT 2025 item 30 (limitations, multiplicity of analyses) + FDA Multiple Endpoints Final Oct 2022; see clinical-biostatistics/multiplicity-graphical. | | "How are missing baseline covariates handled in ANCOVA?" | Complete-case ANCOVA is unbiased under MCAR baseline missingness (White & Thompson 2005 Stat Med 24:993); proportion with complete baseline reported in flow diagram; if >5% missing, multiple imputation of baseline as sensitivity analysis. |

References

Bornkamp B, Rufibach K, Lin J, Liu Y, Mehrotra DV, Roychoudhury S, Schmidli H, Shentu Y, Wolbers M. 2021. Principal stratum strategy: potential role in drug development. Pharm Stat 20:737-751.
Carpenter JR, Roger JH, Kenward MG. 2013. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. J Biopharm Stat 23:1352-1371.
Cro S, Carpenter JR, Kenward MG. 2019. Information-anchored sensitivity analysis: theory and application. JRSS-A 182:623-645.
EMA. 2010. Guideline on Missing Data in Confirmatory Clinical Trials. EMA/CPMP/EWP/1776/99 Rev.1.
Fleming TR, Carroll KJ, Wittes JT, Emerson SS, Rothmann M, Collins S, Levin G. 2025. A perspective on the appropriate implementation of ICH E9(R1) addendum strategies for handling intercurrent events. Stat Med 44:e70104.
Hopewell S, Chan AW, Collins GS, et al. 2025. CONSORT 2025 statement. Lancet 405:1633-1640.
ICH. 2019. E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials. Step 4.
Kahan BC, Cro S, Li F, Harhay MO. 2023. Eliminating ambiguous treatment effects using estimands. Am J Epidemiol 192:987-994.
Kenward MG, Roger JH. 1997. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53:983-997.
Lipkovich I, Ratitch B, Mallinckrodt CH. 2020. Causal inference and estimands in clinical trials. Stat Biopharm Res 12:54-67.
Mallinckrodt CH, Lane PW, Schnell D, Peng Y, Mancuso JP. 2008. Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials. Drug Information Journal 42:303-319.
Sassi-Sayadi M, Verweij P, Cornelisse P. 2025. Regulatory experiences with the use of multiple imputation for missing data in a phase 3 confirmatory trial. Ther Innov Regul Sci.
Morris TP. 2026. Comment on Fleming et al. Stat Med e70455.
NRC. 2010. The Prevention and Treatment of Missing Data in Clinical Trials. National Academies Press.
Olarte Parra C, Daniel RM, Bartlett JW. 2022. Hypothetical estimands in clinical trials: a unification of causal inference and missing data methods. Stat Biopharm Res 15(2):421-432.
Permutt T. 2016. Sensitivity analysis for missing data in regulatory submissions. Stat Med 35:2876-2879.
Permutt T. 2020. Do covariates change the estimand? Stat Biopharm Res 12:45-53.
Sun S, Weber HJ, Butler E, Rufibach K, Roychoudhury S. 2021. Estimands in hematologic oncology trials. Pharm Stat 20(4):793-805.
von Hippel PT. 2020. How many imputations are needed (a two-stage calculation using a quadratic rule). Sociol Methods Res 49:699-718.
Wolbers M, Noci A, Delmar P, Gower-Page C, Yiu S, Bartlett JW. 2022. Standard and reference-based conditional mean imputation. Pharm Stat 21(6):1246-1257.

Related Skills

clinical-biostatistics/cdisc-data-handling - SDTM/ADaM data preparation feeding the analysis dataset
clinical-biostatistics/logistic-regression - Primary analysis models (binary) with marginal-vs-conditional
clinical-biostatistics/effect-measures - Effect-measure CIs under MI pooling
clinical-biostatistics/subgroup-analysis - Pre-specified subgroup analyses for the CSR
clinical-biostatistics/missing-data-sensitivity - MMRM, reference-based MI, tipping point (in depth)
clinical-biostatistics/multiplicity-graphical - Bretz-Maurer graphs for co-primary and key secondary
clinical-biostatistics/survival-analysis - Time-to-event estimand framing under ICH E9(R1)
clinical-biostatistics/power-and-sample-size - Sample size justification per CONSORT 2025 item 16a
reporting/rmarkdown-reports - Formatted statistical report generation
experimental-design/multiple-testing - General multiplicity correction

Version Compatibility

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Trial Reporting Under CONSORT 2025 + ICH E9(R1)

The Single Most Important Methodological Shift -- The Estimand Comes First

The framework: ICH E9(R1) Addendum (November 2019, EMA effective 30 July 2020, FDA May 2021) defines an estimand as the precise specification of what is being estimated, via five attributes:

Treatment condition -- what is being compared
Population -- which patients
Endpoint -- which variable
Population-level summary measure -- mean diff, OR, HR, RD
Intercurrent-event (ICE) handling strategy -- one of five

The order is non-negotiable: specify the estimand BEFORE choosing the statistical method. Choosing MMRM and retrofitting the estimand to match is the canonical error.

The Five Intercurrent-Event Strategies

Postdoc reading list:

Bornkamp, Rufibach, Lin, Liu, Mehrotra, Roychoudhury, Schmidli, Shentu, Wolbers 2021 Pharm Stat 20:737 — principal stratum critique
Olarte Parra, Daniel, Bartlett 2022 Stat Biopharm Res — proves MMRM-MAR IS a hypothetical estimator under specific causal assumptions
Permutt 2016 Stat Med 35:2876; 2020 Stat Biopharm Res 12:45 — FDA Missing Data Working Group taxonomy; "do covariates change the estimand?"
Fleming, Carroll, Wittes et al 2025 Stat Med — argue treatment policy is the only strategy preserving randomisation; critique hypothetical
Morris 2026 Stat Med — causal-perspective comment on Fleming
Lipkovich, Ratitch, Mallinckrodt 2020 Stat Biopharm Res — Rubin causal model connection

Decision Tree for Estimand Selection

MMRM -- The FDA-Favoured MAR Analysis

The mmrm R package (Roche / openpharma)

library(mmrm)
fit <- mmrm(
    formula = change_from_baseline ~ baseline + arm * visit + us(visit | subject),
    data = trial_data,
    method = "Kenward-Roger",  # or "Satterthwaite", "Kenward-Roger-Linear"
    reml = TRUE
)
summary(fit)  # treatment-by-visit contrast at primary timepoint

Convergence-vs-correctness trade-off

UN with KR (preferred)
UN with Satterthwaite (if KR fails)
Heterogeneous Toeplitz (k+1 parameters)
AR(1) with heterogeneous variances
CS with heterogeneous variances (last resort)

MMRM = hypothetical estimator (Olarte Parra unification)

Reference-Based Multiple Imputation -- The rbmi Framework

Carpenter, Roger, Kenward 2013 J Biopharm Stat 23:1352 — the canonical paper. Reference-based MI operationalises MNAR sensitivity not as a numeric delta but as a clinical narrative:

Jump-to-reference (J2R): "after withdrawal the patient instantly resembles the placebo arm"
Copy-reference (CR): "the entire post-baseline trajectory copied from placebo, with subject's baseline deviation preserved"
Copy-increments-in-reference (CIR): "the patient retains the on-treatment increment but trends with the placebo arm thereafter, anchored at last observed value"
Last-mean-carried-forward (LMCF): "patient stays at last on-treatment mean"

rbmi R package (Wolbers et al 2022 Pharm Stat 21(6):1246-1257; CRAN; insightsengineering):

library(rbmi)
# Draws -> Impute -> Analyse -> Pool pipeline
draws <- draws(data = trial_data, vars = vars,
               method = method_bayes(n_samples = 100))
imputed <- impute(draws, references = c('Active' = 'Placebo', 'Placebo' = 'Placebo'))
analyses <- analyse(imputed, fun = ancova,
                    vars = list(outcome = 'change', visit = 'avisit',
                                group = 'arm', covariates = c('baseline')))
result <- pool(analyses)  # Rubin's rules pooling

Four inference engines:

Bayesian MI + Rubin's rules (historical default) — information-anchored variance
Approximate Bayesian via REML + bootstrap — frequentist variance from bootstrap
Conditional mean imputation + jackknife (Wolbers 2022 contribution) — single deterministic imputation per ANCOVA-linearity theorem, jackknife for SE; FDA-friendly because deterministic + frequentist
BMLMI (bootstrapped MI of Lipkovich/Ratitch) — within/between variance decomposition

The Variance Debate -- Cro/Carpenter vs Bartlett/Wolbers

The single most active methodological argument in current biostatistics.

Permutt Tipping-Point Analysis -- The Analyst as Adversary

Delta-adjustment patterns:

One-arm shift (FDA preferred): add delta to imputed values in active arm only; vary delta from 0 to the value that nullifies the effect
Symmetric shift: both arms worsened by delta (probes systematic optimism)
Reverse shift: placebo improved by delta (more aggressive, rarely needed)

# rbmi with delta adjustment
delta <- delta_template(imputed, delta = c(0, 5, 10, 15, 20), dlag = c(1, 1, 1, 1))
adjusted <- analyse(imputed, delta = delta, ...)
# Report: minimum delta that flips p-value below 0.05

Decisive Regulatory Cases -- The 2020-2025 Casebook

Table 1 -- Baseline Characteristics

CONSORT 2010 discouraged baseline significance tests because randomisation is a known mechanism, not a hypothesis. Many journals still require them.

from tableone import TableOne

columns = ['age', 'sex', 'race', 'bmi', 'baseline_score', 'disease_stage']
categorical = ['sex', 'race', 'disease_stage']

table1 = TableOne(df, columns=columns, categorical=categorical,
                  groupby='ARM', pval=True, smd=True,
                  missing=True, overall=True)
print(table1.tabulate(tablefmt='github'))
table1.to_excel('table1.xlsx')

Use standardised mean differences (SMD) rather than p-values: SMD > 0.1 suggests meaningful imbalance regardless of statistical significance.

Analysis Populations -- ITT vs FAS vs PP vs Safety

itt = dm.copy()
pp = dm[dm['USUBJID'].isin(completers) & ~dm['USUBJID'].isin(protocol_violators)]
dosed = ex[ex['EXDOSE'] > 0]['USUBJID'].unique()
safety = dm[dm['USUBJID'].isin(dosed)]

Missing Data Mechanisms -- The Practical Framework

Standard MMRM-MAR Primary Analysis Code

Goal: Execute the FDA-preferred primary analysis for a continuous longitudinal endpoint under MAR with valid Type-I in small/moderate trials.

Approach: Fit MMRM with unstructured covariance + Kenward-Roger via the Roche/openpharma mmrm R package; in Python, use statsmodels.mixedlm as exploratory-only (lacks KR).

# Python is weak for MMRM — current state of the art is R `mmrm`
# For Python users, statsmodels.mixedlm is the closest alternative but lacks
# Kenward-Roger; consider rpy2 to call R from Python for confirmatory work.

import statsmodels.formula.api as smf
import pandas as pd

# Random intercept LMM (suboptimal vs MMRM but Python-native)
model = smf.mixedlm(
    'change ~ baseline + C(ARM) * C(VISIT)',
    data=df_long,
    groups=df_long['USUBJID']
).fit(reml=True)
# WARNING: this is NOT FDA-equivalent to MMRM with UN+KR
# For confirmatory work, use R `mmrm` package via rpy2 or fit in R directly

Multiple Imputation in Python (sklearn)

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
import statsmodels.formula.api as smf
import numpy as np

n_imputations = 20  # rule: m >= 100 * FMI (fraction of missing info)
imputer = IterativeImputer(max_iter=10, random_state=0, sample_posterior=True)

results = []
for i in range(n_imputations):
    imputer.set_params(random_state=i)
    imputed = pd.DataFrame(imputer.fit_transform(df[numeric_cols]), columns=numeric_cols)
    for col in ['ARM', 'sex']:
        imputed[col] = df[col].values
    model = smf.logit(
        'outcome ~ C(ARM, Treatment(reference="Placebo")) + age', data=imputed
    ).fit(disp=0)
    results.append({'coef': model.params.iloc[1], 'se': model.bse.iloc[1]})

# Rubin's rules
pooled_coef = np.mean([r['coef'] for r in results])
within_var = np.mean([r['se']**2 for r in results])
between_var = np.var([r['coef'] for r in results], ddof=1)
total_var = within_var + (1 + 1/n_imputations) * between_var
pooled_se = np.sqrt(total_var)
pooled_or = np.exp(pooled_coef)

Critical sklearn caveats:

sample_posterior=True is essential — without it all m imputations are nearly identical
sample_posterior=True only works with BayesianRidge (the default estimator). If estimator is changed (e.g., RandomForestRegressor), parameter is silently ignored and MI degenerates to single imputation
IterativeImputer treats binary covariates as continuous; consider miceforest for mixed types or move to R mice/rbmi
Only impute covariates and post-baseline outcomes, NOT treatment assignment (fully determined by randomisation)
Include outcome in imputation model as predictor but exclude from imputed variables to avoid circular dependency
IterativeImputer is experimental in sklearn — API may change without standard deprecation

Co-Primary Endpoints and Multiplicity

See clinical-biostatistics/multiplicity-graphical for the full Bretz-Maurer-Hommel treatment with gMCP.

Reconciliation: When Methods Disagree

CONSORT 2025 (April 2025) -- What Changed

Citation: Hopewell, Chan, Collins et al 2025. CONSORT 2025 statement. Lancet 405:1633-1640. E&E in BMJ 2025;389:e081124.

30-item checklist (was 25), 7 new items, 3 substantially revised, 1 deleted. Key new items relevant to statistical reporting:

Item 4 (data and code sharing — where/how de-identified IPD and analysis code can be accessed) — NEW
Item 15 (how harms were actually assessed — methods, attribution, intensity, monitoring rules) — NEW; absorbs CONSORT-Harms 2022
Item 21c (missing-data handling methods) — NEW; reflects NRC 2010 / Carpenter-Kenward consensus
Item 24a/24b (intervention/comparator actual delivery details) — absorbs TIDieR

No DOORS framework in CONSORT 2025 — diversity/equity additions are happening via a separate SAGER-SPIRIT-CONSORT alignment workstream (GENDRO/EASE).

CONSORT Flow Diagram

flow = {
    'screened': len(screening_log),
    'eligible': len(screening_log[screening_log['eligible']]),
    'randomized': len(dm),
    'allocated_drug': len(dm[dm['ARM'] == 'Drug']),
    'allocated_placebo': len(dm[dm['ARM'] == 'Placebo']),
    'completed_drug': len(dm[(dm['ARM'] == 'Drug') & dm['USUBJID'].isin(completers)]),
    'completed_placebo': len(dm[(dm['ARM'] == 'Placebo') & dm['USUBJID'].isin(completers)]),
    'analyzed_itt': len(itt),
    'analyzed_fas': len(fas),
    'analyzed_pp': len(pp),
    'analyzed_safety': len(safety),
}

CONSORT 2025 templates on consort-spirit.org now explicitly accommodate non-1:1 allocation, cluster, multi-arm, and crossover variants.

Per-Method Failure Modes

MMRM-MAR with high differential missingness

Trigger: Differential dropout between arms (e.g., toxic active vs tolerated placebo).
Mechanism: MAR assumption requires that conditional on observed covariates, missingness is unrelated to outcome — implausible when patients drop out because the treatment isn't working.
Symptom: Discontinuation reasons differ qualitatively between arms; primary p-value sensitive to model specification.
Fix: Treatment-policy estimand with retrieved-dropout MI as primary; J2R as sensitivity. Tipping-point delta in active arm (Permutt 2016).

Reference-based MI with Rubin's variance

Trigger: J2R/CR/CIR with Rubin's-rules variance reported as primary.
Mechanism: Rubin's variance is information-anchored but over-conservative (Cro 2019); under-rejects relative to nominal.
Symptom: Power loss vs frequentist variance; Type-I 0.9-2.5% vs nominal 5% in J2R simulations.
Fix: Report frequentist variance via CMI+jackknife as supportive (Wolbers 2022); cite both Cro and Bartlett.

sample_posterior=False in IterativeImputer

Trigger: Default behaviour or estimator changed away from BayesianRidge.
Mechanism: Imputations are point predictions, near-identical across draws; between-imputation variance approximately zero.
Symptom: Artificially narrow Rubin's-pooled CIs.
Fix: Always set sample_posterior=True; verify estimator is BayesianRidge; consider miceforest or R mice for mixed types.

Per-protocol as primary

Trigger: SAP specifies PP analysis as primary.
Mechanism: Excludes non-completers who may have dropped due to treatment failure; inflates effect.
Symptom: PP effect much larger than ITT; reviewers raise post-randomisation bias concern.
Fix: ITT or FAS as primary per ICH E9; PP as sensitivity only.

Method-before-estimand

Trigger: SAP describes MMRM, MI, or LOCF without specifying the estimand the method targets.
Mechanism: Retrofits the estimand to match the chosen method.
Symptom: Reviewer asks "what is the estimand?" and sponsor cannot articulate.
Fix: Per ICH E9(R1), define the 5 attributes BEFORE choosing method; cite Kahan 2023.

Missing the FAS vs ITT distinction

Trigger: Treating FAS and ITT as synonymous.
Mechanism: FAS may exclude eligibility failures or no-post-baseline; ITT cannot.
Symptom: Submission counts differ between FAS and ITT; reviewer asks for ITT.
Fix: Pre-specify both populations explicitly in protocol; report both with reconciliation.

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback

References

Bornkamp B, Rufibach K, Lin J, Liu Y, Mehrotra DV, Roychoudhury S, Schmidli H, Shentu Y, Wolbers M. 2021. Principal stratum strategy: potential role in drug development. Pharm Stat 20:737-751.
Carpenter JR, Roger JH, Kenward MG. 2013. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. J Biopharm Stat 23:1352-1371.
Cro S, Carpenter JR, Kenward MG. 2019. Information-anchored sensitivity analysis: theory and application. JRSS-A 182:623-645.
EMA. 2010. Guideline on Missing Data in Confirmatory Clinical Trials. EMA/CPMP/EWP/1776/99 Rev.1.
Fleming TR, Carroll KJ, Wittes JT, Emerson SS, Rothmann M, Collins S, Levin G. 2025. A perspective on the appropriate implementation of ICH E9(R1) addendum strategies for handling intercurrent events. Stat Med 44:e70104.
Hopewell S, Chan AW, Collins GS, et al. 2025. CONSORT 2025 statement. Lancet 405:1633-1640.
ICH. 2019. E9(R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials. Step 4.
Kahan BC, Cro S, Li F, Harhay MO. 2023. Eliminating ambiguous treatment effects using estimands. Am J Epidemiol 192:987-994.
Kenward MG, Roger JH. 1997. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53:983-997.
Lipkovich I, Ratitch B, Mallinckrodt CH. 2020. Causal inference and estimands in clinical trials. Stat Biopharm Res 12:54-67.
Mallinckrodt CH, Lane PW, Schnell D, Peng Y, Mancuso JP. 2008. Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials. Drug Information Journal 42:303-319.
Sassi-Sayadi M, Verweij P, Cornelisse P. 2025. Regulatory experiences with the use of multiple imputation for missing data in a phase 3 confirmatory trial. Ther Innov Regul Sci.
Morris TP. 2026. Comment on Fleming et al. Stat Med e70455.
NRC. 2010. The Prevention and Treatment of Missing Data in Clinical Trials. National Academies Press.
Olarte Parra C, Daniel RM, Bartlett JW. 2022. Hypothetical estimands in clinical trials: a unification of causal inference and missing data methods. Stat Biopharm Res 15(2):421-432.
Permutt T. 2016. Sensitivity analysis for missing data in regulatory submissions. Stat Med 35:2876-2879.
Permutt T. 2020. Do covariates change the estimand? Stat Biopharm Res 12:45-53.
Sun S, Weber HJ, Butler E, Rufibach K, Roychoudhury S. 2021. Estimands in hematologic oncology trials. Pharm Stat 20(4):793-805.
von Hippel PT. 2020. How many imputations are needed (a two-stage calculation using a quadratic rule). Sociol Methods Res 49:699-718.
Wolbers M, Noci A, Delmar P, Gower-Page C, Yiu S, Bartlett JW. 2022. Standard and reference-based conditional mean imputation. Pharm Stat 21(6):1246-1257.

Related Skills

clinical-biostatistics/cdisc-data-handling - SDTM/ADaM data preparation feeding the analysis dataset
clinical-biostatistics/logistic-regression - Primary analysis models (binary) with marginal-vs-conditional
clinical-biostatistics/effect-measures - Effect-measure CIs under MI pooling
clinical-biostatistics/subgroup-analysis - Pre-specified subgroup analyses for the CSR
clinical-biostatistics/missing-data-sensitivity - MMRM, reference-based MI, tipping point (in depth)
clinical-biostatistics/multiplicity-graphical - Bretz-Maurer graphs for co-primary and key secondary
clinical-biostatistics/survival-analysis - Time-to-event estimand framing under ICH E9(R1)
clinical-biostatistics/power-and-sample-size - Sample size justification per CONSORT 2025 item 16a
reporting/rmarkdown-reports - Formatted statistical report generation
experimental-design/multiple-testing - General multiplicity correction

Adoption

GPTomics/bio-clinical-biostatistics-trial-reporting

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Trial Reporting Under CONSORT 2025 + ICH E9(R1)

The Single Most Important Methodological Shift -- The Estimand Comes First

The Five Intercurrent-Event Strategies

Decision Tree for Estimand Selection

MMRM -- The FDA-Favoured MAR Analysis

The mmrm R package (Roche / openpharma)

Convergence-vs-correctness trade-off

MMRM = hypothetical estimator (Olarte Parra unification)

Reference-Based Multiple Imputation -- The rbmi Framework

The Variance Debate -- Cro/Carpenter vs Bartlett/Wolbers

Permutt Tipping-Point Analysis -- The Analyst as Adversary

Decisive Regulatory Cases -- The 2020-2025 Casebook

Table 1 -- Baseline Characteristics

Analysis Populations -- ITT vs FAS vs PP vs Safety

Missing Data Mechanisms -- The Practical Framework

Standard MMRM-MAR Primary Analysis Code

Multiple Imputation in Python (sklearn)

Co-Primary Endpoints and Multiplicity

Reconciliation: When Methods Disagree

CONSORT 2025 (April 2025) -- What Changed

CONSORT Flow Diagram

Per-Method Failure Modes

MMRM-MAR with high differential missingness

Reference-based MI with Rubin's variance

sample_posterior=False in IterativeImputer

Per-protocol as primary

Method-before-estimand

Missing the FAS vs ITT distinction

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-clinical-biostatistics-trial-reporting

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Trial Reporting Under CONSORT 2025 + ICH E9(R1)

The Single Most Important Methodological Shift -- The Estimand Comes First

The Five Intercurrent-Event Strategies

Decision Tree for Estimand Selection

MMRM -- The FDA-Favoured MAR Analysis

The mmrm R package (Roche / openpharma)

Convergence-vs-correctness trade-off

MMRM = hypothetical estimator (Olarte Parra unification)

Reference-Based Multiple Imputation -- The rbmi Framework

The Variance Debate -- Cro/Carpenter vs Bartlett/Wolbers

Permutt Tipping-Point Analysis -- The Analyst as Adversary

Decisive Regulatory Cases -- The 2020-2025 Casebook

Table 1 -- Baseline Characteristics

Analysis Populations -- ITT vs FAS vs PP vs Safety

Missing Data Mechanisms -- The Practical Framework

Standard MMRM-MAR Primary Analysis Code

Multiple Imputation in Python (sklearn)

Co-Primary Endpoints and Multiplicity

Reconciliation: When Methods Disagree

CONSORT 2025 (April 2025) -- What Changed

CONSORT Flow Diagram

Per-Method Failure Modes

MMRM-MAR with high differential missingness

Reference-based MI with Rubin's variance

sample_posterior=False in IterativeImputer

Per-protocol as primary

Method-before-estimand

Missing the FAS vs ITT distinction

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback