Version Compatibility

Reference examples tested with: R rpact 4.2+ (Wassmer/Brannath), gsDesign 3.6+ and gsDesign2 1.1+ (Anderson/Merck), adaptr, simtrial. Commercial: East/EastHorizon (Cytel), ADDPLAN (ICON), FACTS (Berry Consultants).

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('<pkg>') then ?function_name
Python adaptive packages are limited; R is the regulatory de facto standard

If code throws an error, introspect the installed package and adapt the example to match the actual API rather than retrying.

Adaptive Clinical Trial Designs

"Design an adaptive trial" -> Pre-specify a design with one or more interim adaptations (early stopping, sample-size re-estimation, treatment selection, population enrichment, randomisation ratio changes) that strongly controls Type-I error at the trial-wide level via combination tests or the Conditional Rejection Probability principle.

Regulatory Status -- The 2024-2026 Landscape

FDA 2019 Final Adaptive Designs Guidance (Federal Register 2019-25986, Dec 2 2019) finalised the 2010 and 2018 drafts. Recognises 5 design types: group-sequential, blinded SSR, unblinded SSR, adaptive enrichment, adaptive randomisation.

FDA 2022 Final Master Protocols Guidance (March 2022, NOT 2018 — common citation error): basket (one drug, many diseases), umbrella (multiple drugs, one disease), platform (perpetual, drugs enter/exit).

ICH E20 Adaptive Clinical Trials: Step 2b draft June 25 2025; Step 3 public consultation (EU deadline Nov 30 2025; FDA Federal Register Sept 30 2025); Step 4 final expected in 2026. As of May 2026, ICH E20 is NOT final. The EFPIA/PhRMA position paper preceded the formal ICH work; Berry Consultants public comment letter is one of the more important submissions.

FDA CDER Bayesian Methodology Draft (Jan 2026) (FDA-2025-D-3217): first-ever drug-side Bayesian guidance; permits Bayesian primary inference in pivotals with simulation-based Type-I error calibration.

Project Optimus (FDA OCE, 2021-2024): rewrites Phase I/II oncology by requiring randomised dose comparison before registration, replacing MTD-and-go. Made BOIN, mTPI-2, and multi-arm dose-finding the default.

Algorithmic Taxonomy

| Design type | Adaptation | Type-I preservation | Software | Strength | Fails when | |-------------|-----------|---------------------|----------|----------|------------| | Group-sequential (O'Brien-Fleming) | Early stopping for efficacy/futility | Boundary calculation; very conservative early, near-nominal at end | rpact, gsDesign | FDA's preferred adaptive design | More complex SAP; IDMC firewall essential | | Group-sequential (Pocock) | Early stopping | Constant nominal alpha at each look | rpact, gsDesign | Easy early stopping | Large penalty at final analysis | | Wang-Tsiatis power family | Early stopping | Parameterised by Delta | rpact | Tunable conservatism | Δ choice matters | | Lan-DeMets spending function | Early stopping (flexible timing) | Alpha-spending function | rpact, gsDesign | Operational flexibility; analyses don't need pre-specified number | FDA's de facto preferred framework | | Blinded SSR (Friede-Kieser 2006) | Re-estimate variance/event-rate; recompute n | No Type-I inflation; agency-uncontroversial | rpact | EMA/FDA endorsed | Variance estimate must be blinded | | Unblinded SSR (Cui-Hung-Wang 1999) | Increase n based on interim effect estimate | Requires CHW weights for control; or Mehta-Pocock promising zone | rpact | Recovers power if interim promising | IDMC firewall must be perfect; Jennison-Turnbull 2015 critique | | Mehta-Pocock promising zone (2011) | Increase n if conditional power in (0.3, 0.8) | Calibrated so Type-I inflation negligible (~0.001) | rpact | Operational simplicity | "Stealth alpha inflation" critique (Jennison 2015) | | Bauer-Köhne 1994 combination | Combine stagewise p-values via Fisher product | Any pre-specified design modification | rpact | Most flexible; theoretical foundation | Power loss vs designed group-sequential | | Müller-Schäfer 2001 CRP principle | Preserve null conditional rejection probability | Any adaptation at any time | rpact | Modern theoretical bedrock | Implementation complexity | | Adaptive enrichment | Drop sub-populations failing futility | Closed-test stage-wise | rpact, adaptr | Recovers power on responders | Selection bias on enriched population | | Response-adaptive randomisation | Update allocation probabilities | Stratification + time-trend covariates required | adaptr, FACTS | Patient-welfare; learn-and-confirm | Drift bias, estimator bias; controversial (Hey-Kimmelman 2015 ethics) | | Bayesian platform (I-SPY 2 style) | RAR + biomarker stratification + graduation criterion | Frequentist OCs via simulation | FACTS, custom Stan/JAGS | Modern oncology adaptive | Operational complexity; requires IDMC sophistication |

Postdoc reading list:

Bauer P, Köhne K 1994 Biometrics 50:1029 (combination test; original adaptive)
Cui L, Hung HMJ, Wang SJ 1999 Biometrics 55:853 (CHW weighted test for unblinded SSR)
Müller HH, Schäfer H 2001 Biometrics 57:886 (CRP principle — theoretical bedrock)
Mehta CR, Pocock SJ 2011 Stat Med 30:3267 (promising zone)
Jennison C, Turnbull BW 2015 Stat Med 34(29):3793-3810 (Mehta-Pocock critique)
Friede T, Kieser M 2006 Biom J 48:537 (blinded SSR)
Lan KKG, DeMets DL 1983 Biometrika 70:659 (alpha spending function)
O'Brien PC, Fleming TR 1979 Biometrics 35:549 (OBF boundary)
Pocock SJ 1977 Biometrika 64:191 (Pocock boundary)
Hey SP, Kimmelman J 2015 Clin Trials 12:102 (RAR ethics critique)
Berry DA 2015 commentary Clin Trials 12:107 (counter)
Robertson DS, Lee KM, López-Kolkovska BC, Villar SS 2023 Stat Sci 38:185 (canonical modern RAR review)
Wassmer G, Brannath W 2016 Group Sequential and Confirmatory Adaptive Designs in Clinical Trials (Springer)
Jennison C, Turnbull BW 2000 Group Sequential Methods with Applications to Clinical Trials (CRC)

Decision Tree by Scenario

| Scenario | Recommended approach | Why | |----------|---------------------|-----| | Confirmatory trial wanting interim early stopping | Group-sequential with O'Brien-Fleming boundaries via gsDesign | FDA-preferred; near-nominal final alpha | | Group-sequential with flexible look timing | Lan-DeMets spending function | Operational flexibility; FDA de facto preferred | | Phase 3 with uncertain nuisance parameter (variance, event rate) | Blinded SSR (Friede-Kieser) | No Type-I inflation; agency-uncontroversial | | Phase 3 wanting to increase n if interim shows promise | Mehta-Pocock promising zone with CHW weights | Recovers power; calibrated Type-I | | Seamless Phase 2/3 with arm selection | Bauer-Köhne combination test + closed testing | Most flexible; cite Müller-Schäfer CRP | | Adaptive enrichment (drop subpopulation) | Adaptive enrichment with closed-test stage-wise | Recovers power on responders | | Multi-arm oncology platform | Bayesian platform with RAR (I-SPY 2 model) | Patient-welfare argument strong for multi-arm | | 2-arm phase 3 oncology with potential RAR | Avoid RAR; group-sequential preferred | Hey-Kimmelman 2015 ethics + drift bias | | Continuous endpoint, treatment discontinuation, follow-up data available | Hybrid: J2R imputation for treatment-discontinuation ICEs, MMRM-MAR for other missingness | Aprocitentan PRECISION precedent (2024); FDA de facto standard 2024-2025 for treatment-policy estimands | | Phase 1 dose-finding | BOIN (FDA Fit-for-Purpose qualified 2021) | Transparent, tabulated decisions; no bedside Bayesian software | | Phase 1b/2 dose-optimisation (Project Optimus) | Multi-arm BOIN-12 or multi-dose randomised | FDA Aug 2024 final dose-optimisation guidance | | Basket trial (one drug, multiple diseases) | EXNEX or robust MAP via RBesT | Borrows across baskets while permitting one to detach | | Umbrella trial (one disease, multiple drugs) | Bayesian platform with shared control | FDA Master Protocols 2022 | | Pediatric extrapolation borrowing from adults | Power prior with discount γ in 0.3-0.6 | FDA Bayesian Jan 2026 draft endorses |

Group-Sequential Designs

O'Brien-Fleming -- the regulatory default

library(gsDesign)

# OBF boundaries; 3 interim looks at 33%, 67%, 100% information
design <- gsDesign(
    k = 4,                  # total analyses including final
    test.type = 1,          # 1-sided efficacy
    alpha = 0.025,
    beta = 0.10,            # power = 0.90
    sfu = sfLDOF,           # Lan-DeMets approximation of OBF
    timing = c(0.25, 0.50, 0.75, 1.0)
)
print(design)
plot(design)

OBF is very conservative at early looks (nominal alpha approximately 0.0001 at 25% info) and near-nominal at final analysis (~0.024 of 0.025). Preferred by FDA because the final-analysis penalty is small.

Pocock -- constant nominal

Constant nominal alpha at each look. Easy early stopping but large final-analysis penalty (~0.018 of 0.025 with k=4). Rarely used in confirmatory.

Lan-DeMets spending function -- the modern flexibility

# Lan-DeMets OBF-like spending function (sfLDOF)
# Allows analysis timing to differ from pre-specified
design_flex <- gsDesign(
    k = 3,
    sfu = sfLDOF,      # OBF-like spending
    alpha = 0.025,
    beta = 0.10
)

# Actual analyses can occur at different information fractions
# Spending function returns alpha to spend at each look based on actual timing

The flexibility: sponsor can perform analyses at different information fractions than originally planned. FDA's de facto preferred framework.

Sample-size for group-sequential

# Time-to-event group-sequential
library(gsDesign)
n_gs <- gsSurv(
    k = 3,
    test.type = 2,    # 2-sided
    alpha = 0.025,
    beta = 0.10,
    sfu = sfLDOF,
    lambdaC = 0.04,   # control hazard per month
    hr = 0.70,        # treatment HR
    eta = 0.005,      # dropout hazard
    T = 24,           # total study duration
    minfup = 12       # minimum follow-up
)
print(n_gs)

Sample-Size Re-Estimation

Blinded SSR (Friede-Kieser 2006)

Re-estimate nuisance parameter (variance σ² for continuous, control event rate p_0 for binary, overall event rate for survival) from blinded interim data. No Type-I error inflation when test statistic ignores the SSR.

library(rpact)
# Blinded SSR for continuous outcome
design_blinded_ssr <- getDesignGroupSequential(
    kMax = 2,
    alpha = 0.025,
    beta = 0.20,
    sided = 1,
    informationRates = c(0.5, 1)
)

# At interim, re-estimate variance and recompute n
# (manual implementation; rpact has built-in support via getDesignInverseNormal for unblinded)

EMA Reflection Paper 2007 and FDA 2019 explicitly endorse blinded SSR. Uncontroversial.

Unblinded SSR (Cui-Hung-Wang 1999)

Interim effect estimate triggers sample-size change. Type-I inflation if naive: Cui-Hung-Wang showed 8% Type-I vs 2.5% target.

The Cui-Hung-Wang weighted test uses pre-specified weights from the original design:

Z_weighted = w_1 * Z_1 + w_2 * Z_2_residual

where w_1, w_2 are the pre-specified weights (based on original n_1, n_2) and Z_2_residual is the test statistic on the data after the interim. Pre-specified weights preserve alpha even if the actual n at stage 2 differs.

library(rpact)
design_unblinded_ssr <- getDesignInverseNormal(
    kMax = 2,
    alpha = 0.025,
    beta = 0.20,
    sided = 1,
    informationRates = c(0.5, 1),
    typeOfDesign = 'WT',  # Wang-Tsiatis power family
    deltaWT = 0.25
)

# Use inverse normal combination for adaptive SSR
analysis_result <- getAnalysisResults(
    design_unblinded_ssr,
    dataInput = getDataMeans(...)
)

Mehta-Pocock Promising Zone (2011)

At interim, compute conditional power (CP) given observed effect:

Unfavourable zone (CP < ~30%): stop or continue without modification
Promising zone (CP in 30-80%): increase n to recover power; NO Type-I penalty if increase rule pre-specified and uses original test statistic with original weights
Favourable zone (CP > 80%): continue without change

# rpact implementation
# Sample size recalculation in promising zone
n_increased <- getSampleSizeMeans(
    design_unblinded_ssr,
    alternative = 5,        # detect mean diff of 5
    stDev = 12,
    groups = 2
)

The mathematical sleight: promising zone is constructed so unconditional Type-I error inflation is negligible (~0.001) even WITHOUT CHW weighting. Jennison-Turnbull 2015 critique: stealth alpha inflation in unpublished simulation assumptions; inefficient relative to CHW-weighted GSD. Mehta defends on operational grounds.

Edwards et al 2020 Trials 21:1000 is the systematic review.

Combination Tests and CRP Principle

Bauer-Köhne 1994 Biometrics 50:1029: combine stagewise p-values via Fisher's product test. Permits design modifications post-interim while controlling Type-I error.

Müller-Schäfer 2001 Biometrics 57:886: Conditional Rejection Probability (CRP) principle — preserve the null conditional rejection probability at every adaptation, and unconditional Type-I is preserved. The theoretical bedrock of all post-2001 confirmatory adaptive designs.

Müller-Schäfer 2004 Stat Med 23:2497 extended to ANY design change at ANY time.

# rpact natively supports combination tests
design_comb <- getDesignFisher(
    kMax = 3,
    alpha = 0.025,
    sided = 1
)

# Or inverse normal combination
design_inv_norm <- getDesignInverseNormal(
    kMax = 3,
    alpha = 0.025,
    informationRates = c(0.33, 0.67, 1.0)
)

Adaptive Enrichment

Drop sub-populations failing futility; re-power on responders. Closed-test stage-wise to control familywise error across full and enriched populations.

# rpact: enrichment design via getDesignEnrichmentSubgroup
# Standard implementation requires explicit definition of full population (F)
# and enriched population (S)

Postdoc concern: selection bias on the enriched population — the observed treatment effect on the enriched subgroup is biased upward by selection. Bias-correction via simulation or hierarchical Bayesian.

Response-Adaptive Randomisation -- The Ethics Fight

Hey & Kimmelman 2015 Clin Trials 12:102 "Are outcome-adaptive allocation trials ethical?" Argued RAR's purported ethical advantage (equipoise, sub-optimal exposure minimisation) fails in two-arm and early-phase settings because:

Drift bias inflates Type-I error / biases estimates (time trends confounded with allocation)
Consent dynamics confused — patients believe allocation is "personalised" when stochastic
Marginal patient-welfare benefit is statistical and small while operational risks real

Counter-arguments:

Berry DA 2015 commentary Clin Trials 12:107: RAR enables learn-and-confirm, multi-arm platforms (I-SPY 2 model) where the patient-welfare argument IS the point and equal allocation would be unethical given accumulating evidence.
Saville & Berry 2016 Clin Trials 13:358: RAR's operating characteristics are competitive in multi-arm platforms. Drift bias and Type-I inflation are controlled by adjusting for temporal trends (time-trend covariates) alongside stratification and proper analysis weights -- the "Bayesian time machine" approach developed in later work.
Buyse 2015 Clin Trials 12:119: Hey-Kimmelman correct for 2-arm but wrong for multi-arm.

Consensus position (2020s; ICH E20): RAR appropriate when (a) multi-arm (>=3 arms), (b) rare disease / limited pool, (c) strong PoC of differential biomarker response, (d) robust drift-bias adjustment and pre-specified analysis weights. Inappropriate for confirmatory 2-arm trials.

Robertson, Lee, López-Kolkovska, Villar 2023 Stat Sci 38:185 ("Response-adaptive randomization: from myths to practical considerations") is the canonical modern review settling the debate.

Bayesian Platform Trials

I-SPY 2 (Barker-Sigman 2009 Clin Pharmacol Ther; Park-Liu 2016 NEJM 375:11): neoadjuvant breast cancer; 10 biomarker-defined subtypes × multiple arms; Bayesian RAR; graduation criterion = posterior predictive probability of success in 300-patient Phase 3 ≥ 85%. Berry Consultants designed the engine. Multiple drugs graduated (neratinib, veliparib, pembrolizumab).

GBM AGILE (Alexander 2018; published readouts beginning 2024): glioblastoma; response-adaptive Bayesian; first global registrational platform in neuro-oncology. Regorafenib readout 2025 JCO JCO-25-01137.

REMAP-CAP (Angus 2020 JAMA): severe pneumonia, repurposed for COVID-19 in 2020; Bayesian factorial multi-domain design — multiple intervention domains tested simultaneously and combinatorially. Generated corticosteroid signal in COVID independently of RECOVERY.

Drop-the-loser vs promising-the-winner

Adaptive arm-dropping (futility): Bayesian posterior probability of beating control drops below threshold -> arm closes. Mathematically straightforward; FDA-acceptable.
"Promising-the-winner" (graduate to Phase 3): introduces selection bias. Bias-adjusted estimators (Robertson 2023; conditional MLE) now standard in I-SPY 2 reports.

Phase I Dose-Finding -- BOIN, mTPI, CRM

| Design | Citation | Idea | Where it wins | |--------|----------|------|---------------| | CRM | O'Quigley-Pepe-Fisher 1990 | Single-parameter logistic/power model; updates posterior MTD probability after each cohort | Statistically efficient; skeleton calibration needed | | EWOC | Babb-Rogatko-Zacks 1998 | CRM-like with explicit overdose-control constraint (P(dose > MTD) <= 0.25) | Safer than CRM in small trials | | mTPI | Ji et al 2010 Clin Trials 7:653 | Beta-binomial; UPM decision rule on under/proper/over-dosing intervals | Pre-tabulated decisions; documented over-shoot bias | | mTPI-2 / Keyboard | Guo-Wang-Yang-Lynn-Ji 2017 | Fixes mTPI Ockham bias by equal-width intervals | Default mTPI replacement | | BOIN | Liu-Yuan 2015 J R Stat Soc C 64:507 | Pre-tabulated escalation interval bounds optimised to minimise incorrect-decision probability | FDA Fit-for-Purpose qualified Dec 2021; near-CRM with no bedside software |

Why FDA prefers BOIN operationally: qualified as Fit-for-Purpose under FDA's Drug Development Tools program (FDA Determination Letter, December 10, 2021). Investigator uses pre-printed escalation table — no real-time Bayesian software at the bedside.

R packages: BOIN, dfcrm (Cheung — author of CRM textbook), trialr (Brock — includes EffTox), escalation (Brock — unified framework).

Reconciliation: When Methods Disagree

| Pattern | Likely cause | Action | |---------|--------------|--------| | Blinded SSR n vs unblinded SSR n differ substantially | Unblinded SSR uses interim effect estimate; blinded uses nuisance parameter only | Blinded is Type-I-clean; unblinded requires CHW weighting; pre-specify the approach in SAP | | Group-sequential rejects at interim; Cui-Hung-Wang weighted final test does not | Naive interim rejection used original test statistic; CHW weights downweight late data | Pre-specify boundary and weights; do NOT switch tests mid-stream | | Mehta-Pocock promising-zone vs CHW-weighted GSD give different n increases | Promising zone calibrated for Type-I (~0.001 inflation); CHW more efficient under known effect | Jennison-Turnbull 2015 critique: promising zone "stealth alpha"; pre-specify with simulation OCs | | Adaptive enrichment selects subgroup at interim; replication shows smaller effect | Selection bias on enriched population (winner's curse) | Bias-correction via conditional MLE or hierarchical Bayesian; cite Robertson 2023 | | RAR posterior allocation favours active in 2-arm trial; randomisation drift bias suspected | Time trends confounded with allocation changes | Pre-specify time-trend covariates in analysis; use proper analysis weights; cite Robertson 2023 RAR consensus (RAR INAPPROPRIATE for 2-arm confirmatory) | | BOIN vs CRM choose different MTD on same data | CRM uses model; BOIN uses tabulated boundaries; differ when skeleton mis-calibrated | BOIN Fit-for-Purpose qualified (Dec 2021); CRM more efficient under correct skeleton; report OCs over both | | I-SPY 2 graduation criterion met but Phase 3 replication fails | Selection bias on graduated arm; PP threshold not bias-corrected | Apply conditional MLE; cite Robertson 2023; report both raw and bias-corrected estimates | | Müller-Schäfer CRP preserved but ad hoc rule appears Type-I-inflated in simulation | Implementation deviation from formal CRP | Verify CRP equation precisely; report OCs via simulation; cite Müller-Schäfer 2001 |

Per-Method Failure Modes

Unblinded SSR with naive sample increase

Trigger: Sponsor increases n based on interim effect without CHW weighting.
Mechanism: Type-I inflation up to 8% (Cui-Hung-Wang 1999 simulation).
Symptom: Independent reanalysis finds Type-I > 5%.
Fix: Pre-specify CHW weights from original design; use combination test in rpact.

Mehta-Pocock promising-zone "stealth alpha"

Trigger: Promising-zone applied without sufficient simulation.
Mechanism: Jennison-Turnbull 2015 critique — Type-I inflation hidden in unpublished simulation assumptions.
Symptom: Independent reanalysis finds Type-I 5.3% vs nominal 5%.
Fix: Pre-specify increase rule transparently; report simulation OCs.

RAR drift bias

Trigger: RAR in trial with time trends (calendar effects, learning curves).
Mechanism: Time trends confounded with allocation changes; biased effect estimate.
Symptom: Effect estimate sensitive to time-trend adjustment.
Fix: Pre-specify time-trend covariates in analysis; use proper analysis weights; cite Robertson 2023.

Schoenfeld formula under immunotherapy delayed effect

Trigger: Sample size calculated via Schoenfeld 1981 assuming PH.
Mechanism: Delayed effect violates PH; events under-estimated by 20-50%.
Symptom: Trial under-powered; observed events insufficient.
Fix: Lakatos 1988 or simulation under expected HR(t); cite Lin 2020 NPH Working Group.

IDMC firewall failure in unblinded SSR (IDMC = Independent Data Monitoring Committee; the regulatory-standard term)

Trigger: Interim effect estimate leaks beyond IDMC.
Mechanism: Sponsor inference from increase decision reveals direction of interim effect.
Symptom: Regulator audit reveals unblinding.
Fix: Strict firewall SOP; only "increase / no increase" communicated to sponsor; cite ICH E20.

Adaptive enrichment selection bias

Trigger: Enriched population effect reported without bias correction.
Mechanism: Selection on subgroup with promising interim effect inflates estimate.
Symptom: Independent replication on enriched subgroup gives smaller effect.
Fix: Bias-correction via simulation or hierarchical Bayesian; cite Robertson 2023 for bias-adjusted estimation.

RAR in 2-arm confirmatory

Trigger: RAR applied to confirmatory 2-arm trial.
Mechanism: Hey-Kimmelman 2015 critique — drift bias, consent confusion, marginal benefit.
Symptom: Reviewer rejects RAR as inappropriate for setting.
Fix: Group-sequential with futility/efficacy boundaries instead; cite Robertson 2023 consensus.

CRM skeleton mis-calibration

Trigger: CRM applied with default skeleton without simulation.
Mechanism: Skeleton dictates target dose; mis-calibration biases MTD.
Symptom: MTD selection differs systematically from clinical expectation.
Fix: Calibrate skeleton via Lee-Cheung 2009 indifference-interval method; or switch to BOIN.

Quantitative Thresholds

| Threshold | Source | Rationale | |-----------|--------|-----------| | FDA Fit-for-Purpose BOIN qualification (Dec 2021) | FDA Drug Development Tools program | First dose-finding design with formal FDA endorsement | | Mehta-Pocock promising zone CP 30-80% | Mehta-Pocock 2011 | Mathematical calibration for Type-I preservation | | RAR appropriate >= 3 arms | Robertson 2023 consensus | Multi-arm patient-welfare argument | | OBF nominal alpha ~0.024 at final / 0.025 | gsDesign | Small final penalty preferred by FDA | | Schoenfeld under non-PH under-estimates 20-50% | Lin 2020 NPH WG | Use Lakatos or simulation | | I-SPY 2 graduation: PP success in Phase 3 >= 85% | Barker 2009 | Bayesian platform standard | | Power prior discount γ 0.3-0.6 for pediatric extrapolation | FDA Bayesian Jan 2026 draft | Partial borrowing default |

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Unblinded SSR with naive sample increase | No CHW weighting | Pre-specify CHW weights; cite Cui-Hung-Wang 1999 | | Mehta-Pocock without simulation OCs | Stealth alpha inflation | Report simulation OCs; cite Jennison 2015 | | RAR in 2-arm confirmatory | Misapplication | Group-sequential instead; cite Robertson 2023 | | Schoenfeld for immuno-oncology | PH assumption violated | Lakatos or simulation; cite Lin 2020 | | Adaptive enrichment effect reported uncorrected | Selection bias | Bias-correction; cite Robertson 2023 | | CRM with default skeleton | Mis-calibration | Calibrate via Lee-Cheung 2009 or switch to BOIN | | ICH E20 cited as "finalised April 2024" | Confusion with EFPIA position paper | ICH E20 is Step 2b/3 draft (June 2025); not final | | FDA Master Protocols "2018" | 2018 was draft | March 2022 was the final | | Bauer-Köhne combination test as "old-fashioned" | Misunderstanding | Foundational; cited in modern combination-test implementations | | Stop-for-efficacy at first interim with OBF | OBF nominal alpha ~0.0001 at 25% info | Trial must show very strong evidence to stop early; expected |

Anticipated Reviewer Pushback

| Pushback | Response | |----------|----------| | "How is Type-I error controlled?" | Closed testing via Müller-Schäfer CRP principle; specific implementation is inverse normal combination test in rpact | | "Why these boundaries?" | OBF via Lan-DeMets sfLDOF spending function; preserves final-analysis power; pre-specified in SAP | | "Pre-specification of SSR rule?" | Promising zone CP in (0.3, 0.8) triggers increase to n_max via CHW-weighted statistic; pre-specified n_max in SAP | | "IDMC firewall?" | IDMC receives interim effect estimate; sponsor receives only "increase / no increase" decision; SOP documented; pre-specified | | "RAR ethics?" | Multi-arm (4 arms) setting; Berry 2015 consensus that patient-welfare argument valid; drift-bias adjustment in primary analysis | | "Promising zone vs CHW-weighted GSD?" | Operational simplicity preferred; OCs from simulation confirm Type-I ~5%; supportive Cui-Hung-Wang analysis | | "Adaptive enrichment bias?" | Bias-correction via simulation; conditional MLE for enriched-population effect; cite Robertson 2023 | | "Phase 1 BOIN vs CRM?" | BOIN Fit-for-Purpose qualified by FDA Dec 2021; tabulated decisions; no bedside Bayesian software |

References

Babb J, Rogatko A, Zacks S. 1998. Cancer Phase I clinical trials: efficient dose escalation with overdose control. Stat Med 17:1103-1120.
Bauer P, Köhne K. 1994. Evaluation of experiments with adaptive interim analyses. Biometrics 50:1029-1041.
Berry DA. 2015. Commentary on Hey & Kimmelman. Clin Trials 12:107-109.
Cui L, Hung HMJ, Wang SJ. 1999. Modification of sample size in group sequential clinical trials. Biometrics 55:853-857.
FDA. 2019. Adaptive Designs for Clinical Trials of Drugs and Biologics. Final Guidance.
FDA. 2022. Master Protocols: Efficient Clinical Trial Design Strategies to Expedite Development of Oncology Drugs and Biologics. Final Guidance, March 2022.
FDA. 2026. Use of Bayesian Methodology in Clinical Trials. Draft Guidance, January 2026.
Friede T, Kieser M. 2006. Sample size recalculation in internal pilot study designs. Biom J 48:537-555.
Hey SP, Kimmelman J. 2015. Are outcome-adaptive allocation trials ethical? Clin Trials 12:102-106.
Jennison C, Turnbull BW. 2015. Adaptive sample size modification in clinical trials: start small then ask for more? Stat Med 34(29):3793-3810.
Lan KKG, DeMets DL. 1983. Discrete sequential boundaries for clinical trials. Biometrika 70:659-663.
Liu S, Yuan Y. 2015. Bayesian optimal interval designs for phase I clinical trials. JRSS-C 64:507-523.
Mehta CR, Pocock SJ. 2011. Adaptive increase in sample size when interim results are promising. Stat Med 30:3267-3284.
Müller HH, Schäfer H. 2001. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics 57:886-891.
O'Brien PC, Fleming TR. 1979. A multiple testing procedure for clinical trials. Biometrics 35:549-556.
O'Quigley J, Pepe M, Fisher L. 1990. Continual reassessment method: a practical design for phase 1 clinical trials in cancer. Biometrics 46:33-48.
Pocock SJ. 1977. Group sequential methods in the design and analysis of clinical trials. Biometrika 64:191-199.
Robertson DS, Lee KM, López-Kolkovska BC, Villar SS. 2023. Response-adaptive randomization in clinical trials: from myths to practical considerations. Stat Sci 38:185-208.
Wassmer G, Brannath W. 2016. Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Springer.

Related Skills

clinical-biostatistics/power-and-sample-size - Sample size for adaptive designs
clinical-biostatistics/multiplicity-graphical - Closed testing in adaptive contexts
clinical-biostatistics/bayesian-trials - Bayesian platform trials, BOIN/CRM/EWOC
clinical-biostatistics/trial-reporting - Reporting adaptive trial results per CONSORT 2025
clinical-biostatistics/survival-analysis - Adaptive designs for TTE endpoints
experimental-design/sample-size - General sample-size methods

Version Compatibility

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('<pkg>') then ?function_name
Python adaptive packages are limited; R is the regulatory de facto standard

If code throws an error, introspect the installed package and adapt the example to match the actual API rather than retrying.

Adaptive Clinical Trial Designs

Regulatory Status -- The 2024-2026 Landscape

Algorithmic Taxonomy

Postdoc reading list:

Bauer P, Köhne K 1994 Biometrics 50:1029 (combination test; original adaptive)
Cui L, Hung HMJ, Wang SJ 1999 Biometrics 55:853 (CHW weighted test for unblinded SSR)
Müller HH, Schäfer H 2001 Biometrics 57:886 (CRP principle — theoretical bedrock)
Mehta CR, Pocock SJ 2011 Stat Med 30:3267 (promising zone)
Jennison C, Turnbull BW 2015 Stat Med 34(29):3793-3810 (Mehta-Pocock critique)
Friede T, Kieser M 2006 Biom J 48:537 (blinded SSR)
Lan KKG, DeMets DL 1983 Biometrika 70:659 (alpha spending function)
O'Brien PC, Fleming TR 1979 Biometrics 35:549 (OBF boundary)
Pocock SJ 1977 Biometrika 64:191 (Pocock boundary)
Hey SP, Kimmelman J 2015 Clin Trials 12:102 (RAR ethics critique)
Berry DA 2015 commentary Clin Trials 12:107 (counter)
Robertson DS, Lee KM, López-Kolkovska BC, Villar SS 2023 Stat Sci 38:185 (canonical modern RAR review)
Wassmer G, Brannath W 2016 Group Sequential and Confirmatory Adaptive Designs in Clinical Trials (Springer)
Jennison C, Turnbull BW 2000 Group Sequential Methods with Applications to Clinical Trials (CRC)

Decision Tree by Scenario

Group-Sequential Designs

O'Brien-Fleming -- the regulatory default

library(gsDesign)

# OBF boundaries; 3 interim looks at 33%, 67%, 100% information
design <- gsDesign(
    k = 4,                  # total analyses including final
    test.type = 1,          # 1-sided efficacy
    alpha = 0.025,
    beta = 0.10,            # power = 0.90
    sfu = sfLDOF,           # Lan-DeMets approximation of OBF
    timing = c(0.25, 0.50, 0.75, 1.0)
)
print(design)
plot(design)

Pocock -- constant nominal

Constant nominal alpha at each look. Easy early stopping but large final-analysis penalty (~0.018 of 0.025 with k=4). Rarely used in confirmatory.

Lan-DeMets spending function -- the modern flexibility

# Lan-DeMets OBF-like spending function (sfLDOF)
# Allows analysis timing to differ from pre-specified
design_flex <- gsDesign(
    k = 3,
    sfu = sfLDOF,      # OBF-like spending
    alpha = 0.025,
    beta = 0.10
)

# Actual analyses can occur at different information fractions
# Spending function returns alpha to spend at each look based on actual timing

The flexibility: sponsor can perform analyses at different information fractions than originally planned. FDA's de facto preferred framework.

Sample-size for group-sequential

# Time-to-event group-sequential
library(gsDesign)
n_gs <- gsSurv(
    k = 3,
    test.type = 2,    # 2-sided
    alpha = 0.025,
    beta = 0.10,
    sfu = sfLDOF,
    lambdaC = 0.04,   # control hazard per month
    hr = 0.70,        # treatment HR
    eta = 0.005,      # dropout hazard
    T = 24,           # total study duration
    minfup = 12       # minimum follow-up
)
print(n_gs)

Sample-Size Re-Estimation

Blinded SSR (Friede-Kieser 2006)

library(rpact)
# Blinded SSR for continuous outcome
design_blinded_ssr <- getDesignGroupSequential(
    kMax = 2,
    alpha = 0.025,
    beta = 0.20,
    sided = 1,
    informationRates = c(0.5, 1)
)

# At interim, re-estimate variance and recompute n
# (manual implementation; rpact has built-in support via getDesignInverseNormal for unblinded)

EMA Reflection Paper 2007 and FDA 2019 explicitly endorse blinded SSR. Uncontroversial.

Unblinded SSR (Cui-Hung-Wang 1999)

Interim effect estimate triggers sample-size change. Type-I inflation if naive: Cui-Hung-Wang showed 8% Type-I vs 2.5% target.

The Cui-Hung-Wang weighted test uses pre-specified weights from the original design:

Z_weighted = w_1 * Z_1 + w_2 * Z_2_residual

library(rpact)
design_unblinded_ssr <- getDesignInverseNormal(
    kMax = 2,
    alpha = 0.025,
    beta = 0.20,
    sided = 1,
    informationRates = c(0.5, 1),
    typeOfDesign = 'WT',  # Wang-Tsiatis power family
    deltaWT = 0.25
)

# Use inverse normal combination for adaptive SSR
analysis_result <- getAnalysisResults(
    design_unblinded_ssr,
    dataInput = getDataMeans(...)
)

Mehta-Pocock Promising Zone (2011)

At interim, compute conditional power (CP) given observed effect:

Unfavourable zone (CP < ~30%): stop or continue without modification
Promising zone (CP in 30-80%): increase n to recover power; NO Type-I penalty if increase rule pre-specified and uses original test statistic with original weights
Favourable zone (CP > 80%): continue without change

# rpact implementation
# Sample size recalculation in promising zone
n_increased <- getSampleSizeMeans(
    design_unblinded_ssr,
    alternative = 5,        # detect mean diff of 5
    stDev = 12,
    groups = 2
)

Edwards et al 2020 Trials 21:1000 is the systematic review.

Combination Tests and CRP Principle

Bauer-Köhne 1994 Biometrics 50:1029: combine stagewise p-values via Fisher's product test. Permits design modifications post-interim while controlling Type-I error.

Müller-Schäfer 2004 Stat Med 23:2497 extended to ANY design change at ANY time.

# rpact natively supports combination tests
design_comb <- getDesignFisher(
    kMax = 3,
    alpha = 0.025,
    sided = 1
)

# Or inverse normal combination
design_inv_norm <- getDesignInverseNormal(
    kMax = 3,
    alpha = 0.025,
    informationRates = c(0.33, 0.67, 1.0)
)

Adaptive Enrichment

Drop sub-populations failing futility; re-power on responders. Closed-test stage-wise to control familywise error across full and enriched populations.

# rpact: enrichment design via getDesignEnrichmentSubgroup
# Standard implementation requires explicit definition of full population (F)
# and enriched population (S)

Response-Adaptive Randomisation -- The Ethics Fight

Drift bias inflates Type-I error / biases estimates (time trends confounded with allocation)
Consent dynamics confused — patients believe allocation is "personalised" when stochastic
Marginal patient-welfare benefit is statistical and small while operational risks real

Counter-arguments:

Berry DA 2015 commentary Clin Trials 12:107: RAR enables learn-and-confirm, multi-arm platforms (I-SPY 2 model) where the patient-welfare argument IS the point and equal allocation would be unethical given accumulating evidence.
Saville & Berry 2016 Clin Trials 13:358: RAR's operating characteristics are competitive in multi-arm platforms. Drift bias and Type-I inflation are controlled by adjusting for temporal trends (time-trend covariates) alongside stratification and proper analysis weights -- the "Bayesian time machine" approach developed in later work.
Buyse 2015 Clin Trials 12:119: Hey-Kimmelman correct for 2-arm but wrong for multi-arm.

Robertson, Lee, López-Kolkovska, Villar 2023 Stat Sci 38:185 ("Response-adaptive randomization: from myths to practical considerations") is the canonical modern review settling the debate.

Bayesian Platform Trials

Drop-the-loser vs promising-the-winner

Adaptive arm-dropping (futility): Bayesian posterior probability of beating control drops below threshold -> arm closes. Mathematically straightforward; FDA-acceptable.
"Promising-the-winner" (graduate to Phase 3): introduces selection bias. Bias-adjusted estimators (Robertson 2023; conditional MLE) now standard in I-SPY 2 reports.

Phase I Dose-Finding -- BOIN, mTPI, CRM

R packages: BOIN, dfcrm (Cheung — author of CRM textbook), trialr (Brock — includes EffTox), escalation (Brock — unified framework).

Reconciliation: When Methods Disagree

Per-Method Failure Modes

Unblinded SSR with naive sample increase

Trigger: Sponsor increases n based on interim effect without CHW weighting.
Mechanism: Type-I inflation up to 8% (Cui-Hung-Wang 1999 simulation).
Symptom: Independent reanalysis finds Type-I > 5%.
Fix: Pre-specify CHW weights from original design; use combination test in rpact.

Mehta-Pocock promising-zone "stealth alpha"

Trigger: Promising-zone applied without sufficient simulation.
Mechanism: Jennison-Turnbull 2015 critique — Type-I inflation hidden in unpublished simulation assumptions.
Symptom: Independent reanalysis finds Type-I 5.3% vs nominal 5%.
Fix: Pre-specify increase rule transparently; report simulation OCs.

RAR drift bias

Trigger: RAR in trial with time trends (calendar effects, learning curves).
Mechanism: Time trends confounded with allocation changes; biased effect estimate.
Symptom: Effect estimate sensitive to time-trend adjustment.
Fix: Pre-specify time-trend covariates in analysis; use proper analysis weights; cite Robertson 2023.

Schoenfeld formula under immunotherapy delayed effect

Trigger: Sample size calculated via Schoenfeld 1981 assuming PH.
Mechanism: Delayed effect violates PH; events under-estimated by 20-50%.
Symptom: Trial under-powered; observed events insufficient.
Fix: Lakatos 1988 or simulation under expected HR(t); cite Lin 2020 NPH Working Group.

IDMC firewall failure in unblinded SSR (IDMC = Independent Data Monitoring Committee; the regulatory-standard term)

Trigger: Interim effect estimate leaks beyond IDMC.
Mechanism: Sponsor inference from increase decision reveals direction of interim effect.
Symptom: Regulator audit reveals unblinding.
Fix: Strict firewall SOP; only "increase / no increase" communicated to sponsor; cite ICH E20.

Adaptive enrichment selection bias

Trigger: Enriched population effect reported without bias correction.
Mechanism: Selection on subgroup with promising interim effect inflates estimate.
Symptom: Independent replication on enriched subgroup gives smaller effect.
Fix: Bias-correction via simulation or hierarchical Bayesian; cite Robertson 2023 for bias-adjusted estimation.

RAR in 2-arm confirmatory

Trigger: RAR applied to confirmatory 2-arm trial.
Mechanism: Hey-Kimmelman 2015 critique — drift bias, consent confusion, marginal benefit.
Symptom: Reviewer rejects RAR as inappropriate for setting.
Fix: Group-sequential with futility/efficacy boundaries instead; cite Robertson 2023 consensus.

CRM skeleton mis-calibration

Trigger: CRM applied with default skeleton without simulation.
Mechanism: Skeleton dictates target dose; mis-calibration biases MTD.
Symptom: MTD selection differs systematically from clinical expectation.
Fix: Calibrate skeleton via Lee-Cheung 2009 indifference-interval method; or switch to BOIN.

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback

References

Babb J, Rogatko A, Zacks S. 1998. Cancer Phase I clinical trials: efficient dose escalation with overdose control. Stat Med 17:1103-1120.
Bauer P, Köhne K. 1994. Evaluation of experiments with adaptive interim analyses. Biometrics 50:1029-1041.
Berry DA. 2015. Commentary on Hey & Kimmelman. Clin Trials 12:107-109.
Cui L, Hung HMJ, Wang SJ. 1999. Modification of sample size in group sequential clinical trials. Biometrics 55:853-857.
FDA. 2019. Adaptive Designs for Clinical Trials of Drugs and Biologics. Final Guidance.
FDA. 2022. Master Protocols: Efficient Clinical Trial Design Strategies to Expedite Development of Oncology Drugs and Biologics. Final Guidance, March 2022.
FDA. 2026. Use of Bayesian Methodology in Clinical Trials. Draft Guidance, January 2026.
Friede T, Kieser M. 2006. Sample size recalculation in internal pilot study designs. Biom J 48:537-555.
Hey SP, Kimmelman J. 2015. Are outcome-adaptive allocation trials ethical? Clin Trials 12:102-106.
Jennison C, Turnbull BW. 2015. Adaptive sample size modification in clinical trials: start small then ask for more? Stat Med 34(29):3793-3810.
Lan KKG, DeMets DL. 1983. Discrete sequential boundaries for clinical trials. Biometrika 70:659-663.
Liu S, Yuan Y. 2015. Bayesian optimal interval designs for phase I clinical trials. JRSS-C 64:507-523.
Mehta CR, Pocock SJ. 2011. Adaptive increase in sample size when interim results are promising. Stat Med 30:3267-3284.
Müller HH, Schäfer H. 2001. Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches. Biometrics 57:886-891.
O'Brien PC, Fleming TR. 1979. A multiple testing procedure for clinical trials. Biometrics 35:549-556.
O'Quigley J, Pepe M, Fisher L. 1990. Continual reassessment method: a practical design for phase 1 clinical trials in cancer. Biometrics 46:33-48.
Pocock SJ. 1977. Group sequential methods in the design and analysis of clinical trials. Biometrika 64:191-199.
Robertson DS, Lee KM, López-Kolkovska BC, Villar SS. 2023. Response-adaptive randomization in clinical trials: from myths to practical considerations. Stat Sci 38:185-208.
Wassmer G, Brannath W. 2016. Group Sequential and Confirmatory Adaptive Designs in Clinical Trials. Springer.

Related Skills

clinical-biostatistics/power-and-sample-size - Sample size for adaptive designs
clinical-biostatistics/multiplicity-graphical - Closed testing in adaptive contexts
clinical-biostatistics/bayesian-trials - Bayesian platform trials, BOIN/CRM/EWOC
clinical-biostatistics/trial-reporting - Reporting adaptive trial results per CONSORT 2025
clinical-biostatistics/survival-analysis - Adaptive designs for TTE endpoints
experimental-design/sample-size - General sample-size methods

Adoption

GPTomics/bio-clinical-biostatistics-adaptive-designs

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Adaptive Clinical Trial Designs

Regulatory Status -- The 2024-2026 Landscape

Algorithmic Taxonomy

Decision Tree by Scenario

Group-Sequential Designs

O'Brien-Fleming -- the regulatory default

Pocock -- constant nominal

Lan-DeMets spending function -- the modern flexibility

Sample-size for group-sequential

Sample-Size Re-Estimation

Blinded SSR (Friede-Kieser 2006)

Unblinded SSR (Cui-Hung-Wang 1999)

Mehta-Pocock Promising Zone (2011)

Combination Tests and CRP Principle

Adaptive Enrichment

Response-Adaptive Randomisation -- The Ethics Fight

Bayesian Platform Trials

Drop-the-loser vs promising-the-winner

Phase I Dose-Finding -- BOIN, mTPI, CRM

Reconciliation: When Methods Disagree

Per-Method Failure Modes

Unblinded SSR with naive sample increase

Mehta-Pocock promising-zone "stealth alpha"

RAR drift bias

Schoenfeld formula under immunotherapy delayed effect

IDMC firewall failure in unblinded SSR (IDMC = Independent Data Monitoring Committee; the regulatory-standard term)

Adaptive enrichment selection bias

RAR in 2-arm confirmatory

CRM skeleton mis-calibration

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-clinical-biostatistics-adaptive-designs

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Adaptive Clinical Trial Designs

Regulatory Status -- The 2024-2026 Landscape

Algorithmic Taxonomy

Decision Tree by Scenario

Group-Sequential Designs

O'Brien-Fleming -- the regulatory default

Pocock -- constant nominal

Lan-DeMets spending function -- the modern flexibility

Sample-size for group-sequential

Sample-Size Re-Estimation

Blinded SSR (Friede-Kieser 2006)

Unblinded SSR (Cui-Hung-Wang 1999)

Mehta-Pocock Promising Zone (2011)

Combination Tests and CRP Principle

Adaptive Enrichment

Response-Adaptive Randomisation -- The Ethics Fight

Bayesian Platform Trials

Drop-the-loser vs promising-the-winner

Phase I Dose-Finding -- BOIN, mTPI, CRM

Reconciliation: When Methods Disagree

Per-Method Failure Modes

Unblinded SSR with naive sample increase

Mehta-Pocock promising-zone "stealth alpha"

RAR drift bias

Schoenfeld formula under immunotherapy delayed effect

IDMC firewall failure in unblinded SSR (IDMC = Independent Data Monitoring Committee; the regulatory-standard term)

Adaptive enrichment selection bias

RAR in 2-arm confirmatory

CRM skeleton mis-calibration

Quantitative Thresholds