clinical-biostatistics/adaptive-designs/SKILL.md
Designs adaptive clinical trials including group-sequential (O'Brien-Fleming, Pocock, Lan-DeMets spending), sample-size re-estimation (blinded Friede-Kieser, unblinded Cui-Hung-Wang, Mehta-Pocock promising zone), seamless Phase 2/3 with treatment-arm selection, population enrichment, and response-adaptive randomisation. Covers FDA 2019 Final Adaptive Designs Guidance, FDA 2022 Master Protocols, and ICH E20 Step 2b/3 draft (June 2025, NOT final). Use when planning interim analyses, sample-size re-estimation, or master/platform-trial designs.
npx skillsauth add GPTomics/bioSkills bio-clinical-biostatistics-adaptive-designsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: R rpact 4.2+ (Wassmer/Brannath), gsDesign 3.6+ and gsDesign2 1.1+ (Anderson/Merck), adaptr, simtrial. Commercial: East/EastHorizon (Cytel), ADDPLAN (ICON), FACTS (Berry Consultants).
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_nameIf code throws an error, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Design an adaptive trial" -> Pre-specify a design with one or more interim adaptations (early stopping, sample-size re-estimation, treatment selection, population enrichment, randomisation ratio changes) that strongly controls Type-I error at the trial-wide level via combination tests or the Conditional Rejection Probability principle.
FDA 2019 Final Adaptive Designs Guidance (Federal Register 2019-25986, Dec 2 2019) finalised the 2010 and 2018 drafts. Recognises 5 design types: group-sequential, blinded SSR, unblinded SSR, adaptive enrichment, adaptive randomisation.
FDA 2022 Final Master Protocols Guidance (March 2022, NOT 2018 — common citation error): basket (one drug, many diseases), umbrella (multiple drugs, one disease), platform (perpetual, drugs enter/exit).
ICH E20 Adaptive Clinical Trials: Step 2b draft June 25 2025; Step 3 public consultation (EU deadline Nov 30 2025; FDA Federal Register Sept 30 2025); Step 4 final targeted late 2026. As of May 2026, ICH E20 is NOT final. The EFPIA/PhRMA position paper preceded the formal ICH work; Berry Consultants public comment letter is one of the more important submissions.
FDA CDER Bayesian Methodology Draft (Jan 2026) (FDA-2025-D-3217): first-ever drug-side Bayesian guidance; permits Bayesian primary inference in pivotals with simulation-based Type-I error calibration.
Project Optimus (FDA OCE, 2021-2024): rewrites Phase I/II oncology by requiring randomised dose comparison before registration, replacing MTD-and-go. Made BOIN, mTPI-2, and multi-arm dose-finding the default.
| Design type | Adaptation | Type-I preservation | Software | Strength | Fails when | |-------------|-----------|---------------------|----------|----------|------------| | Group-sequential (O'Brien-Fleming) | Early stopping for efficacy/futility | Boundary calculation; very conservative early, near-nominal at end | rpact, gsDesign | FDA's preferred adaptive design | More complex SAP; IDMC firewall essential | | Group-sequential (Pocock) | Early stopping | Constant nominal alpha at each look | rpact, gsDesign | Easy early stopping | Large penalty at final analysis | | Wang-Tsiatis power family | Early stopping | Parameterised by Delta | rpact | Tunable conservatism | Δ choice matters | | Lan-DeMets spending function | Early stopping (flexible timing) | Alpha-spending function | rpact, gsDesign | Operational flexibility; analyses don't need pre-specified number | FDA's de facto preferred framework | | Blinded SSR (Friede-Kieser 2006) | Re-estimate variance/event-rate; recompute n | No Type-I inflation; agency-uncontroversial | rpact | EMA/FDA endorsed | Variance estimate must be blinded | | Unblinded SSR (Cui-Hung-Wang 1999) | Increase n based on interim effect estimate | Requires CHW weights for control; or Mehta-Pocock promising zone | rpact | Recovers power if interim promising | IDMC firewall must be perfect; Jennison-Turnbull 2015 critique | | Mehta-Pocock promising zone (2011) | Increase n if conditional power in (0.3, 0.8) | Calibrated so Type-I inflation negligible (~0.001) | rpact | Operational simplicity | "Stealth alpha inflation" critique (Jennison 2015) | | Bauer-Köhne 1994 combination | Combine stagewise p-values via Fisher product | Any pre-specified design modification | rpact | Most flexible; theoretical foundation | Power loss vs designed group-sequential | | Müller-Schäfer 2001 CRP principle | Preserve null conditional rejection probability | Any adaptation at any time | rpact | Modern theoretical bedrock | Implementation complexity | | Adaptive enrichment | Drop sub-populations failing futility | Closed-test stage-wise | rpact, adaptr | Recovers power on responders | Selection bias on enriched population | | Response-adaptive randomisation | Update allocation probabilities | Stratification + time-trend covariates required | adaptr, FACTS | Patient-welfare; learn-and-confirm | Drift bias, estimator bias; controversial (Hey-Kimmelman 2015 ethics) | | Bayesian platform (I-SPY 2 style) | RAR + biomarker stratification + graduation criterion | Frequentist OCs via simulation | FACTS, custom Stan/JAGS | Modern oncology adaptive | Operational complexity; requires IDMC sophistication |
Postdoc reading list:
| Scenario | Recommended approach | Why | |----------|---------------------|-----| | Confirmatory trial wanting interim early stopping | Group-sequential with O'Brien-Fleming boundaries via gsDesign | FDA-preferred; near-nominal final alpha | | Group-sequential with flexible look timing | Lan-DeMets spending function | Operational flexibility; FDA de facto preferred | | Phase 3 with uncertain nuisance parameter (variance, event rate) | Blinded SSR (Friede-Kieser) | No Type-I inflation; agency-uncontroversial | | Phase 3 wanting to increase n if interim shows promise | Mehta-Pocock promising zone with CHW weights | Recovers power; calibrated Type-I | | Seamless Phase 2/3 with arm selection | Bauer-Köhne combination test + closed testing | Most flexible; cite Müller-Schäfer CRP | | Adaptive enrichment (drop subpopulation) | Adaptive enrichment with closed-test stage-wise | Recovers power on responders | | Multi-arm oncology platform | Bayesian platform with RAR (I-SPY 2 model) | Patient-welfare argument strong for multi-arm | | 2-arm phase 3 oncology with potential RAR | Avoid RAR; group-sequential preferred | Hey-Kimmelman 2015 ethics + drift bias | | Continuous endpoint, treatment discontinuation, follow-up data available | Hybrid: J2R imputation for treatment-discontinuation ICEs, MMRM-MAR for other missingness | Aprocitentan PRECISION precedent (2024); FDA de facto standard 2024-2025 for treatment-policy estimands | | Phase 1 dose-finding | BOIN (FDA Fit-for-Purpose qualified 2021) | Transparent, tabulated decisions; no bedside Bayesian software | | Phase 1b/2 dose-optimisation (Project Optimus) | Multi-arm BOIN-12 or multi-dose randomised | FDA Aug 2024 final dose-optimisation guidance | | Basket trial (one drug, multiple diseases) | EXNEX or robust MAP via RBesT | Borrows across baskets while permitting one to detach | | Umbrella trial (one disease, multiple drugs) | Bayesian platform with shared control | FDA Master Protocols 2022 | | Pediatric extrapolation borrowing from adults | Power prior with discount γ in 0.3-0.6 | FDA Bayesian Jan 2026 draft endorses |
library(gsDesign)
# OBF boundaries; 3 interim looks at 33%, 67%, 100% information
design <- gsDesign(
k = 4, # total analyses including final
test.type = 1, # 1-sided efficacy
alpha = 0.025,
beta = 0.10, # power = 0.90
sfu = sfLDOF, # Lan-DeMets approximation of OBF
timing = c(0.25, 0.50, 0.75, 1.0)
)
print(design)
plot(design)
OBF is very conservative at early looks (nominal alpha approximately 0.0001 at 25% info) and near-nominal at final analysis (~0.024 of 0.025). Preferred by FDA because the final-analysis penalty is small.
Constant nominal alpha at each look. Easy early stopping but large final-analysis penalty (~0.018 of 0.025 with k=4). Rarely used in confirmatory.
# Lan-DeMets OBF-like spending function (sfLDOF)
# Allows analysis timing to differ from pre-specified
design_flex <- gsDesign(
k = 3,
sfu = sfLDOF, # OBF-like spending
alpha = 0.025,
beta = 0.10
)
# Actual analyses can occur at different information fractions
# Spending function returns alpha to spend at each look based on actual timing
The flexibility: sponsor can perform analyses at different information fractions than originally planned. FDA's de facto preferred framework.
# Time-to-event group-sequential
library(gsDesign)
n_gs <- gsSurv(
k = 3,
test.type = 2, # 2-sided
alpha = 0.025,
beta = 0.10,
sfu = sfLDOF,
lambdaC = 0.04, # control hazard per month
hr = 0.70, # treatment HR
eta = 0.005, # dropout hazard
T = 24, # total study duration
minfup = 12 # minimum follow-up
)
print(n_gs)
Re-estimate nuisance parameter (variance σ² for continuous, control event rate p_0 for binary, overall event rate for survival) from blinded interim data. No Type-I error inflation when test statistic ignores the SSR.
library(rpact)
# Blinded SSR for continuous outcome
design_blinded_ssr <- getDesignGroupSequential(
kMax = 2,
alpha = 0.025,
beta = 0.20,
sided = 1,
informationRates = c(0.5, 1)
)
# At interim, re-estimate variance and recompute n
# (manual implementation; rpact has built-in support via getDesignInverseNormal for unblinded)
EMA Reflection Paper 2007 and FDA 2019 explicitly endorse blinded SSR. Uncontroversial.
Interim effect estimate triggers sample-size change. Type-I inflation if naive: Cui-Hung-Wang showed 8% Type-I vs 2.5% target.
The Cui-Hung-Wang weighted test uses pre-specified weights from the original design:
Z_weighted = w_1 * Z_1 + w_2 * Z_2_residual
where w_1, w_2 are the pre-specified weights (based on original n_1, n_2) and Z_2_residual is the test statistic on the data after the interim. Pre-specified weights preserve alpha even if the actual n at stage 2 differs.
library(rpact)
design_unblinded_ssr <- getDesignInverseNormal(
kMax = 2,
alpha = 0.025,
beta = 0.20,
sided = 1,
informationRates = c(0.5, 1),
typeOfDesign = 'WT', # Wang-Tsiatis power family
deltaWT = 0.25
)
# Use inverse normal combination for adaptive SSR
analysis_result <- getAnalysisResults(
design_unblinded_ssr,
dataInput = getDataMeans(...)
)
At interim, compute conditional power (CP) given observed effect:
# rpact implementation
# Sample size recalculation in promising zone
n_increased <- getSampleSizeMeans(
design_unblinded_ssr,
alternative = 5, # detect mean diff of 5
stDev = 12,
groups = 2
)
The mathematical sleight: promising zone is constructed so unconditional Type-I error inflation is negligible (~0.001) even WITHOUT CHW weighting. Jennison-Turnbull 2015 critique: stealth alpha inflation in unpublished simulation assumptions; inefficient relative to CHW-weighted GSD. Mehta defends on operational grounds.
Hsiao et al 2020 Trials 21:1003 is the systematic review.
Bauer-Köhne 1994 Biometrics 50:1029: combine stagewise p-values via Fisher's product test. Permits design modifications post-interim while controlling Type-I error.
Müller-Schäfer 2001 Biometrics 57:886: Conditional Rejection Probability (CRP) principle — preserve the null conditional rejection probability at every adaptation, and unconditional Type-I is preserved. The theoretical bedrock of all post-2001 confirmatory adaptive designs.
Müller-Schäfer 2004 Stat Med 23:2497 extended to ANY design change at ANY time.
# rpact natively supports combination tests
design_comb <- getDesignFisher(
kMax = 3,
alpha = 0.025,
sided = 1
)
# Or inverse normal combination
design_inv_norm <- getDesignInverseNormal(
kMax = 3,
alpha = 0.025,
informationRates = c(0.33, 0.67, 1.0)
)
Drop sub-populations failing futility; re-power on responders. Closed-test stage-wise to control familywise error across full and enriched populations.
# rpact: enrichment design via getDesignEnrichmentSubgroup
# Standard implementation requires explicit definition of full population (F)
# and enriched population (S)
Postdoc concern: selection bias on the enriched population — the observed treatment effect on the enriched subgroup is biased upward by selection. Bias-correction via simulation or hierarchical Bayesian.
Hey & Kimmelman 2015 Clin Trials 12:102 "Are outcome-adaptive allocation trials ethical?" Argued RAR's purported ethical advantage (equipoise, sub-optimal exposure minimisation) fails in two-arm and early-phase settings because:
Counter-arguments:
Consensus position (2020s; ICH E20): RAR appropriate when (a) multi-arm (>=3 arms), (b) rare disease / limited pool, (c) strong PoC of differential biomarker response, (d) robust drift-bias adjustment and pre-specified analysis weights. Inappropriate for confirmatory 2-arm trials.
Robertson, Lee, López-Kolkovska, Villar 2023 Stat Sci 38:185 ("Response-adaptive randomization: from myths to practical considerations") is the canonical modern review settling the debate.
I-SPY 2 (Barker-Sigman 2009 Clin Pharmacol Ther; Park-Liu 2016 NEJM 375:11): neoadjuvant breast cancer; 10 biomarker-defined subtypes × multiple arms; Bayesian RAR; graduation criterion = posterior predictive probability of success in 300-patient Phase 3 ≥ 85%. Berry Consultants designed the engine. Multiple drugs graduated (neratinib, veliparib, pembrolizumab).
GBM AGILE (Alexander 2018; published readouts beginning 2024): glioblastoma; response-adaptive Bayesian; first global registrational platform in neuro-oncology. Regorafenib readout 2025 JCO JCO-25-01137.
REMAP-CAP (Angus 2020 JAMA): severe pneumonia, repurposed for COVID-19 in 2020; Bayesian factorial multi-domain design — multiple intervention domains tested simultaneously and combinatorially. Generated corticosteroid signal in COVID independently of RECOVERY.
| Design | Citation | Idea | Where it wins | |--------|----------|------|---------------| | CRM | O'Quigley-Pepe-Fisher 1990 | Single-parameter logistic/power model; updates posterior MTD probability after each cohort | Statistically efficient; skeleton calibration needed | | EWOC | Babb-Rogatko-Zacks 1998 | CRM-like with explicit overdose-control constraint (P(dose > MTD) <= 0.25) | Safer than CRM in small trials | | mTPI | Ji et al 2010 Clin Trials 7:653 | Beta-binomial; UPM decision rule on under/proper/over-dosing intervals | Pre-tabulated decisions; documented over-shoot bias | | mTPI-2 / Keyboard | Guo-Wang-Chen-Ji 2017 | Fixes mTPI Ockham bias by equal-width intervals | Default mTPI replacement | | BOIN | Liu-Yuan 2015 J R Stat Soc C 64:507 | Pre-tabulated escalation interval bounds optimised to minimise incorrect-decision probability | FDA Fit-for-Purpose qualified Dec 2021; near-CRM with no bedside software |
Why FDA prefers BOIN operationally: qualified as Fit-for-Purpose under FDA Drug Development Tools program (review document FDA-2020-X-XXXX, posted 2021). Investigator uses pre-printed escalation table — no real-time Bayesian software at the bedside.
R packages: BOIN, dfcrm (Cheung — author of CRM textbook), trialr (Brock — includes EffTox), escalation (Brock — unified framework).
| Pattern | Likely cause | Action | |---------|--------------|--------| | Blinded SSR n vs unblinded SSR n differ substantially | Unblinded SSR uses interim effect estimate; blinded uses nuisance parameter only | Blinded is Type-I-clean; unblinded requires CHW weighting; pre-specify the approach in SAP | | Group-sequential rejects at interim; Cui-Hung-Wang weighted final test does not | Naive interim rejection used original test statistic; CHW weights downweight late data | Pre-specify boundary and weights; do NOT switch tests mid-stream | | Mehta-Pocock promising-zone vs CHW-weighted GSD give different n increases | Promising zone calibrated for Type-I (~0.001 inflation); CHW more efficient under known effect | Jennison-Turnbull 2015 critique: promising zone "stealth alpha"; pre-specify with simulation OCs | | Adaptive enrichment selects subgroup at interim; replication shows smaller effect | Selection bias on enriched population (Sun 2010 winner's curse) | Bias-correction via conditional MLE or hierarchical Bayesian; cite Robertson 2023 | | RAR posterior allocation favours active in 2-arm trial; randomisation drift bias suspected | Time trends confounded with allocation changes | Pre-specify time-trend covariates in analysis; use proper analysis weights; cite Robertson 2023 RAR consensus (RAR INAPPROPRIATE for 2-arm confirmatory) | | BOIN vs CRM choose different MTD on same data | CRM uses model; BOIN uses tabulated boundaries; differ when skeleton mis-calibrated | BOIN Fit-for-Purpose qualified (Dec 2021); CRM more efficient under correct skeleton; report OCs over both | | I-SPY 2 graduation criterion met but Phase 3 replication fails | Selection bias on graduated arm; PP threshold not bias-corrected | Apply conditional MLE; cite Robertson 2023; report both raw and bias-corrected estimates | | Müller-Schäfer CRP preserved but ad hoc rule appears Type-I-inflated in simulation | Implementation deviation from formal CRP | Verify CRP equation precisely; report OCs via simulation; cite Müller-Schäfer 2001 |
| Threshold | Source | Rationale | |-----------|--------|-----------| | FDA Fit-for-Purpose BOIN qualification (Dec 2021) | FDA Drug Development Tools program | First dose-finding design with formal FDA endorsement | | Mehta-Pocock promising zone CP 30-80% | Mehta-Pocock 2011 | Mathematical calibration for Type-I preservation | | RAR appropriate >= 3 arms | Robertson 2023 consensus | Multi-arm patient-welfare argument | | OBF nominal alpha ~0.024 at final / 0.025 | gsDesign | Small final penalty preferred by FDA | | Schoenfeld under non-PH under-estimates 20-50% | Lin 2020 NPH WG | Use Lakatos or simulation | | I-SPY 2 graduation: PP success in Phase 3 >= 85% | Barker 2009 | Bayesian platform standard | | Power prior discount γ 0.3-0.6 for pediatric extrapolation | FDA Bayesian Jan 2026 draft | Partial borrowing default |
| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Unblinded SSR with naive sample increase | No CHW weighting | Pre-specify CHW weights; cite Cui-Hung-Wang 1999 | | Mehta-Pocock without simulation OCs | Stealth alpha inflation | Report simulation OCs; cite Jennison 2015 | | RAR in 2-arm confirmatory | Misapplication | Group-sequential instead; cite Robertson 2023 | | Schoenfeld for immuno-oncology | PH assumption violated | Lakatos or simulation; cite Lin 2020 | | Adaptive enrichment effect reported uncorrected | Selection bias | Bias-correction; cite Sun 2010 | | CRM with default skeleton | Mis-calibration | Calibrate via Lee-Cheung 2009 or switch to BOIN | | ICH E20 cited as "finalised April 2024" | Confusion with EFPIA position paper | ICH E20 is Step 2b/3 draft (June 2025); not final | | FDA Master Protocols "2018" | 2018 was draft | March 2022 was the final | | Bauer-Köhne combination test as "old-fashioned" | Misunderstanding | Foundational; cited in modern combination-test implementations | | Stop-for-efficacy at first interim with OBF | OBF nominal alpha ~0.0001 at 25% info | Trial must show very strong evidence to stop early; expected |
| Pushback | Response | |----------|----------| | "How is Type-I error controlled?" | Closed testing via Müller-Schäfer CRP principle; specific implementation is inverse normal combination test in rpact | | "Why these boundaries?" | OBF via Lan-DeMets sfLDOF spending function; preserves final-analysis power; pre-specified in SAP | | "Pre-specification of SSR rule?" | Promising zone CP in (0.3, 0.8) triggers increase to n_max via CHW-weighted statistic; pre-specified n_max in SAP | | "IDMC firewall?" | IDMC receives interim effect estimate; sponsor receives only "increase / no increase" decision; SOP documented; pre-specified | | "RAR ethics?" | Multi-arm (4 arms) setting; Berry 2015 consensus that patient-welfare argument valid; drift-bias adjustment in primary analysis | | "Promising zone vs CHW-weighted GSD?" | Operational simplicity preferred; OCs from simulation confirm Type-I ~5%; supportive Cui-Hung-Wang analysis | | "Adaptive enrichment bias?" | Bias-correction via simulation; conditional MLE for enriched-population effect; cite Sun 2010 | | "Phase 1 BOIN vs CRM?" | BOIN Fit-for-Purpose qualified by FDA Dec 2021; tabulated decisions; no bedside Bayesian software |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.