clinical-biostatistics/bayesian-trials/SKILL.md
Designs Bayesian clinical trials including Phase I dose-finding (BOIN, CRM, EWOC, mTPI-2), meta-analytic-predictive (MAP) priors with robust mixtures for external data borrowing, EXNEX for basket trials, hierarchical models for safety AE (Berry-Berry), Bayesian platform trials (I-SPY 2, GBM AGILE, REMAP-CAP), and posterior probability stopping rules. Covers FDA Bayesian Devices Guidance (2010), FDA Bayesian Methodology in Drugs Draft (January 2026), BOIN Fit-for-Purpose qualification (December 2021), and Project Optimus dose-optimisation. Use when designing dose-finding studies, platform trials, or sensitivity analyses with informative priors.
npx skillsauth add GPTomics/bioSkills bio-clinical-biostatistics-bayesian-trialsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: R RBesT 1.7+ (Roche), OncoBayes2 0.8+ (Novartis), BOIN 2.7+, dfcrm 0.2-2+, escalation 0.1+, trialr 0.1.6+, bayesDP, psborrow2 (FDA-supported), rstan / cmdstanr, brms. Legacy: JAGS, WinBUGS.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_nameIf code throws an error, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Design a Bayesian clinical trial" -> Specify a prior, likelihood, and decision rule with frequentist operating characteristics demonstrated via simulation; for dose-finding use FDA-endorsed BOIN; for borrowing use robust MAP priors; for adaptive platforms use posterior probability of efficacy stopping with simulation-calibrated thresholds.
FDA 2010 CDRH Bayesian Devices Guidance (Feb 5 2010): the only Bayesian-specific FDA guidance until January 2026. Why devices were ahead: CDRH's PMA pathway permits one pivotal trial and accepts borrowing from prior/OUS data more readily than CDER. Example: Edwards SAPIEN (PARTNER B, PMA P100041, Nov 2011) approved using Bayesian propensity-matched comparison to registry of standard-of-care patients.
FDA January 2026 CDER Bayesian Methodology Draft (FDA-2025-D-3217; comment period closed March 13 2026): first-ever drug-side Bayesian guidance. Explicit that Bayesian primary inference in pivotals is acceptable provided:
Project Optimus (FDA OCE, launched 2021; final dose-optimisation guidance Aug 2024): rewrites Phase I/II oncology by requiring randomised dose comparison before registration. Has made multi-arm randomised dose-finding (BOIN-12, gBOIN-ET) much more important than classic MTD-finding.
FDA BOIN Fit-for-Purpose qualification (December 2021): first formal FDA endorsement of a specific dose-finding design under the Drug Development Tools program.
ICH E20 (Step 2b/3 draft June 2025; NOT final) treats Bayesian as a legitimate analytic framework but requires demonstration of acceptable frequentist operating characteristics (Type-I, power) over a pre-specified parameter space.
| Method | Use case | Software | Strength | Fails when |
|--------|----------|----------|----------|------------|
| BOIN | Phase I MTD | R BOIN (Yuan) | FDA Fit-for-Purpose 2021; pre-tabulated decisions; no bedside Bayesian software | Statistically less efficient than CRM under correct skeleton |
| mTPI-2 / Keyboard | Phase I MTD | R escalation; R Keyboard | Default replacement for mTPI; fixes Ockham bias | Tabulated; transparency |
| CRM | Phase I MTD | R dfcrm, trialr | Most efficient under correct skeleton | Skeleton mis-specification biases MTD |
| EWOC | Phase I MTD | R ewoc, dfcrm | Explicit overdose-control constraint (P(dose>MTD) <= 0.25) | More conservative than CRM in small trials |
| BOIN-12 / gBOIN-ET | Phase 1b dose-optimisation (Project Optimus) | R BOIN extensions | Multi-arm randomised dose comparison | Requires explicit efficacy + toxicity scoring |
| MAP prior | Borrowing from historical control arms | R RBesT::gMAP | Industry-standard borrowing | Sample-size of MAP prior must be calibrated (Schmidli 2014) |
| Robust MAP | Borrowing with prior-data conflict protection | R RBesT::robustify | Adds vague component (weight 0.1-0.3) to detach if conflict | Mixture weight choice affects borrowing |
| EXNEX | Basket trial across rare-disease strata | R bhmbasket; OncoBayes2 | Avoids HM catastrophic borrowing; mixture 0.5/0.5 default (Neuenschwander 2016) | Default weights may over-borrow |
| Dixon-Simon shrinkage | Subgroup analysis | Custom Stan/brms | Honest about no qualitative interaction prior | Prior on tau drives results |
| Berry-Berry 3-level hierarchical | AE multiplicity (AE within PT within SOC) | R c212; JMP Clinical | Tames safety multiplicity | Spike-and-slab tuning matters |
| Posterior probability stopping | Adaptive sequential | Custom; FACTS commercial | Bayesian likelihood-principle compatible | Threshold calibration via simulation |
| Predictive probability of success | End-of-Phase-2 go/no-go | Custom Stan | Decision-theoretic; integrates over posterior | Requires Phase 3 design specified |
| Spiegelhalter skeptical/enthusiastic prior | Sensitivity for regulatory pivotals | Custom | Frames regulator-vs-sponsor evidence | Prior elicitation effort |
| Power prior | Pediatric extrapolation borrowing from adults | R bayesDP, psborrow2 | Partial borrowing with discount gamma | gamma choice (Jan 2026 FDA draft: 0.3-0.6) |
Postdoc reading list:
| Scenario | Recommended approach | Why | |----------|---------------------|-----| | Phase 1 oncology, single-agent MTD | BOIN with target DLT 30%; cohort size 3 | FDA Fit-for-Purpose 2021; tabulated escalation | | Phase 1 oncology, combination (2 agents) | BLRM with EXNEX in OncoBayes2 | Multi-dimensional dose; industry standard at Novartis/Roche | | Phase 1b/2 dose-optimisation (Project Optimus) | BOIN-12 or gBOIN-ET; randomised 2-dose comparison | Aug 2024 FDA dose-optimisation guidance | | Phase 3 with historical control arms available | Robust MAP via RBesT; gMAP() + robustify() | Industry standard borrowing with prior-data conflict protection | | Basket trial across rare-disease strata | EXNEX (0.5 EX / 0.5 NEX mixture) via OncoBayes2 | Avoids HM catastrophic borrowing | | Pediatric extrapolation from adult data | Power prior with discount gamma 0.3-0.6 | working convention; the FDA Bayesian Jan 2026 draft does not prescribe a specific gamma range -- check the draft for the current language before quoting | | Phase 3 trial with single arm + RWE comparator | Propensity-score-integrated power prior via psborrow2 | FDA-supported package for external controls | | Adaptive trial wanting posterior-probability stopping | Custom Stan model + simulation-calibrated threshold | Bayesian likelihood-principle compatible; no penalty for repeated looks | | End-of-Phase-2 go/no-go | Predictive probability of success in Phase 3 | Integrates posterior over Phase 3 design | | Hypothesis-generating safety AE analysis (>100 PTs) | Berry-Berry 3-level hierarchical (AE within PT within SOC) | Tames multiplicity; spike-and-slab on log OR | | Subgroup analysis post-signal | Bayesian shrinkage (Dixon-Simon, RBesT) | Hemmings-Koch 2019: shrinkage for replication planning, NOT signal generation | | Regulatory pivotal sensitivity | Spiegelhalter skeptical-prior framework | Frames "evidence for regulators" vs "evidence for sponsor" |
library(BOIN)
# Generate escalation table for protocol
boundary_table <- get.boundary(
target = 0.30, # target DLT rate
ncohort = 10, # 10 cohorts -> max 30 patients with size 3
cohortsize = 3,
n.earlystop = 12, # stop early at lowest dose if 12 patients show futility
p.saf = 0.6 * 0.30, # "safe" escalation boundary
p.tox = 1.4 * 0.30 # "toxic" de-escalation boundary
)
print(boundary_table)
# Pre-printed at investigator desk; no bedside Bayesian software
# Operating characteristics simulation
oc_boin <- get.oc(
target = 0.30,
p.true = c(0.05, 0.10, 0.20, 0.30, 0.40, 0.55), # true DLT per dose
ncohort = 10,
cohortsize = 3,
ntrial = 1000
)
print(oc_boin)
# Reports: MTD selection accuracy, overdose risk, average sample size
BOIN's transparency-over-modelling philosophy: unlike CRM, BOIN does NOT use information from intermediate dose levels in a model-based way. The Jin-Yuan vs Neuenschwander/Mozgunov debate (Stat Med, Pharm Stat, since ~2018): BLRM/CRM are statistically more efficient under correct model; BOIN is operationally simpler and more transparent.
library(dfcrm)
prior_skeleton <- getprior(halfwidth = 0.05, target = 0.30, nu = 3, nlevel = 6)
# Lee-Cheung 2009 indifference-interval calibration
crm_sim <- crmsim(
PI = c(0.05, 0.10, 0.20, 0.30, 0.40, 0.55),
prior = prior_skeleton,
target = 0.30,
n = 30,
x0 = 1, # starting dose
nsim = 1000,
method = 'bayes',
model = 'logistic'
)
print(crm_sim)
Skeleton mis-specification is the canonical CRM failure mode. Lee-Cheung 2009 indifference-interval method gives a systematic calibration approach.
# Babb-Rogatko-Zacks 1998: explicit P(dose > MTD) <= alpha (default 0.25)
# Implementation in dfcrm::ewoc; or `ewoc` package
Schmidli et al 2014 Biometrics 70:1023: Meta-Analytic-Predictive prior. Fit random-effects meta-analysis of historical control arms; derive predictive distribution for new control arm; use as informative prior. Effective sample size from history typically 20-80% of new control arm.
library(RBesT)
# Historical control data (4 prior studies)
historical_data <- data.frame(
study = c('s1', 's2', 's3', 's4'),
n = c(40, 35, 50, 45),
r = c(8, 6, 12, 9) # responders
)
# Fit MAP via gMAP (Stan-based random-effects meta-analysis)
map_prior <- gMAP(
cbind(r, n - r) ~ 1 | study,
data = historical_data,
family = binomial,
tau.dist = 'HalfNormal',
tau.prior = 0.5, # between-study SD prior
beta.prior = cbind(0, 2) # weakly informative on logit response
)
print(map_prior)
# Approximate posterior with mixture for downstream computation
map_mix <- automixfit(map_prior, Nc = 2)
print(map_mix)
# Effective sample size
ess(map_mix)
# Robust MAP: add vague mixture component (weight 0.1-0.3) to guard against prior-data conflict
robust_map <- robustify(map_mix, weight = 0.2, mean = 0.5, n = 1)
print(robust_map)
ess(robust_map)
Robust MAP rationale: if the new data disagree with historical (prior-data conflict), the mixture down-weights the informative component automatically. Schoenfeld 2017 critique: Schmidli 2014's choice of mixture weight requires sensitivity analysis.
Neuenschwander, Wandel, Roychoudhury, Bailey 2016 Pharm Stat 15:123: Mixture of exchangeable (shared mean+variance) + non-exchangeable (per-basket independent), typically weighted 0.5/0.5. Avoids HM catastrophic borrowing when one basket truly different.
library(OncoBayes2) # Novartis-developed; canonical EXNEX implementation
# Or simplified via bhmbasket
library(bhmbasket)
# Conceptual: each basket has its own posterior, with shrinkage governed by exchangeability mixture
# Default weights 0.5 EX / 0.5 NEX
# Sensitivity over weights (0.1, 0.3, 0.5, 0.7, 0.9) is essential
Neoadjuvant breast cancer; 10 biomarker-defined subtypes × multiple arms; Bayesian RAR; graduation criterion = posterior predictive probability of success in 300-patient Phase 3 ≥ 0.85. Berry Consultants designed engine.
# Conceptual implementation requires custom Stan or FACTS (Berry Consultants commercial)
# Pseudocode:
# 1. Fit hierarchical model to platform data: response ~ arm + biomarker_subtype + arm:subtype
# 2. Posterior draws of treatment effect by subtype
# 3. For each draw, simulate Phase 3 trial: n=300, treatment vs control, observed effect
# 4. Compute proportion of draws meeting Phase 3 success criterion
# 5. If proportion >= 0.85, arm graduates
Severe pneumonia, repurposed for COVID-19; Bayesian factorial multi-domain design. Generated corticosteroid signal independently of RECOVERY.
Berry SM, Berry DA 2004 Biometrics 60:418: three-level hierarchical model for AE multiplicity (AE within MedDRA PT within SOC); spike-and-slab on the log OR. Tames the FDA-feared multiplicity in safety summaries.
library(c212) # Berry-Berry implementation
# Conceptual: each AE has log OR drawn from spike-and-slab prior
# Spike at 0 (no effect); slab as N(mu_SOC, sigma_SOC)
# SOC-level parameters from N(mu_overall, sigma_overall)
# Borrowing within SOC; shrinkage toward 0 if no evidence
# JMP Clinical also implements this for industry use
library(bayesDP)
library(psborrow2) # FDA-supported package
# Power prior: combines current data L(theta | D_current) with historical L(theta | D_hist)^gamma
# gamma in [0, 1]; gamma = 0 = no borrowing; gamma = 1 = full pooling
# Typical pediatric extrapolation: gamma = 0.3 to 0.6 per FDA Bayesian Jan 2026 draft
The 2024-2026 regulatory shift: FDA has materially expanded acceptance of external/historical/synthetic control arms in rare disease, paediatric, and accelerated-approval settings. Key documents: FDA 2018 RWE Framework (and 2024 enhancements), FDA 2023 Considerations for Use of RWE/RWD for Regulatory Decisions, EMA Reflection Paper on Use of RWE in Regulatory Decision-Making (effective 2024). Bayesian methods are the natural fit because historical data become prior information rather than concurrent control.
| Method | Borrowing mechanism | Discount control | When to use | |--------|---------------------|------------------|-------------| | Power prior (Ibrahim-Chen 2000) | Likelihood of historical data raised to power gamma | gamma in [0, 1] fixed or modelled | When historical data is single source; gamma ~ Beta in adaptive power prior | | Robust MAP (Schmidli 2014) | Meta-analytic-predictive prior + vague mixture | Mixture weight (typ 0.1-0.3) | Multiple historical control arms; standard for borrowing | | Commensurate prior (Hobbs 2011) | Conditional model on agreement parameter | Tau estimated from data | When agreement between historical and current is data-determined | | Propensity-integrated power prior | Power prior weighted by PS overlap | gamma * (PS-trimmed overlap) | RWE comparator with covariate imbalance | | Doubly robust ATT via causal inference | IPW + outcome regression | n/a | RWE comparator; identifies marginal ATT |
The psborrow2 package (Genentech / Bayer / FDA-Janssen collaboration; CRAN 2024+) is the canonical R implementation for propensity-score-integrated Bayesian Dynamic Borrowing. The skeleton below illustrates the workflow conceptually; verify exact function names and arguments against the current psborrow2 vignette before use (the package API has evolved through 2024-2026).
library(psborrow2)
# Define external and internal data
ext_data <- data.frame(usubjid = ..., trt = 0, outcome = ..., covariates = ...)
int_data <- data.frame(usubjid = ..., trt = 0 | 1, outcome = ..., covariates = ...)
# Create borrowing design
borrowing_design <- borrowing_full(
method_name = "BDB", # Bayesian Dynamic Borrowing
ext_flag_col = "ext",
tau_prior = prior_gamma(0.001, 0.001) # weakly informative on borrowing
)
# Outcome model (Cox for TTE; logistic for binary)
outcome_model <- outcome_surv_exponential(
time_var = "time",
cens_var = "cens",
baseline_prior = prior_normal(0, 100),
trt_prior = prior_normal(0, 100)
)
# Run Bayesian analysis with covariate adjustment + borrowing
result <- create_analysis_obj(
data_matrix = borrow_obj,
outcome = outcome_model,
borrowing = borrowing_design,
covariates = c("age", "ecog", "baseline_severity")
)
mcmc_result <- mcmc_sample(result, n_chains = 4, n_iter = 4000)
Spiegelhalter, Freedman, Blackburn 1986 Stat Med 5:421: the trip-wire / skeptical-prior framework. Pre-specify a skeptical prior centred at the null and an enthusiastic prior centred at the alternative; stopping requires the skeptic to be convinced (posterior under skeptical prior exceeds threshold).
Frames "evidence for regulators" vs "evidence for sponsor" in Bayesian language; still cited in modern Bayesian-trial protocols.
# Skeptical prior: N(0, sd_sk) — centred at null
# Enthusiastic prior: N(delta_alt, sd_en) — centred at clinically meaningful effect
# Decision: stop for efficacy if P(theta > 0 | skeptical posterior) > 0.975
# stop for futility if P(theta < delta_alt | enthusiastic posterior) > 0.80
rstan/cmdstanr); Docker/renv-pinned environment; include seeds + posterior diagnostics (R-hat <1.01, ESS >1000 per chain).| Threshold | Source | Rationale | |-----------|--------|-----------| | FDA BOIN Fit-for-Purpose qualification (Dec 2021) | FDA Drug Development Tools program | First formal FDA dose-finding endorsement | | Target DLT rate 30% (Phase 1 oncology) | Standard convention | Modal target across oncology Phase 1 | | MAP prior effective sample size 20-80% of new control | Schmidli 2014 | Borrowing strength typical range | | Robust MAP mixture weight 0.1-0.3 | Schmidli 2014 | Guards against prior-data conflict | | EXNEX default 0.5 EX / 0.5 NEX | Neuenschwander 2016 | Standard starting weight; sensitivity required | | I-SPY 2 graduation PP >= 0.85 | Barker 2009 | Bayesian platform standard | | Power prior gamma 0.3-0.6 for pediatric extrapolation | working convention; the FDA Bayesian Jan 2026 draft does not prescribe a specific range | Partial borrowing default | | Stan R-hat <1.01, ESS >1000 per chain | Vehtari 2021 Bayesian Analysis | Posterior convergence criteria | | EWOC overdose constraint P(dose > MTD) <= 0.25 | Babb-Rogatko-Zacks 1998 | Safety floor |
| Error / symptom | Cause | Solution | |-----------------|-------|----------| | CRM with arbitrary skeleton | No calibration | Lee-Cheung 2009 indifference-interval; or BOIN | | MAP without prior-data conflict check | Posterior dominated by prior | Robust MAP; PP-check; sensitivity over mixture weight | | EXNEX with single weight scheme | No sensitivity | Weights 0.1, 0.3, 0.5, 0.7, 0.9; report range | | Posterior probability stopping without Type-I sim | Regulatory rejection | Simulate under null; calibrate threshold | | I-SPY 2 graduated arm reported uncorrected | Selection bias | Conditional MLE; cite Robertson 2023 | | Bayesian shrinkage for signal discovery | Hemmings-Koch critique | Shrinkage for replication only | | Power prior gamma = 1 | Full pooling | Discount 0.3-0.6 per FDA 2026 draft | | WinBUGS without containerisation | Reproducibility | Stan + Docker/renv-pinned | | BOIN vs CRM comparison without simulation OCs | Apples-to-oranges | Compare OCs over same true DLT rates | | FDA cited for Bayesian drugs guidance pre-2026 | Confusion | FDA 2010 is DEVICES; FDA 2026 (draft) is drugs |
| Pushback | Response | |----------|----------| | "Type-I error control?" | Simulation under null demonstrates frequentist Type-I = nominal at threshold chosen; documented in SAP appendix | | "Prior justification?" | MAP from historical control arms via gMAP; robust mixture weight 0.2 for prior-data conflict; sensitivity over prior provided | | "Why BOIN over CRM?" | BOIN Fit-for-Purpose qualified Dec 2021; pre-tabulated escalation; no bedside Bayesian software; OCs comparable to CRM in simulation | | "EXNEX weight sensitivity?" | Reported over weights 0.1, 0.3, 0.5, 0.7, 0.9; results stable; primary at 0.5/0.5 per Neuenschwander 2016 | | "Power prior gamma?" | Discount 0.5 per FDA Bayesian Jan 2026 draft; sensitivity over 0.3-0.7 provided | | "Posterior probability threshold?" | Calibrated via simulation to frequentist Type-I 0.025 one-sided; cite Berry 2010 | | "Stan reproducibility?" | Docker container + renv-pinned R + Stan version; seeds provided; R-hat <1.01, ESS >2000 per parameter | | "Bias correction on platform graduation?" | Conditional MLE applied to estimate Phase 3 effect; cite Robertson 2023 | | "Why not frequentist instead?" | Bayesian framework permits borrowing (rare disease, pediatric); working convention; the FDA Bayesian Jan 2026 draft does not prescribe a specific gamma range -- check the draft for the current language before quoting primary inference with simulation calibration |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.