Version Compatibility

Reference examples tested with: statsmodels 0.14+, scipy 1.12+, numpy 1.26+, pandas 2.1+. R packages cited: pwr, gsDesign (Anderson/Merck), gsDesign2, rpact (Wassmer/Brannath), presize, npsurvSS, nph, simtrial.

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Power and Sample Size for Clinical Trials

"Justify the trial's sample size" -> Compute the n needed to detect a pre-specified alternative δ with power 1-β at significance α, accounting for endpoint distribution, design (superiority/NI/equivalence), expected dropout, multiplicity, and stratification — and distinguish δ (the effect the trial is powered to detect) from MCID (the clinically meaningful difference).

The Foundational Distinction -- δ vs MCID

δ (the alternative effect): what the trial is powered to detect. Usually set above the MCID because sponsors want a strong signal that exceeds noise + design uncertainty.

MCID (Minimum Clinically Important Difference): the smallest effect size considered clinically meaningful. Jaeschke-Singer-Guyatt 1989 Control Clin Trials 10:407 (anchor-based) and Norman-Sloan-Wyrwich 2003 Med Care 41:582 ("the remarkable universality of half a standard deviation") established the modern conventions.

Confusing the two has produced both:

Underpowered trials where sponsor sets δ = MCID and gets a CI straddling zero
Overgenerous NI margins where sponsor sets M2 = full MCID (NI margin should be a fraction of MCID)

Postdoc rule of thumb: for superiority, δ >= 1.5 × MCID; for NI, M2 <= 0.5 × MCID.

Algorithmic Taxonomy

| Design | Formula / approach | Software | Strength | Fails when | |--------|--------------------|----------|----------|------------| | Two-sample t-test, continuous | Cohen's d; n = 2 × (z_α/2 + z_β)² / d² | pwr::pwr.t.test (R); statsmodels.power.tt_ind_solve_power (Py) | Standard | Heteroscedasticity; non-normal outcomes | | Two-sample proportions (Fleiss) | Asymptotic normal approximation with/without continuity correction | power.prop.test (R) -- uncorrected; pwr::pwr.2p.test; statsmodels | Standard | n < 100/arm: continuity correction debate (D'Agostino 1988) | | Survival (Schoenfeld 1981) | events ≈ 4(z_α/2 + z_β)² / (log HR)² for 1:1 | gsDesign::nSurv; npsurvSS::size_two_arm | Standard PH-conformant | PH violated (immuno-oncology) under-estimates by 20-50% | | Survival under non-PH (Lakatos 1988) | Markov chain accommodating time-varying HR, accrual, dropout | gsDesign::nSurv; npsurvSS; simtrial | Handles immuno-oncology delayed effects | Requires explicit specification of HR(t) and accrual | | MaxCombo SS under NPH | Simulation-based; pre-specify weight family | nphRCT; simtrial | Robust to NPH pattern | Computationally heavier | | Non-inferiority fixed-margin | n = (z_α + z_β)² × variance / M² | pwr::pwr.t2n.test adapted; rpact::getSampleSizeMeans | Pre-discounted M | Constancy assumption violation invisible | | Non-inferiority synthesis | Pool historical control-vs-placebo + current test-vs-control | gsDesign::ssTwoArmTest | More efficient than fixed-margin | Constancy assumption MUST hold exactly | | Equivalence TOST | Two one-sided tests at α each | pwr::pwr.t.test adapted; presize | No multiplicity adjustment needed | Wrong question when superiority/NI is intended | | Group-sequential | Lan-DeMets spending function | rpact; gsDesign | Interim analyses; early stopping | More complex SAP | | Sample-size re-estimation (Mehta-Pocock) | Promising-zone conditional power | rpact::getSampleSizeMeans with reestimation | Recovers power if interim shows promise | Unblinded SSR scares FDA | | Cluster-randomised | Adjust for design effect = 1 + (m-1)ICC | clusterPower; pwr adapted | Standard | ICC misspecification |

Postdoc reading list:

Fleiss JL 1981 Statistical Methods for Rates and Proportions (with/without continuity correction tables)
Schoenfeld DA 1981 Biometrika 68:316 (canonical TTE formula)
Lakatos E 1988 Biometrics 44:229 (Markov chain SS for complex survival)
Schuirmann DJ 1987 J Pharmacokinet Biopharm 15:657 (TOST)
Jaeschke R, Singer J, Guyatt GH 1989 Control Clin Trials 10:407 (MCID anchor-based)
Norman GR, Sloan JA, Wyrwich KW 2003 Med Care 41:582 (0.5 SD heuristic)
Snapinn SM 2000 Curr Control Trials Cardiovasc Med 1:19 (NI biocreep)
Temple R, Ellenberg SS 2000 Ann Intern Med 133:455 + 133:464 (NI assay sensitivity; Ann Intern Med NOT NEJM — common citation error)
Hung HMJ, Wang SJ, O'Neill RT 2005 Biom J (NI within-trial Type-I not guaranteed)
Mehta CR, Pocock SJ 2011 Stat Med 30:3267 (promising zone)
Lin RS, Lin J, Roychoudhury S et al 2020 Stat Biopharm Res (NPH Working Group)

Decision Tree by Scenario

| Scenario | Recommended approach | Why | |----------|---------------------|-----| | Continuous outcome, two-arm parallel, superiority | Fleiss / Cohen's d formula via power.t.test or statsmodels | Standard; cite Fleiss 1981 | | Binary outcome, n > 100/arm | Uncorrected normal approximation via power.prop.test; statsmodels power_proportions_2indep | Adequate for n > 100 | | Binary outcome, n < 100/arm | Fisher exact-based simulation OR Fleiss with continuity correction | Continuity correction debate; cite D'Agostino 1988 if uncorrected | | Time-to-event with PH | Schoenfeld 1981 via gsDesign::nSurv | Standard; pre-specify hazards | | Time-to-event under expected NPH (immuno-oncology) | Lakatos 1988 OR simulation via simtrial::simtrial/nphRCT | Schoenfeld under-estimates by 20-50%; cite Lin 2020 | | Non-inferiority continuous | Fixed-margin with M2 = 0.5 × historical M1 (discounted) | FDA 2016 NI guidance; cite Temple-Ellenberg 2000 | | Non-inferiority binary | Fixed-margin with Miettinen-Nurminen CI for RD | Cite EMA NI guidance; MN-CI standard | | Bioequivalence (Cmax/AUC) | TOST with 80-125% margins on geometric mean ratio | FDA 1992 BE guidance; Schuirmann 1987 | | Cluster-randomised | n × design effect = 1 + (m-1)ICC | Cite Murray 1998 cluster RCT methodology | | Group-sequential | Lan-DeMets spending function | rpact or gsDesign | | Pilot for sample size re-estimation | Blinded SSR (Friede-Kieser 2006) | No Type-I inflation; safer than unblinded | | Promising-zone reestimation | Mehta-Pocock 2011 with pre-specified increase rule | Recovers power; cite caveat re Jennison-Turnbull 2015 critique |

Continuous Outcomes -- Two-Sample t-Test

from statsmodels.stats.power import tt_ind_solve_power

# Solve for n per group
n = tt_ind_solve_power(
    effect_size=0.5,  # Cohen's d = (mu1 - mu2) / sigma
    alpha=0.05,
    power=0.80,
    alternative='two-sided'
)
print(f'n per arm = {np.ceil(n):.0f}')

# Solve for power given n
power = tt_ind_solve_power(effect_size=0.5, alpha=0.05, nobs1=100, alternative='two-sided')

Cohen's d benchmarks: small = 0.2, medium = 0.5, large = 0.8. Choose d to detect based on prior literature or clinically meaningful effect, NOT post-hoc to fit the affordable n.

Inflate for dropout: if dropout rate is q, multiply final n by 1/(1-q). For q=0.20, n_total = n/0.80.

Stratified randomisation efficiency: if randomisation stratifies on prognostic factors with combined R² = r against outcome, the effective n is n/(1-r²). Senn 2013 Stat Med 32:1439 makes the precision argument explicit.

Binary Outcomes -- The Continuity Correction Debate

from statsmodels.stats.power import NormalIndPower
import math

# Without continuity correction
p1, p2 = 0.30, 0.20
pbar = (p1 + p2) / 2
z_alpha = 1.96  # two-sided alpha=0.05
z_beta = 0.84   # power=0.80
n = ((z_alpha * math.sqrt(2 * pbar * (1 - pbar)) +
      z_beta * math.sqrt(p1 * (1 - p1) + p2 * (1 - p2))) / (p1 - p2)) ** 2
print(f'n per arm (uncorrected) = {math.ceil(n)}')

# Continuity-corrected (Fleiss formula)
correction = (1 + math.sqrt(1 + 4 / (n * abs(p1 - p2)))) ** 2 / 4
n_corrected = math.ceil(n * correction)
print(f'n per arm (continuity corrected) = {n_corrected}')

The continuity correction debate (D'Agostino, Chase & Belanger 1988 Am Stat 42:198):

Pro (Fleiss, Yates): better matches the exact distribution under H0
Con: overly conservative, wastes ~10% sample size; modern computing makes exact (Fisher-Irwin, Boschloo) tests feasible

R's power.prop.test uses uncorrected normal approximation — can over-state power 5-10% in small samples (n<100). For confirmatory work with n<100, use simulation-based SS via simr or presize.

Survival (Time-to-Event)

Schoenfeld 1981 -- the canonical formula

import math

# Events needed for two-sample log-rank under PH (Schoenfeld 1981)
from scipy.stats import norm

def schoenfeld_events(hr, alpha=0.05, power=0.80, allocation_ratio=1.0, two_sided=True):
    """
    events = (z_crit + z_beta)^2 / (p*(1-p) * log(HR)^2)

    where z_crit = z_{1-alpha/2} for two-sided test OR z_{1-alpha} for one-sided.
    """
    z_crit = norm.ppf(1 - alpha/2) if two_sided else norm.ppf(1 - alpha)
    z_beta = norm.ppf(power)
    p = 1 / (1 + allocation_ratio)
    return math.ceil((z_crit + z_beta) ** 2 / (p * (1 - p) * math.log(hr) ** 2))

# For HR=0.70, alpha=0.05 two-sided, 80% power, 1:1 -> approx 247 events
events_needed = schoenfeld_events(hr=0.70)
print(f'Events needed for HR=0.70: {events_needed}')

Events drive power, not subjects. Convert to n via expected event rate and follow-up:

# n needed = events / overall event probability over follow-up
expected_overall_event_prob = 0.40  # from pilot or historical
n_per_arm = math.ceil(events_needed / (2 * expected_overall_event_prob))
# Inflate for dropout
n_per_arm_with_dropout = math.ceil(n_per_arm / (1 - 0.15))  # 15% dropout

Lakatos 1988 -- complex survival

Use R gsDesign::nSurv() or npsurvSS::size_two_arm() for complex scenarios with:

Time-varying hazard ratios (delayed effect)
Non-uniform accrual
Different dropout rates per arm
Cure fractions

library(gsDesign)
n_lakatos <- nSurv(
    lambdaC = 0.04,     # control hazard per month
    hr = 0.70,           # treatment HR
    eta = 0.005,         # dropout hazard per month
    T = 24,              # total study duration in months
    minfup = 12,         # minimum follow-up
    accrualTime = 12,    # accrual duration
    alpha = 0.025,       # one-sided
    beta = 0.10          # power = 90%
)
print(n_lakatos)

Under non-PH (immuno-oncology with delayed effect)

Schoenfeld 1981 ASSUMES PH. Under immuno-oncology with delayed separation, the formula under-estimates required events by 20-50%. The fix:

# Simulate under expected hazard pattern (Lin 2020 NPH Working Group recommendation)
library(simtrial)
sim_result <- simtrial(
    n_per_arm = 250,
    enroll_rate = piecewise_enroll(),
    fail_rate = piecewise_fail(  # delayed-effect specification
        duration = c(6, 18),
        fail_rate = c(0.04, 0.04),  # control
        hr = c(1.0, 0.5),            # treatment HR is 1 for first 6 months, then 0.5
        dropout_rate = c(0.005, 0.005)
    ),
    total_duration = 36
)
# Compute empirical power via MaxCombo or weighted log-rank

Non-Inferiority Designs -- The FDA 2016 Framework

FDA NI Trials Guidance (November 2016) establishes the canonical M1/M2 framework:

M1 = entire effect of active control vs placebo (lower bound of CI from historical meta-analysis)
M2 = clinically acceptable loss, typically M1/2 (50% effect retention) but as conservative as M1/4 for mortality endpoints

The "double discount" trap: M2 = 0.5 × (lower CI bound of M1) effectively requires 50% retention of a discounted historical estimate, yielding a much tighter margin than naive point-estimate retention. This has effectively halted new antibiotic NDA development under stringent NI margins.

# Fixed-margin NI sample size (binary endpoint)
import math

p_control = 0.85         # historical/observed control success rate
p_test = 0.85            # null assumption: test = control (NI null is test - control < -M2)
margin_M2 = 0.05         # pre-specified clinically acceptable loss
z_alpha = 1.645          # one-sided alpha=0.025
z_beta = 0.84            # power=0.80

# Approximate: n per arm
variance_term = p_control * (1 - p_control) + p_test * (1 - p_test)
n = math.ceil((z_alpha + z_beta) ** 2 * variance_term / margin_M2 ** 2)
print(f'NI sample size per arm: {n}')

Use Miettinen-Nurminen score CI for RD (the regulatory standard) when computing the NI margin from observed proportions. Cite ratesci::scoreci (R) for production.

Constancy assumption: the historical placebo-vs-active effect must persist in the NI trial. If standard of care drifted, M1 over-estimates and margin is too liberal. Hung-Wang-O'Neill 2005 Biom J critique: NI trials have NO within-trial Type-I error guarantee — alpha is conditional on constancy.

Assay sensitivity (Temple-Ellenberg 2000 Ann Intern Med 133:455 + 133:464): NI trials have only external assay sensitivity (inferred from historical data + constancy); placebo-controlled trials have internal assay sensitivity. This is why NI trials are second-best when placebo is ethical.

Equivalence Designs and Bioequivalence

TOST (Two One-Sided Tests; Schuirmann 1987): reject H0 of inequivalence iff both one-sided tests reject at α each — equivalent to the 1-2α CI lying within (-δ, +δ). Closed under intersection-union -> no multiplicity adjustment needed despite two tests.

# TOST sample size for continuous outcome
import math

def tost_sample_size(mu_diff, sigma, margin, alpha=0.05, power=0.80):
    """Sample size per arm for TOST equivalence test."""
    z_alpha = 1.645  # one-sided
    z_beta = 0.84
    effect = abs(margin - abs(mu_diff))
    if effect <= 0:
        raise ValueError('|mu_diff| must be < margin')
    n = math.ceil(2 * sigma ** 2 * (z_alpha + z_beta) ** 2 / effect ** 2)
    return n

n_eq = tost_sample_size(mu_diff=0.0, sigma=10, margin=5)
print(f'Equivalence sample size per arm: {n_eq}')

FDA 1992 bioequivalence (80-125% rule): geometric mean ratio of Cmax and AUC must have 90% CI within (0.80, 1.25). Implemented in PowerTOST R package for crossover BE designs.

Crossover Designs -- Bioequivalence and Repeated-Measures

Crossover trials randomise each subject to receive both treatments in different periods. Power calculation is fundamentally different from parallel: the within-subject SD (sigma_w) drives power, not between-subject SD (sigma_b). With reasonable carryover-free designs, crossover requires ~25-50% the n of parallel for the same precision.

# Two-period crossover sample size (continuous endpoint)
import math

def crossover_n_continuous(mean_diff, sd_within, alpha=0.05, power=0.80):
    """n per sequence (so total n = 2*n) for two-period crossover."""
    z_alpha = 1.96  # two-sided
    z_beta = 0.84   # power=0.80
    n_per_seq = math.ceil(2 * (z_alpha + z_beta)**2 * sd_within**2 / mean_diff**2)
    return n_per_seq, 2 * n_per_seq  # n_per_seq, n_total

Bioequivalence (FDA 1992 / EMA 2010 framework)

Average bioequivalence: geometric mean ratio (GMR) of test/reference Cmax and AUC must have 90% CI within (0.80, 1.25). Log-transform pharmacokinetic parameters; analyse with mixed-effects ANOVA (period, sequence, treatment, subject random).

library(PowerTOST)
# Sample size for 2x2 crossover bioequivalence with CV (within-subject)
n_be <- sampleN.TOST(
    alpha = 0.05,
    targetpower = 0.80,
    theta0 = 0.95,        # expected GMR
    theta1 = 0.80,        # lower BE bound
    theta2 = 1.25,        # upper BE bound
    CV = 0.25,             # within-subject CV
    design = '2x2x2'      # standard 2-period 2-sequence
)
print(n_be)

Highly variable drugs (CV > 30%): standard 80-125% fails with feasible n. Use scaled average bioequivalence (SABE); the expanding-limits (ABEL) approach below is the EMA method (the FDA's reference-scaled RSABE uses a different criterion and PowerTOST::sampleN.RSABE()):

Reference-scaled limits: theta = exp(0.760 * sigma_WR) when sigma_WR > 0.294
Widens BE bounds proportional to within-subject variability of reference
PowerTOST::sampleN.scABEL() for sample size

Carryover assessment (Grizzle 1965)

Grizzle test for carryover: compares baseline-adjusted period 1 vs period 2 effects. Now controversial: the two-stage Grizzle carryover pretest is misleading -- it inflates the Type-I error of the treatment comparison (Freeman 1989 Stat Med 8:1421); significant "carryover" should drive design choice (washout extension), not an analysis switch (Senn 2002, Cross-over Trials in Clinical Research, 2nd ed.).

Modern operational rule: pre-specify adequate washout (>=5 half-lives); do NOT routinely test for carryover in primary analysis. If carryover is biologically plausible, use parallel design or extend washout.

Period effect

Period effects (calendar/learning) are estimable in 2x2 crossover. Standard analysis includes period as fixed effect:

library(nlme)
fit <- lme(response ~ treatment + period, random = ~1 | subject, data = df)

If period significant, treatment effect is still unbiased (orthogonal in balanced 2x2); report period effect for transparency.

Decision tree -- when crossover is appropriate

| Scenario | Use crossover? | Why | |----------|----------------|-----| | Bioequivalence of pharmacokinetic parameters | YES | Standard regulatory; within-subject precision much higher | | Chronic stable disease (HTN, GERD, asthma) | YES | Reversible response; ~50% sample size savings | | Acute disease (sepsis, MI) | NO | Cannot retreat same patient | | Curative intent (oncology, surgery) | NO | Treatment alters disease state irreversibly | | Long-half-life drug (>1 week) | NO | Washout impractical | | Patient-reported outcomes with strong period bias | CAUTION | Period bias may dominate |

MCID -- Anchor-Based vs Distribution-Based

| Method | Source | Description | |--------|--------|-------------| | Anchor-based | Jaeschke 1989 | Patient-reported anchor question; MCID = mean change in subjects who report "small important" change | | 1 SEM (Wyrwich 1999) | J Clin Epidemiol | MCID ≈ standard error of measurement | | 0.5 SD (Norman 2003) | Med Care 41:582 | "Remarkable universality of half a standard deviation" across PROMs | | 0.5 SD baseline | Common rule of thumb | Quick approximation for unknown PROMs |

Postdoc rule: δ in power calculation should EXCEED MCID (often 1.5-2× MCID) so that trials are powered to detect effects clearly larger than the noise threshold.

Multiple Endpoints -- See multiplicity-graphical

For co-primary endpoints, power-adjust per FDA Multiple Endpoints Guidance (October 2022):

Co-primary (all-must-win): each endpoint at full alpha but joint power = product -> inflate n
Multiple primary (any-wins): alpha split (e.g., Bonferroni or graphical)
Hierarchical: test in order; if any fail, downstream cannot be claimed

Reconciliation: When Methods Disagree

| Pattern | Likely cause | Action | |---------|--------------|--------| | Schoenfeld n much smaller than Lakatos simulation n | PH violated under expected HR(t) pattern (immuno-oncology delayed effect) | Use Lakatos OR simulation under expected HR(t); cite Lin 2020 NPH Working Group | | Continuity-corrected vs uncorrected proportion SS differ by ~10% | Asymptotic approximation debate (D'Agostino 1988) | For confirmatory n < 100/arm, use simulation-based SS; document correction approach in SAP | | NI margin per FDA double discount (50% of M1 lower CI) vs sponsor proposed margin (50% of M1 point estimate) | Sponsor uses point estimate; FDA expects lower-CI discount | FDA 2016 NI guidance: M2 from lower CI bound is the regulatory expectation | | Bayesian predictive probability of success vs frequentist power give different n | Bayesian integrates prior uncertainty; frequentist conditions on hypothesised δ | Bayesian PPoS more honest about pre-trial uncertainty; frequentist preferred for confirmatory regulatory; report both for adaptive contexts | | TOST and CI-inclusion approach give different equivalence conclusions | TOST is two-one-sided at α; CI is 1-2α — mathematically equivalent so should NOT disagree | If disagree, check α conventions (Schuirmann 1987: TOST at α, CI at 1-2α) | | Sponsor's MCID vs published anchor-based MCID differ | Different anchor questions or population | Cite primary MCID source; cross-check with Norman 2003 0.5 SD heuristic; declare in SAP | | Stratified-randomisation SS calculation vs unstratified differ substantially | Stratification efficiency gain via Senn 2013 formula | Use stratified formula; cite efficiency = 1/(1-r²) where r² is R² of strata against outcome | | Cluster-RCT SS underpowered after enrollment | ICC misspecified at design (most common SS failure in cluster RCTs) | Use historical ICC + sensitivity analysis at ±50% of estimate; cite Murray 1998 |

Per-Method Failure Modes

Schoenfeld under non-PH

Trigger: Immunotherapy or other delayed-effect mechanism
Mechanism: Schoenfeld assumes constant log-HR
Symptom: Trial under-powered; observed events insufficient
Fix: Lakatos 1988 or simulation under expected HR(t); cite Lin 2020 NPH Working Group

Continuity correction debate

Trigger: Small-sample binary design
Mechanism: Uncorrected normal approx over-states power; corrected wastes ~10%
Symptom: Trial reach significance at lower n than expected
Fix: Use simulation-based SS for n<100; pre-specify correction approach in SAP

NI margin too liberal

Trigger: M2 set to MCID rather than 50% of historical M1
Mechanism: Allows clinically unacceptable inferiority within "non-inferior" CI
Symptom: Regulators flag margin as inappropriate at scientific advice
Fix: M2 <= 0.5 × MCID; M2 <= 0.5 × historical M1 lower bound (FDA double discount)

Constancy assumption violated

Trigger: NI trial with new standard of care differing from historical
Mechanism: M1 derived from historical data no longer reflects current control efficacy
Symptom: Active control performs worse than expected; "successful" NI may mask true inferiority
Fix: Synthesis approach with formal sensitivity to constancy; consider three-arm trial with small placebo

MCID confused with δ

Trigger: Sponsor sets δ = MCID in power calculation
Mechanism: Trial powered to detect smallest meaningful effect, with no margin for noise/dropout
Symptom: Effect estimate near MCID with wide CI straddling zero
Fix: δ >= 1.5 × MCID; document rationale in SAP

Promising-zone "stealth alpha inflation"

Trigger: Mehta-Pocock SSR applied without sufficient simulation
Mechanism: a naive unblinded increase analysed with the conventional (unweighted) statistic inflates Type-I error (Cui-Hung-Wang 1999); the pre-specified weighted CHW statistic preserves alpha
Symptom: Type-I inflation when the CHW weighting is dropped; separately, Jennison-Turnbull 2015 show the promising zone is inefficient -- the largest power gains per added patient lie outside it
Fix: Pre-specify increase rule transparently; report simulation operating characteristics

Unblinded SSR -- DMC firewall failure

Trigger: Interim effect estimate leaked beyond IDMC
Mechanism: Sponsor inference from sample-size increase decision reveals direction of interim effect
Symptom: Regulator audit reveals unblinding
Fix: Strict firewall; only "increase / no increase" communicated to sponsor; document SOP

Quantitative Thresholds

| Threshold | Source | Rationale | |-----------|--------|-----------| | δ >= 1.5 × MCID for superiority | Postdoc rule; Norman 2003 implication | δ = MCID gives no margin for sampling variation | | M2 <= 0.5 × historical M1 (FDA double discount) | FDA NI 2016 guidance | Conservative retention against biocreep | | M2 <= 0.5 × MCID for NI | Standard regulatory expectation | Acceptable loss < clinically meaningful difference | | Schoenfeld under non-PH under-estimates by 20-50% | Lin 2020 NPH WG | Switch to Lakatos or simulation | | Continuity correction wastes ~10% sample size | D'Agostino, Chase & Belanger 1988 Am Stat 42:198 | Modern computing makes exact tests cheap | | 90% CI within (0.80, 1.25) for BE | FDA 1992 BE guidance | Geometric mean ratio of Cmax/AUC | | Unblinded SSR Type-I requires CHW weights | Cui-Hung-Wang 1999 | Naive increase inflates Type-I | | Promising-zone CP range ~30-80% | Mehta-Pocock 2011 Stat Med 30:3267 | Mathematical calibration for Type-I preservation |

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Trial under-powered for immunotherapy | Schoenfeld formula used under non-PH | Lakatos or simulation; cite Lin 2020 | | δ = MCID, CI straddles zero | δ-MCID confusion | Pre-specify δ >= 1.5 × MCID | | NI "success" but active much worse than historical | Constancy violation | Cite Hung 2005; consider synthesis or 3-arm | | Unblinded SSR with naive sample increase | Type-I inflation | CHW-weighted test; cite Cui-Hung-Wang 1999 | | power.prop.test over-states power n < 100 | Uncorrected normal approximation | Simulation-based SS or Fleiss continuity correction | | BE failure after large n with point estimate near 1.0 | Variability higher than assumed | Re-estimate from pilot; consider replicate design for highly variable drugs | | Promising-zone tipping to favourable not pre-specified | Operational confusion | Document increase rule in SAP; pre-specify cap | | Cluster RCT n based on individual-level | Design effect ignored | n × (1 + (m-1)ICC); cite Murray 1998 | | TOST sample size with non-zero true difference | Treating null as abs(mu_diff)=0 | Include realistic mu_diff in calculation |

Anticipated Reviewer Pushback

| Pushback | Response | |----------|----------| | "Is δ realistic?" | Justified from prior phase 2; δ exceeds MCID by factor X; pre-specified | | "Why this NI margin?" | M2 = 50% of historical lower-CI M1 per FDA 2016 double discount; M2 < MCID | | "Constancy assumption?" | Reviewed historical placebo-vs-active effect over time; stable; sensitivity to lower-end constancy provided | | "Schoenfeld vs Lakatos?" | PH plausible from phase 2 KM; pre-specified Schoenfeld; Lakatos simulation under expected delayed-effect as sensitivity | | "Continuity correction?" | n > 200/arm; uncorrected normal approximation valid (Fleiss 1981) | | "Bioequivalence variability assumption?" | Conservative (upper bound of historical CV); pilot will trigger re-estimation if CV >X% | | "Cluster ICC source?" | Historical ICC from prior cluster trial in similar setting; sensitivity at ICC ±50% provided | | "MCID source?" | Anchor-based MCID from validation study (Jaeschke 1989); supported by 0.5 SD heuristic (Norman 2003) | | "Promising zone or just blinded SSR?" | Promising zone with pre-specified CP boundaries; CHW weights for Type-I control |

References

Cui L, Hung HMJ, Wang SJ. 1999. Modification of sample size in group sequential clinical trials. Biometrics 55:853-857.
D'Agostino RB, Chase W, Belanger A. 1988. The appropriateness of some common procedures for testing the equality of two independent binomial populations. Am Stat 42:198-202.
FDA. 2016. Non-Inferiority Clinical Trials to Establish Effectiveness. Final Guidance.
FDA. 2022. Multiple Endpoints in Clinical Trials. Final Guidance.
Fleiss JL. 1981. Statistical Methods for Rates and Proportions (2nd ed). Wiley.
Friede T, Kieser M. 2006. Sample size recalculation in internal pilot study designs. Biom J 48:537-555.
Hung HMJ, Wang SJ, O'Neill RT. 2005. A regulatory perspective on choice of margin and statistical inference issue in non-inferiority trials. Biom J 47:28-36.
Jaeschke R, Singer J, Guyatt GH. 1989. Measurement of health status: ascertaining the minimal clinically important difference. Control Clin Trials 10:407-415.
Lakatos E. 1988. Sample sizes based on the log-rank statistic in complex clinical trials. Biometrics 44:229-241.
Lin RS, Lin J, Roychoudhury S, Anderson KM, Hu T, Huang B, Leon LF, Liao JJZ, Liu R, Luo X, Mukhopadhyay P, Qin R, Tatsuoka K, Wang X, Wang Y, Zhu J, Chen TT, Iacona R. 2020. Alternative analysis methods for time to event endpoints under nonproportional hazards: a comparative analysis. Stat Biopharm Res.
Mehta CR, Pocock SJ. 2011. Adaptive increase in sample size when interim results are promising. Stat Med 30:3267-3284.
Norman GR, Sloan JA, Wyrwich KW. 2003. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care 41:582-592.
Schoenfeld DA. 1981. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 68:316-319.
Schuirmann DJ. 1987. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm 15:657-680.
Senn S. 2013. Seven myths of randomisation in clinical trials. Stat Med 32:1439-1450.
Snapinn SM. 2000. Noninferiority trials. Curr Control Trials Cardiovasc Med 1:19-21.
Temple R, Ellenberg SS. 2000. Placebo-controlled trials and active-control trials in the evaluation of new treatments, part 1: ethical and scientific issues. Ann Intern Med 133:455-463.
Ellenberg SS, Temple R. 2000. Placebo-controlled trials and active-control trials in the evaluation of new treatments, part 2: practical issues and specific cases. Ann Intern Med 133:464-470.
Wyrwich KW, Tierney WM, Wolinsky FD. 1999. Further evidence supporting an SEM-based criterion for the identification of meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol 52:861-873.

Related Skills

clinical-biostatistics/survival-analysis - TTE-specific sample size (Schoenfeld, Lakatos, simulation)
clinical-biostatistics/effect-measures - δ on OR/RR/RD scales
clinical-biostatistics/categorical-tests - Binary endpoint test selection
clinical-biostatistics/multiplicity-graphical - Power adjustment for co-primary endpoints
clinical-biostatistics/adaptive-designs - SSR, promising zone, group-sequential
clinical-biostatistics/bayesian-trials - Bayesian SS via predictive probability of success
clinical-biostatistics/trial-reporting - CONSORT 2025 item 16a (sample size justification)
experimental-design/sample-size - General sample-size methods
experimental-design/power-analysis - General power methods

Version Compatibility

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Power and Sample Size for Clinical Trials

The Foundational Distinction -- δ vs MCID

δ (the alternative effect): what the trial is powered to detect. Usually set above the MCID because sponsors want a strong signal that exceeds noise + design uncertainty.

Confusing the two has produced both:

Underpowered trials where sponsor sets δ = MCID and gets a CI straddling zero
Overgenerous NI margins where sponsor sets M2 = full MCID (NI margin should be a fraction of MCID)

Postdoc rule of thumb: for superiority, δ >= 1.5 × MCID; for NI, M2 <= 0.5 × MCID.

Algorithmic Taxonomy

Postdoc reading list:

Fleiss JL 1981 Statistical Methods for Rates and Proportions (with/without continuity correction tables)
Schoenfeld DA 1981 Biometrika 68:316 (canonical TTE formula)
Lakatos E 1988 Biometrics 44:229 (Markov chain SS for complex survival)
Schuirmann DJ 1987 J Pharmacokinet Biopharm 15:657 (TOST)
Jaeschke R, Singer J, Guyatt GH 1989 Control Clin Trials 10:407 (MCID anchor-based)
Norman GR, Sloan JA, Wyrwich KW 2003 Med Care 41:582 (0.5 SD heuristic)
Snapinn SM 2000 Curr Control Trials Cardiovasc Med 1:19 (NI biocreep)
Temple R, Ellenberg SS 2000 Ann Intern Med 133:455 + 133:464 (NI assay sensitivity; Ann Intern Med NOT NEJM — common citation error)
Hung HMJ, Wang SJ, O'Neill RT 2005 Biom J (NI within-trial Type-I not guaranteed)
Mehta CR, Pocock SJ 2011 Stat Med 30:3267 (promising zone)
Lin RS, Lin J, Roychoudhury S et al 2020 Stat Biopharm Res (NPH Working Group)

Decision Tree by Scenario

Continuous Outcomes -- Two-Sample t-Test

from statsmodels.stats.power import tt_ind_solve_power

# Solve for n per group
n = tt_ind_solve_power(
    effect_size=0.5,  # Cohen's d = (mu1 - mu2) / sigma
    alpha=0.05,
    power=0.80,
    alternative='two-sided'
)
print(f'n per arm = {np.ceil(n):.0f}')

# Solve for power given n
power = tt_ind_solve_power(effect_size=0.5, alpha=0.05, nobs1=100, alternative='two-sided')

Cohen's d benchmarks: small = 0.2, medium = 0.5, large = 0.8. Choose d to detect based on prior literature or clinically meaningful effect, NOT post-hoc to fit the affordable n.

Inflate for dropout: if dropout rate is q, multiply final n by 1/(1-q). For q=0.20, n_total = n/0.80.

Binary Outcomes -- The Continuity Correction Debate

from statsmodels.stats.power import NormalIndPower
import math

# Without continuity correction
p1, p2 = 0.30, 0.20
pbar = (p1 + p2) / 2
z_alpha = 1.96  # two-sided alpha=0.05
z_beta = 0.84   # power=0.80
n = ((z_alpha * math.sqrt(2 * pbar * (1 - pbar)) +
      z_beta * math.sqrt(p1 * (1 - p1) + p2 * (1 - p2))) / (p1 - p2)) ** 2
print(f'n per arm (uncorrected) = {math.ceil(n)}')

# Continuity-corrected (Fleiss formula)
correction = (1 + math.sqrt(1 + 4 / (n * abs(p1 - p2)))) ** 2 / 4
n_corrected = math.ceil(n * correction)
print(f'n per arm (continuity corrected) = {n_corrected}')

The continuity correction debate (D'Agostino, Chase & Belanger 1988 Am Stat 42:198):

Pro (Fleiss, Yates): better matches the exact distribution under H0
Con: overly conservative, wastes ~10% sample size; modern computing makes exact (Fisher-Irwin, Boschloo) tests feasible

Survival (Time-to-Event)

Schoenfeld 1981 -- the canonical formula

import math

# Events needed for two-sample log-rank under PH (Schoenfeld 1981)
from scipy.stats import norm

def schoenfeld_events(hr, alpha=0.05, power=0.80, allocation_ratio=1.0, two_sided=True):
    """
    events = (z_crit + z_beta)^2 / (p*(1-p) * log(HR)^2)

    where z_crit = z_{1-alpha/2} for two-sided test OR z_{1-alpha} for one-sided.
    """
    z_crit = norm.ppf(1 - alpha/2) if two_sided else norm.ppf(1 - alpha)
    z_beta = norm.ppf(power)
    p = 1 / (1 + allocation_ratio)
    return math.ceil((z_crit + z_beta) ** 2 / (p * (1 - p) * math.log(hr) ** 2))

# For HR=0.70, alpha=0.05 two-sided, 80% power, 1:1 -> approx 247 events
events_needed = schoenfeld_events(hr=0.70)
print(f'Events needed for HR=0.70: {events_needed}')

Events drive power, not subjects. Convert to n via expected event rate and follow-up:

# n needed = events / overall event probability over follow-up
expected_overall_event_prob = 0.40  # from pilot or historical
n_per_arm = math.ceil(events_needed / (2 * expected_overall_event_prob))
# Inflate for dropout
n_per_arm_with_dropout = math.ceil(n_per_arm / (1 - 0.15))  # 15% dropout

Lakatos 1988 -- complex survival

Use R gsDesign::nSurv() or npsurvSS::size_two_arm() for complex scenarios with:

Time-varying hazard ratios (delayed effect)
Non-uniform accrual
Different dropout rates per arm
Cure fractions

library(gsDesign)
n_lakatos <- nSurv(
    lambdaC = 0.04,     # control hazard per month
    hr = 0.70,           # treatment HR
    eta = 0.005,         # dropout hazard per month
    T = 24,              # total study duration in months
    minfup = 12,         # minimum follow-up
    accrualTime = 12,    # accrual duration
    alpha = 0.025,       # one-sided
    beta = 0.10          # power = 90%
)
print(n_lakatos)

Under non-PH (immuno-oncology with delayed effect)

Schoenfeld 1981 ASSUMES PH. Under immuno-oncology with delayed separation, the formula under-estimates required events by 20-50%. The fix:

# Simulate under expected hazard pattern (Lin 2020 NPH Working Group recommendation)
library(simtrial)
sim_result <- simtrial(
    n_per_arm = 250,
    enroll_rate = piecewise_enroll(),
    fail_rate = piecewise_fail(  # delayed-effect specification
        duration = c(6, 18),
        fail_rate = c(0.04, 0.04),  # control
        hr = c(1.0, 0.5),            # treatment HR is 1 for first 6 months, then 0.5
        dropout_rate = c(0.005, 0.005)
    ),
    total_duration = 36
)
# Compute empirical power via MaxCombo or weighted log-rank

Non-Inferiority Designs -- The FDA 2016 Framework

FDA NI Trials Guidance (November 2016) establishes the canonical M1/M2 framework:

M1 = entire effect of active control vs placebo (lower bound of CI from historical meta-analysis)
M2 = clinically acceptable loss, typically M1/2 (50% effect retention) but as conservative as M1/4 for mortality endpoints

# Fixed-margin NI sample size (binary endpoint)
import math

p_control = 0.85         # historical/observed control success rate
p_test = 0.85            # null assumption: test = control (NI null is test - control < -M2)
margin_M2 = 0.05         # pre-specified clinically acceptable loss
z_alpha = 1.645          # one-sided alpha=0.025
z_beta = 0.84            # power=0.80

# Approximate: n per arm
variance_term = p_control * (1 - p_control) + p_test * (1 - p_test)
n = math.ceil((z_alpha + z_beta) ** 2 * variance_term / margin_M2 ** 2)
print(f'NI sample size per arm: {n}')

Use Miettinen-Nurminen score CI for RD (the regulatory standard) when computing the NI margin from observed proportions. Cite ratesci::scoreci (R) for production.

Equivalence Designs and Bioequivalence

# TOST sample size for continuous outcome
import math

def tost_sample_size(mu_diff, sigma, margin, alpha=0.05, power=0.80):
    """Sample size per arm for TOST equivalence test."""
    z_alpha = 1.645  # one-sided
    z_beta = 0.84
    effect = abs(margin - abs(mu_diff))
    if effect <= 0:
        raise ValueError('|mu_diff| must be < margin')
    n = math.ceil(2 * sigma ** 2 * (z_alpha + z_beta) ** 2 / effect ** 2)
    return n

n_eq = tost_sample_size(mu_diff=0.0, sigma=10, margin=5)
print(f'Equivalence sample size per arm: {n_eq}')

FDA 1992 bioequivalence (80-125% rule): geometric mean ratio of Cmax and AUC must have 90% CI within (0.80, 1.25). Implemented in PowerTOST R package for crossover BE designs.

Crossover Designs -- Bioequivalence and Repeated-Measures

# Two-period crossover sample size (continuous endpoint)
import math

def crossover_n_continuous(mean_diff, sd_within, alpha=0.05, power=0.80):
    """n per sequence (so total n = 2*n) for two-period crossover."""
    z_alpha = 1.96  # two-sided
    z_beta = 0.84   # power=0.80
    n_per_seq = math.ceil(2 * (z_alpha + z_beta)**2 * sd_within**2 / mean_diff**2)
    return n_per_seq, 2 * n_per_seq  # n_per_seq, n_total

Bioequivalence (FDA 1992 / EMA 2010 framework)

library(PowerTOST)
# Sample size for 2x2 crossover bioequivalence with CV (within-subject)
n_be <- sampleN.TOST(
    alpha = 0.05,
    targetpower = 0.80,
    theta0 = 0.95,        # expected GMR
    theta1 = 0.80,        # lower BE bound
    theta2 = 1.25,        # upper BE bound
    CV = 0.25,             # within-subject CV
    design = '2x2x2'      # standard 2-period 2-sequence
)
print(n_be)

Reference-scaled limits: theta = exp(0.760 * sigma_WR) when sigma_WR > 0.294
Widens BE bounds proportional to within-subject variability of reference
PowerTOST::sampleN.scABEL() for sample size

Carryover assessment (Grizzle 1965)

Period effect

Period effects (calendar/learning) are estimable in 2x2 crossover. Standard analysis includes period as fixed effect:

library(nlme)
fit <- lme(response ~ treatment + period, random = ~1 | subject, data = df)

If period significant, treatment effect is still unbiased (orthogonal in balanced 2x2); report period effect for transparency.

Decision tree -- when crossover is appropriate

MCID -- Anchor-Based vs Distribution-Based

Postdoc rule: δ in power calculation should EXCEED MCID (often 1.5-2× MCID) so that trials are powered to detect effects clearly larger than the noise threshold.

Multiple Endpoints -- See multiplicity-graphical

For co-primary endpoints, power-adjust per FDA Multiple Endpoints Guidance (October 2022):

Co-primary (all-must-win): each endpoint at full alpha but joint power = product -> inflate n
Multiple primary (any-wins): alpha split (e.g., Bonferroni or graphical)
Hierarchical: test in order; if any fail, downstream cannot be claimed

Reconciliation: When Methods Disagree

Per-Method Failure Modes

Schoenfeld under non-PH

Trigger: Immunotherapy or other delayed-effect mechanism
Mechanism: Schoenfeld assumes constant log-HR
Symptom: Trial under-powered; observed events insufficient
Fix: Lakatos 1988 or simulation under expected HR(t); cite Lin 2020 NPH Working Group

Continuity correction debate

Trigger: Small-sample binary design
Mechanism: Uncorrected normal approx over-states power; corrected wastes ~10%
Symptom: Trial reach significance at lower n than expected
Fix: Use simulation-based SS for n<100; pre-specify correction approach in SAP

NI margin too liberal

Trigger: M2 set to MCID rather than 50% of historical M1
Mechanism: Allows clinically unacceptable inferiority within "non-inferior" CI
Symptom: Regulators flag margin as inappropriate at scientific advice
Fix: M2 <= 0.5 × MCID; M2 <= 0.5 × historical M1 lower bound (FDA double discount)

Constancy assumption violated

Trigger: NI trial with new standard of care differing from historical
Mechanism: M1 derived from historical data no longer reflects current control efficacy
Symptom: Active control performs worse than expected; "successful" NI may mask true inferiority
Fix: Synthesis approach with formal sensitivity to constancy; consider three-arm trial with small placebo

MCID confused with δ

Trigger: Sponsor sets δ = MCID in power calculation
Mechanism: Trial powered to detect smallest meaningful effect, with no margin for noise/dropout
Symptom: Effect estimate near MCID with wide CI straddling zero
Fix: δ >= 1.5 × MCID; document rationale in SAP

Promising-zone "stealth alpha inflation"

Trigger: Mehta-Pocock SSR applied without sufficient simulation
Mechanism: a naive unblinded increase analysed with the conventional (unweighted) statistic inflates Type-I error (Cui-Hung-Wang 1999); the pre-specified weighted CHW statistic preserves alpha
Symptom: Type-I inflation when the CHW weighting is dropped; separately, Jennison-Turnbull 2015 show the promising zone is inefficient -- the largest power gains per added patient lie outside it
Fix: Pre-specify increase rule transparently; report simulation operating characteristics

Unblinded SSR -- DMC firewall failure

Trigger: Interim effect estimate leaked beyond IDMC
Mechanism: Sponsor inference from sample-size increase decision reveals direction of interim effect
Symptom: Regulator audit reveals unblinding
Fix: Strict firewall; only "increase / no increase" communicated to sponsor; document SOP

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback

References

Cui L, Hung HMJ, Wang SJ. 1999. Modification of sample size in group sequential clinical trials. Biometrics 55:853-857.
D'Agostino RB, Chase W, Belanger A. 1988. The appropriateness of some common procedures for testing the equality of two independent binomial populations. Am Stat 42:198-202.
FDA. 2016. Non-Inferiority Clinical Trials to Establish Effectiveness. Final Guidance.
FDA. 2022. Multiple Endpoints in Clinical Trials. Final Guidance.
Fleiss JL. 1981. Statistical Methods for Rates and Proportions (2nd ed). Wiley.
Friede T, Kieser M. 2006. Sample size recalculation in internal pilot study designs. Biom J 48:537-555.
Hung HMJ, Wang SJ, O'Neill RT. 2005. A regulatory perspective on choice of margin and statistical inference issue in non-inferiority trials. Biom J 47:28-36.
Jaeschke R, Singer J, Guyatt GH. 1989. Measurement of health status: ascertaining the minimal clinically important difference. Control Clin Trials 10:407-415.
Lakatos E. 1988. Sample sizes based on the log-rank statistic in complex clinical trials. Biometrics 44:229-241.
Lin RS, Lin J, Roychoudhury S, Anderson KM, Hu T, Huang B, Leon LF, Liao JJZ, Liu R, Luo X, Mukhopadhyay P, Qin R, Tatsuoka K, Wang X, Wang Y, Zhu J, Chen TT, Iacona R. 2020. Alternative analysis methods for time to event endpoints under nonproportional hazards: a comparative analysis. Stat Biopharm Res.
Mehta CR, Pocock SJ. 2011. Adaptive increase in sample size when interim results are promising. Stat Med 30:3267-3284.
Norman GR, Sloan JA, Wyrwich KW. 2003. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care 41:582-592.
Schoenfeld DA. 1981. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 68:316-319.
Schuirmann DJ. 1987. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm 15:657-680.
Senn S. 2013. Seven myths of randomisation in clinical trials. Stat Med 32:1439-1450.
Snapinn SM. 2000. Noninferiority trials. Curr Control Trials Cardiovasc Med 1:19-21.
Temple R, Ellenberg SS. 2000. Placebo-controlled trials and active-control trials in the evaluation of new treatments, part 1: ethical and scientific issues. Ann Intern Med 133:455-463.
Ellenberg SS, Temple R. 2000. Placebo-controlled trials and active-control trials in the evaluation of new treatments, part 2: practical issues and specific cases. Ann Intern Med 133:464-470.
Wyrwich KW, Tierney WM, Wolinsky FD. 1999. Further evidence supporting an SEM-based criterion for the identification of meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol 52:861-873.

Related Skills

clinical-biostatistics/survival-analysis - TTE-specific sample size (Schoenfeld, Lakatos, simulation)
clinical-biostatistics/effect-measures - δ on OR/RR/RD scales
clinical-biostatistics/categorical-tests - Binary endpoint test selection
clinical-biostatistics/multiplicity-graphical - Power adjustment for co-primary endpoints
clinical-biostatistics/adaptive-designs - SSR, promising zone, group-sequential
clinical-biostatistics/bayesian-trials - Bayesian SS via predictive probability of success
clinical-biostatistics/trial-reporting - CONSORT 2025 item 16a (sample size justification)
experimental-design/sample-size - General sample-size methods
experimental-design/power-analysis - General power methods

Adoption

GPTomics/bio-clinical-biostatistics-power-sample-size

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Power and Sample Size for Clinical Trials

The Foundational Distinction -- δ vs MCID

Algorithmic Taxonomy

Decision Tree by Scenario

Continuous Outcomes -- Two-Sample t-Test

Binary Outcomes -- The Continuity Correction Debate

Survival (Time-to-Event)

Schoenfeld 1981 -- the canonical formula

Lakatos 1988 -- complex survival

Under non-PH (immuno-oncology with delayed effect)

Non-Inferiority Designs -- The FDA 2016 Framework

Equivalence Designs and Bioequivalence

Crossover Designs -- Bioequivalence and Repeated-Measures

Bioequivalence (FDA 1992 / EMA 2010 framework)

Carryover assessment (Grizzle 1965)

Period effect

Decision tree -- when crossover is appropriate

MCID -- Anchor-Based vs Distribution-Based

Multiple Endpoints -- See multiplicity-graphical

Reconciliation: When Methods Disagree

Per-Method Failure Modes

Schoenfeld under non-PH

Continuity correction debate

NI margin too liberal

Constancy assumption violated

MCID confused with δ

Promising-zone "stealth alpha inflation"

Unblinded SSR -- DMC firewall failure

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-clinical-biostatistics-power-sample-size

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Power and Sample Size for Clinical Trials

The Foundational Distinction -- δ vs MCID

Algorithmic Taxonomy

Decision Tree by Scenario

Continuous Outcomes -- Two-Sample t-Test

Binary Outcomes -- The Continuity Correction Debate

Survival (Time-to-Event)

Schoenfeld 1981 -- the canonical formula

Lakatos 1988 -- complex survival

Under non-PH (immuno-oncology with delayed effect)

Non-Inferiority Designs -- The FDA 2016 Framework

Equivalence Designs and Bioequivalence

Crossover Designs -- Bioequivalence and Repeated-Measures

Bioequivalence (FDA 1992 / EMA 2010 framework)

Carryover assessment (Grizzle 1965)

Period effect

Decision tree -- when crossover is appropriate

MCID -- Anchor-Based vs Distribution-Based

Multiple Endpoints -- See multiplicity-graphical

Reconciliation: When Methods Disagree

Per-Method Failure Modes

Schoenfeld under non-PH

Continuity correction debate

NI margin too liberal

Constancy assumption violated

MCID confused with δ

Promising-zone "stealth alpha inflation"

Unblinded SSR -- DMC firewall failure

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback