Version Compatibility

Reference examples tested with: lifelines 0.27+, scikit-survival 0.21+, statsmodels 0.14+, pandas 2.1+, numpy 1.26+. R packages cited (still the SOTA for survival): survival 3.8+, survRM2, cmprsk, riskRegression, mstate, flexsurv, icenReg, rpsftm.

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Time-to-Event Analysis for Clinical Trials

"Analyze time-to-event endpoint" -> Estimate a hazard, survival probability, cumulative incidence, or restricted mean time using a method calibrated to (a) whether proportional hazards holds, (b) whether competing events exist, (c) whether censoring is informative, and (d) which estimand the trial targets under ICH E9(R1).

The Single Most Important Modern Insight -- PH Almost Never Holds

In modern oncology with checkpoint inhibitors, targeted therapies, crossover, and depleted high-risk subjects over follow-up, proportional hazards (PH) violations are the rule, not the exception. The Cross-Pharma NPH Working Group (Lin et al 2020 Stat Biopharm Res; Magirr-Burman 2021 Stat Biopharm Res 15(2):295) documented systematic PH violations across phase III oncology trials, particularly delayed-effect patterns from checkpoint inhibitors.

The Cox HR is a time-averaged log-hazard ratio under PH violation (Xu-O'Quigley 2000), which may or may not be the estimand of interest. RMST (Royston-Parmar 2013) provides a clinically interpretable, hazard-free alternative.

Algorithmic Taxonomy

| Method | Estimand | Inference | Strength | Fails when | |--------|----------|-----------|----------|------------| | Log-rank (unstratified) | Test of S_A(t) = S_B(t) all t | Permutation / asymptotic chi-square | Standard; preserves Type-I under PH | Underpowered under non-PH; treats all events equally | | Stratified log-rank | Same null within strata, pooled | Asymptotic | Preserves stratification factor from randomisation | Stratification factor must be pre-specified | | Weighted log-rank G(rho, gamma) | Direction-specific test under non-PH | Asymptotic | High power for delayed/early/middle effects | Weight choice must match true effect time profile; chasing weight = p-hacking | | Cox PH | Conditional log-HR | Wald, LR, score | Standard; semi-parametric; covariate adjustment | PH violation makes HR a misleading summary; check via cox.zph | | Stratified Cox | Same; baseline hazards differ by stratum | Wald | Handles non-PH by stratification | Loses inference on stratification variable; cannot interact treatment with strata | | Time-varying Cox (tt()) | Time-dependent log-HR | Wald | Quantifies non-PH explicitly | Interpretability — no single "the HR"; choose g(t) carefully | | Flexible parametric (Royston-Parmar) | Time-varying log-HR via splines | Wald | Smooth S(t), HR(t); supports extrapolation | Spline choice affects results; software in R stpm2/stpm3 | | RMST | Difference in mean survival truncated at tau | Wald with delta or pseudo-obs regression | Hazard-free; clinically interpretable in time units | tau choice; min follow-up across arms constrains tau | | MaxCombo | Maximum over weighted log-rank family | Asymptotic multivariate normal | Robust to range of NPH patterns | Can reject in opposite directions on same data (Magirr 2022 critique) | | Fine-Gray subdistribution HR | Conditional subdistribution HR | Wald | Direct CIF modeling | Andersen-Keiding 2012 critique: violates causal hazard semantics | | Cause-specific Cox | Conditional cause-specific HR | Wald | Causally interpretable | Predicts hazards, not CIFs; need both for CIF prediction | | Multi-state Cox (mstate) | Transition-specific HRs | Wald | Subsumes competing risks; handles relapse/remission | More complex; more parameters to estimate | | Andersen-Gill (recurrent) | Rate ratio | Robust (cluster) Wald | Most efficient under exchangeability | Assumes exchangeable events | | PWP (recurrent) | Conditional event-order HR | Stratified Wald | Handles event-order qualitative heterogeneity | More strata = more parameters; smaller per-stratum n | | Interval-censored Cox (NPMLE) | Cumulative hazard | Likelihood ratio | Correct for periodic-assessment data | Slower; software in R icenReg |

Postdoc reading list:

Royston-Parmar 2013 BMC Med Res Methodol 13:152 (RMST as primary)
Uno et al 2014 JCO 32:2380 (RMST in oncology)
Andersen-Keiding 2012 Stat Med 31:1074 (Fine-Gray semantic critique)
Putter-Schumacher-van Houwelingen 2020 Biom J 62:790 (Fine-Gray revisited; reduction factor)
Putter-Fiocco-Geskus 2007 Stat Med 26:2389 (competing risks tutorial)
Magirr-Burman 2021 Stat Biopharm Res 15(2):295 (MaxCombo critique)
Grambsch-Therneau 1994 Biometrika 81:515 (scaled Schoenfeld residuals)
Buyse-Molenberghs 1998 Biometrics 54:1014 (PFS-OS surrogacy framework)
Sun et al 2021 Pharm Stat 20:793 (estimands in oncology; ICH E9(R1) ICE strategies)
Sun 2006 The Statistical Analysis of Interval-Censored Failure Time Data (Springer)

Decision Tree by Scenario

| Scenario | Recommended approach | Why | |----------|---------------------|-----| | OS, drug expected to extend survival uniformly, PH plausible | Stratified log-rank + Cox HR with cox.zph diagnostic | Standard regulatory approach; pre-specify stratification factors from randomisation | | PFS in immuno-oncology with expected delayed separation | MaxCombo with pre-specified G(0,0), G(0,1), G(1,0), G(1,1); RMST as sensitivity | Robust to NPH; explicit direction check (Magirr-Burman 2021) | | OS with crossover to active arm | Treatment policy estimand (ITT) primary; RPSFT/IPCW as sensitivity for hypothetical | ICH E9(R1); FDA/EMA accept both, ITT is primary | | Time-to-event with assessment-schedule artifact (periodic scans) | Interval-censored Cox (R icenReg::ic_par) | Standard right-censoring at midpoint is biased | | DOR (duration of response) | KM among responders; censor at next-therapy/death/dropout per pre-specified estimand | Weber 2023 Pharm Stat; responder-conditioned = doubly post-randomisation | | Competing risk: drug efficacy on event A in context of high event-B mortality | Cause-specific Cox for A AND for B; report both; CIF via Aalen-Johansen | Andersen-Keiding 2012; Fine-Gray SHR is not causal | | Prediction of CIF for clinical decision | Fine-Gray for CIF estimate; cause-specific Cox for etiology | Putter-Fiocco-Geskus 2007 split | | Multi-state model (alive -> relapse -> death) | mstate framework with transition-specific Cox models | Subsumes competing risks | | Recurrent events (exacerbations, hospitalisations) | Andersen-Gill with robust SE; cite event-order assumption | Most efficient; falls back to PWP if order matters | | Single-arm OS extrapolation for HTA | Flexible parametric (Royston-Parmar) + external information | Smooth tail; supports extrapolation beyond trial follow-up | | Non-PH with crossing hazards | RMST with pre-specified tau; OR multi-state model | HR loses meaning under crossing |

Cox PH Diagnostics -- The Therneau-Grambsch Test (and Its Pitfalls)

Goal: Detect violations of the proportional hazards assumption that would invalidate the Cox HR as a meaningful summary statistic.

Approach: Compute scaled Schoenfeld residuals (Grambsch-Therneau 1994); regress against a time-transform g(t) under H0 of zero slope; supplement the asymptotic p-value with a graphical residual plot since the test is sensitive to g(t) choice and sample-size dependent.

from lifelines import CoxPHFitter
from lifelines.statistics import proportional_hazard_test

cph = CoxPHFitter()
cph.fit(df, duration_col='time', event_col='event', formula='treatment + age + baseline_score')
cph.print_summary()

# Schoenfeld residuals PH test (lifelines)
results = proportional_hazard_test(cph, df, time_transform='rank')
print(results.summary)

# In R: cox.zph(coxph_fit) returns same with KM transform as default

The g(t) choice trap (Park-Hendry 2015 American Journal of Political Science): the test power depends critically on g(t). KM transform, identity, log, and rank give materially different p-values. With n > 5000 even trivial deviations reject; with n < 100 the test misses meaningful violations.

Critical interpretation rule: a global p > 0.05 does NOT mean PH holds — it means the null cannot be rejected. The graphical diagnostic is more informative — a flat smoothed line is the target. Use cox.zph as a failure detector, not a PH validator.

# Plot scaled Schoenfeld residuals (lifelines)
cph.check_assumptions(df, p_value_threshold=0.05, show_plots=True)

Fixes when PH violated

Stratify on the violating covariate (loses inference on it)
Time-dependent coefficients: coxph(Surv(t, d) ~ x + tt(x), tt=function(x,t,...) x*log(t)) in R; lifelines: formula='treatment * time'
Royston-Parmar flexible parametric (R flexsurv::flexsurvspline or rstpm2::stpm2)
RMST as primary; HR as secondary

Restricted Mean Survival Time (RMST) -- The Modern Alternative

Royston-Parmar 2013 + Uno 2014 case: RMST(tau) = E[min(T, tau)] = integral from 0 to tau of S(t) dt = area under KM curve up to tau. The difference RMST_A(tau) - RMST_B(tau) is a time gained in months -- clinically interpretable, requires no PH assumption, always estimable up to the minimum of the largest follow-up across arms.

# Python: rmst via lifelines or manual
from lifelines.utils import restricted_mean_survival_time
from lifelines import KaplanMeierFitter

kmf_A = KaplanMeierFitter().fit(df[df['arm']=='A']['time'], event_observed=df[df['arm']=='A']['event'])
kmf_B = KaplanMeierFitter().fit(df[df['arm']=='B']['time'], event_observed=df[df['arm']=='B']['event'])
tau = 36  # months; pre-specified
rmst_A = restricted_mean_survival_time(kmf_A, t=tau)
rmst_B = restricted_mean_survival_time(kmf_B, t=tau)
print(f'RMST diff: {rmst_A - rmst_B:.2f} months')

# R: survRM2::rmst2(time, status, arm, tau=36) is the standard

The tau (truncation time) choice

Statistical constraint: tau <= min(largest follow-up time in each arm) to avoid extrapolation (Tian et al 2020 Biometrics gives rigorous treatment)
Clinical constraint: tau should reflect a clinically meaningful horizon (5-year OS in adjuvant; 24-month PFS in metastatic)
Data-dependent tau inflates Type-I error — MUST be pre-specified in SAP. Post-hoc tau tuning to chase significance is p-hacking.

Pseudo-observation regression (Andersen-Hansen-Klein 2004)

For each subject i, compute jackknife pseudo-value θ_i(tau) = n·RMST(tau) - (n-1)·RMST_{-i}(tau). Regress pseudo-values on covariates via GEE — enables RMST regression WITH covariate adjustment, including time-varying covariates, no PH assumption.

R implementation: pseudo::pseudomean() + GEE via geepack::geeglm().

Lambert critique (postdoc reading): Paul Lambert (stpm3 author) is on record that RMST is "not unambiguously simpler than HR for clinicians." HR has 50-year pedagogy lead. RMST is cumulative (not instantaneous) and cannot detect crossing hazards beyond chosen tau. The argument for RMST is hazard-free interpretability under non-PH; the argument against is communication and tau sensitivity.

Competing Risks -- The Andersen-Keiding Framework

Fine-Gray subdistribution hazard

# Python: scikit-survival or lifelines (limited support); R is the SOTA
# R: library(cmprsk); crr(time, fstatus, cov, failcode=1, cencode=0)
# R: library(riskRegression); FGR(formula, data=df, cause=1)

Fine-Gray 1999 introduced subdistribution hazard lambda^FG(t) = -d/dt log[1 - F_1(t)] where F_1 is CIF for cause 1. The proportional subdistribution hazards model places covariates on this hazard. Subjects who experience competing events remain in the risk set with IPCW weighting — mathematically convenient but conceptually weird.

The Andersen-Keiding 2012 critique (Stat Med 31:1074): Fine-Gray subdistribution hazard violates the three principles for valid hazard functionals:

Hazard must be a real instantaneous risk among those truly at risk -> FG keeps dead competing-event subjects "at risk"
Covariate effects must be interpretable as causal -> FG coefficients confound the competing-event hazard
Hazard must support landmarking (conditioning on survival to s) -> FG does not

The competing-risk confounding trap (Putter-Schumacher-van Houwelingen 2020 Biom J 62:790; the reduction factor decomposition makes it explicit): any covariate that increases the cause-specific hazard of competing event A will decrease the subdistribution hazard for event B simply because A removes subjects from the population at risk of B. A Fine-Gray "protective" effect on B may be an iatrogenic killer via A.

Practical rule (Putter-Fiocco-Geskus 2007)

For prediction of CIF (cumulative incidence) -> Fine-Gray is fine; report CIF curves and SHR
For etiology / causal effect on the event -> use cause-specific Cox (treat competing events as censoring); report cause-specific HRs for BOTH event of interest AND competing events
For multi-state semantics (alive -> relapse -> death) -> use multi-state framework with transition-specific Cox models

The CIF is always estimable

Aalen-Johansen estimator (multi-state generalisation of KM) gives the CIF non-parametrically. The KM estimator is biased upward in the presence of competing risks — it treats competing events as non-informative censoring and overestimates 1 - CIF.

# R: library(mstate); msfit(coxph_fit, newdata, trans); probtrans(msfit_obj)
# R: library(survival); survfit(Surv(time, event_factor) ~ 1) with multi-state Surv

Log-Rank Variants and MaxCombo

from lifelines.statistics import logrank_test, multivariate_logrank_test

# Stratified log-rank (lifelines uses `weightings=` -- note plural)
results = multivariate_logrank_test(
    df['time'], df['arm'], df['event'], weightings='peto'
)
# weightings options (lifelines 0.27+): None, 'wilcoxon', 'tarone-ware',
# 'peto', 'fleming-harrington' (also accepts kwargs for Fleming-Harrington p, q)

| Test | Citation | When to use | |------|----------|-------------| | Standard log-rank (G(0,0)) | Mantel 1966 | PH holds | | Wilcoxon | Wilcoxon 1945 | Down-weights late events; early-effect detection | | Tarone-Ware | Tarone-Ware 1977 | Compromise | | Peto-Peto | Peto-Peto 1972, Prentice 1978 | Robust to ties and censoring distribution differences between arms | | Fleming-Harrington G(rho, gamma) | Fleming-Harrington 1981 | Direction-specific: G(0,1) for late-emphasis (delayed effects), G(1,0) for early | | MaxCombo | Karrison 2016 | Take maximum over family; control multiplicity via joint MVN |

The MaxCombo controversy

Magirr-Burman 2021 Stat Biopharm Res 15(2):295: MaxCombo can reject the null in opposite directions on the same dataset. Formally, it rejects the strong null H_0: S_A(t) = S_B(t) for all t, but the rejection direction is determined by the dominant weight, which can flip across portions of the curve. KEYNOTE-042 demonstration: MaxCombo simultaneously favouring pembrolizumab AND chemo depending on weight choice.

Cross-Pharma NPH Working Group recommendation: MaxCombo with directionality constraints — require positive z-statistic at the late-emphasis weight before declaring superiority; report the dominant weight and its direction.

Interval Censoring -- When Standard Right-Censoring Is Wrong

PFS is REALLY interval-censored — events known only between consecutive RECIST scans. The convention is to treat it as right-censored at midpoint or first-PD-scan date. This is a long-standing methodological compromise defensible only when scan intervals are short and balanced across arms.

# R: library(icenReg)
# fit_ic <- ic_par(cbind(left, right) ~ treatment + age, data=df, dist='weibull', model='ph')
# ic_sp() for semi-parametric; ic_npar() for NPMLE

Sun 2006 The Statistical Analysis of Interval-Censored Failure Time Data (Springer) is the canonical reference. NPMLE via Turnbull's algorithm; Cox-like regression via Finkelstein 1986 / Pan 1999 EM algorithms.

When to switch from right-censored midpoint to interval-censored: scan intervals > 4 weeks; scan timing differs between arms (open-label trials with potential differential ascertainment); regulatory submission where 2-3% effect-size shift matters.

Recurrent Events -- AG vs PWP vs WLW

| Model | Risk set | Baseline hazard | Best when | |-------|----------|-----------------|-----------| | Andersen-Gill 1982 | Total time; subject at risk continuously between events | Common (counting process) | Events exchangeable; rate model; no event-order effect | | PWP gap-time (Prentice-Williams-Peterson 1981) | Stratified by event number; subject enters stratum k after event k-1 | Stratum-specific | Event-order matters; later events qualitatively different | | WLW (Wei-Lin-Weissfeld 1989) | Marginal -- separate Cox per event order | Per-event-order | Multiple types of events; uses sandwich variance |

# Python: lifelines does not have native AG; use coxph with (start, stop, event) format
# R: coxph(Surv(start, stop, event) ~ trt + cluster(id), data=long_df) for AG
# R: + strata(enum) for PWP

Box-Steffensmeier critique: WLW is often misused — fitting separate Cox to "time to 2nd event" treats it as a first-event problem rather than conditional on prior history, inflating effect estimates. PWP-gap-time is the cleanest conditional model. AG most efficient when exchangeability holds.

Modern advice (Rogers et al 2014; Cook-Lawless 2007 book): AG with robust variance as default; PWP if events qualitatively heterogeneous; avoid WLW unless events are truly distinct types.

Oncology PFS/OS Estimands -- The 2024 Reality

Per ICH E9(R1), the same PFS dataset yields different HRs depending on censoring rules. Each rule corresponds to a different intercurrent-event strategy:

| Censoring rule | ICE strategy | Estimand | |----------------|-------------|----------| | Censor at last assessment before missing visits | Hypothetical (had visit not been missed) | What would PFS be if visits never missed? | | Censor at start of new anticancer therapy | Hypothetical (no subsequent therapy) | What would PFS be without rescue? | | Count subsequent therapy as event | Composite (subsequent therapy = treatment failure) | Treatment-failure-free survival | | No censoring (event = last assessment + progression) | Treatment policy | Real-world PFS with policy of allowing rescue |

The 2024 European Journal of Cancer demonstration (PMID 38547775): two sets of censoring rules — FDA-favoured vs trialist-favoured — applied to the same PFS data shifted median PFS from 32 to 43 months in the experimental arm with no change in control. This is the estimand changing, not analytic artefact.

Fleming 2025 argument: handle subsequent therapy by treatment-policy -- continue follow-up rather than censor at the switch -- keeping progression + death as the composite endpoint to preserve ITT. Controversial because not censoring at subsequent therapy blurs interpretation as "tumor growth control."

Informative censoring -- detection and handling

The mechanism: standard right-censored Cox / KM assumes censoring is non-informative (the censoring process is unrelated to the underlying failure time after conditioning on observed covariates). When this fails, KM is biased upward in the arm with informative censoring -- patients who drop out due to lack of efficacy or toxicity are precisely the ones who would have failed early.

TROPiCS-02 informative censoring (Li et al 2023 JCO 41:1629): "evaporative cooling" of progression events. Patients on the toxic arm discontinue and are censored BEFORE the next protocol-mandated scan that would have captured progression; KM biases PFS upward in the toxic arm. Templeton 2020 Nat Rev Clin Oncol and Campigotto-Weller 2014 JCO are foundational citations.

Detection workflow:

Tabulate censoring reasons by arm from CDISC DS (Disposition) or ADaM ADTTE CNSR integer values:
- CNSR=1 (lost to follow-up): symmetric across arms = OK
- CNSR=2 (withdrew consent): if asymmetric, investigate why
- CNSR=3 (admin EoS): symmetric by definition
- CNSR=4 (subsequent therapy initiated): differential is the canonical informative pattern
- CNSR=5+ (toxicity discontinuation): differential is highly informative
Compare KM curves for censoring distribution by arm. If "time-to-censoring" KM differs by arm in same direction as outcome, suspect informative censoring.
Cox-Snell or Schoenfeld residuals on censoring-as-event model: if treatment is significantly associated with censoring hazard, censoring is informative.

Handling strategies (per ICH E9(R1) and Sun 2021):

| Strategy | When to use | Implementation | |----------|-------------|----------------| | Composite endpoint (treatment-policy) | ICE has clinical meaning (subsequent therapy = treatment failure) | Re-define event as "first of (progression OR subsequent therapy OR death)"; eliminates need for censoring assumption | | IPCW with stabilised weights (Robins 1992) | Hypothetical estimand under "no informative dropout" | R ipw::ipwtm or custom: weight = inverse probability of remaining uncensored given baseline + time-varying covariates | | Sensitivity: worst-case / best-case imputation | Bounding the true effect under informative censoring | Censored patients = events at censoring date (worst case); or remain at risk (best case); report range | | Multistate model | Multiple competing causes of censoring | mstate for transition-specific Cox; handles death + dropout + subsequent therapy as separate states | | RPSFT / structural nested failure time | Crossover-induced informative censoring | R rpsftm package; FDA / EMA acceptable for OS hypothetical estimand under crossover |

IPCW workflow (stabilised weights per Robins-Finkelstein 2000):

# Stabilised IPCW: w(t) = S_num(t) / S_denom(t)
# where S_denom predicts remaining-uncensored given baseline + time-varying covariates,
# and S_num predicts remaining-uncensored given treatment alone (numerator for stabilisation)

# Step 1: fit censoring (NOT event) hazard models
# Note: outcome variable for these models is "1 if censored, 0 if event or still at risk"
denom_cens <- coxph(Surv(time, censored) ~ treatment + age + baseline_severity, data=df)
num_cens   <- coxph(Surv(time, censored) ~ treatment, data=df)

# Step 2: compute survival probabilities (NOT hazards) for "remaining uncensored"
# basehaz + linear predictor -> S(t) per subject; the `survfit` + `summary` API gives S_i(t_i)
S_denom <- summary(survfit(denom_cens, newdata=df), times=df$time)$surv
S_num   <- summary(survfit(num_cens,   newdata=df), times=df$time)$surv

# Step 3: stabilised IPCW weight per subject at their observed time
df$ipcw <- S_num / S_denom

# Step 4: weighted Cox for the event of interest with robust SE
fit_ipcw <- coxph(Surv(time, event) ~ treatment + age + baseline_severity,
                  data=df, weights=df$ipcw, robust=TRUE)

Note: in practice use the ipw::ipwtm (or ipwExt) wrapper which handles the time-varying weights and stabilisation correctly; the pseudocode above shows the underlying machinery. Verify weight distribution: median weight near 1, no extreme values (>10 indicates near-violation of positivity).

Operational rule: for any TTE primary analysis where DS shows differential censoring reasons by arm, the regulatory expectation is (1) tabulate reasons in CSR, (2) report sensitivity under IPCW or composite, (3) discuss whether primary HR/RMST changes under sensitivity.

ADaM ADTTE -- The CNSR Convention Trap

CDISC ADTTE convention: CNSR = 0 for events, positive integers for censoring reasons (1 = lost to follow-up, 2 = withdrew consent, 3 = admin EoS, 4 = subsequent therapy, etc.). Opposite to most stat packages which use 1 = event.

# Convert ADTTE for Python / R
import pandas as pd

adtte = pd.read_csv('ADTTE.csv')
adtte['event'] = (adtte['CNSR'] == 0).astype(int)  # 1 = event for survival packages
adtte['time'] = adtte['AVAL']  # AVAL is in days/months per AVALU

This is a perpetual bug source. See clinical-biostatistics/cdisc-data-handling for the full ADTTE specification.

Per-Method Failure Modes

Cox PH violation undetected

Trigger: Significant treatment HR reported without cox.zph diagnostic
Mechanism: Cox HR is a time-averaged log-HR under PH violation; the "the HR" interpretation breaks
Symptom: Hazard plots show crossing; cox.zph rejects PH; KM curves diverge then converge
Fix: Report cox.zph result; switch to RMST or time-varying Cox; cite Grambsch-Therneau 1994

Fine-Gray reported as causal effect

Trigger: Fine-Gray SHR interpreted as "treatment effect on event of interest"
Mechanism: SHR confounds with competing-event hazard (Andersen-Keiding 2012)
Symptom: SHR interpretation contradicts cause-specific Cox results; reviewer confusion
Fix: Use cause-specific Cox for etiology; Fine-Gray for CIF prediction only; report both per Putter-Fiocco-Geskus 2007

KM curve biased upward under competing risks

Trigger: KM applied to event of interest treating competing events as non-informative censoring
Mechanism: KM estimates 1 - cause-specific hazard cumulative; competing events deplete denominator
Symptom: KM survival exceeds Aalen-Johansen 1 - CIF estimate
Fix: Use Aalen-Johansen estimator for CIF; never KM in competing-risk setting

MaxCombo direction flipping

Trigger: MaxCombo significant; direction not pre-specified
Mechanism: MaxCombo's family can include weights that favour opposite directions
Symptom: Different reports show opposite "winner" depending on weight choice
Fix: Pre-specify directional constraints; require positive z at late-emphasis weight before claiming superiority; cite Magirr-Burman 2021

Tau chosen post-hoc to favour significance

Trigger: RMST tau adjusted after seeing data
Mechanism: Selection bias; tau-tuning is a flavour of p-hacking
Symptom: Sponsor's RMST result differs from independently re-analysed with pre-specified tau
Fix: Pre-specify tau in SAP; cite Tian 2020 Biometrics

Right-censored midpoint for interval-censored PFS

Trigger: Scan intervals long or differential between arms; standard right-censored Cox applied
Mechanism: Midpoint imputation is biased; SE underestimates
Symptom: Replication with interval-censored Cox gives different point estimate and wider CI
Fix: R icenReg::ic_par or ic_sp; Sun 2006

CNSR convention confusion

Trigger: ADTTE CNSR=1 passed to R survival::Surv(time, event) expecting event=1
Mechanism: Censoring/event role reversed
Symptom: "Event count" matches censoring count from CSR; nonsensical HR
Fix: Always convert: event = (CNSR == 0).astype(int); cite ADaM IG v1.3

Reconciliation: When Methods Disagree

| Pattern | Likely cause | Action | |---------|--------------|--------| | Cox HR significant, RMST difference non-significant | HR is time-averaged log-HR under PH violation; RMST captures cumulative effect | If PH violated, RMST is the more interpretable summary; HR may be artifact of late-event preponderance | | MaxCombo rejects null but direction depends on weight | MaxCombo's rejection direction is determined by dominant weight; can flip across weight choices (Magirr-Burman 2021) | Pre-specify directional constraint (positive z at late-emphasis weight) before declaring superiority; do NOT post-hoc select winning weight | | Fine-Gray SHR vs cause-specific Cox HR conflict on direction | Fine-Gray confounds with competing-event hazard; cause-specific isolates causal effect on event of interest (Andersen-Keiding 2012) | For causal claims use cause-specific Cox; Fine-Gray for CIF prediction only; report both per Putter-Fiocco-Geskus 2007 | | Stratified log-rank p < unstratified p | Stratification removes between-stratum variance; correct standard error smaller | Stratified analysis matches randomisation; unstratified is over-conservative (SE biased upward, power loss) per Kahan-Morris 2012 | | KM survival curve vs Aalen-Johansen 1-CIF disagree in competing-risks setting | KM treats competing events as non-informative censoring (biased upward) | Use Aalen-Johansen for CIF; never KM in competing-risk setting | | Schoenfeld-formula SS insufficient at trial end (events under-collected) | Non-PH (immuno-oncology delayed effect) violates Schoenfeld assumption; under-estimates events 20-50% | Re-power using Lakatos 1988 or simulation under expected HR(t); cite Lin 2020 NPH Working Group | | Right-censored midpoint Cox HR vs interval-censored Cox HR differ | Midpoint imputation biased when scan intervals long or asymmetric | Switch to interval-censored Cox via icenReg when scan intervals > 4 weeks or differ by arm | | ADTTE CNSR=1 produces nonsensical results | CDISC convention reversal: CNSR=0 means event in ADaM | Always convert: event = (CNSR == 0).astype(int) before stat-package call |

Quantitative Thresholds

| Threshold | Source | Rationale | |-----------|--------|-----------| | cox.zph p > 0.05 is NOT proof of PH | Grambsch-Therneau 1994; Park-Hendry 2015 | Failure detector, not validator | | RMST tau <= min(largest follow-up per arm) | Tian 2020 Biometrics | Avoid extrapolation; tau pre-specified in SAP | | MaxCombo with directional constraint | Magirr-Burman 2021 Stat Biopharm Res 15(2):295 | Prevents opposite-direction rejections | | Cause-specific Cox for etiology; FG for CIF prediction | Putter-Fiocco-Geskus 2007 | Andersen-Keiding 2012 critique | | Scan intervals > 4 weeks -> interval-censored analysis | Sun 2006 | Midpoint right-censoring is biased | | 10 events per covariate for Cox | Peduzzi 1995 J Clin Epidemiol | Below this, bias and overfitting | | Schoenfeld 1981 events formula assumes PH | Schoenfeld 1981 | Under non-PH, under-estimates required events by 20-50%; cite Lakatos 1988 for non-PH SS |

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Cox HR reported with cox.zph p < 0.001 ignored | Diagnostic skipped | Report RMST or time-varying Cox; cite Therneau-Grambsch | | KM curve labeled "survival from event of interest" with competing risks | Bias upward | Aalen-Johansen CIF; never KM with competing risks | | Fine-Gray HR reported as "treatment effect" | Misinterpretation of SHR | Cite Andersen-Keiding; report cause-specific too | | MaxCombo with no direction restriction | Direction-flipping risk | Pre-specify constraints (Magirr-Burman 2021) | | ADTTE CNSR=1 used as "1=event" | Convention reversal | event = (CNSR == 0).astype(int) | | PFS midpoint right-censored without interval-censored sensitivity | Scan-schedule bias | Interval-censored analysis when scan intervals long | | Stratified randomisation factor not in Cox | Over-conservative (SE biased upward, Type-I below nominal, power loss) | Include strata via strata() or as covariate | | Schoenfeld SS calculation in immuno-oncology | PH assumption violated | Simulate under expected hazard pattern (Lakatos 1988); use MaxCombo SS | | tau set to longest follow-up post-hoc | RMST p-hacking | Pre-specify tau in SAP; cite Tian 2020 | | Recurrent-event WLW with naive "time to 2nd event" | Inflated effect | AG with robust SE, or PWP gap-time |

Anticipated Reviewer Pushback

| Pushback | Response | |----------|----------| | "PH assumption check?" | cox.zph result reported with caveat that p>0.05 doesn't prove PH; graphical diagnostic in supplement | | "Why RMST not HR?" | PH violated; HR is misleading time-average; RMST is hazard-free in interpretable time units (Royston-Parmar 2013) | | "How was tau chosen?" | Pre-specified in SAP based on clinical horizon AND statistical constraint min(largest follow-up); cite Tian 2020 | | "Fine-Gray or cause-specific Cox?" | Both reported per Putter 2007: FG for CIF prediction, cause-specific Cox for etiology; cite Andersen-Keiding 2012 | | "MaxCombo direction validation?" | Pre-specified directional constraint; positive z at late-emphasis weight required; cite Magirr-Burman 2021 | | "Crossover handling?" | ITT primary (treatment policy); RPSFT or IPCW as sensitivity (hypothetical); cite Robins-Tsiatis | | "Informative censoring?" | Censoring reasons tabulated by arm; symmetric -> no concern; differential -> sensitivity under composite or worst-case | | "ADTTE CNSR convention?" | Explicit conversion documented: ADaM CNSR=0 means event; converted to event=1 for downstream R/Python | | "Schoenfeld SS under expected NPH?" | Simulation-based SS using expected hazard pattern (Lakatos 1988); NOT Schoenfeld formula | | "Multi-state vs Fine-Gray?" | Multi-state when alive->relapse->death is the disease model; Fine-Gray only for CIF prediction of one cause |

References

Andersen PK, Keiding N. 2012. Interpretability and importance of functionals in competing risks and multistate models. Stat Med 31:1074-1088.
Andersen PK, Hansen MG, Klein JP. 2004. Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Anal 10:335-350.
Buyse M, Molenberghs G. 1998. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics 54:1014-1029.
Cox DR. 1972. Regression models and life-tables. JRSS-B 34:187-220.
Fine JP, Gray RJ. 1999. A proportional hazards model for the subdistribution of a competing risk. JASA 94:496-509.
Karrison TG. 2016. Versatile tests for comparing survival curves based on weighted log-rank statistics. Stat J 16:678-690.
Lakatos E. 1988. Sample sizes based on the log-rank statistic in complex clinical trials. Biometrics 44:229-241.
Sun S, Weber HJ, Butler E, Rufibach K, Roychoudhury S. 2021. Estimands in hematologic oncology trials. Pharm Stat 20(4):793-805.
Magirr D, Burman CF. 2021. The strong null hypothesis and the MaxCombo test. Stat Biopharm Res 15(2):295-296. (Earlier "Cherry-picking in survival analysis" attribution was a blog post, Oct 2022.)
Mantel N. 1966. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports 50:163-170.
Peduzzi P et al. 1995. Importance of events per independent variable in proportional hazards analysis. J Clin Epidemiol 48:1503-1510.
Putter H, Fiocco M, Geskus RB. 2007. Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26:2389-2430.
Putter H, Schumacher M, van Houwelingen HC. 2020. On the relation between the cause-specific hazard and the subdistribution rate for competing risks data: the Fine-Gray model revisited. Biom J 62:790-807.
Royston P, Parmar MKB. 2013. Restricted mean survival time: an alternative to the hazard ratio. BMC Med Res Methodol 13:152.
Schoenfeld DA. 1981. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 68:316-319.
Sun J. 2006. The Statistical Analysis of Interval-Censored Failure Time Data. Springer.
Grambsch PM, Therneau TM. 1994. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81:515-526.
Tian L, Jin H, Uno H, Lu Y, Huang B, Anderson KM, Wei LJ. 2020. On the empirical choice of the time window for restricted mean survival time. Biometrics 76(4):1157-1166.
Uno H et al. 2014. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. JCO 32:2380-2385.
Xu R, O'Quigley J. 2000. Estimating average regression effect under non-proportional hazards. Biostatistics 1:423-439.

Related Skills

clinical-biostatistics/effect-measures - HR vs RMST as effect measures; CI methods
clinical-biostatistics/trial-reporting - ICH E9(R1) estimand framework for time-to-event
clinical-biostatistics/missing-data-sensitivity - Informative censoring and tipping-point for survival
clinical-biostatistics/cdisc-data-handling - ADTTE structure and CNSR convention
clinical-biostatistics/subgroup-analysis - Subgroup HTE for survival endpoints
clinical-biostatistics/power-and-sample-size - Schoenfeld and Lakatos sample size for TTE
clinical-biostatistics/multiplicity-graphical - Co-primary survival endpoints
machine-learning/survival-analysis - Predictive survival models and ML extensions

Version Compatibility

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Time-to-Event Analysis for Clinical Trials

The Single Most Important Modern Insight -- PH Almost Never Holds

Algorithmic Taxonomy

Postdoc reading list:

Royston-Parmar 2013 BMC Med Res Methodol 13:152 (RMST as primary)
Uno et al 2014 JCO 32:2380 (RMST in oncology)
Andersen-Keiding 2012 Stat Med 31:1074 (Fine-Gray semantic critique)
Putter-Schumacher-van Houwelingen 2020 Biom J 62:790 (Fine-Gray revisited; reduction factor)
Putter-Fiocco-Geskus 2007 Stat Med 26:2389 (competing risks tutorial)
Magirr-Burman 2021 Stat Biopharm Res 15(2):295 (MaxCombo critique)
Grambsch-Therneau 1994 Biometrika 81:515 (scaled Schoenfeld residuals)
Buyse-Molenberghs 1998 Biometrics 54:1014 (PFS-OS surrogacy framework)
Sun et al 2021 Pharm Stat 20:793 (estimands in oncology; ICH E9(R1) ICE strategies)
Sun 2006 The Statistical Analysis of Interval-Censored Failure Time Data (Springer)

Decision Tree by Scenario

Cox PH Diagnostics -- The Therneau-Grambsch Test (and Its Pitfalls)

Goal: Detect violations of the proportional hazards assumption that would invalidate the Cox HR as a meaningful summary statistic.

from lifelines import CoxPHFitter
from lifelines.statistics import proportional_hazard_test

cph = CoxPHFitter()
cph.fit(df, duration_col='time', event_col='event', formula='treatment + age + baseline_score')
cph.print_summary()

# Schoenfeld residuals PH test (lifelines)
results = proportional_hazard_test(cph, df, time_transform='rank')
print(results.summary)

# In R: cox.zph(coxph_fit) returns same with KM transform as default

# Plot scaled Schoenfeld residuals (lifelines)
cph.check_assumptions(df, p_value_threshold=0.05, show_plots=True)

Fixes when PH violated

Stratify on the violating covariate (loses inference on it)
Time-dependent coefficients: coxph(Surv(t, d) ~ x + tt(x), tt=function(x,t,...) x*log(t)) in R; lifelines: formula='treatment * time'
Royston-Parmar flexible parametric (R flexsurv::flexsurvspline or rstpm2::stpm2)
RMST as primary; HR as secondary

Restricted Mean Survival Time (RMST) -- The Modern Alternative

# Python: rmst via lifelines or manual
from lifelines.utils import restricted_mean_survival_time
from lifelines import KaplanMeierFitter

kmf_A = KaplanMeierFitter().fit(df[df['arm']=='A']['time'], event_observed=df[df['arm']=='A']['event'])
kmf_B = KaplanMeierFitter().fit(df[df['arm']=='B']['time'], event_observed=df[df['arm']=='B']['event'])
tau = 36  # months; pre-specified
rmst_A = restricted_mean_survival_time(kmf_A, t=tau)
rmst_B = restricted_mean_survival_time(kmf_B, t=tau)
print(f'RMST diff: {rmst_A - rmst_B:.2f} months')

# R: survRM2::rmst2(time, status, arm, tau=36) is the standard

The tau (truncation time) choice

Statistical constraint: tau <= min(largest follow-up time in each arm) to avoid extrapolation (Tian et al 2020 Biometrics gives rigorous treatment)
Clinical constraint: tau should reflect a clinically meaningful horizon (5-year OS in adjuvant; 24-month PFS in metastatic)
Data-dependent tau inflates Type-I error — MUST be pre-specified in SAP. Post-hoc tau tuning to chase significance is p-hacking.

Pseudo-observation regression (Andersen-Hansen-Klein 2004)

R implementation: pseudo::pseudomean() + GEE via geepack::geeglm().

Competing Risks -- The Andersen-Keiding Framework

Fine-Gray subdistribution hazard

# Python: scikit-survival or lifelines (limited support); R is the SOTA
# R: library(cmprsk); crr(time, fstatus, cov, failcode=1, cencode=0)
# R: library(riskRegression); FGR(formula, data=df, cause=1)

The Andersen-Keiding 2012 critique (Stat Med 31:1074): Fine-Gray subdistribution hazard violates the three principles for valid hazard functionals:

Hazard must be a real instantaneous risk among those truly at risk -> FG keeps dead competing-event subjects "at risk"
Covariate effects must be interpretable as causal -> FG coefficients confound the competing-event hazard
Hazard must support landmarking (conditioning on survival to s) -> FG does not

Practical rule (Putter-Fiocco-Geskus 2007)

For prediction of CIF (cumulative incidence) -> Fine-Gray is fine; report CIF curves and SHR
For etiology / causal effect on the event -> use cause-specific Cox (treat competing events as censoring); report cause-specific HRs for BOTH event of interest AND competing events
For multi-state semantics (alive -> relapse -> death) -> use multi-state framework with transition-specific Cox models

The CIF is always estimable

# R: library(mstate); msfit(coxph_fit, newdata, trans); probtrans(msfit_obj)
# R: library(survival); survfit(Surv(time, event_factor) ~ 1) with multi-state Surv

Log-Rank Variants and MaxCombo

from lifelines.statistics import logrank_test, multivariate_logrank_test

# Stratified log-rank (lifelines uses `weightings=` -- note plural)
results = multivariate_logrank_test(
    df['time'], df['arm'], df['event'], weightings='peto'
)
# weightings options (lifelines 0.27+): None, 'wilcoxon', 'tarone-ware',
# 'peto', 'fleming-harrington' (also accepts kwargs for Fleming-Harrington p, q)

The MaxCombo controversy

Interval Censoring -- When Standard Right-Censoring Is Wrong

# R: library(icenReg)
# fit_ic <- ic_par(cbind(left, right) ~ treatment + age, data=df, dist='weibull', model='ph')
# ic_sp() for semi-parametric; ic_npar() for NPMLE

Recurrent Events -- AG vs PWP vs WLW

# Python: lifelines does not have native AG; use coxph with (start, stop, event) format
# R: coxph(Surv(start, stop, event) ~ trt + cluster(id), data=long_df) for AG
# R: + strata(enum) for PWP

Modern advice (Rogers et al 2014; Cook-Lawless 2007 book): AG with robust variance as default; PWP if events qualitatively heterogeneous; avoid WLW unless events are truly distinct types.

Oncology PFS/OS Estimands -- The 2024 Reality

Per ICH E9(R1), the same PFS dataset yields different HRs depending on censoring rules. Each rule corresponds to a different intercurrent-event strategy:

Informative censoring -- detection and handling

Detection workflow:

Tabulate censoring reasons by arm from CDISC DS (Disposition) or ADaM ADTTE CNSR integer values:
- CNSR=1 (lost to follow-up): symmetric across arms = OK
- CNSR=2 (withdrew consent): if asymmetric, investigate why
- CNSR=3 (admin EoS): symmetric by definition
- CNSR=4 (subsequent therapy initiated): differential is the canonical informative pattern
- CNSR=5+ (toxicity discontinuation): differential is highly informative
Compare KM curves for censoring distribution by arm. If "time-to-censoring" KM differs by arm in same direction as outcome, suspect informative censoring.
Cox-Snell or Schoenfeld residuals on censoring-as-event model: if treatment is significantly associated with censoring hazard, censoring is informative.

Handling strategies (per ICH E9(R1) and Sun 2021):

IPCW workflow (stabilised weights per Robins-Finkelstein 2000):

# Stabilised IPCW: w(t) = S_num(t) / S_denom(t)
# where S_denom predicts remaining-uncensored given baseline + time-varying covariates,
# and S_num predicts remaining-uncensored given treatment alone (numerator for stabilisation)

# Step 1: fit censoring (NOT event) hazard models
# Note: outcome variable for these models is "1 if censored, 0 if event or still at risk"
denom_cens <- coxph(Surv(time, censored) ~ treatment + age + baseline_severity, data=df)
num_cens   <- coxph(Surv(time, censored) ~ treatment, data=df)

# Step 2: compute survival probabilities (NOT hazards) for "remaining uncensored"
# basehaz + linear predictor -> S(t) per subject; the `survfit` + `summary` API gives S_i(t_i)
S_denom <- summary(survfit(denom_cens, newdata=df), times=df$time)$surv
S_num   <- summary(survfit(num_cens,   newdata=df), times=df$time)$surv

# Step 3: stabilised IPCW weight per subject at their observed time
df$ipcw <- S_num / S_denom

# Step 4: weighted Cox for the event of interest with robust SE
fit_ipcw <- coxph(Surv(time, event) ~ treatment + age + baseline_severity,
                  data=df, weights=df$ipcw, robust=TRUE)

ADaM ADTTE -- The CNSR Convention Trap

# Convert ADTTE for Python / R
import pandas as pd

adtte = pd.read_csv('ADTTE.csv')
adtte['event'] = (adtte['CNSR'] == 0).astype(int)  # 1 = event for survival packages
adtte['time'] = adtte['AVAL']  # AVAL is in days/months per AVALU

This is a perpetual bug source. See clinical-biostatistics/cdisc-data-handling for the full ADTTE specification.

Per-Method Failure Modes

Cox PH violation undetected

Trigger: Significant treatment HR reported without cox.zph diagnostic
Mechanism: Cox HR is a time-averaged log-HR under PH violation; the "the HR" interpretation breaks
Symptom: Hazard plots show crossing; cox.zph rejects PH; KM curves diverge then converge
Fix: Report cox.zph result; switch to RMST or time-varying Cox; cite Grambsch-Therneau 1994

Fine-Gray reported as causal effect

Trigger: Fine-Gray SHR interpreted as "treatment effect on event of interest"
Mechanism: SHR confounds with competing-event hazard (Andersen-Keiding 2012)
Symptom: SHR interpretation contradicts cause-specific Cox results; reviewer confusion
Fix: Use cause-specific Cox for etiology; Fine-Gray for CIF prediction only; report both per Putter-Fiocco-Geskus 2007

KM curve biased upward under competing risks

Trigger: KM applied to event of interest treating competing events as non-informative censoring
Mechanism: KM estimates 1 - cause-specific hazard cumulative; competing events deplete denominator
Symptom: KM survival exceeds Aalen-Johansen 1 - CIF estimate
Fix: Use Aalen-Johansen estimator for CIF; never KM in competing-risk setting

MaxCombo direction flipping

Trigger: MaxCombo significant; direction not pre-specified
Mechanism: MaxCombo's family can include weights that favour opposite directions
Symptom: Different reports show opposite "winner" depending on weight choice
Fix: Pre-specify directional constraints; require positive z at late-emphasis weight before claiming superiority; cite Magirr-Burman 2021

Tau chosen post-hoc to favour significance

Trigger: RMST tau adjusted after seeing data
Mechanism: Selection bias; tau-tuning is a flavour of p-hacking
Symptom: Sponsor's RMST result differs from independently re-analysed with pre-specified tau
Fix: Pre-specify tau in SAP; cite Tian 2020 Biometrics

Right-censored midpoint for interval-censored PFS

Trigger: Scan intervals long or differential between arms; standard right-censored Cox applied
Mechanism: Midpoint imputation is biased; SE underestimates
Symptom: Replication with interval-censored Cox gives different point estimate and wider CI
Fix: R icenReg::ic_par or ic_sp; Sun 2006

CNSR convention confusion

Trigger: ADTTE CNSR=1 passed to R survival::Surv(time, event) expecting event=1
Mechanism: Censoring/event role reversed
Symptom: "Event count" matches censoring count from CSR; nonsensical HR
Fix: Always convert: event = (CNSR == 0).astype(int); cite ADaM IG v1.3

Reconciliation: When Methods Disagree

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback

References

Andersen PK, Keiding N. 2012. Interpretability and importance of functionals in competing risks and multistate models. Stat Med 31:1074-1088.
Andersen PK, Hansen MG, Klein JP. 2004. Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Anal 10:335-350.
Buyse M, Molenberghs G. 1998. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics 54:1014-1029.
Cox DR. 1972. Regression models and life-tables. JRSS-B 34:187-220.
Fine JP, Gray RJ. 1999. A proportional hazards model for the subdistribution of a competing risk. JASA 94:496-509.
Karrison TG. 2016. Versatile tests for comparing survival curves based on weighted log-rank statistics. Stat J 16:678-690.
Lakatos E. 1988. Sample sizes based on the log-rank statistic in complex clinical trials. Biometrics 44:229-241.
Sun S, Weber HJ, Butler E, Rufibach K, Roychoudhury S. 2021. Estimands in hematologic oncology trials. Pharm Stat 20(4):793-805.
Magirr D, Burman CF. 2021. The strong null hypothesis and the MaxCombo test. Stat Biopharm Res 15(2):295-296. (Earlier "Cherry-picking in survival analysis" attribution was a blog post, Oct 2022.)
Mantel N. 1966. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports 50:163-170.
Peduzzi P et al. 1995. Importance of events per independent variable in proportional hazards analysis. J Clin Epidemiol 48:1503-1510.
Putter H, Fiocco M, Geskus RB. 2007. Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26:2389-2430.
Putter H, Schumacher M, van Houwelingen HC. 2020. On the relation between the cause-specific hazard and the subdistribution rate for competing risks data: the Fine-Gray model revisited. Biom J 62:790-807.
Royston P, Parmar MKB. 2013. Restricted mean survival time: an alternative to the hazard ratio. BMC Med Res Methodol 13:152.
Schoenfeld DA. 1981. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 68:316-319.
Sun J. 2006. The Statistical Analysis of Interval-Censored Failure Time Data. Springer.
Grambsch PM, Therneau TM. 1994. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81:515-526.
Tian L, Jin H, Uno H, Lu Y, Huang B, Anderson KM, Wei LJ. 2020. On the empirical choice of the time window for restricted mean survival time. Biometrics 76(4):1157-1166.
Uno H et al. 2014. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. JCO 32:2380-2385.
Xu R, O'Quigley J. 2000. Estimating average regression effect under non-proportional hazards. Biostatistics 1:423-439.

Related Skills

clinical-biostatistics/effect-measures - HR vs RMST as effect measures; CI methods
clinical-biostatistics/trial-reporting - ICH E9(R1) estimand framework for time-to-event
clinical-biostatistics/missing-data-sensitivity - Informative censoring and tipping-point for survival
clinical-biostatistics/cdisc-data-handling - ADTTE structure and CNSR convention
clinical-biostatistics/subgroup-analysis - Subgroup HTE for survival endpoints
clinical-biostatistics/power-and-sample-size - Schoenfeld and Lakatos sample size for TTE
clinical-biostatistics/multiplicity-graphical - Co-primary survival endpoints
machine-learning/survival-analysis - Predictive survival models and ML extensions

Adoption

GPTomics/bio-clinical-biostatistics-survival-analysis

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Time-to-Event Analysis for Clinical Trials

The Single Most Important Modern Insight -- PH Almost Never Holds

Algorithmic Taxonomy

Decision Tree by Scenario

Cox PH Diagnostics -- The Therneau-Grambsch Test (and Its Pitfalls)

Fixes when PH violated

Restricted Mean Survival Time (RMST) -- The Modern Alternative

The tau (truncation time) choice

Pseudo-observation regression (Andersen-Hansen-Klein 2004)

Competing Risks -- The Andersen-Keiding Framework

Fine-Gray subdistribution hazard

Practical rule (Putter-Fiocco-Geskus 2007)

The CIF is always estimable

Log-Rank Variants and MaxCombo

The MaxCombo controversy

Interval Censoring -- When Standard Right-Censoring Is Wrong

Recurrent Events -- AG vs PWP vs WLW

Oncology PFS/OS Estimands -- The 2024 Reality

Informative censoring -- detection and handling

ADaM ADTTE -- The CNSR Convention Trap

Per-Method Failure Modes

Cox PH violation undetected

Fine-Gray reported as causal effect

KM curve biased upward under competing risks

MaxCombo direction flipping

Tau chosen post-hoc to favour significance

Right-censored midpoint for interval-censored PFS

CNSR convention confusion

Reconciliation: When Methods Disagree

Quantitative Thresholds

Common Errors

Anticipated Reviewer Pushback

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-clinical-biostatistics-survival-analysis

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Time-to-Event Analysis for Clinical Trials

The Single Most Important Modern Insight -- PH Almost Never Holds

Algorithmic Taxonomy

Decision Tree by Scenario

Cox PH Diagnostics -- The Therneau-Grambsch Test (and Its Pitfalls)

Fixes when PH violated

Restricted Mean Survival Time (RMST) -- The Modern Alternative

The tau (truncation time) choice

Pseudo-observation regression (Andersen-Hansen-Klein 2004)

Competing Risks -- The Andersen-Keiding Framework

Fine-Gray subdistribution hazard

Practical rule (Putter-Fiocco-Geskus 2007)

The CIF is always estimable

Log-Rank Variants and MaxCombo

The MaxCombo controversy

Interval Censoring -- When Standard Right-Censoring Is Wrong

Recurrent Events -- AG vs PWP vs WLW

Oncology PFS/OS Estimands -- The 2024 Reality

Informative censoring -- detection and handling

ADaM ADTTE -- The CNSR Convention Trap

Per-Method Failure Modes

Cox PH violation undetected

Fine-Gray reported as causal effect

KM curve biased upward under competing risks

MaxCombo direction flipping

Tau chosen post-hoc to favour significance

Right-censored midpoint for interval-censored PFS

CNSR convention confusion

Reconciliation: When Methods Disagree

Quantitative Thresholds