clinical-biostatistics/multiplicity-graphical/SKILL.md
Implements multiplicity control for confirmatory clinical trials using graphical procedures (Bretz-Maurer-Hommel), gatekeeping (parallel, serial, mixed), Hochberg/Hommel/Holm with PRDS, and the closed-testing principle (Marcus-Peritz-Gabriel; Goeman 2021 admissibility). Covers FDA Multiple Endpoints Final Guidance (October 2022), graphical procedures via R gMCP, primary + key-secondary + subgroup hierarchies, and FWER vs FDR distinction. Use when designing the multiplicity strategy for confirmatory trials with multiple primary or key secondary endpoints.
npx skillsauth add GPTomics/bioSkills bio-clinical-biostatistics-multiplicity-graphicalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: R gMCP 0.8.16+, graphicalMCP 0.2+, gatekeeping, multcomp, multxpert; Python statsmodels 0.14+ for basic FDR/FWER methods.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_namepip show <package> then help(module.function)If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Design the multiplicity strategy for my trial" -> Specify a closed-testing procedure (graphical, gatekeeping, hierarchical, or step-down Bonferroni-Holm) that controls family-wise error rate at the trial-wide level across primary endpoints, key secondary endpoints, and subgroup analyses, with provable strong FWER control.
Marcus, Peritz & Gabriel 1976 Biometrika 63:655: a hypothesis H_I (I ⊆ {1,...,m}) is rejected iff every intersection hypothesis ∩_{J⊇I} H_J is rejected by a valid α-level local test. Strong FWER control holds for ANY choice of local tests.
Goeman, Hemerik & Solari 2021 Ann Stat 49:1218 tightens this: closed testing is not merely sufficient — it is necessary for admissibility under FDP/FWER/k-FWER. Every admissible multiplicity procedure is equivalent to some closed test. Graphical procedures, gatekeepers, Hommel, fixed-sequence, fallback — all are closed tests in disguise.
FWER vs FDR philosophical divide:
Confirmatory clinical trials use FWER essentially universally.
| Procedure | Type | FWER control | Power profile | Use case | |-----------|------|--------------|---------------|----------| | Bonferroni | Single-step | Yes, any dependence | Conservative; loses 30-50% power vs Hommel under positive dependence | Very small m; worst-case dependence | | Holm 1979 | Step-down | Yes, any dependence | Better than Bonferroni; uniformly dominates | Default for any dependence pattern | | Hochberg 1988 | Step-up | Yes under PRDS (Sarkar 1998) | Better than Holm under PRDS | Positive correlation; verify PRDS | | Hommel 1988 | Step-up via closed tests | Yes under PRDS | Uniformly dominates Hochberg by 1-3% | Whenever Hochberg is valid | | Fixed-sequence (hierarchical) | Sequential | Yes, any dependence | Full alpha for first; subsequent zero if any fail | When clear priority ordering; "key secondary" labelling | | Parallel gatekeeping (Dmitrienko 2003) | Multi-family | Yes | Family-by-family; secondary tested if any primary rejects | Primary family + secondary family | | Serial gatekeeping | Sequential families | Yes | Strict: family k tested only if ALL of family k-1 reject | Co-primary + secondary tiers | | Mixed gatekeeping (Dmitrienko-Tamhane 2008) | Combination | Yes | Combines closed-testing local procedures across families | Complex hierarchies | | Graphical procedures (Bretz-Maurer 2009) | Closed-test as directed graph | Yes by construction | Flexible; allocate alpha to hypotheses via graph weights | Modern standard for confirmatory SAPs | | Graphical + Simes/parametric (Bretz et al 2011) | Closed-test with non-Bonferroni local tests | Yes when Simes valid | Gains power under correlation | Complex co-primary + key secondary + subgroup hierarchies | | Maurer-Bretz 2013 entangled graphs | Memory-augmented graphs | Yes by construction | Alpha propagation depends on origin | Parent-descendant constraints | | Benjamini-Hochberg 1995 | FDR | FDR controlled at level q | Higher power than FWER | Exploratory only; NOT for confirmatory regulatory |
Postdoc reading list:
| Scenario | Recommended procedure | Why | |----------|----------------------|-----| | 2 co-primary endpoints (both must succeed) | No alpha split needed; per-endpoint alpha-level test; cite FDA 2022 | Co-primary doesn't split alpha; inflates n via joint power | | 2 multiple primary endpoints (any-wins) | Graphical procedure or Holm with weights | Alpha must be allocated; graphical is flexible | | 1 primary + 2 key secondary endpoints | Hierarchical (serial gatekeeping) OR graphical with alpha propagation | Modern SAPs favour graphical | | 1 primary + 3 secondary + 4 subgroup analyses | Graphical procedure via gMCP with pre-specified weights | Complex hierarchies benefit from graph visualisation | | Primary endpoint + tipping-point sensitivity | No multiplicity adjustment needed for sensitivity | Sensitivity is "what if" not "another claim" | | Many exploratory biomarker subgroups | Benjamini-Hochberg FDR | Exploratory; not for label claims | | Win-ratio composite (cardiology) | Single test; no multiplicity | Composite captures multiple events in single hierarchy | | Subgroup analysis (pre-specified) | Graphical alpha allocation; small budget (≤20%) per Dane 2019 | Confirmatory subgroup discovery requires explicit allocation | | Adaptive trial with treatment arm dropping | Combination tests (Bauer-Köhne 1994) + closed testing | See clinical-biostatistics/adaptive-designs | | Group-sequential with multiple endpoints | gsDesign or rpact with multivariate alpha spending | Hierarchical alpha across both time and endpoints |
The Bretz-Maurer-Brannath-Posch 2009 Stat Med 28:586 framework recast weighted Bonferroni-Holm closed tests as directed weighted graphs:
library(gMCP)
# Construct a graph for primary + 2 key secondary endpoints
# Primary endpoint at full alpha; if rejected, alpha propagates equally to secondaries
hypotheses <- c('Primary', 'Sec1', 'Sec2')
weights <- c(1, 0, 0) # initial alpha all on primary
# Transition matrix: rows = source, columns = target
# When Primary rejects, weight 0.5 goes to each secondary; when Sec1/Sec2 rejects, alpha returns
transitions <- matrix(c(
0, 0.5, 0.5,
0, 0, 1,
0, 1, 0
), nrow = 3, byrow = TRUE, dimnames = list(hypotheses, hypotheses))
graph <- graphMCP(m = transitions, weights = weights, hnames = hypotheses)
# Note: in current gMCP, the graph constructor is `graphMCP(m=, weights=, hnames=)`;
# `matrix2graph()` appeared in older tutorials and is not the canonical exported API
# -- verify with `?graphMCP` / `?gMCP` in the installed gMCP release before scripting.
# Set p-values from the trial
p_vals <- c(Primary = 0.018, Sec1 = 0.042, Sec2 = 0.038)
# Run the graphical procedure at alpha = 0.025
result <- gMCP(graph, pvalues = p_vals, alpha = 0.025)
print(result)
# Hierarchical rejection: Primary rejects -> alpha propagates to secondaries -> ...
| Pattern | Graph topology | Use | |---------|----------------|-----| | Pure hierarchical (fixed sequence) | H1 -> H2 -> H3 with weight 1 on each transition | Strict ordering | | Holm graph (equal weights) | Each Hi -> Hj with weight 1/(m-1) | No priority ordering | | Primary + secondaries | Primary -> Sec1 (0.5), Sec2 (0.5); Sec1 ↔ Sec2 (1) | Pivotal labeling claims | | Co-primary chain | H1 -> H2 with full weight if BOTH H1a, H1b reject | Co-primary + secondary | | Subgroup branch | Primary -> Subgroup_OS (0.2), Sec1 (0.4), Sec2 (0.4) | Discovery subgroup with budget |
When endpoints are positively correlated, replace the Bonferroni-based intersection test with Simes (for positive dependence) or parametric (using known correlation):
library(gMCP)
# Use Simes-based local tests at each intersection
result_simes <- gMCP(graph, pvalues = p_vals, alpha = 0.025, test = 'Simes')
# Or parametric with estimated correlation matrix
result_param <- gMCP(graph, pvalues = p_vals, alpha = 0.025, corr = correlation_matrix)
Entangled graphs add memory: the alpha propagation can depend on the origin of the alpha. This allows parent-descendant constraints that a single non-entangled graph cannot express. Example: secondary endpoint Sec1 receives alpha only from Primary, never from Sec2.
Postdoc argument: purists argue memory makes the procedure non-coherent in Gabriel's sense; Glimm/Maurer/Bretz argue it matches real-world inferential intent.
Gabriel coherence in plain terms: a coherent procedure rejects a hypothesis H consistently regardless of which superset of H is being tested. Non-entangled graphs are coherent: if H1 is rejected via path A, it would also be rejected via path B. Entangled (memory-bearing) graphs sacrifice coherence: the same H may be rejected when alpha arrives from one parent but not from another, because the propagation history changes the available alpha. The trade-off is operational power -- entangled graphs can encode "secondary X is meaningful only if primary Y rejects, not if primary Z rejects" inferential intent that flat coherent procedures cannot express. Choose based on whether the SAP needs path-dependent priority.
Test H1 at full alpha; only if it rejects, test H2 at full alpha; etc. Maximises power for H1 but H_k becomes inferentially worthless once any H_j (j<k) fails.
# Hierarchical / serial: just a chain graph in gMCP
hyp <- c('H1', 'H2', 'H3', 'H4')
weights <- c(1, 0, 0, 0)
trans <- matrix(c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0),
nrow=4, dimnames=list(hyp, hyp))
graph <- graphMCP(m = trans, weights = weights, hnames = hyp)
Pre-specification of order is critical — based on clinical importance, NOT expected effect size. Ordering by expected effect is data-driven and inflates Type-I.
Secondary family is tested only if at least one primary rejects. Bonferroni-based parallel gatekeeper has stepwise representation (Guilbaud 2007 Biom J 49:917).
Permits using any closed-testing local procedure (e.g., Holm in family 1, Hommel in family 2) and combining via closure principle. R gMCP::generalMixGatekeeping or Mediana/MultXpert packages.
Postdoc tradeoff: parallel gatekeeping power loss vs collapsing endpoints into a composite (which avoids multiplicity but dilutes effect if components move in opposite directions); whether tree gatekeeping (Dmitrienko 2008 Stat Med 27:2114) over-engineers vs equivalent graphical procedure.
Holm 1979 — step-down rejective Bonferroni; FWER controlled under any joint dependence. Conservative but robust.
Hochberg 1988 — step-up using ordered Simes critical values; needs Simes inequality which requires PRDS (Sarkar 1998, 2008). Under PRDS, Hochberg uniformly dominates Holm.
Hommel 1988 — also Simes-based but uses closed-testing tableau directly (not step-up shortcut). Uniformly more powerful than Hochberg (typically 1-3% gain).
Hochberg becomes Type-I-inflated under negative dependence — relevant when comparing endpoints mathematically constrained to move in opposite directions (LDL-C and HDL-C; complementary efficacy and safety endpoints).
Sarkar critique: when PRDS cannot be proven, fall back to Holm. The lost power is the price of robustness.
# Python: statsmodels supports Holm, Hochberg, Hommel
from statsmodels.stats.multitest import multipletests
p_vals = [0.018, 0.042, 0.038, 0.015]
for method in ['holm', 'hochberg', 'hommel', 'bonferroni']:
reject, adj_p, _, _ = multipletests(p_vals, alpha=0.05, method=method)
print(f'{method}: reject={reject}, adjusted={adj_p}')
Federal Register 2022-22882 finalises 2017 draft. Key changes vs draft:
| Category | Approach | Note | |----------|----------|------| | Composite | Single test; no multiplicity | Win-ratio, DOOR/RADAR, time-to-first-event | | Co-primary (all-win) | Each at full alpha; n inflated for joint power | Power = product of marginals | | Multiple primary (any-wins) | Alpha must be split (Bonferroni or graphical) | More n required than co-primary if effects similar | | Primary + key secondary | Hierarchical or graphical | Modern preference: graphical for flexibility |
Winner's bias warning: when post-hoc-selected endpoints are emphasised, bias-corrected effect estimates are recommended (same selection-bias issue as adaptive design).
Dmitrienko-D'Agostino 2017 Stat Med 36:4341 editorial + 2024 Pharm Stat discussion: the dogma of a single primary endpoint causes systematic Type-II error inflation in trials with broad multi-domain benefit (heart failure drugs with effects on mortality, hospitalisation, symptoms, biomarkers).
Win-ratio (Pocock-Ariti-Collier-Wang 2012) and hierarchical composite (DOOR/RADAR, Evans 2015) are responses — they preserve a single inferential test while letting multiple endpoints contribute.
FDA counter-position (Hung, O'Neill, Wang): without a designated primary, sponsors and regulators negotiate over secondary endpoints post hoc, destroying inferential meaning. Hence the FDA 2022 guidance reaffirms key-secondary hierarchies.
gMCP; cite Bretz-Maurer 2009.| Threshold | Source | Rationale | |-----------|--------|-----------| | FWER for confirmatory; FDR for exploratory | ICH E9; FDA 2022 Multiple Endpoints | Regulatory standard universally | | Bonferroni: ~10 tests -> 30-50% power loss | Sarkar 1998 PRDS | Conservative under positive dependence | | PRDS required for Hochberg validity | Sarkar 2008 Ann Stat | Otherwise Type-I inflated; fall back to Holm | | Subgroup α budget <=20% of total | Dane 2019 EFSPI white paper | Discipline against subgroup fishing | | Key secondary requires hierarchy in SAP | FDA 2022 Final | Labeling claims need Type-I-controlled test | | Composite avoids multiplicity but dilutes effect | Pocock 2012 Eur Heart J | Win-ratio captures heterogeneity in single test |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| Hochberg applied to negatively-dependent endpoints | PRDS not checked | Switch to Holm (cite Sarkar 1998) |
| Fixed-sequence ordering data-driven | Post-hoc selection | Pre-specify clinical priority in SAP |
| Bonferroni at 10 correlated endpoints | Default conservatism | Graphical procedure (gMCP); 30-50% power gain |
| FDR for confirmatory primary | Misunderstanding error rates | FWER mandatory for confirmatory; FDR exploratory only |
| Graphical procedure run with multiple weight schemes | Post-hoc graph tuning | Pre-specify single graph in SAP |
| Subgroups significant without multiplicity | Cherry-picking | Pre-specified allocation OR explicit hypothesis-generating label |
| Co-primary treated as multiple primary | Confused alpha allocation | Co-primary: no alpha split; inflate n. Multiple primary: split alpha |
| Win-ratio component priority unspecified | Data-driven choice | Pre-specify hierarchy with rationale; sensitivity over alternatives |
| multipletests default method='hs' (Holm-Sidak) | Common Python mistake | Always specify method='holm', 'hommel', etc., explicitly |
| Sensitivity analysis listed as a "key secondary" requiring alpha | Confusion about role | Sensitivity is "what if" not "another claim"; no alpha needed |
| Pushback | Response | |----------|----------| | "Why this multiplicity procedure?" | Closed testing per Marcus-Peritz-Gabriel; specific implementation is graphical (Bretz-Maurer 2009) with pre-specified weights in SAP | | "Why Hommel not Holm?" | PRDS holds (positive correlation among endpoints); Hommel dominates Holm by 1-3% with no Type-I cost | | "Why graph weights X, Y, Z?" | Clinical priority: primary > key secondary > exploratory; weights reflect labelling claim hierarchy | | "Are these endpoints positively correlated?" | Sensitivity analyses provided: Bonferroni, Holm, Hochberg, Hommel results all in CSR appendix; concordant | | "Where is alpha for the subgroup analysis?" | Pre-specified 20% of primary alpha allocated; cite Dane 2019 | | "Why not just composite endpoint?" | Composite would dilute differential effect on mortality vs hospitalisation; key-secondary hierarchy preserves component-level claims | | "PRDS check for Hochberg?" | Endpoints positively correlated via simulation under null; PRDS holds; Hochberg/Hommel valid | | "Sensitivity in the hierarchy?" | No — sensitivity is "what if" and does not require alpha. Listed as supportive not key secondary. |
development
Find restriction enzyme cut sites in DNA sequences using Biopython Bio.Restriction. Search with single enzymes, batches of enzymes, or commercially available enzyme sets. Returns cut positions for linear or circular DNA. Use when finding restriction enzyme cut sites in sequences.
development
Create restriction maps showing enzyme cut positions on DNA sequences using Biopython Bio.Restriction. Visualize cut sites, calculate distances between sites, and generate text or graphical maps. Use when creating or analyzing restriction maps.
development
Analyze restriction digest fragments using Biopython Bio.Restriction. Predict fragment sizes, get fragment sequences, simulate gel electrophoresis patterns, and perform double digests. Use when analyzing restriction digest fragment patterns.
development
Select restriction enzymes by criteria using Biopython Bio.Restriction. Find enzymes that cut once, don't cut, produce specific overhangs, are commercially available, or have compatible ends for cloning. Use when selecting restriction enzymes for cloning or analysis.