experimental-design/randomization-blocking/SKILL.md
Structures biological experiments so inference is valid by construction, covering Fisher's principles (randomization, replication, local control), the experimental-vs-observational unit distinction and pseudoreplication (Hurlbert 1984; Lazic 2018), randomization mechanics (complete, restricted, stratified, rerandomization, run-order), blocking layouts (randomized complete block, Latin square, incomplete block), factorial designs and interactions, and the split-plot/nested error strata hidden inside multi-batch genomics. Use when deciding the experimental unit and what counts as a replicate, planning randomization and run order, choosing a blocked/factorial/split-plot/nested layout, avoiding pseudoreplication in cell-culture or animal studies, or specifying the random-effects structure of the analysis model. For assigning samples to sequencing batches/lanes/plates and batch-effect correction see experimental-design/batch-design; for regulated clinical-trial randomization see clinical-biostatistics.
npx skillsauth add GPTomics/bioSkills bio-experimental-design-randomization-blockingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: designit 0.5+, lme4 1.1-35+, lmerTest 3.1+, pwr 1.3+.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_name to verify parametersIf code throws an error, introspect the installed package and adapt the example to the actual API rather than retrying. designit is an R6 package whose BatchContainer$new(), optimize_design(), and *_score_generator() signatures evolve between releases; confirm against the installed vignette (vignette(package = 'designit')) before relying on argument names.
"Design the experiment so the statistics will be valid" -> Decide what the experimental unit is, randomize treatments to those units, replicate the unit (not the measurement), and remove known nuisance variation by blocking — so that the analysis model mirrors how the experiment was actually run.
designit::optimize_design() for constrained randomization; lme4::lmer() / lmerTest for the matching mixed modelThe most consequential and most violated idea in biological design: the experimental unit (EU) is the smallest entity independently assigned to a treatment — and it, not the number of measurements, is the sample size for inference. Lazic 2018 PLoS Biol 16:e2005282 separates three entities: the biological unit (what conclusions are about), the experimental unit (what randomization acts on, = the n), and the observational unit (what is measured). When observational units are counted as independent replicates, the standard error shrinks illegitimately and p-values become meaningless — pseudoreplication (Hurlbert 1984 Ecol Monogr 54:187). Ten thousand cells from three mice are n = 3, not n = 10,000, for a between-mouse question; mice co-housed in a cage dosed through the chow make the cage the EU, not the mouse. Lazic et al. found ~46% of surveyed animal studies pseudoreplicated. The fix is structural: model the design's hierarchy (random effects) or aggregate to the EU before testing — pseudoreplication is, formally, an omitted random effect.
A second, deeper point (Fisher): randomization is what licenses the p-value. It supplies the physical basis for the error term and converts systematic lurking-variable bias into random error balanced in expectation. Model-based tests are approximations to the randomization distribution. Skip randomization and the causal claim rests entirely on assumptions.
| Design | Controls / estimates | When to use | Fails / costs when | |--------|----------------------|-------------|--------------------| | Completely randomized (CRD) | error variance only | units homogeneous; no known nuisance | inefficient if real nuisance structure exists | | Randomized complete block (RCBD) | one known nuisance (day, litter, chip, donor) | nuisance factor identifiable and blockable | costs error df; harmful if block variance is ~0 ("blocking on noise") | | Latin square | two orthogonal nuisances (day x technician) | n² runs affordable for n treatments | assumes no interaction among row/col/treatment | | (Balanced) incomplete block | one nuisance, block smaller than #treatments | plate/chip holds fewer samples than treatments | analysis more complex; needs balance for efficiency | | Factorial | main effects + interactions, "hidden replication" | >1 factor; interaction is of interest | #runs grows multiplicatively | | Fractional factorial / screening | main effects under sparsity-of-effects | many factors, few runs (Plackett-Burman) | aliases effects; cannot resolve all interactions | | Split-plot | two EU sizes, two error strata | one factor hard to randomize finely (lane, incubator, batch) | wrong error term if analyzed as a flat factorial -> anti-conservative | | Nested / hierarchical | variance components across levels | sub-sampling within units (cells in mice in cages) | pseudoreplication if the nesting is ignored | | Repeated measures | within-unit change over time | longitudinal sampling of the same EU | a split-plot in time; needs the within-unit error term |
| Scenario | Recommended structure | Why | |----------|----------------------|-----| | Treatment given per animal, one tissue measured each | CRD or RCBD; n = animals | EU = animal | | Many cells measured per animal, between-animal question | nested; aggregate to per-animal (pseudobulk) before testing | EU = animal, cells are observational units | | Treatment delivered per cage (chow/water), several mice/cage | EU = cage; block or model cage as random | randomization acted on the cage | | Two factors of interest (genotype x drug) | factorial; estimate the interaction | main effects uninterpretable if interaction is large | | One factor fixed per run (incubator temp, sequencing lane) | split-plot; whole-plot = run, sub-plot = sample | two error strata; test whole-plot against whole-plot error | | Known batch/day nuisance, all conditions fit per block | RCBD; include block in the model | removes nuisance from error; "analyze as randomized" | | Plate holds fewer samples than conditions | incomplete block + include block term | balance preserves estimability | | Assigning samples to sequencing batches/lanes | -> experimental-design/batch-design | constrained sample-to-batch allocation lives there | | Regulated clinical trial randomization | -> clinical-biostatistics | confirmatory/regulated regime out of scope |
Goal: Identify the EU and therefore the true n before any power or analysis decision.
Approach: Trace the randomization: the EU is the smallest entity to which a treatment level was independently assigned. Anything measured below that level is an observational unit and is summarized (mean/sum) up to the EU, or modeled as a nested random effect — never counted as an independent replicate.
# Between-condition question with multiple cells per donor:
# the donor is the experimental unit, NOT the cell.
# Correct: aggregate observational units to the EU, then test on EU-level values.
library(dplyr)
eu_level <- cells |>
group_by(donor, condition) |>
summarise(value = mean(measurement), .groups = 'drop') # one row per experimental unit
# n for inference = number of donors per condition, not number of cells
Goal: Assign treatments to units with a documented random mechanism, optionally restricted to guarantee balance on known factors.
Approach: Use a seeded pseudo-random generator (never "haphazard" order, which aliases treatment with processing position/time). For known prognostic factors, restrict the randomization (block/stratify) and then include those factors in the model. When finite-sample imbalance matters, rerandomize against a pre-specified balance criterion (Morgan & Rubin 2012 Ann Stat 40:1263) or use minimization for sequential enrollment (Pocock & Simon 1975 Biometrics 31:103).
set.seed(20260528) # record the seed for reproducibility
units <- data.frame(id = sprintf('S%02d', 1:24),
block = rep(c('day1','day2','day3'), each = 8))
# Restricted (block) randomization: randomize treatment WITHIN each block
units$treatment <- ave(units$id, units$block,
FUN = function(ids) sample(rep(c('ctrl','treat'),
length.out = length(ids))))
# Also randomize RUN ORDER so processing position is not confounded with treatment
units$run_order <- sample(nrow(units))
Goal: Remove a known nuisance source from the error term to sharpen the treatment comparison.
Approach: Group units into homogeneous blocks (day, litter, chip, donor), randomize treatments within block, and add the block as a term in the model. The paired t-test is the special case of an RCBD with block size 2. Block only on factors with real between-block variation; blocking on a noise factor spends error df for nothing.
library(designit) # constrained assignment; verify API vs installed vignette
bc <- BatchContainer$new(dimensions = list(block = 3, position = 8))
bc <- assign_in_order(bc, samples = units)
bc <- optimize_design(
bc,
scoring = osat_score_generator(batch_vars = 'block',
feature_vars = c('treatment'))) # balance treatment across blocks
A split-plot has two experimental-unit sizes and therefore two error strata: a whole-plot factor that is hard to randomize finely (incubator temperature, the sequencing run/lane, the 10x chip, the staining batch) and a sub-plot factor randomized within each whole plot (the individual sample, the genotype). Analyzing a split-plot as a flat factorial uses the wrong, too-small error term for the whole-plot factor and gives anti-conservative tests for exactly the factor that was hardest to replicate. In genomics the lane/run/chip is almost always a whole plot; "batch effects" are frequently a split-plot structure to be modeled, not a nuisance to scrub.
Goal: Match the model's random-effects structure to the design's randomization structure.
Approach: Encode each randomization level as a random effect; fixed effects carry the questions. Crossed vs nested structure determines the denominator for each fixed effect; with few EUs use Satterthwaite or Kenward-Roger degrees of freedom (lmerTest / pbkrtest).
library(lme4); library(lmerTest)
# Whole plot = run (random); sub-plot factor = condition (fixed); cells nested in sample
fit <- lmer(expression ~ condition + (1 | run/sample), data = df) # run, and sample within run
anova(fit) # Satterthwaite df via lmerTest
A factorial design crosses factors so every observation informs every main effect ("hidden replication") and, uniquely, estimates interactions — the joint action one-factor-at-a-time (OFAT) cannot see. When an interaction is large, main effects are not interpretable alone; reporting a main effect while ignoring a strong interaction is the most common misreading of a 2x2 design. OFAT is less efficient and silently assumes additivity.
(1 | run)).| Threshold | Source | Rationale | |-----------|--------|-----------| | EU = level of independent treatment assignment | Hurlbert 1984; Lazic 2018 | defines the n for inference | | Paired design = RCBD with block size 2 | Fisher; standard | pairing is blocking | | Latin square needs n² runs for n treatments | standard design theory | controls two nuisances orthogonally | | Use Kenward-Roger/Satterthwaite df when EU count is small (roughly < ~10/group) | Kenward & Roger 1997 Biometrics 53:983 | naive F df are anti-conservative with few units | | Maximal random effects for confirmatory; prune for small samples | Barr 2013; Matuschek 2017 | Type-I protection vs convergence/power tradeoff |
| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Significant result that will not replicate | pseudoreplication (cells as n) | aggregate to EU or add EU random effect | | Whole-plot factor over-significant | split-plot analyzed flat | two-stratum mixed model | | Treatment effect tracks processing order | no run-order randomization | seeded randomization of run order | | Blocked design analyzed without block term | "design but don't analyze" | include block in the model; analyze as randomized | | Main effect reported despite strong interaction | factorial misread | interpret simple effects within the interaction | | Mixed model singular fit | over-specified random effects | prune to the design-justified structure (Matuschek 2017) |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.