flow-cytometry/gating-analysis/SKILL.md
Defines cell populations in flow and spectral cytometry through manual gates (rectangle, polygon, quadrant, boolean) and reproducible automated gating (openCyto gating templates, flowDensity data-driven thresholds, flowClust model-based gates), organized as a hierarchical GatingSet (flowWorkspace) and round-tripped with FlowJo via CytoML. Covers the canonical gate order (time -> debris -> singlets -> live -> lineage), FMO-vs-isotype boundary setting, gate-order dependence and recompute semantics, rare-event/MRD gating, and per-population statistics. Use when building a gating strategy, automating a manual FlowJo scheme across samples, choosing manual vs data-driven gates, or extracting population frequencies.
npx skillsauth add GPTomics/bioSkills bio-flow-cytometry-gating-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: flowWorkspace 4.14+, openCyto 2.14+, flowDensity 1.36+, flowCore 2.14+, CytoML 2.14+.
Before using code patterns, verify installed versions match. If versions differ:
packageVersion('<pkg>') then ?function_name to verify parametersopenCyto gating-method names drift across versions - confirm with gt_list_methods() on the installed package (e.g. gate_flowclust_2d vs flowClust.2d). Adapt rather than retrying.
"Gate my data to identify cell populations" -> Define populations by drawing boundaries in marker space, organized as a hierarchy, manually or with reproducible data-driven methods.
flowCore gates -> flowWorkspace::GatingSet -> gs_pop_add -> recomputeopenCyto gating template (CSV) or flowDensity::deGateThe position of a positive/negative boundary is governed by SPREADING ERROR - the variance that every other bright fluorophore spills into the channel of interest - NOT by nonspecific antibody binding (Roederer 2001 Cytometry 45:194). An FMO control (full panel minus the one channel) reproduces exactly that spreading and is the correct way to set the gate; an isotype control addresses only nonspecific binding, has a different total fluorochrome load, and sits in the wrong place. Isotypes are deprecated for boundary-setting (still fine for a qualitative new-reagent check). Equally load-bearing is gate ORDER: time -> debris (FSC/SSC) -> singlets (FSC-A vs FSC-H) -> live/dead -> lineage. This is a funnel that removes the broadest, least-specific contaminants first (time instability corrupts ALL channels; doublets are scatter-normal AND viable AND double-positive; dead cells bind antibody nonspecifically) so each narrower downstream gate operates on clean input. Reorder it - gate lineage before singlets - and artifacts are baked into the result that no later gate can remove.
| Method | Citation | Mechanism | When to use |
|--------|----------|-----------|-------------|
| openCyto | Finak 2014 PLoS Comput Biol 10:e1003806 | CSV gatingTemplate + per-gate algorithms | reproduce a manual SOP across many samples; human-readable + automated |
| mindensity (openCyto) | - | KDE valley between two peaks | clear bimodal marker, 1D cut |
| tailgate (openCyto) | - | KDE-derivative tail onset | rare positive tail, no clean second peak |
| quantileGate (openCyto) | - | cut at a fixed event quantile | threshold should track a fraction |
| flowDensity | Malek 2015 Bioinformatics 31:606 | sequential bivariate density cutoffs | reproduce an entire predefined manual strategy |
| flowClust / gate_flowclust_2d | Lo 2009 BMC Bioinformatics 10:145 | t-mixture + Box-Cox, K by BIC | overlapping elliptical populations |
| DAFi | Lee 2018 Cytometry A 93:597 | recursive filter + clustering on a hierarchy | discovery WITH interpretability |
Rule of thumb: 1D bimodal -> mindensity; rare tail -> tailgate; overlapping ellipses -> flowClust.2d; replicate a full manual SOP -> flowDensity; discovery-with-interpretability -> DAFi.
Goal: Apply gates in the canonical order and extract population statistics.
Approach: Build a GatingSet, add gates parent-by-parent, then recompute() - WITHOUT it, child populations are empty. Gates apply on the TRANSFORMED scale if the GatingSet is transformed.
library(flowWorkspace); library(flowCore)
gs <- GatingSet(fs)
# matrix dimnames preserve 'FSC-A'/'FSC-H'; data.frame() would mangle them to FSC.A
singlet <- polygonGate('singlets', .gate = matrix(
c(2e4, 1e4, 25e4, 2e5, 25e4, 26e4, 2e4, 4e4), ncol = 2, byrow = TRUE,
dimnames = list(NULL, c('FSC-A', 'FSC-H'))))
gs_pop_add(gs, singlet, parent = 'root')
gs_pop_add(gs, rectangleGate('CD3+', CD3 = c(1.5, Inf)), parent = 'singlets') # transformed scale
recompute(gs) # REQUIRED - else children are empty
gs_pop_get_stats(gs, type = 'count')
Goal: Apply a reproducible, declarative gating strategy across all samples.
Approach: A CSV template (alias/pop/parent/dims/gating_method/gating_args) defines the hierarchy; gt_gating applies it. Confirm method names with gt_list_methods().
library(openCyto); library(data.table)
tmpl <- fread('
alias,pop,parent,dims,gating_method,gating_args
nonDebris,+,root,FSC-A,mindensity,
singlets,+,nonDebris,"FSC-A,FSC-H",singletGate,
live,-,singlets,"Live_Dead",mindensity,
CD3,+,live,CD3,mindensity,
CD4CD8,+,CD3,"CD4,CD8",gate_flowclust_2d,K=2
')
gt <- gatingTemplate(tmpl)
gs <- GatingSet(fs)
gt_gating(gt, gs)
Goal: Detect a rare population (e.g. MRD at 1e-4 to 1e-5).
Approach: Unsupervised clustering FAILS here (a 1e-5 population is ~10 events, invisible to density/SOM); MRD stays supervised/template-gated. Compute the acquisition depth needed from the target sensitivity and the ~50-event Poisson rule BEFORE acquiring; never downsample.
# Need ~50-60 target events for CV < ~15%; sensitivity 1e-5 => acquire ~1e6 cells.
target_sensitivity <- 1e-5
events_needed <- ceiling(50 / target_sensitivity) # cells to acquire
# Gate the rare population with a prespecified template; report observed LOD from cells acquired.
Trigger: querying stats right after gs_pop_add. Mechanism: membership not computed. Symptom: zero counts. Fix: recompute(gs).
Trigger: raw-scale gate values on a transformed GatingSet (or vice versa). Mechanism: scale mismatch. Symptom: gate in the wrong place / empty. Fix: set gate values on the same (transformed) scale the GS uses.
Trigger: isotype control to set positivity. Mechanism: spreading error, not nonspecific binding, sets the edge. Symptom: wrong negative boundary. Fix: use FMO.
Trigger: FlowSOM for a 1e-5 population. Mechanism: too few events. Symptom: rare pop absorbed into a neighbor. Fix: supervised/template gating; size acquisition for the Poisson floor.
| Threshold | Source | Rationale | |-----------|--------|-----------| | ~50-60 events for CV < 15% | Poisson statistics | rare-event detection floor | | sensitivity 1e-5 needs ~1e6 cells | Poisson floor | to collect ~50 events at that frequency | | FMO for boundary, not isotype | Roederer 2001; Maecker & Trotter 2006 | spreading error dominates the boundary |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| zero counts in children | no recompute() | call it after adding gates |
| gt_gating method not found | version-renamed method | check gt_list_methods() |
| filter() vs Subset() confusion | filter returns a mask, Subset the data | use Subset(ff, gate) for the population |
| FlowJo .jo won't import | only .wsp supported | re-save as wsp; use CytoML |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.