imaging-mass-cytometry/quality-metrics/SKILL.md
Quality control for IMC/MIBI data across pixel, channel, image, slide, and batch levels, covering Poisson-count SNR (cell-level Gaussian-mixture and empty-channel comparison), spillover-matrix QC (the three physical sources), drift and the missing EQ-bead analog, acquisition artifacts, and sample-of-origin batch effects. Use when deciding whether to keep or drop a channel, ROI, or slide, distinguishing a dim antibody from a failed one, reading a spillover matrix, or diagnosing batch-driven clustering before analysis.
npx skillsauth add GPTomics/bioSkills bio-imaging-mass-cytometry-quality-metricsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: numpy 1.26+, scikit-learn 1.4+, scanpy 1.10+, CATALYST 1.28+ (R), spillR 1.0+ (R)
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturespackageVersion('<pkg>') then ?function_name to verify parametersIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Notes specific to this skill: IMC values are integer ion (dual) counts, so SNR must respect Poisson statistics, not fluorescence intuition; biology lives in 1-2 count differences. There is no EQ-bead in-line drift normalizer for ablated tissue. CATALYST normCytof is for SUSPENSION bead normalization, not IMC images -- do not apply it to image data. Compensate raw pixels before transformation.
"Assess the quality of my IMC acquisition" -> Gate the data at the level each failure lives at -- pixel, channel, image, slide, batch -- before analysis, not by normalizing after.
numpy/scikit-learn for SNR, artifacts, batch diagnosisCATALYST::plotSpillmat, spillR for spillover QCIMC/MIBI QC is not one number. The data live in a Poisson ion-count regime where "noise" has a defined statistical meaning, and the failures that actually destroy an experiment -- a dead antibody, an unbalanced batch, cells that cluster by which slide they came from rather than by phenotype -- are panel/staining/batch problems that are invisible to per-image SNR. So a single metric is always blind at some level, and the discipline is to gate (drop a channel, ROI, or slide) before analysis rather than normalize after, because correction moves a distribution but never creates the positive/negative separation that staining never produced. Three corollaries a postdoc internalizes. (1) Counts are Poisson: a real floor-abundance epitope genuinely yields a few counts, so "2 counts" is signal or noise depending on dwell, area, and the aggregation level -- SNR grows as sqrt(N) when pixels are pooled into a cell. (2) Dim is not failed: a correctly-titrated antibody to a sparse antigen is supposed to be dim; failure is INSEPARABILITY of positive and negative populations, judged against a known-empty channel, not low absolute intensity. (3) IMC has no EQ-bead drift normalizer -- ablated fixed tissue cannot be spiked with calibration beads, so the only honest cross-batch yardstick is an anchor reference sample included in every run (Casanova 2025), and the absence of an in-line standard is itself expert knowledge.
| Level | What to measure | Characteristic failure | Blind to | |-------|-----------------|------------------------|----------| | Pixel | hot pixels, shot noise, dynamic range | detector spikes; Poisson noise on dim signal | whether the channel is biologically real | | Channel (marker) | cell-level SNR, spillover in/out, vs empty channel | dead antibody, crosstalk, oxide/+-1 leak | spatial artifacts, batch | | Image / ROI | mean intensity, cell coverage, ablation completeness | failed ablation, folding, off-target ROI | cross-sample comparability | | Slide / acquisition | detector drift over time, tune (Lu duals) | within-run sensitivity decay, mis-tune | between-slide offset | | Batch / cohort | sample-of-origin clustering, lot effects | the cohort clusters by batch not biology | nothing -- the top level |
| Observation | Decision | Basis | |-------------|----------|-------| | Cell-level positive/negative mixture won't separate; signal ~ empty/80ArAr channel | DROP (failed antibody) | inseparability, not intensity | | Low absolute counts but clean separation, pattern matches biology + control tissue | KEEP (dim-but-real); use at aggregated levels | dim != failed | | High signal, low SNR (everything "positive") | DROP or re-titrate | non-specific binding | | Heavy +16 oxide or impurity from a co-expressed partner | DROP or re-mass the panel | unrescuable by compensation | | Striping / incomplete-ablation banding | DROP the ROI | physical failure, not correctable noise | | DNA/Ir dropout over a region | MASK the region, keep the rest | non-ablation/tissue loss | | Tune fails (Lu duals below panel criterion) | RE-TUNE / re-acquire | instrument not in spec | | Cells cluster by slide/patient not phenotype | batch-correct; if it won't mix, the contrast is confounded | sample-of-origin effect |
Goal: Judge marker adequacy at the unit of analysis (the cell), in a count-aware way.
Approach: Fit a two-component Gaussian mixture to per-cell mean counts (on non-transformed counts) and take mean(positive)/mean(negative); a failed antibody is one whose components do not separate, regardless of brightness. Compare the distribution to a known-empty channel as the operational "did this antibody work" test.
import numpy as np
from sklearn.mixture import GaussianMixture
def cell_snr(per_cell_counts):
# two-component mixture on raw per-cell means: separation, not brightness, defines success
gm = GaussianMixture(n_components=2, random_state=0).fit(per_cell_counts.reshape(-1, 1))
pos, neg = np.sort(gm.means_.ravel())[::-1]
return pos / neg if neg > 0 else np.inf
def matches_empty(channel_counts, empty_channel_counts, q=95, tol=2.0):
# compare the POSITIVE tail (q-th percentile), not the median: a real-but-sparse marker
# carries its signal in the tail while a failed channel's tail sits at the empty floor.
# True -> indistinguishable from 80ArAr / an unconjugated lanthanide -> the honest "failed" test
return np.percentile(channel_counts, q) - np.percentile(empty_channel_counts, q) <= tol
Goal: Decide whether a panel's crosstalk is acceptable before compensating.
Approach: Generate the matrix from single-stain controls and read whole rows, not just neighbors -- spillover has three physically distinct sources with different mass signatures and different fixes. Acceptability is co-expression-dependent: the same percentage is fine between unrelated markers and fatal between co-expressed ones.
library(CATALYST)
sce <- readSCEfromTXT('spillover/') # single-metal-spotted slides; filenames carry the metal
sce <- prepData(sce, transform = TRUE, cofactor = 5)
sce <- assignPrelim(sce); sce <- applyCutoffs(estCutoffs(sce))
sm <- computeSpillmat(sce)
plotSpillmat(sce, sm) # inspect M+-1 (abundance), M+16 (oxide), and
# any bright off-diagonal at a NON-adjacent mass (impurity)
Goal: Catch the dominant real-world failure -- cells clustering by slide/patient rather than phenotype -- which no per-image metric reports.
Approach: Embed cells and color by patient, slide, day, and antibody lot; if cells separate by sample, there is a batch problem. Diagnose before correcting, and correct at the batch layer with an anchor reference sample.
import scanpy as sc
sc.pp.pca(adata); sc.pp.neighbors(adata); sc.tl.umap(adata)
sc.pl.umap(adata, color=['patient', 'slide', 'acquisition_day', 'antibody_lot'])
# separation by these = batch, not biology; a dead/unbalanced channel cannot be normalized into life
Trigger: discarding low-count channels by fluorescence intuition. Mechanism: at 1 um^2/~1 ms dwell a floor-abundance epitope yields a few counts; biology lives in 1-2 count differences. Symptom: real dim markers dropped. Fix: judge adequacy at the aggregation level analyzed; SNR scales sqrt(N) with pooled pixels; mean expression > ~7 is effectively noise-immune.
Trigger: flagging high pixel-channel Pearson correlation as spillover. Mechanism: co-expressed real markers correlate too; spillover is a directional, mass-structured leak. Symptom: false spillover calls, missed real ones. Fix: read the single-stain spillover matrix; diagnose by mass signature (+-1, +16, named impurity mass), not correlation.
Trigger: trusting compensation on very bright donors or co-expressed pairs. Mechanism: the matrix is linear only in the linear range and cannot separate real co-expression from leak. Symptom: over/under-shoot; subtracted real biology. Fix: keep total per-pair spillover low by panel design; use NNLS (CATALYST) or flag-and-replace (spillR); the real fix is upstream mass assignment.
Trigger: passing per-image SNR and skipping cross-sample QC. Mechanism: FFPE/ischemia/lot variation shifts baselines by batch. Symptom: unsupervised analysis groups by sample-of-origin. Fix: UMAP/ridgeline by patient/slide/day/lot; include anchor samples; correct at the batch layer.
| Threshold | Source | Rationale | |-----------|--------|-----------| | Mean expression > ~7 ~ noise-immune | Lu 2023 Nat Commun 14:1601 | below it shot noise perturbs per-cell values | | Abundance sensitivity (M+-1) < 0.3% (Tb) | Han 2018 Nat Protoc 13:2121 | TOF peak-tail spec; tuning target, not guarantee | | Oxide (M+16) < 3% (La) | Han 2018 Nat Protoc 13:2121 | plasma-oxide spec; worst for abundant structural markers | | Isotopic impurity up to ~4% at a named mass | Han 2018 Nat Protoc 13:2121 | not predictable from mass proximity -- read the lot | | Tune ~ Lu >= 1500 dual counts | panel-specific convention | a pass criterion, stated in dual counts (unit matters) | | Pixel foreground (Otsu) signal < 2/image -> flag | steinbock/IMCDataAnalysis | image-level marker filter |
Where the field has NO accepted threshold (itself expert knowledge): a universal SNR cutoff for a "good marker"; a single spillover percentage defining an "acceptable panel" (acceptability is co-expression-dependent); an in-line pixel-level drift-normalization standard equivalent to EQ beads; a fixed hot-pixel count threshold (DIMR/KNN are adaptive precisely because a fixed cutoff fails across brightnesses).
| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Dropped a real sparse marker | judged by absolute intensity | drop on inseparability + match-to-empty + wrong spatial pattern | | Spillover "fixed" but double-positives persist | compensated co-expressed/saturated channel | re-mass the panel; NNLS/spillR; compensate raw pixels | | Cohort clusters by patient | unaddressed batch | anchor reference sample; diagnose before correcting | | Threshold "1500" or "2" ambiguous | unit omitted | always state dual counts; thresholds are panel/instrument-specific | | Striped ROI "denoised" | physical ablation failure treated as noise | drop the ROI | | Indium nuclear signal taken as a marker (MIBI) | In localizes to nuclei | treat as artifact unless validated; use 197Au + background masking |
development
Find restriction enzyme cut sites in DNA sequences using Biopython Bio.Restriction. Search with single enzymes, batches of enzymes, or commercially available enzyme sets. Returns cut positions for linear or circular DNA. Use when finding restriction enzyme cut sites in sequences.
development
Create restriction maps showing enzyme cut positions on DNA sequences using Biopython Bio.Restriction. Visualize cut sites, calculate distances between sites, and generate text or graphical maps. Use when creating or analyzing restriction maps.
development
Analyze restriction digest fragments using Biopython Bio.Restriction. Predict fragment sizes, get fragment sequences, simulate gel electrophoresis patterns, and perform double digests. Use when analyzing restriction digest fragment patterns.
development
Select restriction enzymes by criteria using Biopython Bio.Restriction. Find enzymes that cut once, don't cut, produce specific overhangs, are commercially available, or have compatible ends for cloning. Use when selecting restriction enzymes for cloning or analysis.