imaging-mass-cytometry/spatial-analysis/SKILL.md
Analyze spatial cell-cell interactions, neighborhoods, and niches in IMC/MIBI data with squidpy and imcRtools, covering neighborhood-enrichment permutation nulls, the abundance-vs-density confound, inhomogeneous Ripley's K, cellular-neighborhood discovery, graph-construction (contact vs proximity), and edge effects. Use when testing whether cell types co-locate, choosing a spatial null, building a neighbor graph, discovering tissue niches, or deciding whether a spatial pattern is real or a density/segmentation artifact.
npx skillsauth add GPTomics/bioSkills bio-imaging-mass-cytometry-spatial-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: squidpy 1.3+, scanpy 1.10+, anndata 0.10+, numpy 1.26+, imcRtools 1.8+ (R), spatstat 3.0+ (R)
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturespackageVersion('<pkg>') then ?function_name to verify parametersIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Notes specific to this skill: squidpy's nhood_enrichment z-score is unbounded and scales with graph degree, so it is NOT comparable across images of different cell counts. squidpy does not implement Ripley's K (its ripley and co_occurrence are loose analogs); the inhomogeneous K that conditions on tissue density lives in R spatstat::Kinhom/Kcross.inhom. Build the graph with gr.spatial_neighbors(coord_type='generic') before any test.
"Analyze spatial cell interactions in my IMC data" -> Build a neighbor graph, then test cell-type co-location against an explicitly chosen null, or discover recurrent niches.
squidpy.gr.spatial_neighbors, squidpy.gr.nhood_enrichment, squidpy.gr.co_occurrenceimcRtools::buildSpatialGraph, spatstat::Kcross.inhomA "spatial interaction" or "niche" is not an observation; it is a test against a null model, and which null is chosen (label-shuffle vs CSR vs density-conditioned vs patient-level) decides whether the result is real biology, a density gradient, a segmentation artifact, or upstream clustering. The label-permutation null (histoCAT, squidpy nhood_enrichment) holds cell positions fixed and shuffles identities: it DOES control global abundance (a rare type cannot show spurious enrichment just from being rare) but it does NOT control local density or tissue architecture -- so two cell types that merely share an anatomical compartment (both enriched in a follicle or at an invasive margin) score as strongly "interacting" with no direct affinity. This density confound is the dominant false-positive engine, and a finding significant under CSR but null under inhomogeneous Ripley's K is a density artifact, not an interaction. Two further facts compound it: the nhood_enrichment z-score is unbounded and graph-degree-dependent, so a z of 30 in a 50k-cell image and a z of 8 in a 5k-cell image are not on the same scale (never threshold a fixed z across heterogeneous images); and the replicate is the patient, not the cell or the image, so a per-cell test over tens of thousands of cells reports p~0 for trivial effects (pseudoreplication). The skill forces two questions onto every analysis: against which null, and at which unit.
| Method | Measures | Null / inference | Confound it does NOT control |
|--------|----------|------------------|------------------------------|
| histoCAT / squidpy nhood_enrichment | A neighbors B more/less than chance | within-image label shuffle; z-score | local density, architecture, edges |
| squidpy interaction_matrix | raw cluster-cluster edge counts | none (descriptive) | everything -- just counts |
| squidpy co_occurrence | P(B | A at distance d) / P(B) | none (descriptive curve) | needs its own CSR null |
| Ripley's K/L, cross-K, K_inhom | clustering vs dispersion over radii | CSR; K_inhom conditions on intensity | tissue inhomogeneity (unless K_inhom); ROI shape |
| Cellular Neighborhoods (CN) | recurrent local compositions (niches) | none -- clustering, not a test | window size = scale; doubly-derived |
| Spatial-LDA / UTAG | microenvironment topics / tissue domains | generative / unsupervised | topic/domain count arbitrary |
| Moran's I / Geary's C | spatial autocorrelation of a continuous mark | permutation / analytic | graph definition; global stat masks local |
| Question | Method | Why |
|----------|--------|-----|
| Do two named types co-locate more than chance? (fast screen, one image) | nhood_enrichment / histoCAT | abundance-aware permutation z |
| Same, but worried about density/architecture | inhomogeneous cross-K (Kcross.inhom) | conditions on the tissue's own intensity surface |
| At what scale is a type clustered/dispersed? | Ripley's K/L over radii (edge-corrected) | second-order structure across distance |
| What recurrent multicellular niches exist? (discovery) | Cellular Neighborhoods (sweep window k) | recurrent compositions; no built-in test |
| Is a continuous marker/score spatially structured? | Moran's I (global) / Geary's C (local) | autocorrelation |
| Does an interaction/niche differ between conditions? | hand off to differential-analysis | per-image summary -> patient unit -> mixed model + FDR |
Goal: Construct the graph whose definition matches the biological claim.
Approach: Delaunay approximates physical adjacency (contact/juxtacrine) but invents long edges across lumen/necrosis, so prune by a max distance; fixed radius gives true proximity (paracrine) at a stated micron scale; kNN silently mixes contact and proximity because fixed k spans microns in dense regions and hundreds of microns in sparse ones. Build the graph per image.
import squidpy as sq
import numpy as np
# contact graph: Delaunay, then prune edges longer than a biological max distance (um)
sq.gr.spatial_neighbors(adata, coord_type='generic', delaunay=True)
dist = adata.obsp['spatial_distances']
keep = dist.copy(); keep.data[keep.data > 30] = 0; keep.eliminate_zeros() # cap at ~30 um
adata.obsp['spatial_connectivities'] = (keep > 0).astype(float)
# OR proximity graph: fixed radius at a justified micron scale (paracrine range)
# sq.gr.spatial_neighbors(adata, coord_type='generic', radius=30.0)
Goal: Test co-location per image with the abundance-aware permutation null, knowing its blind spot.
Approach: Run nhood_enrichment per image (so the shuffle is within-image), keep the z as a per-image summary, and never threshold a fixed z across images of different size. Cross-check density-driven hits against inhomogeneous K.
per_image_z = {}
for img_id, idx in adata.obs.groupby('image_id').groups.items():
sub = adata[idx].copy()
sq.gr.spatial_neighbors(sub, coord_type='generic', delaunay=True)
sq.gr.nhood_enrichment(sub, cluster_key='cell_type', seed=0)
per_image_z[img_id] = sub.uns['cell_type_nhood_enrichment']['zscore']
# aggregate these per-image summaries to the PATIENT unit in differential-analysis, not here
Goal: Find recurrent local cell-type compositions, treating them as exploratory.
Approach: Per cell, summarize the composition of its window of neighbors, then cluster the windows. The window size IS the spatial scale and is almost always unjustified, so sweep it and report that the biological conclusion survives k in {10, 20, 30}. A CN has no built-in significance test; significance enters only as a cross-condition comparison (differential-analysis).
from sklearn.cluster import KMeans
import pandas as pd
def cellular_neighborhoods(adata, k_window=20, n_cn=10):
sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=k_window) # window = k neighbors
conn = adata.obsp['spatial_connectivities']
onehot = pd.get_dummies(adata.obs['cell_type']).values
comp = (conn @ onehot) # neighbor composition per cell
comp = comp / comp.sum(axis=1, keepdims=True).clip(min=1)
return KMeans(n_clusters=n_cn, random_state=0).fit_predict(comp)
adata.obs['CN'] = cellular_neighborhoods(adata, k_window=20) # sweep k_window to check stability
Trigger: reporting a z-score as a "significant interaction." Mechanism: the label-shuffle null absorbs abundance but not local density, so co-compartmentalized types score as interacting. Symptom: every type pair sharing a region looks attracted. Fix: name the null; cross-check with inhomogeneous cross-K; or shuffle within a compartment, not the whole image.
Trigger: calling z>2 "significant" across images of different cell counts. Mechanism: the z is unbounded and scales with graph degree. Symptom: large images dominate; small images never reach threshold. Fix: rank within image or convert to an effect size; aggregate per-image summaries to the patient unit.
Trigger: kNN graph results called "contact," or Delaunay across tissue gaps. Mechanism: fixed k mixes contact and proximity by density; Delaunay invents long edges across voids. Symptom: phantom interactions across lumen/necrosis. Fix: prune Delaunay by a max distance; use fixed radius for paracrine claims; state the micron scale.
Trigger: homogeneous-Poisson K on structured tissue. Mechanism: CSR assumes constant intensity, so everything tests as clustered. Symptom: universal "clustering". Fix: inhomogeneous K (Kinhom/Kcross.inhom) that divides by the estimated intensity surface.
Trigger: treating a CN as established biology. Mechanism: it is doubly-derived (cluster cells -> cluster windows) with an arbitrary window and k, and no significance test. Symptom: different labs report different niches on the same tissue. Fix: sweep window k; report stability (NMI/ARI); validate niche-defining markers against raw images for spillover.
| Threshold | Source | Rationale | |-----------|--------|-----------| | CN window = 10 nearest neighbors, 9 CNs retained | Schurch 2020 Cell 182:1341 | the canonical CN convention -- sweep it, do not adopt blindly | | CODEX i-niches: Delaunay 1st-tier, k-means = 100 | Goltsev 2018 Cell 174:968 | window/scale is a choice, not a default | | n_perm >= 10,000 for small corrected p | Schapiro 2017 Nat Methods 14:873 | n_perm=1000 floors p at ~1/1001, too coarse after FDR | | BH-FDR across ~C(C+1)/2 type pairs x radii | multiplicity | ~200 pairs at p<0.05 guarantees false positives | | Prune Delaunay at a biological max distance (e.g. ~30 um) | graph hygiene | removes edges across acellular voids |
| Error / symptom | Cause | Solution | |-----------------|-------|----------| | "Significant interaction" that is two types in one compartment | density confound under label-shuffle | inhomogeneous cross-K; shuffle within compartment | | p~0 across 50k cells | cell-level pseudoreplication | per-image summary -> patient unit (differential-analysis) | | Niche changes with the window/k | window IS the scale | sweep k_window and n_cn; report stability | | Boundary cells dominate a small ROI | ignored edge effects | edge-corrected K; erode/buffer the ROI interior | | Many "significant" pairs | no multiple-testing correction | BH-FDR across all pairs and radii |
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.