skills/spatial-transcriptomics/SKILL.md
ToolUniverse workflow — Spatial Transcriptomics
npx skillsauth add lamm-mit/scienceclaw spatial-transcriptomicsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive analysis of spatially-resolved transcriptomics data to understand gene expression patterns in tissue architecture context. Combines expression profiling with spatial coordinates to reveal tissue organization, cell-cell interactions, and spatially variable genes.
Triggers:
Example Questions This Skill Solves:
| Capability | Description | |-----------|-------------| | Data Import | 10x Visium, MERFISH, seqFISH, Slide-seq, STARmap, Xenium formats | | Quality Control | Spot/cell QC, spatial alignment verification, tissue coverage | | Normalization | Spatial-aware normalization accounting for tissue heterogeneity | | Spatial Clustering | Identify spatial domains with similar expression profiles | | Spatial Variable Genes | Find genes with non-random spatial patterns | | Neighborhood Analysis | Cell-cell proximity, spatial neighborhoods, niche identification | | Spatial Patterns | Gradients, boundaries, hotspots, expression waves | | Integration | Merge with scRNA-seq for cell type mapping | | Ligand-Receptor Spatial | Map cell communication in tissue context | | Visualization | Spatial plots, heatmaps on tissue, 3D reconstruction |
Input: Spatial Transcriptomics Data + Tissue Image
|
v
Phase 1: Data Import & QC
|-- Load spatial coordinates + expression matrix
|-- Load tissue histology image
|-- Quality control per spot/cell
|-- Filter low-quality spots
|-- Align spatial coordinates to tissue
|
v
Phase 2: Preprocessing
|-- Normalization (spatial-aware methods)
|-- Highly variable gene selection
|-- Dimensionality reduction (PCA)
|-- Spatial lag smoothing (optional)
|
v
Phase 3: Spatial Clustering
|-- Identify spatial domains/regions
|-- Graph-based clustering with spatial constraints
|-- Annotate domains with marker genes
|-- Visualize domains on tissue
|
v
Phase 4: Spatial Variable Genes
|-- Test for spatial autocorrelation (Moran's I, Geary's C)
|-- Identify genes with spatial patterns
|-- Classify pattern types (gradient, hotspot, boundary)
|-- Rank by spatial significance
|
v
Phase 5: Neighborhood Analysis
|-- Define spatial neighborhoods (k-NN, radius)
|-- Calculate neighborhood composition
|-- Identify interaction zones
|-- Niche characterization
|
v
Phase 6: Integration with scRNA-seq
|-- Cell type deconvolution per spot
|-- Map cell types to spatial locations
|-- Predict cell type spatial distributions
|-- Validate with marker genes
|
v
Phase 7: Spatial Cell Communication
|-- Identify proximal cell type pairs
|-- Query ligand-receptor database (OmniPath)
|-- Score spatial interactions
|-- Map communication hotspots
|
v
Phase 8: Generate Spatial Report
|-- Tissue overview with domains
|-- Spatially variable genes
|-- Cell type spatial maps
|-- Interaction networks in tissue context
|-- 3D visualization (if applicable)
Objective: Load spatial data and assess quality.
Supported platforms:
10x Visium (most common):
MERFISH/seqFISH (imaging-based):
Slide-seq/Slide-seqV2:
Xenium (10x single-cell spatial):
Data loading (Visium):
def load_visium_data(data_dir):
"""
Load 10x Visium spatial transcriptomics data.
Expected structure:
data_dir/
├── filtered_feature_bc_matrix/
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
├── spatial/
│ ├── tissue_positions_list.csv
│ ├── scalefactors_json.json
│ └── tissue_hires_image.png
Returns: AnnData object with spatial coordinates
"""
import scanpy as sc
import pandas as pd
# Load expression data
adata = sc.read_visium(data_dir)
# Spatial coordinates are in adata.obsm['spatial']
# Tissue image in adata.uns['spatial']
return adata
Quality Control:
def spatial_qc(adata):
"""
Quality control for spatial transcriptomics data.
"""
import scanpy as sc
# Calculate QC metrics
sc.pp.calculate_qc_metrics(adata, inplace=True)
# Visualize QC metrics spatially
sc.pl.spatial(adata, color='n_genes_by_counts', title='Genes per Spot')
sc.pl.spatial(adata, color='total_counts', title='UMI Counts per Spot')
# Filter criteria
# - Min 200 genes per spot
# - Min 500 UMI counts per spot
# - Max mitochondrial content < 20%
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_cells(adata, min_counts=500)
# Mitochondrial filtering
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)
adata = adata[adata.obs['pct_counts_mt'] < 20].copy()
return adata
def verify_spatial_alignment(adata):
"""
Verify spatial coordinates align with tissue image.
"""
import matplotlib.pyplot as plt
# Plot spots on tissue image
fig, ax = plt.subplots(figsize=(10, 10))
# Tissue image
img = adata.uns['spatial']['tissue_hires_image']
ax.imshow(img)
# Overlay spot coordinates
coords = adata.obsm['spatial']
ax.scatter(coords[:, 0], coords[:, 1], c='red', s=1, alpha=0.5)
ax.set_title('Spatial Alignment Verification')
plt.axis('off')
Objective: Normalize data accounting for spatial heterogeneity.
Normalization:
def normalize_spatial(adata):
"""
Normalize spatial transcriptomics data.
"""
import scanpy as sc
# Filter genes (min 3 spots)
sc.pp.filter_genes(adata, min_cells=3)
# Normalize to median total counts
sc.pp.normalize_total(adata, target_sum=1e4)
# Log-transform
sc.pp.log1p(adata)
# Store raw counts
adata.raw = adata
return adata
Highly variable genes:
def select_hvg_spatial(adata):
"""
Select highly variable genes for spatial analysis.
"""
import scanpy as sc
# Standard HVG selection
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
# Optionally: weight by spatial autocorrelation
# Genes with spatial patterns are more informative
return adata
Spatial smoothing (optional):
def spatial_smooth(adata, radius=2):
"""
Smooth expression by averaging over spatial neighbors.
Useful for noisy data, but can blur boundaries.
"""
from sklearn.neighbors import NearestNeighbors
# Find spatial neighbors
coords = adata.obsm['spatial']
nn = NearestNeighbors(n_neighbors=radius, metric='euclidean')
nn.fit(coords)
distances, indices = nn.kneighbors(coords)
# Smooth expression matrix
X_smooth = adata.X.copy()
for i in range(adata.n_obs):
neighbors = indices[i]
X_smooth[i] = adata.X[neighbors].mean(axis=0)
adata.layers['smoothed'] = X_smooth
return adata
Objective: Identify spatial domains (regions with distinct expression).
Graph-based clustering with spatial constraints:
def spatial_clustering(adata, n_neighbors=6):
"""
Cluster spots into spatial domains.
Uses both expression similarity AND spatial proximity.
"""
import scanpy as sc
import squidpy as sq
# PCA for dimensionality reduction
sc.pp.pca(adata, n_comps=50)
# Build spatial neighbor graph
sq.gr.spatial_neighbors(adata, coord_type='generic', n_neighs=n_neighbors)
# Clustering with spatial constraints
# Uses both PCA space and spatial graph
sc.tl.leiden(adata, resolution=1.0, key_added='spatial_domain')
# Visualize domains on tissue
sc.pl.spatial(adata, color='spatial_domain', title='Spatial Domains')
return adata
Domain marker genes:
def find_domain_markers(adata):
"""
Identify marker genes for each spatial domain.
"""
import scanpy as sc
# Differential expression per domain
sc.tl.rank_genes_groups(adata, groupby='spatial_domain', method='wilcoxon')
# Get top markers per domain
markers = sc.get.rank_genes_groups_df(adata, group=None)
return markers
Objective: Find genes with non-random spatial patterns.
Moran's I (spatial autocorrelation):
def identify_spatial_genes(adata):
"""
Test for spatial autocorrelation using Moran's I.
Moran's I > 0: Positive spatial autocorrelation (clustering)
Moran's I ~ 0: Random spatial distribution
Moran's I < 0: Negative autocorrelation (checkerboard)
"""
import squidpy as sq
# Calculate Moran's I for all genes
sq.gr.spatial_autocorr(
adata,
mode='moran',
n_perms=100,
n_jobs=-1
)
# Results in adata.uns['moranI']
spatial_genes = adata.uns['moranI'].sort_values('I', ascending=False)
# Filter significant spatial genes (FDR < 0.05)
sig_spatial = spatial_genes[spatial_genes['pval_norm_fdr_bh'] < 0.05]
return sig_spatial
Spatial pattern classification:
def classify_spatial_patterns(adata, spatial_genes):
"""
Classify types of spatial patterns.
Pattern types:
- Gradient: Smooth directional change
- Hotspot: Localized high expression
- Boundary: Expression at domain edges
- Periodic: Regular spacing
"""
patterns = {}
for gene in spatial_genes.index[:100]: # Top 100 spatial genes
# Get expression and coordinates
expr = adata[:, gene].X.toarray().flatten()
coords = adata.obsm['spatial']
# Detect pattern type
pattern_type = detect_pattern_type(expr, coords)
patterns[gene] = pattern_type
return patterns
Objective: Analyze cell-cell proximity and spatial niches.
Define spatial neighborhoods:
def analyze_neighborhoods(adata, radius=150):
"""
Analyze spatial neighborhood composition.
For each spot, characterize its microenvironment.
"""
import squidpy as sq
# Calculate neighborhood enrichment
# Tests if cell types are enriched in proximity
sq.gr.nhood_enrichment(adata, cluster_key='spatial_domain')
# Visualize neighborhood enrichment
sq.pl.nhood_enrichment(adata, cluster_key='spatial_domain')
# Results: which domains are spatially proximal?
return adata
Interaction zones:
def identify_interaction_zones(adata, domain_a, domain_b):
"""
Find boundary regions between two spatial domains.
These are hotspots for cell-cell interactions.
"""
# Get spots from each domain
spots_a = adata.obs['spatial_domain'] == domain_a
spots_b = adata.obs['spatial_domain'] == domain_b
# Find spots that neighbor the other domain
# (spots from A that have neighbors in B)
coords = adata.obsm['spatial']
from sklearn.neighbors import NearestNeighbors
nn = NearestNeighbors(n_neighbors=6)
nn.fit(coords)
distances, indices = nn.kneighbors(coords)
interaction_spots = []
for i, spot_in_a in enumerate(spots_a):
if spot_in_a:
neighbors = indices[i]
if any(spots_b[neighbors]):
interaction_spots.append(i)
# Mark interaction zone
adata.obs['interaction_zone'] = False
adata.obs.loc[interaction_spots, 'interaction_zone'] = True
return adata
Objective: Map cell types from scRNA-seq to spatial locations.
Cell type deconvolution:
def deconvolve_cell_types(adata_spatial, adata_sc):
"""
Predict cell type composition per spatial spot.
Uses scRNA-seq reference to deconvolve Visium spots.
Methods: Cell2location, Tangram, SPOTlight
"""
import cell2location
# Prepare single-cell reference
# Extract signature genes per cell type
cell_type_signatures = extract_signatures(adata_sc)
# Run cell2location
# Estimates cell type abundances per spot
mod = cell2location.models.Cell2location(
adata_spatial,
cell_state_df=cell_type_signatures
)
mod.train(max_epochs=30000)
# Add cell type proportions to adata_spatial
adata_spatial.obsm['cell_type_fractions'] = mod.get_cell_type_fractions()
return adata_spatial
Spatial cell type mapping:
def map_cell_types_spatial(adata):
"""
Visualize cell type spatial distributions.
"""
import scanpy as sc
# For each cell type, plot abundance on tissue
cell_types = adata.obsm['cell_type_fractions'].columns
for ct in cell_types:
sc.pl.spatial(
adata,
color=adata.obsm['cell_type_fractions'][ct],
title=f'{ct} Spatial Distribution'
)
Objective: Map ligand-receptor interactions in tissue context.
Spatial proximity-based communication:
def spatial_cell_communication(adata):
"""
Identify cell-cell communication based on spatial proximity.
Requires:
- Cell type annotations (from deconvolution)
- Ligand-receptor database (OmniPath)
"""
import squidpy as sq
from tooluniverse import ToolUniverse
tu = ToolUniverse()
# Get ligand-receptor pairs from OmniPath
lr_pairs = tu.run_one_function({
"name": "OmniPath_get_ligand_receptor_interactions",
"arguments": {"partners": ""} # Get all pairs
})
# For each cell type pair that are spatially proximal
# Calculate interaction scores
sq.gr.ligrec(
adata,
n_perms=100,
cluster_key='cell_type',
interactions=lr_pairs,
copy=False
)
# Visualize significant interactions
sq.pl.ligrec(adata, cluster_key='cell_type')
return adata
Communication hotspot mapping:
def map_communication_hotspots(adata, ligand, receptor):
"""
Map spatial locations of specific L-R interactions.
"""
import matplotlib.pyplot as plt
# Get ligand expression
ligand_expr = adata[:, ligand].X.toarray().flatten()
# Get receptor expression
receptor_expr = adata[:, receptor].X.toarray().flatten()
# Interaction score = ligand × receptor
interaction_score = ligand_expr * receptor_expr
# Add to adata
adata.obs[f'{ligand}_{receptor}_score'] = interaction_score
# Visualize on tissue
sc.pl.spatial(adata, color=f'{ligand}_{receptor}_score',
title=f'{ligand}-{receptor} Interaction Hotspots')
Generate comprehensive spatial report:
# Spatial Transcriptomics Analysis Report
## Dataset Summary
- **Platform**: 10x Visium
- **Tissue**: Breast cancer tumor section
- **Spots**: 3,562 (after QC filtering)
- **Genes**: 18,432 detected
- **Resolution**: 55μm spot diameter (~50 cells/spot)
## Quality Control
- **Mean genes per spot**: 3,245
- **Mean UMI counts**: 12,543
- **Mitochondrial content**: 8.2% average
- **Tissue coverage**: 85% of capture area
## Spatial Domains Identified
- **7 distinct spatial domains** detected via graph-based clustering
- Domain 1: Tumor core (32% of tissue)
- Domain 2: Invasive margin (18%)
- Domain 3: Stromal region (25%)
- Domain 4: Immune infiltrate (12%)
- Domain 5: Necrotic region (8%)
- Domain 6: Normal epithelium (3%)
- Domain 7: Adipose tissue (2%)
## Top Marker Genes per Domain
### Domain 1 (Tumor Core)
- EPCAM, KRT19, MKI67, CCNB1, TOP2A (proliferative tumor)
### Domain 2 (Invasive Margin)
- VIM, FN1, MMP2, SNAI2 (EMT signature)
### Domain 4 (Immune Infiltrate)
- CD3D, CD8A, CD4, PTPRC (T cell enriched)
- CD68, CD14 (macrophage enriched)
## Spatially Variable Genes
- **456 genes with significant spatial patterns** (Moran's I, FDR < 0.05)
### Top 10 Spatial Genes
1. **MKI67** (I=0.82) - Hotspot pattern in tumor core
2. **CD8A** (I=0.78) - Gradient from margin to stroma
3. **VIM** (I=0.75) - Boundary enrichment at invasive margin
4. **COL1A1** (I=0.71) - Stromal-specific expression
5. **EPCAM** (I=0.69) - Tumor region pattern
## Cell Type Deconvolution
Integration with scRNA-seq reference (Bassez et al. 2021)
### Cell Type Spatial Distributions
- **Tumor cells**: Concentrated in core, sparse at margin
- **T cells**: Enriched at invasive margin and infiltrate zones
- **CAFs**: Stromal region and invasive margin
- **Macrophages**: Scattered, enriched near necrosis
- **B cells**: Lymphoid aggregates (2% of tissue)
### Tumor Microenvironment Composition
- Tumor core: 85% tumor cells, 10% CAFs, 5% immune
- Invasive margin: 45% tumor, 30% CAFs, 25% immune (T cell rich)
- Immune infiltrate: 70% T cells, 20% macrophages, 10% B cells
## Spatial Cell Communication
### Top L-R Interactions (Spatially Proximal)
1. **Tumor → T cell**: CD274 (PD-L1) → PDCD1 (PD-1)
- Hotspot: Invasive margin
- Interpretation: Immune checkpoint evasion
2. **CAF → Tumor**: TGFB1 → TGFBR2
- Hotspot: Stromal-tumor interface
- Interpretation: TGF-β-driven EMT
3. **Macrophage → Tumor**: TNF → TNFRSF1A
- Scattered across tumor
- Interpretation: Inflammatory signaling
### Interaction Zones
- **Tumor-Immune Interface**: 245 spots (7% of tissue)
- High expression: CXCL10, CXCL9 (chemokines)
- T cell recruitment and activation
- **Stromal-Tumor Interface**: 387 spots (11% of tissue)
- High expression: MMP2, MMP9 (matrix remodeling)
- Invasion-promoting niche
## Spatial Gradients
- **Hypoxia gradient**: HIF1A, VEGFA increase toward tumor core
- **Proliferation gradient**: MKI67, TOP2A decrease from core to margin
- **Immune gradient**: CD8A, GZMB peak at invasive margin
## Biological Interpretation
Spatial analysis reveals distinct tumor microenvironment organization:
1. **Tumor core**: Highly proliferative, hypoxic, immune-excluded
2. **Invasive margin**: Active EMT, high immune infiltration, checkpoint expression
3. **Stromal barrier**: CAF-rich, matrix remodeling, immunosuppressive signals
The invasive margin shows hallmarks of immune-tumor interaction with
PD-L1/PD-1 checkpoint engagement, suggesting potential for checkpoint
blockade therapy. CAF-mediated TGF-β signaling may drive EMT and therapy
resistance at tumor-stroma interface.
## Clinical Relevance
- **Checkpoint inhibitor response**: High immune infiltration at margin suggests potential
- **Resistance mechanisms**: CAF barrier and TGF-β signaling
- **Biomarkers**: Spatial arrangement of immune cells more predictive than bulk tumor metrics
| Skill | Used For | Phase |
|-------|----------|-------|
| tooluniverse-single-cell | scRNA-seq reference for deconvolution | Phase 6 |
| tooluniverse-single-cell (Phase 10) | L-R database for communication | Phase 7 |
| tooluniverse-gene-enrichment | Pathway enrichment for spatial domains | Phase 3 |
| tooluniverse-multi-omics-integration | Integrate with other omics | Phase 8 |
Question: "Map the spatial organization of tumor, immune, and stromal cells"
Workflow:
Question: "Identify spatial gene expression gradients in developing tissue"
Workflow:
Question: "Automatically segment brain tissue into anatomical regions"
Workflow:
| Component | Requirement | |-----------|-------------| | Spots/cells | At least 500 spatial locations | | QC | Filter low-quality spots, verify alignment | | Spatial clustering | At least one method (graph-based or spatial) | | Spatial genes | Moran's I or similar spatial test | | Visualization | Spatial plots on tissue images | | Report | Domains, spatial genes, visualizations |
Methods:
Platforms:
tools
Onboard and manage Paperclip AI for research-paper knowledge and agent orchestration
development
Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.
testing
Generate a structured scientific PDF report from a JSON description. Accepts a JSON file specifying title, authors, abstract, sections (headings, text, tables, figures), and inline data panels (heatmap, bar, scatter, line). Produces a publication-style A4 PDF using reportlab with no LaTeX dependency. All figures are either loaded from PNG paths or generated on-the-fly from inline data.
development
Execute arbitrary Python code and return stdout. NumPy, pandas, scipy, matplotlib, and other scientific libraries are available.