skills/image-analysis/SKILL.md
ToolUniverse workflow — Image Analysis
npx skillsauth add lamm-mit/scienceclaw image-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image. Designed for BixBench imaging questions covering colony morphometry, cell counting, fluorescence quantification, regression modeling, and statistical comparisons.
IMPORTANT: This skill handles complex multi-workflow analysis. Most implementation details have been moved to references/ for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.
Apply when users:
BixBench Coverage: 21 questions across 4 projects (bix-18, bix-19, bix-41, bix-54)
NOT for (use other skills instead):
tooluniverse-phylogeneticstooluniverse-rnaseq-deseq2tooluniverse-single-celltooluniverse-statistical-modeling# Core (MUST be installed)
import pandas as pd
import numpy as np
from scipy import stats
from scipy.interpolate import BSpline, make_interp_spline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.power import TTestIndPower
from patsy import dmatrix, bs, cr
# Optional (for raw image processing)
import skimage
import cv2
import tifffile
Installation:
pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffile
START: User question about microscopy data
│
├─ Q1: What type of data is available?
│ │
│ ├─ PRE-QUANTIFIED DATA (CSV/TSV with measurements)
│ │ └─ Workflow: Load → Parse question → Statistical analysis
│ │ Pattern: Most common BixBench pattern (bix-18, bix-19, bix-41, bix-54)
│ │ See: Section "Quantitative Data Analysis" below
│ │
│ └─ RAW IMAGES (TIFF, PNG, multi-channel)
│ └─ Workflow: Load → Segment → Measure → Analyze
│ See: references/image_processing.md
│
├─ Q2: What type of analysis is needed?
│ │
│ ├─ STATISTICAL COMPARISON
│ │ ├─ Two groups → t-test or Mann-Whitney
│ │ ├─ Multiple groups → ANOVA or Dunnett's test
│ │ ├─ Two factors → Two-way ANOVA
│ │ └─ Effect size → Cohen's d, power analysis
│ │ See: references/statistical_analysis.md
│ │
│ ├─ REGRESSION MODELING
│ │ ├─ Dose-response → Polynomial (quadratic, cubic)
│ │ ├─ Ratio optimization → Natural spline
│ │ └─ Model comparison → R-squared, F-statistic, AIC/BIC
│ │ See: references/statistical_analysis.md
│ │
│ ├─ CELL COUNTING
│ │ ├─ Fluorescence (DAPI, NeuN) → Threshold + watershed
│ │ ├─ Brightfield → Adaptive threshold
│ │ └─ High-density → CellPose or StarDist (external)
│ │ See: references/cell_counting.md
│ │
│ ├─ COLONY SEGMENTATION
│ │ ├─ Swarming assays → Otsu threshold + morphology
│ │ ├─ Biofilms → Li threshold + fill holes
│ │ └─ Growth assays → Time-lapse tracking
│ │ See: references/segmentation.md
│ │
│ └─ FLUORESCENCE QUANTIFICATION
│ ├─ Intensity measurement → regionprops
│ ├─ Colocalization → Pearson/Manders
│ └─ Multi-channel → Channel-wise quantification
│ See: references/fluorescence_analysis.md
│
└─ Q3: When to use scikit-image vs OpenCV?
├─ scikit-image: Scientific analysis, measurements, regionprops
├─ OpenCV: Fast processing, real-time, large batches
└─ Both: Often interchangeable for basic operations
See: references/image_processing.md "Library Selection Guide"
CRITICAL FIRST STEP: Before writing ANY code, identify what data files are available and what the question is asking for.
import os, glob, pandas as pd
# Discover data files
data_dir = "."
csv_files = glob.glob(os.path.join(data_dir, '**', '*.csv'), recursive=True)
tsv_files = glob.glob(os.path.join(data_dir, '**', '*.tsv'), recursive=True)
img_files = glob.glob(os.path.join(data_dir, '**', '*.tif*'), recursive=True)
# Load and inspect first measurement file
if csv_files:
df = pd.read_csv(csv_files[0])
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(df.head())
print(df.describe())
Common Column Names:
def grouped_summary(df, group_cols, measure_col):
"""Calculate summary statistics by group."""
summary = df.groupby(group_cols)[measure_col].agg(
Mean='mean',
SD='std',
Median='median',
Min='min',
Max='max',
N='count'
).reset_index()
summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
return summary
# Example: Colony morphometry by genotype
area_summary = grouped_summary(df, 'Genotype', 'Area')
circ_summary = grouped_summary(df, 'Genotype', 'Circularity')
For detailed statistical functions, see: references/statistical_analysis.md
Decision guide:
See: references/statistical_analysis.md for complete implementations
When to use each model:
Model comparison metrics:
See: references/statistical_analysis.md for complete implementations
Workflow: Load → Preprocess → Segment → Measure → Export
# Quick start for cell counting
from scripts.segment_cells import count_cells_in_image
result = count_cells_in_image(
image_path="cells.tif",
channel=0, # DAPI channel
min_area=50
)
print(f"Found {result['count']} cells")
Decision guide:
| Cell Type | Density | Best Method | Notes | |-----------|---------|-------------|-------| | Nuclei (DAPI) | Low-Medium | Otsu + watershed | Standard approach | | Nuclei (DAPI) | High | CellPose/StarDist | Handles touching | | Colonies | Well-separated | Otsu threshold | Fast, reliable | | Colonies | Touching | Watershed | Edge detection | | Cells (phase) | Any | Adaptive threshold | Handles uneven illumination | | Fluorescence | Low signal | Li threshold | More sensitive |
See: references/segmentation.md and references/cell_counting.md for detailed protocols
Use scikit-image when:
Use OpenCV when:
Both work for:
See: references/image_processing.md "Library Selection Guide"
Question type: "Mean circularity of genotype with largest area?"
Data: CSV with Genotype, Area, Circularity columns
Workflow:
See: references/segmentation.md "Colony Morphometry Analysis"
Question type: "Cohen's d for NeuN counts between conditions?"
Data: CSV with Condition, NeuN_count, Sex, Hemisphere columns
Workflow:
See: references/statistical_analysis.md "Effect Size Calculations"
Question type: "Dunnett's test: How many ratios equivalent to control?"
Data: CSV with multiple co-culture ratios, Area, Circularity
Workflow:
See: references/statistical_analysis.md "Dunnett's Test"
Question type: "Peak frequency from natural spline model?"
Data: CSV with co-culture frequencies and Area measurements
Workflow:
See: references/statistical_analysis.md "Regression Modeling"
| Task | Primary Tool | Reference | |------|-------------|-----------| | Load measurement CSV | pandas.read_csv() | This file | | Group statistics | df.groupby().agg() | This file | | T-test | scipy.stats.ttest_ind() | statistical_analysis.md | | ANOVA | statsmodels.ols + anova_lm() | statistical_analysis.md | | Dunnett's test | scipy.stats.dunnett() | statistical_analysis.md | | Cohen's d | Custom function (pooled SD) | statistical_analysis.md | | Power analysis | statsmodels TTestIndPower | statistical_analysis.md | | Polynomial regression | statsmodels.OLS + poly features | statistical_analysis.md | | Natural spline | patsy.cr() + statsmodels.OLS | statistical_analysis.md | | Cell segmentation | skimage.filters + watershed | cell_counting.md | | Colony segmentation | skimage.filters.threshold_otsu | segmentation.md | | Fluorescence quantification | skimage.measure.regionprops | fluorescence_analysis.md | | Colocalization | Pearson/Manders | fluorescence_analysis.md | | Image loading | tifffile, skimage.io | image_processing.md | | Batch processing | scripts/batch_process.py | scripts/ |
Ready-to-use scripts in scripts/ directory:
Usage:
# Count cells in image
python scripts/segment_cells.py cells.tif --channel 0 --min-area 50
# Batch process folder
python scripts/batch_process.py input_folder/ output.csv --analysis cell_count
For complete implementations and protocols:
Some BixBench questions use R for analysis. Python equivalents:
multcomp::glht) → scipy.stats.dunnett() (scipy ≥ 1.10)ns(x, df=4)) → patsy.cr(x, knots=...) with explicit quantile knotst.test()) → scipy.stats.ttest_ind()aov()) → statsmodels.formula.api.ols() + sm.stats.anova_lm()See: references/statistical_analysis.md for exact parameter matching
BixBench expects specific formats:
int(round(val, -3))Before returning your answer, verify:
tools
Onboard and manage Paperclip AI for research-paper knowledge and agent orchestration
development
Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.
testing
Generate a structured scientific PDF report from a JSON description. Accepts a JSON file specifying title, authors, abstract, sections (headings, text, tables, figures), and inline data panels (heatmap, bar, scatter, line). Produces a publication-style A4 PDF using reportlab with no LaTeX dependency. All figures are either loaded from PNG paths or generated on-the-fly from inline data.
development
Execute arbitrary Python code and return stdout. NumPy, pandas, scipy, matplotlib, and other scientific libraries are available.