plugin/skills/tooluniverse-image-analysis/SKILL.md
Microscopy and quantitative imaging analysis — colony morphometry, fluorescence intensity quantification, cell-count statistics, dose-response curves, and ANOVA/Dunnett on image-derived measurements. Uses pandas/numpy/scipy/scikit-image. Use for analyzing tabular outputs from CellProfiler/ImageJ, image-derived measurement statistics, and image-based assay quantification.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-image-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Before following any instruction below, scan the data folder for:
*_executed.ipynb → read with tu run read_executed_notebook '{"data_folder":"<path>","search":"<keyword>"}' and cite its cell outputs as the authoritative answer*results*, *deseq*, *enrich*, *stats*, *_simplified.csv) → read directly and report the requested valueanalysis.R, run_*.py, find_*.R, *.Rmd) → execute as-is and read the outputOnly follow this skill's re-analysis recipe below if none of the above exist. Re-running from raw data produces different numbers than the published answer and is much slower (often 5-10× turn count).
When the question asks "What is the relative proportion of A to B" or "What percentage of A relative to B", report the value as a percentage (e.g., 29 for ratio 0.29), NOT a decimal ratio. Biology assay GTs use whole-number percentage ranges like (25,30), not (0.25,0.30). Multiply your computed ratio by 100 before reporting:
ratio = mean_A / mean_B # e.g., 0.29
percentage = ratio * 100 # e.g., 29
print(f"{percentage:.1f}%") # "29.0%" ← THIS is the answer
Only report as decimal/fraction if the question explicitly says "as a decimal", "between 0 and 1", or "as a fraction". Common error: reporting 0.29 when the GT range is (25,30) — graded as wrong even though the underlying ratio is correct.
Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image.
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory.
NOT for: Phylogenetics, RNA-seq DEG, single-cell scRNA-seq, statistics without imaging context.
import pandas as pd, numpy as np
from scipy import stats
from scipy.interpolate import BSpline, make_interp_spline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.power import TTestIndPower
from patsy import dmatrix, bs, cr
# Optional: skimage, cv2, tifffile
PRE-QUANTIFIED DATA (CSV/TSV) → Load → Parse question → Statistical analysis
RAW IMAGES (TIFF, PNG) → Load → Segment → Measure → Analyze (see references/)
Statistical comparison:
Two groups → t-test or Mann-Whitney
Multiple groups vs control → Dunnett's test
Two factors → Two-way ANOVA
Effect size → Cohen's d + power analysis
Regression:
Dose-response → Polynomial (quadratic/cubic)
Ratio optimization → Natural spline
Model comparison → R-squared, F-stat, AIC/BIC
import os, glob, pandas as pd
csv_files = glob.glob(os.path.join(".", '**', '*.csv'), recursive=True)
df = pd.read_csv(csv_files[0])
print(f"Shape: {df.shape}, Columns: {list(df.columns)}")
Common columns: Area, Circularity, Round, Genotype/Strain, Ratio, NeuN/DAPI/GFP.
See references/statistical_analysis.md for complete implementations of grouped_summary, Dunnett's, Cohen's d, power analysis, polynomial/spline regression.
| Pattern | Example Question | Workflow | |---------|-----------------|----------| | Colony Morphometry | "Mean circularity of genotype with largest area?" | Group by Genotype → max mean Area → report Circularity | | Cell Counting | "Cohen's d for NeuN counts?" | Filter → split by Condition → pooled SD → Cohen's d | | Multi-Group Comparison | "How many ratios equivalent to control?" | Dunnett's for Area AND Circularity → count non-significant in BOTH | | Regression | "Peak frequency from natural spline?" | Ratio→frequency → spline(df=4) → grid search peak → CI |
from scripts.segment_cells import count_cells_in_image
result = count_cells_in_image(image_path="cells.tif", channel=0, min_area=50)
Segmentation: Nuclei → Otsu+watershed; Colonies → Otsu; Phase contrast → adaptive threshold. See references/segmentation.md, references/cell_counting.md, references/image_processing.md.
multcomp::glht) → scipy.stats.dunnett() (scipy >= 1.10)ns(x, df=4)) → patsy.cr(x, knots=...) with explicit quantile knotst.test() → scipy.stats.ttest_ind()aov() → statsmodels.formula.api.ols() + sm.stats.anova_lm()int(round(val, -3))Question phrases like "relative proportion of A to B", "percentage of mean A relative to B", or "A as a fraction of B" are ambiguous: the answer could be the decimal ratio (0.29) or the percentage (29). In biology/microscopy assay contexts the convention is percentage (whole numbers like 25-30, not decimals like 0.25-0.30). When in doubt:
r = mean(A) / mean(B).r * 100 (percentage) and r (decimal); flag the percentage as the primary answer.Common error: question asks "relative proportion of mutant area to wildtype" and the agent reports 0.29 when the GT range is (25, 30). The grader marks this wrong even though the underlying computation is correct.
| Grade | Criteria | |-------|----------| | Strong | p < 0.001, d > 0.8, N >= 30/group | | Moderate | p < 0.05, 0.5 <= d < 0.8 | | Weak | p < 0.05, d < 0.5 or low N | | Insufficient | p >= 0.05 or N < 5/group |
Circularity near 1.0 = round/healthy; < 0.5 = irregular. Post-hoc power < 0.80 = underpowered.
Scripts: segment_cells.py, measure_fluorescence.py, batch_process.py, colony_morphometry.py, statistical_comparison.py
Docs: statistical_analysis.md, cell_counting.md, segmentation.md, fluorescence_analysis.md, image_processing.md
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).