hi-c-analysis/tad-detection/SKILL.md
Detects TAD boundaries from balanced Hi-C contact matrices via the diamond-window insulation score (cooltools insulation) and HiCExplorer hicFindTADs, returning a continuous log2 insulation track, valley-prominence boundary_strength, and Li/Otsu-thresholded is_boundary flags across a list of window sizes. Covers the multi-scale window sweep (sub-TAD to compartment-domain), why the boundary is reproducible but the domain partition is not, cross-condition comparison via differential SCORE not differential partition, and the insulation-vs-compartment orthogonality. Use when calling TADs or domain boundaries, computing insulation scores, choosing a window size, ranking boundary strength, comparing boundaries across conditions, or annotating CTCF-backed boundaries; route domain rendering to hic-visualization and boundary-feature overlap to genome-intervals.
npx skillsauth add GPTomics/bioSkills bio-hi-c-analysis-tad-detectionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: cooler 0.10+, cooltools 0.7+, bioframe 0.7+, HiCExplorer 3.7+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signatures<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
cooltools changed its API around 0.5 -> 0.7+ (functions standardized on view_df/viewframe arguments; insulation returns the boundary_strength_{W}/is_boundary_{W} columns). A .cool MUST be balanced (a stored weight column) before insulation; clr.matrix(balance=True) on an unbalanced cooler returns all-NaN. A .mcool is multi-resolution: pass a single-resolution URI (file.mcool::/resolutions/10000), never the bare .mcool.
"Where are the reproducible domain boundaries in my Hi-C matrix, and how strong?" -> Compute the diamond-window insulation score on the balanced matrix, take valley minima as boundaries and their prominence as strength, and report across a LIST of window sizes rather than a single magic scale.
cooltools.insulation(clr, [3*res, 5*res, 10*res, 25*res]) then rank by boundary_strength_{W}hicFindTADs -m corrected.cool --outPrefix tads --correctForMultipleTesting fdr (sweep --minDepth/--maxDepth/--step)A population Hi-C "TAD" is the ensemble average over a heterogeneous mixture of cell-specific, stochastic domains. Single-cell imaging (Bintu 2018 Science 362:eaau1783) shows individual cells DO have sharp domains, but the boundary POSITION varies cell to cell - the population boundary is a preferred position, not a wall. Cohesin depletion abolishes population TADs while leaving single-cell domains intact, removing only the preferred-position bias. Three consequences govern every decision in this skill:
| Method | Role | Mechanism | When |
|--------|------|-----------|------|
| cooltools insulation | boundary score + strength | diamond-window valleys; prominence = strength; Li/Otsu threshold | cooler-native pipeline, multi-scale, the modern default |
| HiCExplorer hicFindTADs | domains + boundaries + FDR | multi-window TAD-separation score with per-bin multiple-testing | CLI workflow, hierarchical sweep, FDR-controlled boundaries |
| directionality index (DI) | boundary direction | HMM on up/downstream interaction bias (Dixon 2012) | classic comparison, legacy reproducibility, gives a partition |
| Arrowhead (Juicer) | corner-score domains | arrowhead transform on the .hic; loop-anchored contact domains | Juicer/.hic ecosystems; calls fewer, sharper domains (Rao 2014, median ~185kb) |
| OnTAD / TADtree / rGMAP | nested/hierarchical | explicitly models meta-TADs > TADs > sub-TADs | when the question is about hierarchy or sub-TAD insulation (An 2019) |
| Stripenn / JOnTADS | stripe-aware | calls asymmetric stripes as first-class objects | when the map shows flames/stripes a flat caller mis-segments |
Insulation/DI/hicFindTADs are blind to stripes (asymmetric one-sided extrusion). Arrowhead "contact domains" are corner-anchored and categorically different from track-based boundaries - do NOT cross-compare their counts naively.
| Scenario | Recommended | Why |
|----------|-------------|-----|
| Matrix not yet balanced | cooler balance / cooler.balance_cooler first | unbalanced insulation = coverage-driven garbage valleys |
| Reproducible boundaries, one sample | insulation multi-window, rank by boundary_strength_{W} | strength is continuous and comparable; partition is brittle |
| "What scale of domain?" | run windows [3,5,10,25]x bin; ~10x is the mammalian sweet spot | the window sets sub-TAD vs TAD vs compartment-domain |
| Need FDR-controlled domains | hicFindTADs with --correctForMultipleTesting fdr, sweep depths | per-bin multiple-testing on the TAD-separation score |
| Hierarchical/nested structure | OnTAD/TADtree (or compare windows) | a flat caller picks ONE level set by its window |
| Asymmetric stripes/flames present | Stripenn/JOnTADS | insulation-only pipelines are blind to stripes |
| Two conditions, boundary change | differential SCORE at matched bins, NOT intersected domain BEDs | partitions are unstable; set-differencing manufactures spurious gain/loss |
| Annotate boundaries with CTCF | -> chip-seq/peak-annotation, genome-intervals/overlap-significance | ~76-85% of boundaries are convergent CTCF + cohesin |
| Overlap boundaries with features | -> genome-intervals/interval-arithmetic | boundary BED set operations live there |
| Render domains on the matrix | -> hic-visualization | the TAD square is a colormap/resolution choice as much as a measurement |
Goal: Produce a continuous boundary-strength track and threshold-flagged boundaries at several scales, so the analysis reports where insulation reproducibly dips rather than a single brittle partition.
Approach: Run cooltools.insulation on the balanced cooler with a LIST of window sizes (3-25x the bin). Each window appends its own log2_insulation_score_{W} (valleys = boundaries), boundary_strength_{W} (valley prominence - the quantitative, comparable strength), and is_boundary_{W} (the prominence passed through a Li histogram threshold). Rank and compare on boundary_strength, not on the boolean flag.
import cooler
import cooltools
clr = cooler.Cooler('matrix.mcool::/resolutions/10000') # single-resolution URI, must be balanced
res = clr.binsize
windows = [3 * res, 5 * res, 10 * res, 25 * res] # 30k,50k,100k,250k: sub-TAD -> compartment-domain
ins = cooltools.insulation(clr, windows, verbose=True) # clr_weight_name='weight' default -> needs ICE balancing
strong = ins[ins[f'is_boundary_{10 * res}']] # 100kb window: ~10x bin, mammalian interphase sweet spot
ranked = ins.dropna(subset=[f'boundary_strength_{10 * res}']).sort_values(f'boundary_strength_{10 * res}', ascending=False)
boundary_strength_{W} is the scipy-style PROMINENCE of the insulation valley - continuous, quantitative, and comparable across samples; use it for ranking and cross-condition deltas. is_boundary_{W} is just that prominence passed through threshold='Li' (skimage threshold_li, an Otsu-like histogram split that is MORE PERMISSIVE than Otsu). Because the Li cutoff is fit per dataset, is_boundary is dataset-dependent and NOT directly comparable across samples - compare boundary_strength, then threshold consistently. min_frac_valid_pixels (default 0.66) and min_dist_bad_bin gate which bins get a score; sparse/blacklisted regions silently drop boundaries, so inspect n_valid_pixels_{W} before trusting a boundary in a low-coverage locus.
Goal: Get an FDR-controlled boundary/domain set from a multi-window TAD-separation score when a CLI workflow or hierarchical depth sweep is preferred.
Approach: Feed a CORRECTED (balanced) matrix and sweep the diamond depths (--minDepth/--maxDepth/--step); hicFindTADs computes a TAD-separation score at each depth and applies per-bin multiple-testing. The docs explicitly warn to sweep parameters before claiming a TAD count or comparing conditions.
hicFindTADs -m corrected.cool --outPrefix tads \
--minDepth 30000 --maxDepth 100000 --step 10000 \
--correctForMultipleTesting fdr --thresholdComparisons 0.01 --delta 0.01
# minDepth >= ~3x bin, maxDepth <= ~10x range, step >= ~2x bin; --minBoundaryDistance defaults to 4x bin
# outputs: tads_boundaries.bed, tads_domains.bed, tads_score.bedgraph, tads_tad_separation.bm, tads_zscore_matrix.h5
Goal: Decide which boundaries strengthen or weaken between conditions without the spurious gain/loss that comes from intersecting unstable domain calls.
Approach: Because partitions are unstable (caller/resolution-dependent), do NOT call TADs in each condition and set-difference the domain BEDs. Instead match resolution AND down-sample to matched valid-pixel depth, compute the bin-matched continuous insulation track at a fixed window, take the per-bin delta of log2_insulation_score (or boundary_strength), and test against a permutation/replicate null. Report boundary STRENGTHENING/WEAKENING, treating a binary boundary gain/loss as real only when strength crosses threshold robustly across replicates.
ins_wt = cooltools.insulation(clr_wt, [10 * res])
ins_ko = cooltools.insulation(clr_ko, [10 * res])
key = f'log2_insulation_score_{10 * res}'
merged = ins_wt[['chrom', 'start', 'end', key]].merge(ins_ko[['chrom', 'start', 'end', key]], on=['chrom', 'start', 'end'], suffixes=('_wt', '_ko'))
merged['delta'] = merged[f'{key}_ko'] - merged[f'{key}_wt'] # negative = stronger insulation in KO; test vs a permutation null
Insulation (loop-extrusion barriers) and A/B compartmentalization (affinity/phase separation) are ORTHOGONAL mechanisms: CTCF degron erases insulation while compartments persist (Nora 2017 Cell 169:930); cohesin/RAD21 degron erases TADs+loops while compartments sharpen. Never read a boundary change as a compartment switch - a boundary can sit mid-compartment.
Trigger: insulation on a cooler with no stored weight (or clr_weight_name=None). Mechanism: the diamond sum is dominated by per-bin coverage bias, not topology. Symptom: valleys track sequencing depth/blacklist, not domains. Fix: cooler balance first; keep the default clr_weight_name='weight'.
Trigger: calling insulation with one window_bp and treating its partition as ground truth. Mechanism: the window IS the scale dial; one window picks one level of a nested hierarchy. Symptom: sub-TAD or compartment-domain structure invisible; "TAD count" irreproducible. Fix: sweep [3,5,10,25]x bin and report multi-scale; pick the scale that matches the biological question.
Trigger: calling TADs per condition and set-differencing the domain files. Mechanism: partitions are unstable, so set differences manufacture changes that are caller noise. Symptom: large "gained/lost TAD" lists that do not replicate. Fix: differential on the continuous bin-matched insulation/boundary-strength track with a permutation null.
Trigger: window_bp < 3 * binsize. Mechanism: the diamond spans too few pixels to average out noise. Symptom: dense spurious boundaries, no biological structure. Fix: set window >= 3x bin (10x is the mammalian sweet spot).
Trigger: counting is_boundary True in two libraries and subtracting. Mechanism: the Li threshold is fit per dataset; depth/strength-distribution differences shift the cutoff. Symptom: apparent boundary gain/loss driven by depth, not biology. Fix: compare continuous boundary_strength, then threshold consistently.
Trigger: a boundary expected in a low-coverage/blacklisted region is missing. Mechanism: min_frac_valid_pixels (0.66) and min_dist_bad_bin gate scoring; sparse diamonds get NaN. Symptom: no boundary where the biology predicts one. Fix: inspect n_valid_pixels_{W}; raise min_dist_bad_bin near bad bins or interpret cautiously.
Trigger: cooler uses chr1, a phasing/annotation track uses 1. Mechanism: chromosomes never match. Symptom: empty/zero output, no error. Fix: harmonize names across cooler, fasta, and CTCF/feature tracks.
| Threshold | Source | Rationale |
|-----------|--------|-----------|
| TAD/insulation resolution 10-40kb | domain scale | sub-Mb domains; bins must resolve boundaries without burning depth |
| Window 3-25x bin (sweep) | Open2C insulation notebook | <3x = noise; 25x = compartment-domain scale; the window is the scale dial |
| ~10x bin single window | mammalian interphase convention | e.g. 100kb window at 10kb bins for interphase TAD boundaries |
| min_frac_valid_pixels 0.66 | cooltools default | min valid-pixel fraction in a diamond for the bin to score |
| threshold='Li' | cooltools default | permissive (vs Otsu) histogram split; dataset-dependent, NOT cross-sample comparable |
| hicFindTADs depths: minDepth >=3x bin, step >=2x bin | HiCExplorer docs | the diamond depths must straddle real domain sizes; sweep before comparing |
| ~76-85% boundaries are CTCF (convergent) | Rao 2014; Vietri Rudan 2015 | strength scales with CTCF+cohesin occupancy; a sanity anchor, not a filter |
| match resolution + valid-pixel depth before gain/loss | resolution-confound | unequal depth shifts boundaries and merges sub-TADs; a false-positive engine |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| insulation output all NaN | cooler not balanced | cooler balance / cooler.balance_cooler first |
| KeyError on .mcool / wrong resolution | bare .mcool passed | use file.mcool::/resolutions/<bp> URI |
| Dense spurious boundaries | window < ~3x bin | raise window_bp to >= 3x bin (10x typical) |
| Boundary counts differ wildly between samples | comparing is_boundary (per-dataset Li threshold) | compare continuous boundary_strength, threshold consistently |
| Spurious "gained/lost TADs" | differential on intersected domain partitions | differential on the continuous bin-matched score with a null |
| Empty result / missing boundary | chrom naming mismatch or sparse locus | harmonize names; inspect n_valid_pixels_{W} |
| AttributeError on cooltools call | pre-0.7 vs 0.7+ API change | help(cooltools.insulation); update to the viewframe signature |
development
Find restriction enzyme cut sites in DNA sequences using Biopython Bio.Restriction. Search with single enzymes, batches of enzymes, or commercially available enzyme sets. Returns cut positions for linear or circular DNA. Use when finding restriction enzyme cut sites in sequences.
development
Create restriction maps showing enzyme cut positions on DNA sequences using Biopython Bio.Restriction. Visualize cut sites, calculate distances between sites, and generate text or graphical maps. Use when creating or analyzing restriction maps.
development
Analyze restriction digest fragments using Biopython Bio.Restriction. Predict fragment sizes, get fragment sequences, simulate gel electrophoresis patterns, and perform double digests. Use when analyzing restriction digest fragment patterns.
development
Select restriction enzymes by criteria using Biopython Bio.Restriction. Find enzymes that cut once, don't cut, produce specific overhangs, are commercially available, or have compatible ends for cloning. Use when selecting restriction enzymes for cloning or analysis.