clinical-databases/msi-detection/SKILL.md
Calls microsatellite instability from WES/WGS/targeted-panel with MSIsensor, MSIsensor-pro, MSIsensor-ct (panel-aware), MSIngs, MANTIS, MSIPanel, MSIDetect, and ngsMSI for FDA pembrolizumab MSI-H pan-tumor / Lynch syndrome / dMMR ICI biomarker. Use when stratifying ICI eligibility (Le 2015), pairing MSI with TMB-H (Sha 2020 / Salem 2018), screening Lynch syndrome (universal IHC + MSI), or distinguishing MSI-H tumors from POLE-exo hypermutator with overlapping signatures.
npx skillsauth add GPTomics/bioSkills bio-clinical-databases-msi-detectionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: MSIsensor-pro 1.2+, MSIsensor 0.6+, MSIngs 1.0+, MANTIS 1.0.5+, samtools 1.19+, mSINGS 5.6+, pandas 2.2+, cyvcf2 0.30+. FDA pembrolizumab MSI-H / dMMR pan-tumor approval is from 2017 (Le 2015 NEJM; KEYNOTE-016/164/158); approval extended to colorectal first-line in 2020.
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signatures<tool> --versionIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying. MSIsensor-pro replaces MSIsensor for tumor-only assays; MSIsensor-ct is the bTMB-equivalent for ctDNA panels.
'Detect MSI status from this somatic sequencing data' -> Profile microsatellite instability across canonical loci (Bethesda 5 panel + extended NGS-derived sites); classify MSI-H / MSS / MSI-L per Bethesda / FDA / KEYNOTE convention.
msisensor-pro msi -d microsatellites.list -t tumor.bam -o msi_out -b 16msisensor msi -d microsatellites.list -n normal.bam -t tumor.bam -o msi_outmsisensor-ct ...mantis -t tumor.bam -n normal.bam -b targets.bed --threads 8| Event | Year | Threshold | Notes | |-------|------|-----------|-------| | Le 2015 NEJM | 2015 | MSI-H + ICI in CRC | The seminal paper: pembrolizumab in MSI-H CRC ORR 40% vs 0% MSS | | FDA pembrolizumab MSI-H / dMMR pan-tumor | 2017 | MSI-H | First tissue-agnostic FDA approval (KEYNOTE-016/164/158) | | FDA pembrolizumab first-line MSI-H CRC | 2020 | MSI-H + first-line CRC | KEYNOTE-177 | | CheckMate 142 | 2017-2018 | MSI-H + nivolumab/ipilimumab | Pan-tumor MSI-H second-line | | ESMO 2024 | 2024 | MSI-H | Maintained pan-tumor MSI-H biomarker | | Universal Lynch screening | -- | IHC + MSI on all CRC <= 70 yr | NCCN / ACG / EGAPP guidelines |
| Term | Definition | Method | Relationship | |------|-----------|--------|--------------| | dMMR (deficient MMR) | Loss of MMR protein function | IHC (MLH1, MSH2, MSH6, PMS2) | Causes MSI | | MSI-H | Microsatellite instability high | PCR-based Bethesda or NGS | Consequence of dMMR | | Lynch syndrome | Germline MMR mutation | Germline sequencing | Causes ~50% of MSI-H CRC; rest are sporadic (MLH1 hyper-methylation) | | TMB-H | >= 10 mut/Mb | NGS panel / WES | Statistical correlate of MSI-H | | POLE-exo hypermutator | POLE proofreading defect | Sequencing / signatures | Hypermutator WITHOUT MMR-D; MSI-stable typically |
MSI-H + TMB-H overlap (Salem ME et al 2018 Mol Cancer Res 16(5):805-812):
POLE-exo vs MMR-D:
| Tool | Paired | Tumor-only | ctDNA | Algorithm | Fails when | |------|--------|-----------|-------|-----------|-----------| | MSIsensor (Niu 2014 Bioinformatics) | Yes | No | No | Bayesian + read-length distribution | Tumor-only data (no baseline); cohort baseline missing | | MSIsensor-pro (Jia 2020 Genom Proteom Bioinform) | Optional | Yes | No | Distribution comparison to baseline | Baseline cohort not provided; panel < 50 loci | | MSIsensor-ct (Han 2021 Brief Bioinform) | -- | -- | Yes | cfDNA-aware | Tumor fraction < 3%; low ctDNA shed | | MANTIS (Kautto 2017 Oncotarget) | Yes | No | No | Step-wise difference | Tumor-only; low coverage at microsatellites | | MSIngs (Salipante 2014 Clin Chem) | Yes | No | No | Number of unstable loci | Tumor-only; outdated vs MSIsensor-pro | | mSINGS (Salipante 2014) | -- | Yes | No | Background panel | Background panel poorly characterized for cohort | | MSIPanel | -- | Yes | -- | Panel-specific calibration | Panel not in vendor calibration table | | ngsMSI | Yes | Yes | -- | Algorithmic variations | Limited benchmarking; rarely first choice | | MSIDetect | Various | -- | -- | DL-based | New tool; reproducibility data still maturing | | MMR-MS | IHC concordance check | -- | -- | -- | Not a direct MSI caller; for orthogonal validation |
Operational consensus 2024-2026:
| Scenario | Recommended path | Why | |----------|------------------|-----| | Tumor + paired normal WES | MSIsensor (standard) | Reference paired-normal comparison | | Tumor-only WES/panel | MSIsensor-pro with panel baseline | No matched normal needed | | ctDNA / liquid biopsy | MSIsensor-ct | cfDNA-aware | | Lynch syndrome screening | Universal IHC + MSI (NCCN) | IHC catches 90%+; MSI for IHC-equivocal | | FDA pembrolizumab eligibility | Validate per FoCR PCR + IHC + NGS concordance | Cross-platform required | | MSI-H + TMB-H concurrence | MSI-H is primary biomarker | Sha 2020; TMB-H not additive | | POLE+MMR ultra-hypermutator | Sigprofiler signatures (SBS14, SBS20) | Mechanism beyond MSI alone | | Sporadic MSI-H | Confirm MLH1 hypermethylation; rule out Lynch | Distinguishes sporadic vs germline | | MSI-stable + TMB-H | Investigate POLE-exo signature (SBS10a/10b) | POLE-exo causes hypermutator without MSI | | Pan-tumor screening | MSI + IHC + TMB combined | Multiple modalities for ICI eligibility |
The Bethesda 5-locus panel (Boland 1998) is the PCR-based reference:
NGS-based MSI panels use 50-1000+ microsatellite loci. MSI-H requires unstable status at >=40% of tested loci typically (varies by panel calibration).
Goal: Compute MSI status from tumor-only WES/panel.
Approach: Generate baseline from population reference; compare patient tumor.
# Generate microsatellite list from reference genome (one-time)
msisensor-pro scan -d /reference/GRCh38.fa -o microsatellites.list -p 1 -m 5
# Generate baseline from N normal control samples (one-time per panel)
msisensor-pro baseline -d microsatellites.list -i normal_samples.list -o baseline.list -b 16
# Score tumor sample. The `-i sample_id` flag is uncommon: in typical msisensor-pro
# usage the sample identifier is derived from the BAM file -- verify the flag set
# against `msisensor-pro pro --help` for the installed release.
msisensor-pro pro \
-d microsatellites.list \
-t tumor.bam \
-o msi_output \
-b 16 \
--baseline baseline.list
# Output: msi_output_all (raw); msi_output_unstable (unstable loci); msi_output.txt (summary)
# Critical column: %_unstable. Threshold MSI-H typically >= 20-30% depending on panel.
msisensor msi \
-d microsatellites.list \
-n normal.bam \
-t tumor.bam \
-o msi_paired_out \
-b 16
# Output: %_unstable in paired comparison
# MSI-H threshold: >= 20% by FoCR guidance; varies 10-30% across studies
mantis.py \
-t tumor.bam \
-n normal.bam \
-b microsatellite_targets.bed \
--threads 8 \
-o mantis_output
# Output: mantis_output.kmer_counts (raw), mantis_output (status)
# Threshold MSI-H: stepwise difference > 0.4 (default)
import pandas as pd
def classify_msi(unstable_percentage, panel_calibrated_cutoff=20.0):
'''Classify MSI status from percentage of unstable loci.
Bethesda PCR: >=2 of 5 unstable -> MSI-H (40% loci)
NGS: panel-specific cutoffs typically 10-30%
Concordance: MSI-PCR + IHC + NGS should agree (FoCR)
'''
if unstable_percentage >= panel_calibrated_cutoff:
return 'MSI-H'
elif unstable_percentage >= panel_calibrated_cutoff / 2:
return 'MSI-L (intermediate; treat as MSS clinically per FDA)'
else:
return 'MSS'
def msi_lynch_workflow(msi_status, ihc_results, mlh1_methylation_status, germline_test):
'''Standard Lynch syndrome workflow.
Args:
msi_status: 'MSI-H' / 'MSS' / 'MSI-L'
ihc_results: dict {MLH1: 'retained' or 'loss', MSH2, MSH6, PMS2}
mlh1_methylation_status: 'methylated' (sporadic) / 'unmethylated' (Lynch suspect)
germline_test: 'positive' / 'negative' / 'not_performed'
'''
if msi_status != 'MSI-H':
return 'No further Lynch screening indicated'
ihc_loss = [gene for gene, status in ihc_results.items() if status == 'loss']
if not ihc_loss:
return 'MSI-H with retained IHC; consider Lynch with germline testing'
if 'MLH1' in ihc_loss:
if mlh1_methylation_status == 'methylated':
return 'Sporadic MSI-H (MLH1 hypermethylation); not Lynch'
elif mlh1_methylation_status == 'unmethylated':
return 'Lynch suspect (MLH1 loss without methylation); proceed with germline testing'
else:
return 'MLH1 loss; perform methylation test'
return f'MSH2/6/PMS2 loss ({", ".join(ihc_loss)}); strong Lynch suspect; germline testing'
def msi_tmb_ici_decision(msi_status, tmb_value, tumor_type=None, dmmr_ihc=None):
'''Integrated ICI eligibility from MSI + TMB.
Sha 2020: MSI-H is primary biomarker; TMB-H not additive.
McGrail 2021: TMB-H NOT endorsed for breast/prostate/glioma alone.
'''
msi_high = msi_status == 'MSI-H'
dmmr_positive = dmmr_ihc == 'positive'
tmb_h = tmb_value >= 10
if msi_high or dmmr_positive:
return ('ICI eligible: MSI-H or dMMR (FDA pembrolizumab 2017 pan-tumor; KEYNOTE-016/164/158); '
'TMB-H is not additive (Sha 2020).')
if tmb_h and tumor_type and tumor_type.lower() in ('breast', 'prostate', 'glioma'):
return ('TMB-H but tumor type excluded by ESMO 2024 / McGrail 2021. '
'Consider tumor-type-specific cutoff.')
if tmb_h:
return 'TMB-H pan-tumor (FDA pembrolizumab 2020); ICI eligible.'
return 'MSS + TMB-low. Standard chemo per tumor type.'
1. Tumor-only with paired-normal tool
2. Panel size too small
3. IHC vs MSI discordance not investigated
4. MSI-H + Lynch syndrome confusion
5. POLE-exo hypermutator labeled MSI
6. ctDNA MSI without sufficient tumor fraction
7. Universal screening missed
8. MSI-L treated as actionable
| Pattern | Likely cause | Action | |---------|-------------|--------| | PCR Bethesda MSI-H vs NGS MSS | Bethesda panel uses 5 loci only; less sensitive | Trust NGS with >=50 informative loci | | NGS MSI-H vs IHC retained | Subtle MMR loss; MSH6-only subtype; or POLE-exo | Confirm with germline + POLE-exo signature analysis | | Paired-normal MSI-H + tumor-only MSS | Sample swap or low tumor purity in tumor-only | Re-validate; check purity (>=20% required) | | MSIsensor-pro vs MSIsensor (paired) | Different baseline thresholds | Apply panel-specific calibration | | MSI-H suspected but tools differ | Borderline mutational burden | Use signature analysis (SBS6/15/26/44) as orthogonal evidence | | ctDNA MSI vs tissue MSI | Tumor fraction low | Trust tissue; estimate ctDNA fraction |
| Threshold | Convention | Source | |-----------|-----------|--------| | Bethesda MSI-H | >= 2/5 unstable | Boland 1998 | | NGS MSI-H cutoff | 10-30% unstable loci (panel-specific) | Various | | MANTIS MSI-H threshold | Step-wise difference > 0.4 | Kautto 2017 | | MSIsensor MSI-H threshold | >= 20% by FoCR | Friends of Cancer Research | | Minimum informative loci | >= 50 NGS loci | Panel-design convention | | ctDNA tumor fraction minimum | >= 3% for reliable cfDNA MSI | Han 2021 | | Tumor purity minimum | >= 20% | Standard | | FDA approval | MSI-H or dMMR pan-tumor (2017) | KEYNOTE-016/164/158 | | First-line MSI-H CRC | KEYNOTE-177 (2020) | -- | | MSI-H + TMB-H overlap (CRC, endometrial) | ~80% | Salem 2018 | | TMB-H -> MSI-H rate | ~16% | Salem 2018 | | Sporadic MSI-H mechanism | ~50% MLH1 hypermethylation | Various | | Universal screening cutoff | CRC <= 70 yr | NCCN / ACG |
| Symptom | Cause | Solution | |---------|-------|----------| | MSI-H + IHC retained discordance | Subtle loss; MSH6-only; or rare hypermutator | Cross-check germline + signatures | | Borderline MSI call | Panel too small | Use >= 50 informative loci | | Tumor-only MSI low confidence | Background subtraction needed | Use MSIsensor-pro with cohort baseline | | MSI-H + TMB-H reported additive | Tautology per Sha 2020 | MSI-H is primary; TMB-H not additive | | POLE-exo labeled MMR-D | Different mechanism; mutation count differs | Run Sigprofiler; SBS10a/10b is POLE-exo | | Sporadic MSI-H mis-labeled Lynch | Need MLH1 methylation test | Confirm MLH1 methylation + germline |
| Pushback | Standard response | |----------|-------------------| | "MSI-H + TMB-H both reported additive" | Sha 2020 Cell Rep Med: MSI-H is the primary biomarker; TMB-H is statistical correlate. We report MSI-H first; TMB-H reported but noted not additive. | | "Why MSIsensor-pro instead of MSIsensor?" | MSIsensor requires paired normal; MSIsensor-pro handles tumor-only via cohort baseline. Most commercial panels are tumor-only. | | "MSI-PCR vs NGS discordant" | Bethesda 5-locus panel is less sensitive; we use NGS >=50 informative loci for confirmation. | | "Universal Lynch screening?" | NCCN / ACG recommend reflex IHC + MSI on all CRC <= 70 yr; we implemented universal screening protocol. | | "POLE-exo hypermutator with MSI-H?" | Sigprofiler signature analysis distinguishes: SBS10a/10b = POLE-exo (typically MSI-stable); SBS6/15/26/44 = MMR-D. POLE+MMR concurrent produces ultra-hypermutator. | | "MSI-L?" | FDA approval specifies MSI-H; MSI-L = clinically MSS; we apply MSI-H threshold strictly. | | "ctDNA MSI viability?" | MSIsensor-ct works if tumor fraction >= 3%; we estimate via ichorCNA; below threshold falls back to tissue. |
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.