Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

GPTomics/bio-data-visualization-statistical-annotation

Name: bio-data-visualization-statistical-annotation
Author: GPTomics

data-visualization/statistical-annotation/SKILL.md

npx skillsauth add GPTomics/bioSkills bio-data-visualization-statistical-annotation

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Version Compatibility

Reference examples tested with: ggpubr 0.6+, ggsignif 0.6+, rstatix 0.7+, statannotations 0.6+ (Python), seaborn 0.13+.

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('<pkg>') then ?function_name
Python: pip show <package> then help(module.function) to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Statistical Annotation

"Add p-values to my plot" -> Render pairwise group comparisons as brackets with the correct statistical test (parametric vs non-parametric, paired vs unpaired, independent vs nested), adjusted for multiple testing, with rendering of significance as either numerical p OR asterisks. The choices that matter: which test is appropriate for the data, what multiple-testing adjustment applies, and whether to show n.s. (non-significant) results.

R: ggpubr::stat_compare_means, ggsignif::geom_signif, rstatix::t_test/wilcox_test
Python: statannotations.Annotator, scipy.stats directly

The Single Most Important Modern Insight -- The Test Must Match the Data

Tool defaults are NOT data-appropriate. ggpubr::stat_compare_means(method='t.test') uses Welch's two-sample t-test assuming approximate normality and unequal variances. This is wrong when:

Data are non-normal and N is small (<30): use Mann-Whitney U (method='wilcox.test').
Data are paired: use paired t-test or paired Wilcoxon (paired = TRUE).
Comparing >2 groups: ANOVA / Kruskal-Wallis with post-hoc, not all-pairs t-test (multiple-testing penalty).
Data are nested (cells within patients, replicates within samples): linear mixed model, NOT pairwise test.

The bracket-and-asterisk visual is the same; the underlying statistics are not. Choose the test deliberately.

Decision Tree for Test Selection

| Question | Recommended test | Function | |----------|------------------|----------| | 2 unpaired groups, normal, N≥30 | Welch t-test | t.test(), stat_compare_means(method='t.test') | | 2 unpaired groups, non-normal or small N | Mann-Whitney U (Wilcoxon rank-sum) | wilcox.test(), stat_compare_means(method='wilcox.test') | | 2 paired groups | Paired t-test OR Wilcoxon signed-rank | paired = TRUE | | 3+ groups, normal | One-way ANOVA + Tukey HSD post-hoc | aov(), TukeyHSD() | | 3+ groups, non-normal | Kruskal-Wallis + Dunn post-hoc | kruskal.test(), dunn.test() | | Nested data (cells in patients) | Linear mixed model | lme4::lmer() | | Time-course / repeated measures | Repeated-measures ANOVA OR LMM | nlme::lme() | | Two-way factorial | Two-way ANOVA + interaction term | aov(y ~ a*b) | | Survival / time-to-event | Log-rank, NOT t-test | survdiff() | | Categorical outcome | Chi-square OR Fisher exact | chisq.test(), fisher.test() |

Multiple Testing Adjustment

For pairwise comparisons among K groups, there are K(K-1)/2 unadjusted p-values. Without adjustment, family-wise error rate inflates rapidly:

3 groups: 3 comparisons; α_FW = 14% at nominal 5%
4 groups: 6 comparisons; α_FW = 26%
6 groups: 15 comparisons; α_FW = 54%

# rstatix supports per-comparison adjustment
library(rstatix)
df %>%
    pairwise_wilcox_test(value ~ group, p.adjust.method = 'bonferroni') %>%
    add_xy_position(x = 'group')

p.adjust.method options:

'bonferroni' — strictest; controls FWER
'holm' — stepwise Bonferroni; uniformly more powerful than Bonferroni
'BH' (Benjamini-Hochberg) — FDR; less strict than FWER; standard for genomics
'fdr' — alias for BH

For figure annotations, holm is the modern default — controls FWER and is more powerful than bonferroni. For a small number of pre-planned comparisons (≤3), Bonferroni is fine.

ggpubr -- Standard ggplot2 Workflow

Goal: Add per-comparison p-value brackets between groups on a distribution plot, using a test appropriate to data shape and adjusting for multiple comparisons.

Approach: Build the base plot with ggboxplot(); add stat_compare_means() with explicit method, comparisons, p.adjust.method, and label arguments; render as asterisks (p.signif) for terse display or numeric (p.format) for precise display.

library(ggpubr)

# Default boxplot + p-value bracket(s)
ggboxplot(df, x = 'group', y = 'value', color = 'group',
          add = 'jitter', palette = 'npg') +
    stat_compare_means(method = 'wilcox.test',           # explicit; default is t-test
                       comparisons = list(c('Control', 'Treatment'),
                                          c('Control', 'Vehicle'),
                                          c('Treatment', 'Vehicle')),
                       label = 'p.signif',               # 'p.signif' for asterisks; 'p.format' for numeric
                       p.adjust.method = 'holm',
                       method.args = list(alternative = 'two.sided'))

For an overall test plus pairwise:

# Overall + per-comparison
ggboxplot(df, x = 'group', y = 'value', color = 'group') +
    stat_compare_means(method = 'kruskal.test',          # overall test
                       label.y = 1.05 * max(df$value)) +
    stat_compare_means(comparisons = pairs,
                       method = 'wilcox.test',
                       p.adjust.method = 'holm',
                       label = 'p.signif')

ggsignif -- Lighter Alternative

library(ggsignif)
ggplot(df, aes(group, value, fill = group)) +
    geom_boxplot() +
    geom_signif(comparisons = list(c('Control', 'Treatment')),
                test = 'wilcox.test',
                map_signif_level = TRUE,                  # asterisks vs numeric p
                step_increase = 0.1) +
    scale_fill_manual(values = c('#0072B2', '#D55E00'))

map_signif_level = TRUE converts p-values to asterisks per Wasserstein-Lazar 2016 convention:

*** p < 0.001
** p < 0.01
* p < 0.05
ns p ≥ 0.05

For literal p-values, set FALSE.

statannotations (Python)

import seaborn as sns
from statannotations.Annotator import Annotator

ax = sns.boxplot(x='group', y='value', data=df, palette=['#0072B2', '#D55E00', '#009E73'])

pairs = [('Control', 'Treatment'),
         ('Control', 'Vehicle'),
         ('Treatment', 'Vehicle')]

annotator = Annotator(ax, pairs, data=df, x='group', y='value')
annotator.configure(test='Mann-Whitney',                    # 't-test_ind', 't-test_paired', 'Wilcoxon', etc
                    comparisons_correction='holm',
                    text_format='star',                     # 'star', 'simple', 'full'
                    line_height=0.02,
                    text_offset=0.5)
annotator.apply_and_annotate()

Per-Method Failure Modes

Default t-test on non-normal data

Trigger: stat_compare_means(method='t.test') (default) on log-distributed expression.

Mechanism: t-test assumes approximate normality; non-normal data with small N inflates Type-I.

Symptom: Significant p where rank test gives p > 0.05.

Fix: Switch to method='wilcox.test' for non-normal or small-N data. Verify normality with Shapiro-Wilk if borderline.

Pairwise tests without adjustment

Trigger: Multiple bracket annotations with raw p-values.

Mechanism: K(K-1)/2 comparisons inflate FWER without adjustment.

Symptom: All-pairwise significant at nominal 0.05; doesn't replicate.

Fix: p.adjust.method = 'holm' (or 'bonferroni' or 'BH'). Document choice.

Paired data tested as independent

Trigger: Before/after measurements in same subjects, tested with unpaired t-test.

Mechanism: Ignores within-subject correlation; loses power.

Symptom: Non-significant p where paired test gives significant.

Fix: paired = TRUE (R) or t-test_paired (statannotations). Verify subjects are correctly matched.

Nested data tested with pairwise t

Trigger: Hundreds of cells per patient, tested as if each cell is independent.

Mechanism: Pseudoreplication — within-patient correlation ignored; p-values dramatically over-significant.

Symptom: p < 1e-50 from a dataset where the actual N is ~10 patients.

Fix: Linear mixed model (lme4::lmer(value ~ group + (1|patient_id))); aggregate to per-patient median first; or pseudobulk.

Asterisks shown but p-values not reported anywhere

Trigger: label = 'p.signif' exclusively.

Mechanism: Reader cannot recover the actual p-value.

Symptom: Reviewer asks for exact p; not in figure or supplementary.

Fix: Either show numeric p (label = 'p.format') or include test results table in supplementary.

n.s. annotation hidden

Trigger: Showing only significant brackets, omitting non-significant.

Mechanism: Selective reporting biases interpretation.

Symptom: Reader assumes untested pairs were significant.

Fix: Either annotate all comparisons (with n.s. for non-significant) OR pre-specify which pairs are tested in the legend/caption.

Reading effect size from p-value

Trigger: Conclusion "highly significant difference" from p = 1e-10 on a tiny effect.

Mechanism: Large N inflates significance for trivial differences.

Symptom: Effect size negligible despite extreme p.

Fix: Always report effect size (Cohen's d, Cliff's delta, median difference with CI) alongside p. The bracket should convey direction AND magnitude, not just significance.

Reconciliation: When Tests Disagree

| Pattern | Cause | Action | |---------|-------|--------| | t-test significant; Wilcoxon n.s. | Outliers driving t-test; rank test robust | Trust Wilcoxon for non-normal data | | Unpaired n.s.; paired significant | Within-subject correlation matters | Use paired if subjects are matched | | Pairwise all-significant; ANOVA n.s. | Multiple-testing inflation in pairwise | ANOVA / Kruskal-Wallis is the overall test; pairwise post-hoc only after omnibus significant | | Pseudo-replicated p < 1e-50; LMM p = 0.1 | Pseudoreplication | LMM is correct; pseudo-replicated p is meaningless | | Bonferroni-adjusted n.s.; raw p < 0.05 | Adjustment correctly identified borderline | Trust adjusted; document the test family |

Quantitative Thresholds

| Threshold | Value | Source | |-----------|-------|--------| | α for FWER control | 0.05 family-wise | Standard | | α for FDR control | 0.05 expected FDR (BH) | Benjamini-Hochberg 1995 | | Asterisk convention | * <0.05, ** <0.01, *** <0.001 | Common practice | | Bonferroni cutoff | 0.05 / K(K-1)/2 | Standard | | Holm step-down | better than Bonferroni for all K | Holm 1979 | | FDR (BH) | less strict than FWER | Genomics standard |

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Reviewer asks "why t-test?" | Default not justified | Pre-justify test choice | | Many pairwise-significant; doesn't replicate | No multiple-testing adjustment | Holm or BH | | Effect "highly significant" but tiny | Large N inflates p | Report effect size | | Asterisks only; no p-values | label = 'p.signif' exclusively | Show p.format OR provide table | | Pseudoreplication inflated p | Cells treated as independent | LMM or pseudobulk | | n.s. comparisons hidden | Selective reporting | Annotate all pre-specified pairs | | Numeric p truncated to '<2.22e-16' | R default precision | Manual formatting or report as < 2e-16 |

References

Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289-300.
Dunn OJ. 1964. Multiple comparisons using rank sums. Technometrics 6(3):241-252.
Holm S. 1979. A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65-70.
Kassambara A. 2020. Practical Statistics in R for Comparing Groups: Numerical Variables. (ggpubr / rstatix tutorial).
Wasserstein RL, Lazar NA. 2016. The ASA's statement on p-values: context, process, and purpose. Am Stat 70(2):129-133.

Related Skills

data-visualization/distribution-plots - Underlying box/violin/raincloud
clinical-biostatistics/categorical-tests - Chi-square / Fisher tests for categorical outcomes
clinical-biostatistics/effect-measures - Effect size to report alongside p
experimental-design/multiple-testing - Methods for controlling FWER and FDR

GPTomics/bio-data-visualization-statistical-annotation

data-visualization/statistical-annotation/SKILL.md

Add p-value brackets, significance asterisks, and effect-size annotations to distribution plots using ggpubr, ggsignif, and statannotations with correct test selection (parametric vs non-parametric vs paired), multiple-testing adjustment, and rendering of negative results. Use when a boxplot/violin/raincloud needs in-figure statistical comparisons between groups.

776 stars

testing

Updated May 25, 2026

$ install --global

skillsauth

npx skillsauth add GPTomics/bioSkills bio-data-visualization-statistical-annotation

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 25, 2026, 6:58 AM147.9s3 files scanned

SKILL.md

name:: bio-data-visualization-statistical-annotation
description:: Add p-value brackets, significance asterisks, and effect-size annotations to distribution plots using ggpubr, ggsignif, and statannotations with correct test selection (parametric vs non-parametric vs paired), multiple-testing adjustment, and rendering of negative results. Use when a boxplot/violin/raincloud needs in-figure statistical comparisons between groups.
tool_type:: mixed
primary_tool:: ggpubr

Version Compatibility

Reference examples tested with: ggpubr 0.6+, ggsignif 0.6+, rstatix 0.7+, statannotations 0.6+ (Python), seaborn 0.13+.

Before using code patterns, verify installed versions match. If versions differ:

R: packageVersion('<pkg>') then ?function_name
Python: pip show <package> then help(module.function) to check signatures

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Statistical Annotation

R: ggpubr::stat_compare_means, ggsignif::geom_signif, rstatix::t_test/wilcox_test
Python: statannotations.Annotator, scipy.stats directly

The Single Most Important Modern Insight -- The Test Must Match the Data

Tool defaults are NOT data-appropriate. ggpubr::stat_compare_means(method='t.test') uses Welch's two-sample t-test assuming approximate normality and unequal variances. This is wrong when:

Data are non-normal and N is small (<30): use Mann-Whitney U (method='wilcox.test').
Data are paired: use paired t-test or paired Wilcoxon (paired = TRUE).
Comparing >2 groups: ANOVA / Kruskal-Wallis with post-hoc, not all-pairs t-test (multiple-testing penalty).
Data are nested (cells within patients, replicates within samples): linear mixed model, NOT pairwise test.

The bracket-and-asterisk visual is the same; the underlying statistics are not. Choose the test deliberately.

Decision Tree for Test Selection

Multiple Testing Adjustment

For pairwise comparisons among K groups, there are K(K-1)/2 unadjusted p-values. Without adjustment, family-wise error rate inflates rapidly:

3 groups: 3 comparisons; α_FW = 14% at nominal 5%
4 groups: 6 comparisons; α_FW = 26%
6 groups: 15 comparisons; α_FW = 54%

# rstatix supports per-comparison adjustment
library(rstatix)
df %>%
    pairwise_wilcox_test(value ~ group, p.adjust.method = 'bonferroni') %>%
    add_xy_position(x = 'group')

p.adjust.method options:

'bonferroni' — strictest; controls FWER
'holm' — stepwise Bonferroni; uniformly more powerful than Bonferroni
'BH' (Benjamini-Hochberg) — FDR; less strict than FWER; standard for genomics
'fdr' — alias for BH

For figure annotations, holm is the modern default — controls FWER and is more powerful than bonferroni. For a small number of pre-planned comparisons (≤3), Bonferroni is fine.

ggpubr -- Standard ggplot2 Workflow

Goal: Add per-comparison p-value brackets between groups on a distribution plot, using a test appropriate to data shape and adjusting for multiple comparisons.

library(ggpubr)

# Default boxplot + p-value bracket(s)
ggboxplot(df, x = 'group', y = 'value', color = 'group',
          add = 'jitter', palette = 'npg') +
    stat_compare_means(method = 'wilcox.test',           # explicit; default is t-test
                       comparisons = list(c('Control', 'Treatment'),
                                          c('Control', 'Vehicle'),
                                          c('Treatment', 'Vehicle')),
                       label = 'p.signif',               # 'p.signif' for asterisks; 'p.format' for numeric
                       p.adjust.method = 'holm',
                       method.args = list(alternative = 'two.sided'))

For an overall test plus pairwise:

# Overall + per-comparison
ggboxplot(df, x = 'group', y = 'value', color = 'group') +
    stat_compare_means(method = 'kruskal.test',          # overall test
                       label.y = 1.05 * max(df$value)) +
    stat_compare_means(comparisons = pairs,
                       method = 'wilcox.test',
                       p.adjust.method = 'holm',
                       label = 'p.signif')

ggsignif -- Lighter Alternative

library(ggsignif)
ggplot(df, aes(group, value, fill = group)) +
    geom_boxplot() +
    geom_signif(comparisons = list(c('Control', 'Treatment')),
                test = 'wilcox.test',
                map_signif_level = TRUE,                  # asterisks vs numeric p
                step_increase = 0.1) +
    scale_fill_manual(values = c('#0072B2', '#D55E00'))

map_signif_level = TRUE converts p-values to asterisks per Wasserstein-Lazar 2016 convention:

*** p < 0.001
** p < 0.01
* p < 0.05
ns p ≥ 0.05

For literal p-values, set FALSE.

statannotations (Python)

import seaborn as sns
from statannotations.Annotator import Annotator

ax = sns.boxplot(x='group', y='value', data=df, palette=['#0072B2', '#D55E00', '#009E73'])

pairs = [('Control', 'Treatment'),
         ('Control', 'Vehicle'),
         ('Treatment', 'Vehicle')]

annotator = Annotator(ax, pairs, data=df, x='group', y='value')
annotator.configure(test='Mann-Whitney',                    # 't-test_ind', 't-test_paired', 'Wilcoxon', etc
                    comparisons_correction='holm',
                    text_format='star',                     # 'star', 'simple', 'full'
                    line_height=0.02,
                    text_offset=0.5)
annotator.apply_and_annotate()

Per-Method Failure Modes

Default t-test on non-normal data

Trigger: stat_compare_means(method='t.test') (default) on log-distributed expression.

Mechanism: t-test assumes approximate normality; non-normal data with small N inflates Type-I.

Symptom: Significant p where rank test gives p > 0.05.

Fix: Switch to method='wilcox.test' for non-normal or small-N data. Verify normality with Shapiro-Wilk if borderline.

Pairwise tests without adjustment

Trigger: Multiple bracket annotations with raw p-values.

Mechanism: K(K-1)/2 comparisons inflate FWER without adjustment.

Symptom: All-pairwise significant at nominal 0.05; doesn't replicate.

Fix: p.adjust.method = 'holm' (or 'bonferroni' or 'BH'). Document choice.

Paired data tested as independent

Trigger: Before/after measurements in same subjects, tested with unpaired t-test.

Mechanism: Ignores within-subject correlation; loses power.

Symptom: Non-significant p where paired test gives significant.

Fix: paired = TRUE (R) or t-test_paired (statannotations). Verify subjects are correctly matched.

Nested data tested with pairwise t

Trigger: Hundreds of cells per patient, tested as if each cell is independent.

Mechanism: Pseudoreplication — within-patient correlation ignored; p-values dramatically over-significant.

Symptom: p < 1e-50 from a dataset where the actual N is ~10 patients.

Fix: Linear mixed model (lme4::lmer(value ~ group + (1|patient_id))); aggregate to per-patient median first; or pseudobulk.

Asterisks shown but p-values not reported anywhere

Trigger: label = 'p.signif' exclusively.

Mechanism: Reader cannot recover the actual p-value.

Symptom: Reviewer asks for exact p; not in figure or supplementary.

Fix: Either show numeric p (label = 'p.format') or include test results table in supplementary.

n.s. annotation hidden

Trigger: Showing only significant brackets, omitting non-significant.

Mechanism: Selective reporting biases interpretation.

Symptom: Reader assumes untested pairs were significant.

Fix: Either annotate all comparisons (with n.s. for non-significant) OR pre-specify which pairs are tested in the legend/caption.

Reading effect size from p-value

Trigger: Conclusion "highly significant difference" from p = 1e-10 on a tiny effect.

Mechanism: Large N inflates significance for trivial differences.

Symptom: Effect size negligible despite extreme p.

Fix: Always report effect size (Cohen's d, Cliff's delta, median difference with CI) alongside p. The bracket should convey direction AND magnitude, not just significance.

Reconciliation: When Tests Disagree

Quantitative Thresholds

Common Errors

References

Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289-300.
Dunn OJ. 1964. Multiple comparisons using rank sums. Technometrics 6(3):241-252.
Holm S. 1979. A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65-70.
Kassambara A. 2020. Practical Statistics in R for Comparing Groups: Numerical Variables. (ggpubr / rstatix tutorial).
Wasserstein RL, Lazar NA. 2016. The ASA's statement on p-values: context, process, and purpose. Am Stat 70(2):129-133.

Related Skills

data-visualization/distribution-plots - Underlying box/violin/raincloud
clinical-biostatistics/categorical-tests - Chi-square / Fisher tests for categorical outcomes
clinical-biostatistics/effect-measures - Effect size to report alongside p
experimental-design/multiple-testing - Methods for controlling FWER and FDR

Related Skills

GPTomics/bio-workflows-clip-pipeline

tools

VerifiedTrustedCommunity

End-to-end CLIP-seq pipeline from FASTQ to ENCODE-compliant binding sites, single-nucleotide crosslink maps, annotation, motifs, and (optionally) differential binding. Use when running the full Yeo lab eCLIP / iCLIP / iCLIP2 / iCLIP3 / irCLIP / PAR-CLIP analysis with SMInput control, protocol-specific UMI extraction, ENCODE STAR parameters, CLIPper or Skipper peak calling with stringent log2 FC and -log10 p thresholds, IDR rescue and self-consistency QC, and downstream motif registration with mCross or PEKA.

1,065SKILL.mdUpdated Jun 10, 2026

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

development

VerifiedTrustedCommunity

Detect, date, and contextualize whole-genome duplication (WGD / paleopolyploidy) events using wgd v2 (Chen et al 2024), KsRates (Sensalari 2022 substitution-rate-corrected Ks dating), DupGen_finder (Qiao 2019), MAPS (Li 2018 phylogenomic), POInT (Conant 2008 ordered-block), SLEDGe (2024 ML-based), Whale.jl (Bayesian DL+WGD), and synteny-anchored paranome construction. Use when identifying ancient polyploidy from Ks distributions and synteny block analysis, positioning WGD events relative to speciation, distinguishing tandem from segmental from WGD duplications, dating the 2R/3R vertebrate / fish / salmonid WGDs, building paranome and Ks-age mixture models, applying KsRates substitution-rate correction across lineages, or testing alternative biased-fractionation / dosage-balance models post-WGD.

1,065SKILL.mdUpdated May 23, 2026

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

tools

VerifiedTrustedCommunity

Build whole-genome alignments using Progressive Cactus (Armstrong 2020 reference-free clade-level WGA), Minigraph-Cactus (Hickey 2024 pangenome-aware), LASTZ chain/net (UCSC pipeline), MUMmer4 (Marçais 2018 pairwise), minimap2 -x asm5/10/20 (Li 2018 fast pairwise), AnchorWave (Song 2022 WGD-aware), and Mauve / progressiveMauve (bacterial). Operates the HAL toolkit (Hickey 2013) for downstream extraction including halSynteny, halLiftover, halBranchMutations, and hal2maf. Use when constructing multi-species alignments for comparative-annotation projection (TOGA), synteny detection, conservation analyses (phyloP / PhastCons), or pangenome graph construction; selecting between reference-free (Cactus) and reference-anchored (LASTZ chains/nets) approaches; tuning sensitivity for closely vs distantly related genomes; or producing HAL files for genome-wide downstream tools.

1,065SKILL.mdUpdated May 23, 2026

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

development

VerifiedTrustedCommunity

Detect syntenic blocks and structural rearrangements between genomes using MCScanX (Wang 2012), JCVI/MCScan (Tang 2008 Python), GENESPACE (Lovell 2022) for orthology-anchored riparian visualization, SyRI for structural variation, AnchorWave for sequence-level synteny, i-ADHoRe 3.0 for highly diverged species, SynNet for synteny networks, and ntSynt for multi-genome macrosynteny. Use when identifying collinear gene blocks across species, distinguishing macrosynteny from microsynteny, detecting inversions/translocations/duplications, anchoring orthology in WGD lineages, producing publication riparian plots, computing synteny block age via Ks (cross-references whole-genome-duplication), or running synteny-aware ortholog inference in polyploids.

1,065SKILL.mdUpdated May 23, 2026

GPTomics/bio-comparative-genomics-synteny-analysis

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/GPTomics/bioSkills.git

# Copy into Claude Code skills folder (global)
cp -r bioSkills/data-visualization/statistical-annotation ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

GPTomics/bioSkills

776 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT