Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

GPTomics/bio-long-read-sequencing-basecalling

Name: bio-long-read-sequencing-basecalling
Author: GPTomics

long-read-sequencing/basecalling/SKILL.md

npx skillsauth add GPTomics/bioSkills bio-long-read-sequencing-basecalling

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Version Compatibility

Reference examples tested with: Dorado 1.0+, pod5 0.3+, samtools 1.19+, chopper 0.7+.

Before using code patterns, verify installed versions match. If versions differ:

CLI: <tool> --version then <tool> --help to confirm flags

Results depend on inputs that outlive the binary version - record them:

The basecaller MODEL string (e.g. [email protected]) sets the entire error profile and must be propagated to every downstream tool. Pin it.
Modified-base models carry a SECOND version ([email protected]_5mCG_5hmCG@v3); the mod version can lag the simplex version - check dorado download --list.
R9.4.1 and RNA002 models were removed from Dorado v1.0 defaults; legacy data needs an archived model path.

If code throws an error, introspect the installed tool (dorado --help, dorado basecaller --help) and adapt the example to the actual API rather than retrying.

Nanopore Basecalling

"Basecall my Nanopore data" -> Convert raw signal (POD5) into reads with Dorado using the chemistry-matched model, deciding the accuracy tier and whether to call modifications now - because the model choice is baked irreversibly into the output.

CLI: dorado basecaller sup pod5s/ > calls.bam (simplex), dorado basecaller sup,5mCG_5hmCG pod5s/ > calls.bam (with methylation), dorado duplex sup pod5s/ > duplex.bam (duplex)

PacBio note: PacBio "basecalling" (CCS -> HiFi reads) runs on-instrument/in SMRT Link; users receive HiFi BAMs already at Q20-Q30+. This skill is Oxford Nanopore / Dorado. HiFi assembly lives in genome-assembly/hifi-assembly.

The Single Most Important Modern Insight -- There Is No "The Reads," Only "The Reads As Called By This Model"

Basecalling is not fixed preprocessing that yields a neutral FASTQ. The model and version chosen are an analysis decision written permanently into the BAM, with three consequences a naive user misses:

Methylation is a basecalling decision, not a later analysis step. Modified bases are inferred from raw signal at basecall time by Remora models and emitted as MM/ML tags. A plain BAM/FASTQ with no MM/ML tags has thrown the signal away - mods CANNOT be recovered without re-basecalling from POD5. If methylation might ever matter, request it now (sup,5mCG_5hmCG) and KEEP the POD5. See nanopore-methylation.
Downstream polish/variant models must match the basecaller model+version. medaka and Clair3 ship per-model weights (Clair3 r1041_e82_400bps_sup_v500; medaka the dotted r1041_e82_400bps_sup_v5.2.0). A mismatched model silently degrades accuracy with no error. Propagate the basecaller model name to every downstream step.
Mixing model versions across a cohort is a batch effect. Different model versions have different identity and homopolymer-indel error profiles. Re-basecall the WHOLE cohort with ONE current model before joint or differential analysis.

Dorado Subcommand Taxonomy

Dorado (one GPU-first executable) replaced Guppy, which is end-of-life. Bonito is ONT's research/training basecaller (not production); Rerio hosts research-release models (niche mods, bacterial methylation).

| Subcommand | Purpose | Canonical invocation | |------------|---------|----------------------| | basecaller | simplex basecalling | dorado basecaller hac pod5s/ > calls.bam | | duplex | template+complement duplex | dorado duplex sup pod5s/ > duplex.bam | | demux | barcode classification/split | dorado demux --kit-name SQK-NBD114-24 --output-dir out/ calls.bam | | trim | standalone adapter/primer trim | dorado trim reads.bam > trimmed.bam | | aligner | minimap2 alignment (carries MM/ML) | dorado aligner ref.mmi reads.bam > aln.bam | | correct | HERRO single-read correction | dorado correct reads.fastq > corrected.fasta | | summary | sequencing-summary TSV from BAM | dorado summary calls.bam > summary.tsv | | download | model management | dorado download --model <name> / --list |

Model Naming Scheme (load-bearing)

Format {analyte}_{pore}_{chemistry}_{speed}@v{ver} + optional mod suffix, e.g. [email protected]_5mCG_5hmCG@v3.

| Token | Meaning | Examples | |-------|---------|----------| | analyte | molecule | dna, rna004 | | pore | flow-cell generation | r10.4.1 (current), r9.4.1 (legacy) | | chemistry | kit chemistry | e8.2 (Kit 14) | | speed | translocation speed -> sampling rate | 400bps (5 kHz DNA), 130bps (RNA004, 4 kHz) | | tier | model size/accuracy | fast, hac, sup | | version | model version | @v4.3.0, @v5.2.0, @v6.0.0 |

Passing the bare tier (sup) lets Dorado auto-detect chemistry from POD5 metadata and fetch the matching latest model; pin a version ([email protected]) or a full path for reproducibility. Append mods comma-separated (sup,5mCG_5hmCG,6mA); only one mod model per canonical base may be active.

Decision Tree by Scenario

| Scenario | Recommended | Why | |----------|-------------|-----| | Any analysis (variant/assembly/methylation) | sup + matched model, pinned version | fast/hac error profile leaks into calls | | Live run / adaptive sampling / quick QC only | fast | speed; never for downstream analysis | | Routine work, compute-limited | hac | strong accuracy/compute balance (v5.2 closed much of the gap to sup) | | Methylation wanted now or maybe later | sup,5mCG_5hmCG (DNA), keep POD5 | mods are unrecoverable from a plain BAM -> nanopore-methylation | | Per-molecule accuracy, low input, phasing | dorado duplex sup | ~Q30 reads, but expect <10% duplex yield | | Diploid/phased T2T assembly from simplex | dorado correct (HERRO) before assembler | haplotype-aware Q22->Q40 -> genome-assembly/long-read-assembly | | Barcoded multiplexed run | basecall --no-trim, then dorado demux | trimming first strips barcodes before demux sees them | | Legacy R9.4.1 / RNA002 data | explicit archived model path | removed from Dorado v1.0 default downloads | | PacBio data | already HiFi; no Dorado step | CCS runs on-instrument -> genome-assembly/hifi-assembly |

Core Commands

# Simplex, super-accuracy, auto-detected chemistry-matched model (BAM is the default output)
dorado basecaller sup pod5s/ > calls.bam

# Pin the model version for reproducibility
dorado basecaller [email protected] pod5s/ > calls.bam

# Call methylation AT basecall time (CpG 5mC + 5hmC); KEEP pod5s/ - mods are unrecoverable later
dorado basecaller sup,5mCG_5hmCG pod5s/ > calls.bam
dorado basecaller sup,6mA pod5s/ > calls.bam               # all-context 6mA
# RNA004 direct RNA (cDNA CANNOT call mods - PCR erases the signal):
dorado basecaller [email protected],m6A_DRACH pod5s/ > rna_mods.bam

# FASTQ output and a per-read quality floor (relative filter, not a calibrated accuracy)
dorado basecaller sup pod5s/ --emit-fastq --min-qscore 10 > calls.fastq

# Duplex (needs raw POD5; cannot be recovered from simplex FASTQ); dx tag marks read types
dorado duplex sup pod5s/ > duplex.bam

# Demultiplex: basecall WITHOUT trimming, then demux (demux trims barcodes itself)
dorado basecaller sup pod5s/ --no-trim > calls.bam
dorado demux --kit-name SQK-NBD114-24 --output-dir demux/ calls.bam
dorado demux --kit-name SQK-NBD114-24 --barcode-both-ends --output-dir demux/ calls.bam  # stringent

# HERRO read correction for diploid/phased assembly (input FASTQ of HAC/SUP R10 reads >=10kb -> FASTA)
dorado download --model herro-v1
dorado correct reads.fastq > corrected.fasta

POD5 is ONT's default raw format (faster random access than FAST5). Convert FAST5 first:

pod5 convert fast5 raw/*.fast5 --output pod5s/    # FAST5 is legacy; basecalling it directly is slow
pod5 view pod5s/                                   # summary table (replaces deprecated `pod5 inspect reads`)
pod5 merge pod5s/*.pod5 --output merged.pod5

Per-Method Failure Modes

Methylation gone forever

Trigger: basecalling without a mod model, then wanting 5mC later. Mechanism: Remora infers mods from raw signal at basecall time; a plain BAM has only bases. Symptom: no MM/ML tags; modkit pileup returns nothing. Fix: re-basecall from POD5 with sup,5mCG_5hmCG; keep POD5 archives.

Barcodes land in unclassified

Trigger: default --trim all basecall, then a separate dorado demux. Mechanism: trimming removes the barcode before demux can read it. Symptom: most reads in unclassified.bam, low classification rate. Fix: basecall --no-trim, then demux (it trims barcodes itself).

Silent accuracy loss downstream

Trigger: polishing/calling with a medaka/Clair3 model that doesn't match the basecaller model+version. Mechanism: per-model neural weights expect a specific error profile. Symptom: no error, just quietly worse consensus/calls. Fix: propagate the basecaller model name; use medaka tools resolve_model --auto_model; pick the matching Clair3 model dir.

Duplex double-counting

Trigger: treating every read in a duplex BAM as an independent molecule. Mechanism: a simplex parent and its duplex offspring both appear. Symptom: inflated coverage/allele counts. Fix: the dx:i:-1 tag marks simplex parents of duplex reads - filter them when counting molecules (dx:i:1 = duplex, dx:i:0 = simplex-only).

Cohort batch effect

Trigger: runs basecalled with different model versions joined for analysis. Mechanism: version-specific identity/indel error profiles confound a technical batch with biology. Symptom: spurious between-run differences. Fix: re-basecall the whole cohort with one model version.

Quantitative Thresholds

| Threshold | Source | Rationale | |-----------|--------|-----------| | sup for any analysis | ONT model guidance | fast/hac error profiles contaminate variant/assembly/methylation calls | | R10.4.1 SUP modal accuracy ~Q20 (99%) | Sereika 2022 | dual-reader head fixes homopolymers; enables nanopore-only near-finished genomes | | Duplex read ~Q30; yield typically <10% of reads | community benchmarks | duplex is library-prep/loading-limited, not free accuracy | | A "Q20" base errs at ~Q12.5 empirically | Delahaye 2021 | nanopore qscores >Q10 are overconfident posteriors; use for relative filtering only | | HERRO input reads >=10 kbp, HAC/SUP R10 | Dorado correct docs | HERRO operates on 4096-bp chunks; shorter reads dropped | | --min-qscore 10 as a permissive QC floor | convention | Q10 ~ 90% nominal; a starting filter, not a hard rule |

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | "Failed to determine sequencing chemistry from data" | R9/RNA002 or non-standard kit; bare tier can't auto-resolve | pass an explicit model path; for legacy chemistry use an archived model | | No MM/ML tags in BAM | basecalled without a mod model | re-basecall from POD5 with sup,5mCG_5hmCG | | Most reads unclassified after demux | trimmed before demux | basecall --no-trim, then demux | | --model sup errors | model is the positional arg, not a flag | dorado basecaller sup pod5s/ | | dorado correct reads.bam fails | input is FASTQ(.gz), output FASTA | dorado correct reads.fastq > corrected.fasta | | Out of GPU memory | batch too large for VRAM (sup is heaviest) | lower --batchsize; or drop to hac | | cDNA m6A calling returns nothing | PCR erased native modifications | use direct RNA (RNA004), not cDNA |

References

Sereika M, Kirkegaard RH, Karst SM, et al. 2022. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat Methods 19:823-826.
Stanojević D, Lin D, Nurk S, Florez de Sessions P, Šikić M. 2026. Telomere-to-telomere assembly using HERRO-corrected Nanopore simplex reads. Nature (online ahead of print). DOI 10.1038/s41586-026-10563-y.
Wick RR, Judd LM, Holt KE. 2019. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20:129.
Pagès-Gallego M, de Ridder J. 2023. Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling. Genome Biol 24:71.
Delahaye C, Nicolas J. 2021. Sequencing DNA with nanopores: troubles and biases. PLoS ONE 16(10):e0257521.
Gamaarachchi H, Samarakoon H, et al. 2025. The enduring advantages of the SLOW5 file format for raw nanopore sequencing data. GigaScience giaf118.

Related Skills

long-read-qc - Assess read length/quality and run health after basecalling
nanopore-methylation - Pile up the MM/ML tags this skill must request at basecall time
long-read-alignment - Map the reads; use -y to carry MM/ML tags through alignment
medaka-polishing - Consensus model that must match this basecaller model+version
clair3-variants - Variant model that must match this basecaller model+version
genome-assembly/long-read-assembly - Assemble the reads (HERRO-corrected for diploid/T2T)
genome-assembly/hifi-assembly - PacBio HiFi (basecalled on-instrument, not here)
epitranscriptomics/m6anet-analysis - ONT direct-RNA m6A from signal
workflows/longread-sv-pipeline - End-to-end basecall -> align -> SV call

GPTomics/bio-long-read-sequencing-basecalling

long-read-sequencing/basecalling/SKILL.md

Basecalls raw Oxford Nanopore signal (POD5/FAST5) into reads with Dorado, choosing the chemistry-matched model and accuracy tier (fast/hac/sup), requesting modified bases (5mCG_5hmCG, 6mA, m6A) at basecall time, and handling duplex, demultiplexing, trimming, and HERRO read correction. Covers why the model+version is an irreversible analysis decision, why methylation cannot be recovered later, and why downstream polish/variant models must match the basecaller. Use when converting POD5/FAST5 to reads, picking a Dorado model for R9/R10 or RNA004, enabling methylation calling, basecalling duplex, demultiplexing barcoded runs, or correcting reads for assembly.

860 stars

development

Updated Jun 7, 2026

$ install --global

skillsauth

npx skillsauth add GPTomics/bioSkills bio-long-read-sequencing-basecalling

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 7, 2026, 2:06 AM36.4s4 files scanned

SKILL.md

name:: bio-long-read-sequencing-basecalling
description:: Basecalls raw Oxford Nanopore signal (POD5/FAST5) into reads with Dorado, choosing the chemistry-matched model and accuracy tier (fast/hac/sup), requesting modified bases (5mCG_5hmCG, 6mA, m6A) at basecall time, and handling duplex, demultiplexing, trimming, and HERRO read correction. Covers why the model+version is an irreversible analysis decision, why methylation cannot be recovered later, and why downstream polish/variant models must match the basecaller. Use when converting POD5/FAST5 to reads, picking a Dorado model for R9/R10 or RNA004, enabling methylation calling, basecalling duplex, demultiplexing barcoded runs, or correcting reads for assembly.
tool_type:: cli
primary_tool:: dorado

Version Compatibility

Reference examples tested with: Dorado 1.0+, pod5 0.3+, samtools 1.19+, chopper 0.7+.

Before using code patterns, verify installed versions match. If versions differ:

CLI: <tool> --version then <tool> --help to confirm flags

Results depend on inputs that outlive the binary version - record them:

The basecaller MODEL string (e.g. [email protected]) sets the entire error profile and must be propagated to every downstream tool. Pin it.
Modified-base models carry a SECOND version ([email protected]_5mCG_5hmCG@v3); the mod version can lag the simplex version - check dorado download --list.
R9.4.1 and RNA002 models were removed from Dorado v1.0 defaults; legacy data needs an archived model path.

If code throws an error, introspect the installed tool (dorado --help, dorado basecaller --help) and adapt the example to the actual API rather than retrying.

Nanopore Basecalling

CLI: dorado basecaller sup pod5s/ > calls.bam (simplex), dorado basecaller sup,5mCG_5hmCG pod5s/ > calls.bam (with methylation), dorado duplex sup pod5s/ > duplex.bam (duplex)

The Single Most Important Modern Insight -- There Is No "The Reads," Only "The Reads As Called By This Model"

Methylation is a basecalling decision, not a later analysis step. Modified bases are inferred from raw signal at basecall time by Remora models and emitted as MM/ML tags. A plain BAM/FASTQ with no MM/ML tags has thrown the signal away - mods CANNOT be recovered without re-basecalling from POD5. If methylation might ever matter, request it now (sup,5mCG_5hmCG) and KEEP the POD5. See nanopore-methylation.
Downstream polish/variant models must match the basecaller model+version. medaka and Clair3 ship per-model weights (Clair3 r1041_e82_400bps_sup_v500; medaka the dotted r1041_e82_400bps_sup_v5.2.0). A mismatched model silently degrades accuracy with no error. Propagate the basecaller model name to every downstream step.
Mixing model versions across a cohort is a batch effect. Different model versions have different identity and homopolymer-indel error profiles. Re-basecall the WHOLE cohort with ONE current model before joint or differential analysis.

Dorado Subcommand Taxonomy

Model Naming Scheme (load-bearing)

Format {analyte}_{pore}_{chemistry}_{speed}@v{ver} + optional mod suffix, e.g. [email protected]_5mCG_5hmCG@v3.

Decision Tree by Scenario

Core Commands

# Simplex, super-accuracy, auto-detected chemistry-matched model (BAM is the default output)
dorado basecaller sup pod5s/ > calls.bam

# Pin the model version for reproducibility
dorado basecaller [email protected] pod5s/ > calls.bam

# Call methylation AT basecall time (CpG 5mC + 5hmC); KEEP pod5s/ - mods are unrecoverable later
dorado basecaller sup,5mCG_5hmCG pod5s/ > calls.bam
dorado basecaller sup,6mA pod5s/ > calls.bam               # all-context 6mA
# RNA004 direct RNA (cDNA CANNOT call mods - PCR erases the signal):
dorado basecaller [email protected],m6A_DRACH pod5s/ > rna_mods.bam

# FASTQ output and a per-read quality floor (relative filter, not a calibrated accuracy)
dorado basecaller sup pod5s/ --emit-fastq --min-qscore 10 > calls.fastq

# Duplex (needs raw POD5; cannot be recovered from simplex FASTQ); dx tag marks read types
dorado duplex sup pod5s/ > duplex.bam

# Demultiplex: basecall WITHOUT trimming, then demux (demux trims barcodes itself)
dorado basecaller sup pod5s/ --no-trim > calls.bam
dorado demux --kit-name SQK-NBD114-24 --output-dir demux/ calls.bam
dorado demux --kit-name SQK-NBD114-24 --barcode-both-ends --output-dir demux/ calls.bam  # stringent

# HERRO read correction for diploid/phased assembly (input FASTQ of HAC/SUP R10 reads >=10kb -> FASTA)
dorado download --model herro-v1
dorado correct reads.fastq > corrected.fasta

POD5 is ONT's default raw format (faster random access than FAST5). Convert FAST5 first:

pod5 convert fast5 raw/*.fast5 --output pod5s/    # FAST5 is legacy; basecalling it directly is slow
pod5 view pod5s/                                   # summary table (replaces deprecated `pod5 inspect reads`)
pod5 merge pod5s/*.pod5 --output merged.pod5

Per-Method Failure Modes

Methylation gone forever

Barcodes land in unclassified

Silent accuracy loss downstream

Duplex double-counting

Cohort batch effect

Quantitative Thresholds

Common Errors

References

Sereika M, Kirkegaard RH, Karst SM, et al. 2022. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat Methods 19:823-826.
Stanojević D, Lin D, Nurk S, Florez de Sessions P, Šikić M. 2026. Telomere-to-telomere assembly using HERRO-corrected Nanopore simplex reads. Nature (online ahead of print). DOI 10.1038/s41586-026-10563-y.
Wick RR, Judd LM, Holt KE. 2019. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 20:129.
Pagès-Gallego M, de Ridder J. 2023. Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling. Genome Biol 24:71.
Delahaye C, Nicolas J. 2021. Sequencing DNA with nanopores: troubles and biases. PLoS ONE 16(10):e0257521.
Gamaarachchi H, Samarakoon H, et al. 2025. The enduring advantages of the SLOW5 file format for raw nanopore sequencing data. GigaScience giaf118.

Related Skills

long-read-qc - Assess read length/quality and run health after basecalling
nanopore-methylation - Pile up the MM/ML tags this skill must request at basecall time
long-read-alignment - Map the reads; use -y to carry MM/ML tags through alignment
medaka-polishing - Consensus model that must match this basecaller model+version
clair3-variants - Variant model that must match this basecaller model+version
genome-assembly/long-read-assembly - Assemble the reads (HERRO-corrected for diploid/T2T)
genome-assembly/hifi-assembly - PacBio HiFi (basecalled on-instrument, not here)
epitranscriptomics/m6anet-analysis - ONT direct-RNA m6A from signal
workflows/longread-sv-pipeline - End-to-end basecall -> align -> SV call

Related Skills

GPTomics/bio-workflows-clip-pipeline

tools

VerifiedTrustedCommunity

End-to-end CLIP-seq pipeline from FASTQ to ENCODE-compliant binding sites, single-nucleotide crosslink maps, annotation, motifs, and (optionally) differential binding. Use when running the full Yeo lab eCLIP / iCLIP / iCLIP2 / iCLIP3 / irCLIP / PAR-CLIP analysis with SMInput control, protocol-specific UMI extraction, ENCODE STAR parameters, CLIPper or Skipper peak calling with stringent log2 FC and -log10 p thresholds, IDR rescue and self-consistency QC, and downstream motif registration with mCross or PEKA.

1,065SKILL.mdUpdated Jun 10, 2026

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

development

VerifiedTrustedCommunity

Detect, date, and contextualize whole-genome duplication (WGD / paleopolyploidy) events using wgd v2 (Chen et al 2024), KsRates (Sensalari 2022 substitution-rate-corrected Ks dating), DupGen_finder (Qiao 2019), MAPS (Li 2018 phylogenomic), POInT (Conant 2008 ordered-block), SLEDGe (2024 ML-based), Whale.jl (Bayesian DL+WGD), and synteny-anchored paranome construction. Use when identifying ancient polyploidy from Ks distributions and synteny block analysis, positioning WGD events relative to speciation, distinguishing tandem from segmental from WGD duplications, dating the 2R/3R vertebrate / fish / salmonid WGDs, building paranome and Ks-age mixture models, applying KsRates substitution-rate correction across lineages, or testing alternative biased-fractionation / dosage-balance models post-WGD.

1,065SKILL.mdUpdated May 23, 2026

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

tools

VerifiedTrustedCommunity

Build whole-genome alignments using Progressive Cactus (Armstrong 2020 reference-free clade-level WGA), Minigraph-Cactus (Hickey 2024 pangenome-aware), LASTZ chain/net (UCSC pipeline), MUMmer4 (Marçais 2018 pairwise), minimap2 -x asm5/10/20 (Li 2018 fast pairwise), AnchorWave (Song 2022 WGD-aware), and Mauve / progressiveMauve (bacterial). Operates the HAL toolkit (Hickey 2013) for downstream extraction including halSynteny, halLiftover, halBranchMutations, and hal2maf. Use when constructing multi-species alignments for comparative-annotation projection (TOGA), synteny detection, conservation analyses (phyloP / PhastCons), or pangenome graph construction; selecting between reference-free (Cactus) and reference-anchored (LASTZ chains/nets) approaches; tuning sensitivity for closely vs distantly related genomes; or producing HAL files for genome-wide downstream tools.

1,065SKILL.mdUpdated May 23, 2026

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

development

VerifiedTrustedCommunity

Detect syntenic blocks and structural rearrangements between genomes using MCScanX (Wang 2012), JCVI/MCScan (Tang 2008 Python), GENESPACE (Lovell 2022) for orthology-anchored riparian visualization, SyRI for structural variation, AnchorWave for sequence-level synteny, i-ADHoRe 3.0 for highly diverged species, SynNet for synteny networks, and ntSynt for multi-genome macrosynteny. Use when identifying collinear gene blocks across species, distinguishing macrosynteny from microsynteny, detecting inversions/translocations/duplications, anchoring orthology in WGD lineages, producing publication riparian plots, computing synteny block age via Ks (cross-references whole-genome-duplication), or running synteny-aware ortholog inference in polyploids.

1,065SKILL.mdUpdated May 23, 2026

GPTomics/bio-comparative-genomics-synteny-analysis

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/GPTomics/bioSkills.git

# Copy into Claude Code skills folder (global)
cp -r bioSkills/long-read-sequencing/basecalling ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

GPTomics/bioSkills

860 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT