Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

GPTomics/bio-alignment-sorting

Name: bio-alignment-sorting
Author: GPTomics

alignment-files/alignment-sorting/SKILL.md

npx skillsauth add GPTomics/bioSkills bio-alignment-sorting

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Version Compatibility

Reference examples tested with: pysam 0.22+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Alignment Sorting

Sort alignment files by coordinate or read name using samtools and pysam.

"Sort a BAM file" -> Reorder reads by genomic coordinate (for indexing/variant calling) or by name (for paired-end processing).

CLI: samtools sort -o sorted.bam input.bam
Python: pysam.sort('-o', 'sorted.bam', 'input.bam')

Sort Orders

| Order | Flag | Use Case | |-------|------|----------| | Coordinate | default | Indexing, visualization, variant calling | | Name | -n | Paired-end processing, fixmate, markdup | | Tag | -t TAG | Sort by specific tag value |

samtools sort

Sort by Coordinate (Default)

samtools sort -o sorted.bam input.bam

Sort by Read Name

samtools sort -n -o namesorted.bam input.bam

Multi-threaded Sorting

samtools sort -@ 8 -o sorted.bam input.bam

Control Memory Usage

samtools sort -m 4G -@ 4 -o sorted.bam input.bam

Set Temporary Directory

samtools sort -T /tmp/sort_tmp -o sorted.bam input.bam

Specify Output Format

# Output as BAM (default)
samtools sort -O bam -o sorted.bam input.bam

# Output as CRAM
samtools sort -O cram --reference ref.fa -o sorted.cram input.bam

Sort by Tag

# Sort by cell barcode (10x Genomics)
samtools sort -t CB -o sorted_by_barcode.bam input.bam

Pipe from Aligner

bwa mem ref.fa reads.fq | samtools sort -o aligned.bam

samtools collate vs sort -n

| Tool | Algorithm | Speed | Memory | Output guarantee | |------|-----------|-------|--------|------------------| | sort -n | Full lexicographic sort by QNAME | Slowest | Spills to -T | Strict total order by name | | collate | Hash-bucket grouping | ~3-10x faster | Bounded | Mates adjacent; between-mate order undefined |

Use collate when extracting paired FASTQ, re-aligning, or streaming through markdup. Use sort -n only when a tool requires true lexicographic name order (e.g. RSEM, Salmon alignment-mode).

# Fast paired FASTQ extraction
samtools collate -O -u in.bam tmp_prefix | \
    samtools fastq -1 R1.fq.gz -2 R2.fq.gz -0 /dev/null -s /dev/null -n -

# Markdup pre-processing (collate beats sort -n here)
samtools collate -O -u in.bam tmp_prefix | \
    samtools fixmate -m -u - - | \
    samtools sort -u - | \
    samtools markdup - out.bam

Sort Order Required by Downstream Tool

| Operation | Required sort | |-----------|---------------| | samtools index | coordinate (hard requirement) | | samtools fixmate -m | name (or collate; needs mates adjacent) | | samtools markdup | coordinate (after fixmate) | | GATK MarkDuplicatesSpark | coordinate or queryname | | samtools mpileup / bcftools mpileup | coordinate | | GATK HaplotypeCaller, Mutect2 | coordinate | | featureCounts / HTSeq | coordinate or name (-p for paired) | | umi_tools dedup | coordinate (with index) | | fgbio GroupReadsByUmi | any order accepted (template-coordinate recommended to avoid an internal re-sort) | | fgbio CallMolecularConsensusReads | grouped by MI tag (consumes GroupReadsByUmi output) | | Sniffles, cuteSV, Manta, Delly | coordinate (need SA tags) | | Salmon alignment-mode | name | | RSEM (with STAR --quantMode TranscriptomeSAM) | name (hard requirement) |

Check Sort Order

From Header

samtools view -H input.bam | grep "^@HD"
# SO:coordinate = coordinate sorted
# SO:queryname = name sorted
# SO:unsorted = not sorted

Verify Sorted

# Check if coordinate sorted (returns 0 if sorted). Reset the position tracker
# on each new contig, else the POS reset at every chromosome boundary of a
# correctly sorted multi-contig BAM would falsely report "unsorted".
samtools view input.bam | awk '$3!=c {c=$3; prev=0} $4<prev {exit 1} {prev=$4}'
# Simpler and authoritative: trust the @HD SO: header shown above.

pysam Python Alternative

Sort with pysam

import pysam

pysam.sort('-o', 'sorted.bam', 'input.bam')

Sort by Name

pysam.sort('-n', '-o', 'namesorted.bam', 'input.bam')

Sort with Options

pysam.sort('-@', '4', '-m', '2G', '-o', 'sorted.bam', 'input.bam')

Avoid In-Python Sorting

Do not load BAM records into a list and call sorted(). pysam.sort() calls samtools' external-merge sort which spills to disk; loading reads into memory blows up around ~30M reads (~10 GB human BAM). Always delegate to pysam.sort():

import pysam

pysam.sort('-@', '4', '-m', '2G', '-T', '/tmp/sortpfx',
           '-o', 'sorted.bam', 'input.bam')

Check Sort Order in pysam

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    hd = bam.header.get('HD', {})
    sort_order = hd.get('SO', 'unknown')
    print(f'Sort order: {sort_order}')

Stream Sort from Aligner

For streaming from aligners, use shell pipes (simpler and more reliable):

import subprocess

subprocess.run(
    'bwa mem ref.fa reads.fq | samtools sort -o aligned.bam',
    shell=True, check=True
)

samtools merge

Combine multiple BAM files into one. samtools merge does NOT validate sort-order consistency across inputs; mismatched inputs silently produce a malformed output.

Verify Sort Order Consistency First

for f in *.bam; do samtools view -H "$f" | head -1; done | sort -u
# Should print exactly ONE line, e.g. "@HD VN:1.6 SO:coordinate"

Safe Merge (dedup @RG and @PG)

# -c deduplicates @RG records; -p deduplicates @PG records (samtools-merge(1))
samtools merge -c -p -@ 8 merged.bam sample1.bam sample2.bam sample3.bam

When merging BAMs from different lanes / machines / aligners, RG IDs may collide. -c and -p deduplicate header records, but RG IDs that genuinely refer to different lane-level read groups must be made unique upstream (samtools addreplacerg) before merge -- otherwise GATK BQSR (which keys models by RGID/PU) silently produces wrong recalibration.

Merge with Threads / from File List

samtools merge -@ 4 merged.bam sample1.bam sample2.bam sample3.bam
samtools merge -b files.txt merged.bam   # one BAM path per line

Force Overwrite

samtools merge -f merged.bam sample1.bam sample2.bam

Merge Specific Region

samtools merge -R chr1:1000000-2000000 merged_region.bam sample1.bam sample2.bam

pysam Merge

import pysam

pysam.merge('-c', '-p', '-f', 'merged.bam', 'sample1.bam', 'sample2.bam', 'sample3.bam')

Common Workflows

Goal: Combine sorting with other alignment processing steps into efficient pipelines.

Approach: Pipe aligner output directly into samtools sort to avoid writing unsorted intermediates, then index for downstream access.

Align and Sort

bwa mem -t 8 ref.fa R1.fq R2.fq | samtools sort -@ 4 -o aligned.bam
samtools index aligned.bam

Re-sort by Name for Duplicate Marking

# Full workflow: sort by name, fixmate, sort by coord, markdup
samtools sort -n -o namesorted.bam input.bam
samtools fixmate -m namesorted.bam fixmate.bam
samtools sort -o sorted.bam fixmate.bam
samtools markdup sorted.bam marked.bam

Convert Name-sorted to Coordinate-sorted

samtools sort -o coord_sorted.bam name_sorted.bam
samtools index coord_sorted.bam

Extract FASTQ from Sorted BAM

# Collate first to group pairs
samtools collate -u -O input.bam /tmp/collate | \
    samtools fastq -1 R1.fq -2 R2.fq -0 /dev/null -s /dev/null -

Performance Tips

| Parameter | Effect | |-----------|--------| | -@ N | Use N additional threads | | -m SIZE | Memory per thread (e.g., 4G) | | -T PREFIX | Temp file location (use fast SSD scratch) | | -l LEVEL | Compression level (1-9, default 6) |

Compression Level Decision

| Level | Use | Wall-time vs default | Size vs default | |-------|-----|----------------------|------------------| | -l 0 / -u | Pipe between samtools tools | 0% (skips BGZF) | +200-400% | | -l 1 | Final output if disk is cheap | ~+10% | ~+30% | | -l 6 | Default | baseline | baseline | | -l 9 | Archival, write-once | ~+50-100% | ~-2-5% |

# WRONG -- pipe re-compresses then decompresses every step
samtools fixmate -m in.bam - | samtools sort -o out.bam

# RIGHT -- uncompressed (-u) between piped samtools commands
samtools fixmate -m -u in.bam - | samtools sort -o out.bam

Optimal Settings for Large Files

# 8 threads, 2GB per thread, low compression for output written to fast disk
samtools sort -@ 8 -m 2G -l 1 -T /scratch/sortpfx -o sorted.bam input.bam

Quick Reference

| Task | Command | |------|---------| | Sort by coordinate | samtools sort -o out.bam in.bam | | Sort by name | samtools sort -n -o out.bam in.bam | | Sort with threads | samtools sort -@ 8 -o out.bam in.bam | | Collate pairs | samtools collate -o out.bam in.bam | | Merge BAMs | samtools merge out.bam in1.bam in2.bam | | Check sort order | samtools view -H in.bam \| grep "^@HD" | | Sort + index | samtools sort -o out.bam in.bam && samtools index out.bam |

Common Errors

| Error | Cause | Solution | |-------|-------|----------| | out of memory | Insufficient RAM | Use -m to limit per-thread memory | | disk full | Temp files filling disk | Use -T to specify different location | | truncated file | Interrupted sort | Re-run sort from original |

Related Skills

sam-bam-basics - View and convert alignment files
alignment-indexing - Index after coordinate sorting
duplicate-handling - Requires name-sorted input for fixmate
alignment-filtering - Filter before or after sorting

GPTomics/bio-alignment-sorting

alignment-files/alignment-sorting/SKILL.md

Sort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.

1,006 stars

tools

Updated Jul 12, 2026

$ install --global

skillsauth

npx skillsauth add GPTomics/bioSkills bio-alignment-sorting

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 12, 2026, 2:05 AM155.7s3 files scanned

SKILL.md

name:: bio-alignment-sorting
description:: Sort alignment files by coordinate or read name using samtools and pysam. Use when preparing BAM files for indexing, variant calling, or paired-end analysis.
tool_type:: cli
primary_tool:: samtools

Version Compatibility

Reference examples tested with: pysam 0.22+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Alignment Sorting

Sort alignment files by coordinate or read name using samtools and pysam.

"Sort a BAM file" -> Reorder reads by genomic coordinate (for indexing/variant calling) or by name (for paired-end processing).

CLI: samtools sort -o sorted.bam input.bam
Python: pysam.sort('-o', 'sorted.bam', 'input.bam')

Sort Orders

samtools sort

Sort by Coordinate (Default)

samtools sort -o sorted.bam input.bam

Sort by Read Name

samtools sort -n -o namesorted.bam input.bam

Multi-threaded Sorting

samtools sort -@ 8 -o sorted.bam input.bam

Control Memory Usage

samtools sort -m 4G -@ 4 -o sorted.bam input.bam

Set Temporary Directory

samtools sort -T /tmp/sort_tmp -o sorted.bam input.bam

Specify Output Format

# Output as BAM (default)
samtools sort -O bam -o sorted.bam input.bam

# Output as CRAM
samtools sort -O cram --reference ref.fa -o sorted.cram input.bam

Sort by Tag

# Sort by cell barcode (10x Genomics)
samtools sort -t CB -o sorted_by_barcode.bam input.bam

Pipe from Aligner

bwa mem ref.fa reads.fq | samtools sort -o aligned.bam

samtools collate vs sort -n

Use collate when extracting paired FASTQ, re-aligning, or streaming through markdup. Use sort -n only when a tool requires true lexicographic name order (e.g. RSEM, Salmon alignment-mode).

# Fast paired FASTQ extraction
samtools collate -O -u in.bam tmp_prefix | \
    samtools fastq -1 R1.fq.gz -2 R2.fq.gz -0 /dev/null -s /dev/null -n -

# Markdup pre-processing (collate beats sort -n here)
samtools collate -O -u in.bam tmp_prefix | \
    samtools fixmate -m -u - - | \
    samtools sort -u - | \
    samtools markdup - out.bam

Sort Order Required by Downstream Tool

Check Sort Order

From Header

samtools view -H input.bam | grep "^@HD"
# SO:coordinate = coordinate sorted
# SO:queryname = name sorted
# SO:unsorted = not sorted

Verify Sorted

# Check if coordinate sorted (returns 0 if sorted). Reset the position tracker
# on each new contig, else the POS reset at every chromosome boundary of a
# correctly sorted multi-contig BAM would falsely report "unsorted".
samtools view input.bam | awk '$3!=c {c=$3; prev=0} $4<prev {exit 1} {prev=$4}'
# Simpler and authoritative: trust the @HD SO: header shown above.

pysam Python Alternative

Sort with pysam

import pysam

pysam.sort('-o', 'sorted.bam', 'input.bam')

Sort by Name

pysam.sort('-n', '-o', 'namesorted.bam', 'input.bam')

Sort with Options

pysam.sort('-@', '4', '-m', '2G', '-o', 'sorted.bam', 'input.bam')

Avoid In-Python Sorting

import pysam

pysam.sort('-@', '4', '-m', '2G', '-T', '/tmp/sortpfx',
           '-o', 'sorted.bam', 'input.bam')

Check Sort Order in pysam

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as bam:
    hd = bam.header.get('HD', {})
    sort_order = hd.get('SO', 'unknown')
    print(f'Sort order: {sort_order}')

Stream Sort from Aligner

For streaming from aligners, use shell pipes (simpler and more reliable):

import subprocess

subprocess.run(
    'bwa mem ref.fa reads.fq | samtools sort -o aligned.bam',
    shell=True, check=True
)

samtools merge

Combine multiple BAM files into one. samtools merge does NOT validate sort-order consistency across inputs; mismatched inputs silently produce a malformed output.

Verify Sort Order Consistency First

for f in *.bam; do samtools view -H "$f" | head -1; done | sort -u
# Should print exactly ONE line, e.g. "@HD VN:1.6 SO:coordinate"

Safe Merge (dedup @RG and @PG)

# -c deduplicates @RG records; -p deduplicates @PG records (samtools-merge(1))
samtools merge -c -p -@ 8 merged.bam sample1.bam sample2.bam sample3.bam

Merge with Threads / from File List

samtools merge -@ 4 merged.bam sample1.bam sample2.bam sample3.bam
samtools merge -b files.txt merged.bam   # one BAM path per line

Force Overwrite

samtools merge -f merged.bam sample1.bam sample2.bam

Merge Specific Region

samtools merge -R chr1:1000000-2000000 merged_region.bam sample1.bam sample2.bam

pysam Merge

import pysam

pysam.merge('-c', '-p', '-f', 'merged.bam', 'sample1.bam', 'sample2.bam', 'sample3.bam')

Common Workflows

Goal: Combine sorting with other alignment processing steps into efficient pipelines.

Approach: Pipe aligner output directly into samtools sort to avoid writing unsorted intermediates, then index for downstream access.

Align and Sort

bwa mem -t 8 ref.fa R1.fq R2.fq | samtools sort -@ 4 -o aligned.bam
samtools index aligned.bam

Re-sort by Name for Duplicate Marking

# Full workflow: sort by name, fixmate, sort by coord, markdup
samtools sort -n -o namesorted.bam input.bam
samtools fixmate -m namesorted.bam fixmate.bam
samtools sort -o sorted.bam fixmate.bam
samtools markdup sorted.bam marked.bam

Convert Name-sorted to Coordinate-sorted

samtools sort -o coord_sorted.bam name_sorted.bam
samtools index coord_sorted.bam

Extract FASTQ from Sorted BAM

# Collate first to group pairs
samtools collate -u -O input.bam /tmp/collate | \
    samtools fastq -1 R1.fq -2 R2.fq -0 /dev/null -s /dev/null -

Performance Tips

Compression Level Decision

# WRONG -- pipe re-compresses then decompresses every step
samtools fixmate -m in.bam - | samtools sort -o out.bam

# RIGHT -- uncompressed (-u) between piped samtools commands
samtools fixmate -m -u in.bam - | samtools sort -o out.bam

Optimal Settings for Large Files

# 8 threads, 2GB per thread, low compression for output written to fast disk
samtools sort -@ 8 -m 2G -l 1 -T /scratch/sortpfx -o sorted.bam input.bam

Quick Reference

Common Errors

Related Skills

sam-bam-basics - View and convert alignment files
alignment-indexing - Index after coordinate sorting
duplicate-handling - Requires name-sorted input for fixmate
alignment-filtering - Filter before or after sorting

Related Skills

GPTomics/bio-workflows-clip-pipeline

tools

VerifiedTrustedCommunity

End-to-end CLIP-seq pipeline from FASTQ to ENCODE-compliant binding sites, single-nucleotide crosslink maps, annotation, motifs, and (optionally) differential binding. Use when running the full Yeo lab eCLIP / iCLIP / iCLIP2 / iCLIP3 / irCLIP / PAR-CLIP analysis with SMInput control, protocol-specific UMI extraction, ENCODE STAR parameters, CLIPper or Skipper peak calling with stringent log2 FC and -log10 p thresholds, IDR rescue and self-consistency QC, and downstream motif registration with mCross or PEKA.

1,065SKILL.mdUpdated Jun 10, 2026

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

development

VerifiedTrustedCommunity

Detect, date, and contextualize whole-genome duplication (WGD / paleopolyploidy) events using wgd v2 (Chen et al 2024), KsRates (Sensalari 2022 substitution-rate-corrected Ks dating), DupGen_finder (Qiao 2019), MAPS (Li 2018 phylogenomic), POInT (Conant 2008 ordered-block), SLEDGe (2024 ML-based), Whale.jl (Bayesian DL+WGD), and synteny-anchored paranome construction. Use when identifying ancient polyploidy from Ks distributions and synteny block analysis, positioning WGD events relative to speciation, distinguishing tandem from segmental from WGD duplications, dating the 2R/3R vertebrate / fish / salmonid WGDs, building paranome and Ks-age mixture models, applying KsRates substitution-rate correction across lineages, or testing alternative biased-fractionation / dosage-balance models post-WGD.

1,065SKILL.mdUpdated May 23, 2026

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

tools

VerifiedTrustedCommunity

Build whole-genome alignments using Progressive Cactus (Armstrong 2020 reference-free clade-level WGA), Minigraph-Cactus (Hickey 2024 pangenome-aware), LASTZ chain/net (UCSC pipeline), MUMmer4 (Marçais 2018 pairwise), minimap2 -x asm5/10/20 (Li 2018 fast pairwise), AnchorWave (Song 2022 WGD-aware), and Mauve / progressiveMauve (bacterial). Operates the HAL toolkit (Hickey 2013) for downstream extraction including halSynteny, halLiftover, halBranchMutations, and hal2maf. Use when constructing multi-species alignments for comparative-annotation projection (TOGA), synteny detection, conservation analyses (phyloP / PhastCons), or pangenome graph construction; selecting between reference-free (Cactus) and reference-anchored (LASTZ chains/nets) approaches; tuning sensitivity for closely vs distantly related genomes; or producing HAL files for genome-wide downstream tools.

1,065SKILL.mdUpdated May 23, 2026

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

development

VerifiedTrustedCommunity

Detect syntenic blocks and structural rearrangements between genomes using MCScanX (Wang 2012), JCVI/MCScan (Tang 2008 Python), GENESPACE (Lovell 2022) for orthology-anchored riparian visualization, SyRI for structural variation, AnchorWave for sequence-level synteny, i-ADHoRe 3.0 for highly diverged species, SynNet for synteny networks, and ntSynt for multi-genome macrosynteny. Use when identifying collinear gene blocks across species, distinguishing macrosynteny from microsynteny, detecting inversions/translocations/duplications, anchoring orthology in WGD lineages, producing publication riparian plots, computing synteny block age via Ks (cross-references whole-genome-duplication), or running synteny-aware ortholog inference in polyploids.

1,065SKILL.mdUpdated May 23, 2026

GPTomics/bio-comparative-genomics-synteny-analysis

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/GPTomics/bioSkills.git

# Copy into Claude Code skills folder (global)
cp -r bioSkills/alignment-files/alignment-sorting ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

GPTomics/bioSkills

1,006 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT