cli-tool/components/skills/scientific/pysam/SKILL.md
Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines.
npx skillsauth add davila7/claude-code-templates pysamInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Pysam is a Python module for reading, manipulating, and writing genomic datasets. Read/write SAM/BAM/CRAM alignment files, VCF/BCF variant files, and FASTA/FASTQ sequences with a Pythonic interface to htslib. Query tabix-indexed files, perform pileup analysis for coverage, and execute samtools/bcftools commands.
This skill should be used when:
uv pip install pysam
Read alignment file:
import pysam
# Open BAM file and fetch reads in region
samfile = pysam.AlignmentFile("example.bam", "rb")
for read in samfile.fetch("chr1", 1000, 2000):
print(f"{read.query_name}: {read.reference_start}")
samfile.close()
Read variant file:
# Open VCF file and iterate variants
vcf = pysam.VariantFile("variants.vcf")
for variant in vcf:
print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}")
vcf.close()
Query reference sequence:
# Open FASTA and extract sequence
fasta = pysam.FastaFile("reference.fasta")
sequence = fasta.fetch("chr1", 1000, 2000)
print(sequence)
fasta.close()
Use the AlignmentFile class to work with aligned sequencing reads. This is appropriate for analyzing mapping results, calculating coverage, extracting reads, or quality control.
Common operations:
Reference: See references/alignment_files.md for detailed documentation on:
fetch()Use the VariantFile class to work with genetic variants from variant calling pipelines. This is appropriate for variant analysis, filtering, annotation, or population genetics.
Common operations:
Reference: See references/variant_files.md for detailed documentation on:
Use FastaFile for random access to reference sequences and FastxFile for reading raw sequencing data. This is appropriate for extracting gene sequences, validating variants against reference, or processing raw reads.
Common operations:
Reference: See references/sequence_files.md for detailed documentation on:
Pysam excels at integrating multiple file types for comprehensive genomic analyses. Common workflows combine alignment files, variant files, and reference sequences.
Common workflows:
Reference: See references/common_workflows.md for detailed examples of:
Critical: Pysam uses 0-based, half-open coordinates (Python convention):
Exception: Region strings in fetch() follow samtools convention (1-based):
samfile.fetch("chr1", 999, 2000) # 0-based: positions 999-1999
samfile.fetch("chr1:1000-2000") # 1-based string: positions 1000-2000
VCF files: Use 1-based coordinates in the file format, but VariantRecord.start is 0-based.
Random access to specific genomic regions requires index files:
.bai index (create with pysam.index()).crai index.fai index (create with pysam.faidx()).tbi tabix index (create with pysam.tabix_index()).csi indexWithout an index, use fetch(until_eof=True) for sequential reading.
Specify format when opening files:
"rb" - Read BAM (binary)"r" - Read SAM (text)"rc" - Read CRAM"wb" - Write BAM"w" - Write SAM"wc" - Write CRAMpileup() for column-wise analysis instead of repeated fetch operationscount() for counting instead of iterating and counting manuallyuntil_eof=True for sequential processing without indexmultiple_iterators=True if needed)fetch() returns reads overlapping region boundaries, not just those fully containedquery_qualities in place after changing query_sequence—create a copy firstPysam provides access to samtools and bcftools commands:
# Sort BAM file
pysam.samtools.sort("-o", "sorted.bam", "input.bam")
# Index BAM
pysam.samtools.index("sorted.bam")
# View specific region
pysam.samtools.view("-b", "-o", "region.bam", "input.bam", "chr1:1000-2000")
# BCF tools
pysam.bcftools.view("-O", "z", "-o", "output.vcf.gz", "input.vcf")
Error handling:
try:
pysam.samtools.sort("-o", "output.bam", "input.bam")
except pysam.SamtoolsError as e:
print(f"Error: {e}")
Detailed documentation for each major capability:
alignment_files.md - Complete guide to SAM/BAM/CRAM operations, including AlignmentFile class, AlignedSegment attributes, fetch operations, pileup analysis, and writing alignments
variant_files.md - Complete guide to VCF/BCF operations, including VariantFile class, VariantRecord attributes, genotype handling, INFO/FORMAT fields, and multi-sample operations
sequence_files.md - Complete guide to FASTA/FASTQ operations, including FastaFile and FastxFile classes, sequence extraction, quality score handling, and tabix-indexed file access
common_workflows.md - Practical examples of integrated bioinformatics workflows combining multiple file types, including quality control, coverage analysis, variant validation, and sequence extraction
For detailed information on specific operations, refer to the appropriate reference document:
alignment_files.mdvariant_files.mdsequence_files.mdcommon_workflows.mdOfficial documentation: https://pysam.readthedocs.io/
tools
No-code automation democratizes workflow building. Zapier and Make (formerly Integromat) let non-developers automate business processes without writing code. But no-code doesn't mean no-complexity - these platforms have their own patterns, pitfalls, and breaking points. This skill covers when to use which platform, how to build reliable automations, and when to graduate to code-based solutions. Key insight: Zapier optimizes for simplicity and integrations (7000+ apps), Make optimizes for power
tools
Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).
tools
Workflow automation is the infrastructure that makes AI agents reliable. Without durable execution, a network hiccup during a 10-step payment flow means lost money and angry customers. With it, workflows resume exactly where they left off. This skill covers the platforms (n8n, Temporal, Inngest) and patterns (sequential, parallel, orchestrator-worker) that turn brittle scripts into production-grade automation. Key insight: The platforms make different tradeoffs. n8n optimizes for accessibility
development
Trigger.dev expert for background jobs, AI workflows, and reliable async execution with excellent developer experience and TypeScript-first design. Use when: trigger.dev, trigger dev, background task, ai background job, long running task.