scientific-skills/biopython/SKILL.md
Comprehensive molecular biology toolkit. Use for sequence manipulation, file parsing (FASTA/GenBank/PDB), phylogenetics, and programmatic NCBI/PubMed access (Bio.Entrez). Best for batch processing, custom bioinformatics pipelines, BLAST automation. For quick lookups use gget; for multi-service integration use bioservices.
npx skillsauth add K-Dense-AI/claude-scientific-skills biopythonInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Biopython is a comprehensive set of freely available Python tools for biological computation. It provides functionality for sequence manipulation, file I/O, database access, structural bioinformatics, phylogenetics, and many other bioinformatics tasks. The current version is Biopython 1.85 (released January 2025), which supports Python 3 and requires NumPy.
Use this skill when:
Biopython is organized into modular sub-packages, each addressing specific bioinformatics domains:
Install Biopython using pip (requires Python 3 and NumPy):
uv pip install biopython
For NCBI database access, always set your email address (required by NCBI):
from Bio import Entrez
Entrez.email = "[email protected]"
# Optional: API key for higher rate limits (10 req/s instead of 3 req/s)
Entrez.api_key = "your_api_key_here"
This skill provides comprehensive documentation organized by functionality area. When working on a task, consult the relevant reference documentation:
Reference: references/sequence_io.md
Use for:
Quick example:
from Bio import SeqIO
# Read sequences from FASTA file
for record in SeqIO.parse("sequences.fasta", "fasta"):
print(f"{record.id}: {len(record.seq)} bp")
# Convert GenBank to FASTA
SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")
Reference: references/alignment.md
Use for:
Quick example:
from Bio import Align
# Pairwise alignment
aligner = Align.PairwiseAligner()
aligner.mode = 'global'
alignments = aligner.align("ACCGGT", "ACGGT")
print(alignments[0])
Reference: references/databases.md
Use for:
Quick example:
from Bio import Entrez
Entrez.email = "[email protected]"
# Search PubMed
handle = Entrez.esearch(db="pubmed", term="biopython", retmax=10)
results = Entrez.read(handle)
handle.close()
print(f"Found {results['Count']} results")
Reference: references/blast.md
Use for:
Quick example:
from Bio.Blast import NCBIWWW, NCBIXML
# Run BLAST search
result_handle = NCBIWWW.qblast("blastn", "nt", "ATCGATCGATCG")
blast_record = NCBIXML.read(result_handle)
# Display top hits
for alignment in blast_record.alignments[:5]:
print(f"{alignment.title}: E-value={alignment.hsps[0].expect}")
Reference: references/structure.md
Use for:
Quick example:
from Bio.PDB import PDBParser
# Parse structure
parser = PDBParser(QUIET=True)
structure = parser.get_structure("1crn", "1crn.pdb")
# Calculate distance between alpha carbons
chain = structure[0]["A"]
distance = chain[10]["CA"] - chain[20]["CA"]
print(f"Distance: {distance:.2f} Å")
Reference: references/phylogenetics.md
Use for:
Quick example:
from Bio import Phylo
# Read and visualize tree
tree = Phylo.read("tree.nwk", "newick")
Phylo.draw_ascii(tree)
# Calculate distance
distance = tree.distance("Species_A", "Species_B")
print(f"Distance: {distance:.3f}")
Reference: references/advanced.md
Use for:
Quick example:
from Bio.SeqUtils import gc_fraction, molecular_weight
from Bio.Seq import Seq
seq = Seq("ATCGATCGATCG")
print(f"GC content: {gc_fraction(seq):.2%}")
print(f"Molecular weight: {molecular_weight(seq, seq_type='DNA'):.2f} g/mol")
When a user asks about a specific Biopython task:
Example search patterns for reference files:
# Find information about specific functions
grep -n "SeqIO.parse" references/sequence_io.md
# Find examples of specific tasks
grep -n "BLAST" references/blast.md
# Find information about specific concepts
grep -n "alignment" references/alignment.md
Follow these principles when writing Biopython code:
Import modules explicitly
from Bio import SeqIO, Entrez
from Bio.Seq import Seq
Set Entrez email when using NCBI databases
Entrez.email = "[email protected]"
Use appropriate file formats - Check which format best suits the task
# Common formats: "fasta", "genbank", "fastq", "clustal", "phylip"
Handle files properly - Close handles after use or use context managers
with open("file.fasta") as handle:
records = SeqIO.parse(handle, "fasta")
Use iterators for large files - Avoid loading everything into memory
for record in SeqIO.parse("large_file.fasta", "fasta"):
# Process one record at a time
Handle errors gracefully - Network operations and file parsing can fail
try:
handle = Entrez.efetch(db="nucleotide", id=accession)
except HTTPError as e:
print(f"Error: {e}")
from Bio import Entrez, SeqIO
Entrez.email = "[email protected]"
# Fetch sequence
handle = Entrez.efetch(db="nucleotide", id="EU490707", rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
handle.close()
print(f"Description: {record.description}")
print(f"Sequence length: {len(record.seq)}")
from Bio import SeqIO
from Bio.SeqUtils import gc_fraction
for record in SeqIO.parse("sequences.fasta", "fasta"):
# Calculate statistics
gc = gc_fraction(record.seq)
length = len(record.seq)
# Find ORFs, translate, etc.
protein = record.seq.translate()
print(f"{record.id}: {length} bp, GC={gc:.2%}")
from Bio.Blast import NCBIWWW, NCBIXML
from Bio import Entrez, SeqIO
Entrez.email = "[email protected]"
# Run BLAST
result_handle = NCBIWWW.qblast("blastn", "nt", sequence)
blast_record = NCBIXML.read(result_handle)
# Get top hit accessions
accessions = [aln.accession for aln in blast_record.alignments[:5]]
# Fetch sequences
for acc in accessions:
handle = Entrez.efetch(db="nucleotide", id=acc, rettype="fasta", retmode="text")
record = SeqIO.read(handle, "fasta")
handle.close()
print(f">{record.description}")
from Bio import AlignIO, Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
# Read alignment
alignment = AlignIO.read("alignment.fasta", "fasta")
# Calculate distances
calculator = DistanceCalculator("identity")
dm = calculator.get_distance(alignment)
# Build tree
constructor = DistanceTreeConstructor()
tree = constructor.nj(dm)
# Visualize
Phylo.draw_ascii(tree)
Solution: This is just a warning. Set Entrez.email to suppress it.
Solution: Check that IDs/accessions are valid and properly formatted.
Solution: Verify file format matches the specified format string.
Solution: Ensure sequences are aligned before using AlignIO or MultipleSeqAlignment.
Solution: Use local BLAST for large-scale searches, or cache results.
Solution: Use PDBParser(QUIET=True) to suppress warnings, or investigate structure quality.
To locate information in reference files, use these search patterns:
# Search for specific functions
grep -n "function_name" references/*.md
# Find examples of specific tasks
grep -n "example" references/sequence_io.md
# Find all occurrences of a module
grep -n "Bio.Seq" references/*.md
Biopython provides comprehensive tools for computational molecular biology. When using this skill:
references/ directoryThe modular reference documentation ensures detailed, searchable information for every major Biopython capability.
development
Spectral similarity and compound identification for metabolomics. Use for comparing mass spectra, computing similarity scores (cosine, modified cosine), and identifying unknown compounds from spectral libraries. Best for metabolite identification, spectral matching, library searching. For full LC-MS/MS proteomics pipelines use pyopenms.
development
Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more.
development
Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.
testing
Comprehensive markdown and Mermaid diagram writing skill. Use when creating any scientific document, report, analysis, or visualization. Establishes text-based diagrams as the default documentation standard with full style guides (markdown + mermaid), 24 diagram type references, and 9 document templates.