skills/science/bio-tools/SKILL.md
Bioinformatics workflow guide for sequence analysis, QC, plotting, structure rendering, and literature search.
npx skillsauth add drugclaw/drugclaw bio-toolsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill for bioinformatics, genomics, transcriptomics, structural biology, or general biological data-analysis requests.
This skill focuses on DrugClaw's reproducible bioinformatics workflow pattern:
bashwrite_file or edit_fileread_fileweb_search / web_fetch for literature and database lookupsDo not assume the runtime already has the biology stack. Check first.
which python3 blastn blastp samtools bedtools bwa minimap2 fastqc seqtk pymol || true
python3 - <<'PY'
mods = ["Bio", "pandas", "numpy", "matplotlib", "pysam", "seaborn", "sklearn"]
for name in mods:
try:
__import__(name)
print(f"{name}: ok")
except Exception as exc:
print(f"{name}: missing ({exc})")
PY
If key tools are missing, say so explicitly and recommend the optional drug-sandbox image documented in docs/operations/science-runtime.md.
pwd
find . -maxdepth 3 -type f | sort
file sample.fastq.gz
gzip -dc sample.fastq.gz | head
Useful quick checks:
samtools --version | head -n 1
blastn -version
fastqc --version
python3 --version
# Nucleotide BLAST against a local FASTA database
blastn -query query.fa -subject reference.fa -outfmt 6 -evalue 1e-5 > blast.tsv
# Protein BLAST
blastp -query protein.fa -subject reference_proteins.fa -outfmt 6 > blastp.tsv
# Translate nucleotide query against proteins
blastx -query transcript.fa -subject proteins.fa -outfmt 6 > blastx.tsv
For remote NCBI lookups, prefer Python so the workflow is easy to archive:
from Bio import Entrez, SeqIO
Entrez.email = "[email protected]"
handle = Entrez.efetch(db="nucleotide", id="NM_000546", rettype="fasta", retmode="text")
record = SeqIO.read(handle, "fasta")
print(record.id, len(record.seq))
# Build index
bwa index reference.fa
# Short-read alignment
bwa mem reference.fa reads_R1.fastq.gz reads_R2.fastq.gz > aligned.sam
# Long-read alignment
minimap2 -a reference.fa long_reads.fastq.gz > aligned.sam
# SAM -> sorted/indexed BAM
samtools view -bS aligned.sam | samtools sort -o aligned.sorted.bam
samtools index aligned.sorted.bam
samtools flagstat aligned.sorted.bam > aligned.flagstat.txt
mkdir -p qc
fastqc reads_R1.fastq.gz reads_R2.fastq.gz -o qc
# Quick sequence statistics
seqtk comp reads_R1.fastq.gz | head
seqtk size reads_R1.fastq.gz
Report at minimum:
bedtools intersect -a peaks.bed -b genes.bed > overlap.bed
bedtools coverage -a targets.bed -b aligned.sorted.bam > coverage.tsv
bedtools getfasta -fi reference.fa -bed targets.bed > targets.fa
from Bio import SeqIO
for record in SeqIO.parse("input.fa", "fasta"):
print(record.id, len(record.seq))
import pandas as pd
from pydeseq2.dds import DeseqDataSet
from pydeseq2.ds import DeseqStats
counts = pd.read_csv("counts.csv", index_col=0)
meta = pd.read_csv("metadata.csv", index_col=0)
dds = DeseqDataSet(counts=counts, metadata=meta, design="~condition")
dds.deseq2()
stats = DeseqStats(dds, contrast=["condition", "treated", "control"])
stats.summary()
res = stats.results_df.sort_values("padj")
res.to_csv("deseq2_results.csv")
import scanpy as sc
adata = sc.read_h5ad("data.h5ad")
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)
sc.pl.umap(adata, color="leiden", save="_leiden.png")
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("expression.csv")
sns.set_theme(style="whitegrid")
ax = sns.scatterplot(data=df, x="log2FoldChange", y="-log10_padj", hue="significant")
ax.figure.savefig("volcano.png", dpi=300, bbox_inches="tight")
Fetch structures from PDB with web_fetch or direct download, then render with PyMOL if available.
curl -L https://files.rcsb.org/download/1M17.pdb -o 1M17.pdb
cat > render.pml <<'PML'
load 1M17.pdb, prot
hide everything
show cartoon, prot
spectrum count, rainbow, prot
bg_color white
png 1M17_rainbow.png, width=1600, height=1200, dpi=200, ray=1
quit
PML
pymol -cq render.pml
When PyMOL is unavailable, still provide the fetched structure, any residue/chain findings, and the exact rendering script the user can run later.
Use web_search or PubMed APIs for recent papers. For structured PubMed workflows:
from Bio import Entrez
Entrez.email = "[email protected]"
search = Entrez.esearch(db="pubmed", term="CRISPR off-target 2025[dp]", retmax=5)
ids = Entrez.read(search)["IdList"]
summary = Entrez.esummary(db="pubmed", id=",".join(ids))
print(Entrez.read(summary))
Summaries should include:
Good replies should mention:
Example closing pattern:
I aligned the reads against GRCh38 with bwa mem and generated `aligned.sorted.bam` plus `aligned.flagstat.txt`.
FastQC shows 3' quality decay after cycle 125 and adapter contamination in R2, so trimming before re-alignment is recommended.
Next step: run fastp or cutadapt, then repeat alignment and variant calling.
For remote biology database lookups across UniProt, PDB, AlphaFold, ClinVar, Ensembl, GEO, InterPro, KEGG, OpenTargets, Reactome, or STRING, activate bio-db-tools.
For AnnData, single-cell dataset profiling, alignment-region inspection, or mzML inventory, activate omics-tools.
For public drug-discovery database lookups across PubChem, ChEMBL, openFDA, ClinicalTrials.gov, or OpenAlex, activate pharma-db-tools.
For molecular docking or pose inspection, activate docking-tools.
For DeepChem, PySCF, RDKit descriptors, or chemistry-specific follow-up, activate chem-tools.
tools
Survival and time-to-event workflow guide for Kaplan-Meier summaries, log-rank tests, and Cox proportional hazards models with reproducible outputs. Use when the user asks for time-to-event analysis, censored data summaries, hazard ratios, or survival-group comparison for research datasets.
tools
Statistical modeling workflow guide for hypothesis tests, effect-size reporting, statsmodels regression, diagnostics, and structured result export. Use when the user asks for statistical test selection, OLS or logistic regression, coefficient tables, inference, or reproducible statistical summaries for scientific datasets.
tools
Research-method workflow guide for hypothesis framing, peer-review style critique, reproducibility planning, study-design checks, and scientific-writing structure. Use when the user asks for manuscript critique, research-gap framing, hypothesis generation, reproducibility checklists, or study-planning support that should stay on the research side rather than patient-care decisions.
tools
Scientific visualization workflow guide for publication-ready static figures with seaborn or matplotlib and interactive figures with Plotly. Use when the user asks for scientific plots, cohort or assay figures, publication graphics, dashboards, or reusable plotting scripts for research datasets.