jaechang-hits

deseq2-differential-expression

Bulk RNA-seq DE with R/Bioconductor DESeq2. Negative binomial GLM, empirical Bayes shrinkage, Wald/LRT tests, multi-factor designs, Salmon tximeta import, apeglm LFC shrinkage, MA/volcano/heatmap viz. R gold standard. Use pydeseq2-differential-expression for Python; use edgeR for TMM normalization.

development202

deseq2-differential-expression

Bulk RNA-seq DE with R/Bioconductor DESeq2. Negative binomial GLM, empirical Bayes shrinkage, Wald/LRT tests, multi-factor designs, Salmon tximeta import, apeglm LFC shrinkage, MA/volcano/heatmap viz. R gold standard. Use pydeseq2-differential-expression for Python; use edgeR for TMM normalization.

development202

busco-status-interpretation

Guide to interpreting BUSCO completeness statuses: why Duplicated BUSCOs count as complete, parsing output files, computing/comparing completeness across proteomes/genomes, common counting mistakes. Use when running BUSCO QC, comparing assemblies, or reporting completeness. See also: prokka-genome-annotation for annotation workflows feeding BUSCO.

documentation199

kegg-database

KEGG REST API (academic only). Pathways, genes, compounds, enzymes, diseases, drugs via 7 ops (info/list/find/get/conv/link/ddi). ID conversion (NCBI/UniProt/PubChem). Use bioservices for multi-DB Python.

development199

mouse-phenome-database

Retrieve mouse phenotype data from the Jackson Laboratory Mouse Phenome Database (MPD) via its REST API. Browse 520+ projects, look up per-project measure metadata, pull strain-level means (raw or LS-mean adjusted) and per-animal values, find measures by MP/VT ontology terms, and resolve strain nomenclature or gene coordinates. Use for QTL support, cross-strain comparison, mouse model selection, and ontology-driven phenotype discovery. Use monarch-database for disease-gene-phenotype knowledge graphs; ensembl-database for mouse genome annotations.

development199

cellxgene-census

Query CELLxGENE Census (61M+ cells). Search by cell type/tissue/disease/organism; get AnnData, stream out-of-core, train PyTorch models. For your own data use scanpy; for annotated data use anndata.

data-ai199

uniprot-protein-database

Query UniProt REST API: search by gene/protein name, fetch FASTA, map IDs (Ensembl, PDB, RefSeq), access Swiss-Prot annotations. Use bioservices for multi-DB access; alphafold-database-access for structures.

development199

cellxgene-census

Query CELLxGENE Census (61M+ cells). Search by cell type/tissue/disease/organism; get AnnData, stream out-of-core, train PyTorch models. For your own data use scanpy; for annotated data use anndata.

data-ai199

cnvkit-copy-number

Detect somatic CNVs from WES/WGS/targeted BAMs (CNVkit v0.9.x). Bin coverage in target/antitarget regions, normalize vs reference, segment with CBS/HMM, call amps/dels, scatter/diagram plots, purity/ploidy, VCF/SEG export. CLI plus Python API (cnvlib). Use GATK CNV for deep WGS with population controls; use CNVkit for targeted/exome where antitarget bins matter.

tools199

scanpy-scrna-seq

scRNA-seq with Scanpy: QC, normalization, HVG selection, PCA, neighborhood graph, UMAP/t-SNE, Leiden clustering, markers, cell annotation, trajectory inference. Standard scRNA-seq exploration.

tools199

popv-cell-annotation

Consensus cell type annotation: runs 10+ algorithms (KNN-Harmony/BBKNN/Scanorama/scVI, CellTypist, ONCLASS, Random Forest, SCANVI, SVM, XGBoost) on a labeled reference and transfers labels via majority voting. Outputs per-method labels, consensus, agreement score. Use when single-method annotation is insufficient or you need ensemble uncertainty for novel states.

development199

harmony-batch-correction

Harmony batch correction for scRNA-seq and other omics. Removes batch effects from PCA embeddings while preserving biology. Run after PCA, before UMAP. Scales to millions of cells. Python (harmonypy, scanpy) and R (Seurat).

development199

anndata-data-structure

Annotated matrices for single-cell genomics. Stores X with obs/var metadata, layers, embeddings (obsm/varm), graphs (obsp/varp), uns. Use for .h5ad/.zarr I/O, concatenation, scverse integration. For analysis use scanpy; for probabilistic models use scvi-tools.

tools199

pydeseq2-differential-expression

Bulk RNA-seq DE with PyDESeq2: load counts, normalize, fit negative binomial models, Wald test (BH-FDR), LFC shrinkage, volcano/MA plots. Use for two-group comparisons, multi-factor designs with batch correction, multiple contrasts.

testing199

featurecounts-rna-counting

Counts RNA-seq reads overlapping GTF gene features. Takes sorted STAR BAMs plus GTF; outputs a per-gene tab-delimited matrix across samples. Handles strandedness (0/1/2), paired-end, multi-sample batch counting in one command, and outputs assignment statistics. Use Salmon for alignment-free quantification; use featureCounts when STAR BAMs already exist.

data-ai199

busco-status-interpretation

Guide to interpreting BUSCO completeness statuses: why Duplicated BUSCOs count as complete, parsing output files, computing/comparing completeness across proteomes/genomes, common counting mistakes. Use when running BUSCO QC, comparing assemblies, or reporting completeness. See also: prokka-genome-annotation for annotation workflows feeding BUSCO.

documentation199

deeptools-ngs-analysis

NGS CLI for ChIP/RNA/ATAC-seq. BAM→bigWig with RPGC/CPM/RPKM, sample correlation/PCA, heatmaps/profiles around features, fingerprints. For alignment use STAR/BWA; for peak calling use MACS2.

tools199

bedtools-genomic-intervals

Genomic interval ops on BED/BAM/GFF/VCF. Find overlaps, merge intervals, compute coverage, extract FASTA, find nearest features. Core for ChIP-seq peak annotation, region filtering, genome arithmetic. Use tabix for indexed single-region queries; use deeptools for normalized bigWig coverage.

tools199

ucsc-genome-browser

Query UCSC Genome Browser REST API for DNA sequences, tracks, gene models, and conservation across 100+ assemblies. Retrieve sequence by region, list/fetch BED/bigWig tracks, chromosome sizes, RefSeq/GENCODE gene structures, PhyloP/PhastCons scores. Use for UCSC annotations; Ensembl REST API for Ensembl gene IDs and VEP variant annotation.

development199

remap-database

Query ReMap 2022 TF ChIP-seq peak database via REST API and BED downloads. Retrieve TF peaks overlapping a region (chr:start-end), peaks near a gene, TFs by species, peaks filtered by biotype (promoter, enhancer), and BED files for a TF-cell type pair. Use for TF co-occupancy, regulatory annotation, and TF binding atlases. Use jaspar-database for PWM motifs; encode-database for ENCODE tracks.

development199

regulomedb-database

Query RegulomeDB v2 GET REST API to score variants for regulatory function and retrieve overlapping evidence (TF binding, histone marks, DNase peaks, footprints, motifs, eQTLs, chromatin state). Scores range 1a (strongest) to 7 (none). Use for GWAS hit prioritization, regulatory variant annotation, cis-regulatory discovery. Use clinvar-database for pathogenicity; gwas-database for trait associations.

tools199

quickgo-database

Query EBI QuickGO REST API for GO terms and protein annotations. Fetch term metadata by ID, search by keyword, walk ancestor/descendant hierarchies, download annotations filtered by taxon, evidence code, aspect. Use for GO resolution, ontology traversal, annotation retrieval before enrichment. Use gseapy-gene-enrichment for enrichment; uniprot-protein-database for proteins.

development199

jaspar-database

JASPAR 2024 TF binding profiles via REST API and pyJASPAR. Retrieve PFMs/PWMs by TF name, JASPAR ID, species, or structural class. Scan DNA for TFBS; browse by taxon (human, mouse) or TF family (bHLH, zinc finger). Use for motif enrichment input, TFBS scanning, and regulatory sequence analysis. For ChIP-seq peak motif discovery use homer-motif-analysis; for regulatory variant scoring use regulomedb-database.

development199

gget-genomic-databases

Unified CLI/Python interface to 20+ genomic databases. Gene lookups (Ensembl search/info/seq), BLAST/BLAT, AlphaFold, Enrichr enrichment, OpenTargets disease/drug, CELLxGENE single-cell, cBioPortal/COSMIC cancer, ARCHS4 expression. Spans genomics, proteomics, disease. For batch/advanced BLAST use biopython; for multi-DB Python SDK use bioservices.

tools199

ensembl-database

Ensembl REST API for gene/transcript/variant annotations in 300+ species. Gene info by symbol/ID, sequence, cross-refs (HGNC, RefSeq, UniProt), regulatory features. For bulk local use pyensembl; for pathways use kegg-database.

development199

geo-database

NCBI GEO access via GEOparse and E-utilities. Search by keyword/organism/platform, download GSE series matrices, parse GPL annotations, extract GSM metadata, load expression matrices into pandas. For single-cell use cellxgene-census; for multi-DB access use gget-genomic-databases.

data-ai199

bioservices-multi-database

Unified Python interface to 40+ bioinformatics web services: UniProt proteins, KEGG pathways, ChEMBL/ChEBI/PubChem, BLAST, cross-database ID mapping, GO annotations, PPI. For deep single-DB queries use dedicated tools (gget for Ensembl, pubchempy for PubChem); bioservices excels at cross-database workflows.

tools199

clinpgx-database

Query the ClinPGx (formerly PharmGKB) REST API plus the CPIC PostgREST companion API for pharmacogenomic clinical annotations, CPIC/DPWG dosing guidelines, gene-drug pairs, variant-drug associations, FDA/EMA drug labels, and PGx pathways. Two-host architecture: api.clinpgx.org for annotation records, api.cpicpgx.org for genotype→recommendation lookups. No auth. For germline pathogenicity use clinvar-database; for somatic cancer PGx use cosmic-database or opentargets-database; for drug bioactivity use chembl-database-bioactivity.

tools199

cbioportal-database

Cancer genomics (TCGA et al.) via cBioPortal REST API. Retrieve somatic mutations, CNAs, expression, clinical data (survival/stage/treatment) across thousands of studies. Use for TMB, oncoprints, survival analysis. For population frequencies use gnomad-database; for drug-gene interactions use opentargets-database.

tools199

gatk-variant-calling

GATK Best Practices for germline SNP/indel calling from WGS/WES BAMs. Per-sample HaplotypeCaller GVCFs, GenomicsDBImport, GenotypeGVCFs joint calling, VQSR or hard filters. Requires BWA-MEM2-aligned, markdup, BQSR BAMs. Use DeepVariant for a faster DL alternative; GATK is the NIH/ENCODE standard.

development199

archs4-database

Query ARCHS4 REST API for uniformly processed RNA-seq expression, tissue patterns, co-expression across 1M+ human/mouse samples. Retrieve z-scores, co-expressed genes, samples by metadata, HDF5 matrices. For variant population genetics use gnomad-database; for pathway enrichment use gget-genomic-databases (Enrichr).

development199

roary-pangenome

Compute the bacterial pan-genome from Prokka/Bakta GFF3 annotations with Roary's CD-HIT + BLAST + MCL clustering pipeline. Builds gene presence/absence matrices, core/soft-core/shell/cloud partitions, multi-FASTA core gene alignments (with `-e`), and a pan-genome reference. Use Panaroo for higher-accuracy pan-genomes from highly fragmented assemblies, PIRATE for paralog-aware clustering, or PPanGGOLiN for graph-based partitioning.

development199

bakta-genome-annotation

Annotate bacterial and archaeal genomes and plasmids with Bakta's Prodigal/HMM/diamond pipeline. Identifies CDS, ncRNA, tRNA, rRNA, tmRNA, sORFs, CRISPR arrays, oriC/oriV/oriT, and gaps against a curated UniRef-derived database. Produces NCBI-compatible GFF3, GenBank, EMBL, JSON, FASTA, TSV, and a circular genome plot. Use Prokka for legacy pipelines or non-bacterial kingdoms; PGAP for NCBI GenBank submission.

testing199

pysam-genomic-files

Read/write SAM/BAM/CRAM, VCF/BCF, FASTA/FASTQ. Region queries, pileup, variant filtering, read groups. Python htslib wrapper exposing samtools/bcftools CLI. Use STAR/BWA for alignment; GATK/DeepVariant for variant calling.

tools199

cbioportal-database

Cancer genomics (TCGA et al.) via cBioPortal REST API. Retrieve somatic mutations, CNAs, expression, clinical data (survival/stage/treatment) across thousands of studies. Use for TMB, oncoprints, survival analysis. For population frequencies use gnomad-database; for drug-gene interactions use opentargets-database.

tools199

molfeat-molecular-featurization

Molecular featurization hub (100+ featurizers) for ML. SMILES to fingerprints (ECFP, MACCS, MAP4), descriptors (RDKit 2D, Mordred), pretrained embeddings (ChemBERTa, GIN, Graphormer), pharmacophores. Scikit-learn compatible with parallelization/caching. For QSAR, virtual screening, similarity, and molecular DL.

testing199

bioservices-multi-database

Unified Python interface to 40+ bioinformatics web services: UniProt proteins, KEGG pathways, ChEMBL/ChEBI/PubChem, BLAST, cross-database ID mapping, GO annotations, PPI. For deep single-DB queries use dedicated tools (gget for Ensembl, pubchempy for PubChem); bioservices excels at cross-database workflows.

tools199

esm-protein-language-model

Protein language models (ESM3, ESM C) for sequence generation, structure prediction, inverse folding, and embeddings. Design novel proteins, extract ML features, or fold sequences. Local GPU or EvolutionaryScale Forge API. Use AlphaFold for traditional folding; RDKit for small molecules.

development199

zarr-python

Chunked N-D arrays with compression and cloud storage. NumPy-style indexing. Backends: local, S3, GCS, ZIP, memory. Dask/Xarray integration for parallel and labeled computation. For lineage use lamindb; for labeled arrays use xarray.

development199

snpeff-variant-annotation

Annotate and filter VCF variants with SnpEff and SnpSift. SnpEff predicts functional effects (HIGH/MODERATE/LOW/MODIFIER), genes, transcripts, AA changes, HGVS; SnpSift filters and adds ClinVar/dbSNP. Java CLI with Python subprocess integration. Use ANNOVAR for multi-database annotation; Ensembl VEP for REST API; SnpEff for fast CLI with pre-built genomes.

tools199

cnvkit-copy-number

Detect somatic CNVs from WES/WGS/targeted BAMs (CNVkit v0.9.x). Bin coverage in target/antitarget regions, normalize vs reference, segment with CBS/HMM, call amps/dels, scatter/diagram plots, purity/ploidy, VCF/SEG export. CLI plus Python API (cnvlib). Use GATK CNV for deep WGS with population controls; use CNVkit for targeted/exome where antitarget bins matter.

tools199

popv-cell-annotation

Consensus cell type annotation: runs 10+ algorithms (KNN-Harmony/BBKNN/Scanorama/scVI, CellTypist, ONCLASS, Random Forest, SCANVI, SVM, XGBoost) on a labeled reference and transfers labels via majority voting. Outputs per-method labels, consensus, agreement score. Use when single-method annotation is insufficient or you need ensemble uncertainty for novel states.

development199

harmony-batch-correction

Harmony batch correction for scRNA-seq and other omics. Removes batch effects from PCA embeddings while preserving biology. Run after PCA, before UMAP. Scales to millions of cells. Python (harmonypy, scanpy) and R (Seurat).

development199

anndata-data-structure

Annotated matrices for single-cell genomics. Stores X with obs/var metadata, layers, embeddings (obsm/varm), graphs (obsp/varp), uns. Use for .h5ad/.zarr I/O, concatenation, scverse integration. For analysis use scanpy; for probabilistic models use scvi-tools.

tools199

pydeseq2-differential-expression

Bulk RNA-seq DE with PyDESeq2: load counts, normalize, fit negative binomial models, Wald test (BH-FDR), LFC shrinkage, volcano/MA plots. Use for two-group comparisons, multi-factor designs with batch correction, multiple contrasts.

testing199

scanpy-scrna-seq

scRNA-seq with Scanpy: QC, normalization, HVG selection, PCA, neighborhood graph, UMAP/t-SNE, Leiden clustering, markers, cell annotation, trajectory inference. Standard scRNA-seq exploration.

tools199

deeptools-ngs-analysis

NGS CLI for ChIP/RNA/ATAC-seq. BAM→bigWig with RPGC/CPM/RPKM, sample correlation/PCA, heatmaps/profiles around features, fingerprints. For alignment use STAR/BWA; for peak calling use MACS2.

tools199

bedtools-genomic-intervals

Genomic interval ops on BED/BAM/GFF/VCF. Find overlaps, merge intervals, compute coverage, extract FASTA, find nearest features. Core for ChIP-seq peak annotation, region filtering, genome arithmetic. Use tabix for indexed single-region queries; use deeptools for normalized bigWig coverage.

tools199

ucsc-genome-browser

Query UCSC Genome Browser REST API for DNA sequences, tracks, gene models, and conservation across 100+ assemblies. Retrieve sequence by region, list/fetch BED/bigWig tracks, chromosome sizes, RefSeq/GENCODE gene structures, PhyloP/PhastCons scores. Use for UCSC annotations; Ensembl REST API for Ensembl gene IDs and VEP variant annotation.

development199

remap-database

Query ReMap 2022 TF ChIP-seq peak database via REST API and BED downloads. Retrieve TF peaks overlapping a region (chr:start-end), peaks near a gene, TFs by species, peaks filtered by biotype (promoter, enhancer), and BED files for a TF-cell type pair. Use for TF co-occupancy, regulatory annotation, and TF binding atlases. Use jaspar-database for PWM motifs; encode-database for ENCODE tracks.

development199

regulomedb-database

Query RegulomeDB v2 GET REST API to score variants for regulatory function and retrieve overlapping evidence (TF binding, histone marks, DNase peaks, footprints, motifs, eQTLs, chromatin state). Scores range 1a (strongest) to 7 (none). Use for GWAS hit prioritization, regulatory variant annotation, cis-regulatory discovery. Use clinvar-database for pathogenicity; gwas-database for trait associations.

tools199

quickgo-database

Query EBI QuickGO REST API for GO terms and protein annotations. Fetch term metadata by ID, search by keyword, walk ancestor/descendant hierarchies, download annotations filtered by taxon, evidence code, aspect. Use for GO resolution, ontology traversal, annotation retrieval before enrichment. Use gseapy-gene-enrichment for enrichment; uniprot-protein-database for proteins.

development199

gget-genomic-databases

Unified CLI/Python interface to 20+ genomic databases. Gene lookups (Ensembl search/info/seq), BLAST/BLAT, AlphaFold, Enrichr enrichment, OpenTargets disease/drug, CELLxGENE single-cell, cBioPortal/COSMIC cancer, ARCHS4 expression. Spans genomics, proteomics, disease. For batch/advanced BLAST use biopython; for multi-DB Python SDK use bioservices.

tools199

ensembl-database

Ensembl REST API for gene/transcript/variant annotations in 300+ species. Gene info by symbol/ID, sequence, cross-refs (HGNC, RefSeq, UniProt), regulatory features. For bulk local use pyensembl; for pathways use kegg-database.

development199

roary-pangenome

Compute the bacterial pan-genome from Prokka/Bakta GFF3 annotations with Roary's CD-HIT + BLAST + MCL clustering pipeline. Builds gene presence/absence matrices, core/soft-core/shell/cloud partitions, multi-FASTA core gene alignments (with `-e`), and a pan-genome reference. Use Panaroo for higher-accuracy pan-genomes from highly fragmented assemblies, PIRATE for paralog-aware clustering, or PPanGGOLiN for graph-based partitioning.

development199

archs4-database

Query ARCHS4 REST API for uniformly processed RNA-seq expression, tissue patterns, co-expression across 1M+ human/mouse samples. Retrieve z-scores, co-expressed genes, samples by metadata, HDF5 matrices. For variant population genetics use gnomad-database; for pathway enrichment use gget-genomic-databases (Enrichr).

development199

bakta-genome-annotation

Annotate bacterial and archaeal genomes and plasmids with Bakta's Prodigal/HMM/diamond pipeline. Identifies CDS, ncRNA, tRNA, rRNA, tmRNA, sORFs, CRISPR arrays, oriC/oriV/oriT, and gaps against a curated UniRef-derived database. Produces NCBI-compatible GFF3, GenBank, EMBL, JSON, FASTA, TSV, and a circular genome plot. Use Prokka for legacy pipelines or non-bacterial kingdoms; PGAP for NCBI GenBank submission.

testing199

pysam-genomic-files

Read/write SAM/BAM/CRAM, VCF/BCF, FASTA/FASTQ. Region queries, pileup, variant filtering, read groups. Python htslib wrapper exposing samtools/bcftools CLI. Use STAR/BWA for alignment; GATK/DeepVariant for variant calling.

tools199

lamindb-data-management

Open-source FAIR biology data framework. Version artifacts (AnnData, DataFrame, Zarr), track lineage, validate via ontologies (Bionty), query datasets. Integrates with Nextflow, Snakemake, W&B, scVI. For scRNA-seq use scanpy; for ontology lookups use bionty.

development199

pubchem-compound-search

Query PubChem (110M+ compounds) directly via the PUG-REST/JSON API with plain `requests` — no SDK install required. Search by name/CID/SMILES/InChIKey/formula, retrieve properties (MW, XLogP, TPSA, H-bond counts), do similarity/substructure searches with async ListKey polling, fetch synonyms, descriptions, assay summaries, and download SDF/PNG. For local cheminformatics use rdkit; for bioactivity-centric workflows use chembl-database-bioactivity.

development199

smina-molecular-docking

smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock; for the Python-bindings/Vinardo workflow use autodock-vina-docking.

tools199

vcf-variant-filtering

Guide to quality filtering raw VCF files before computing summary stats (Ts/Tv ratio, variant counts, AF distributions). Covers detecting raw VCFs via FILTER column and QUAL inspection, QUAL-based filtering with bcftools, Ts/Tv interpretation, and when NOT to filter. Read before any variant-level QC task. See bcftools-variant-manipulation for advanced filters, gatk-variant-calling for caller config, samtools-bam-processing for upstream alignment QC.

tools199

vcf-variant-filtering

Guide to quality filtering raw VCF files before computing summary stats (Ts/Tv ratio, variant counts, AF distributions). Covers detecting raw VCFs via FILTER column and QUAL inspection, QUAL-based filtering with bcftools, Ts/Tv interpretation, and when NOT to filter. Read before any variant-level QC task. See bcftools-variant-manipulation for advanced filters, gatk-variant-calling for caller config, samtools-bam-processing for upstream alignment QC.

tools199

single-cell-annotation-guide

Decision framework for manual marker-based, automated (CellTypist), and reference-based (popV) cell type annotation in scRNA-seq. Three-tier strategy: Tier 1 manual markers, Tier 2 CellTypist, Tier 3 popV ensemble transfer. Use when planning or troubleshooting annotation.

development199

snpeff-variant-annotation

Annotate and filter VCF variants with SnpEff and SnpSift. SnpEff predicts functional effects (HIGH/MODERATE/LOW/MODIFIER), genes, transcripts, AA changes, HGVS; SnpSift filters and adds ClinVar/dbSNP. Java CLI with Python subprocess integration. Use ANNOVAR for multi-database annotation; Ensembl VEP for REST API; SnpEff for fast CLI with pre-built genomes.

tools199

single-cell-annotation-guide

Decision framework for manual marker-based, automated (CellTypist), and reference-based (popV) cell type annotation in scRNA-seq. Three-tier strategy: Tier 1 manual markers, Tier 2 CellTypist, Tier 3 popV ensemble transfer. Use when planning or troubleshooting annotation.

development199

smina-molecular-docking

smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock; for the Python-bindings/Vinardo workflow use autodock-vina-docking.

tools199

monarch-database

Monarch Initiative knowledge graph REST API for disease-gene-phenotype associations and cross-species orthology. MONDO disease-to-gene/phenotype, HP phenotype profiles, cross-species comparisons. Use for rare disease gene prioritization and phenotype-based candidate ranking. For GWAS use gwas-database; for clinical pathogenicity use clinvar-database.

tools199

geniml

Python library for genomic interval ML. Train/apply region2vec embeddings turning BED regions into vectors, index interval datasets for ML, search embedding space with BEDSpace, and evaluate embedding quality. Use for chromatin accessibility clustering, regulatory element classification, and cross-sample region comparison.

development199

geo-database

NCBI GEO access via GEOparse and E-utilities. Search by keyword/organism/platform, download GSE series matrices, parse GPL annotations, extract GSM metadata, load expression matrices into pandas. For single-cell use cellxgene-census; for multi-DB access use gget-genomic-databases.

data-ai199

dbsnp-database

Query NCBI dbSNP for SNP records by rsID, gene, or region via E-utilities and Variation Services REST API. Retrieve alleles, MAF, variant class (SNV/indel/MNV), clinical links, cross-DB IDs (ClinVar, dbVar, 1000G). Free; 3 req/sec (10 with key). For clinical pathogenicity use clinvar-database; for population frequencies use gnomad-database.

tools199

fastp-fastq-preprocessing

All-in-one FASTQ QC and adapter trimming. Auto-detects Illumina adapters, filters low-quality reads, corrects paired-end overlaps, emits HTML+JSON QC in one pass. 3-10x faster than Trim Galore/Trimmomatic. First step before STAR, BWA-MEM2, or Salmon.

development199

monarch-database

Monarch Initiative knowledge graph REST API for disease-gene-phenotype associations and cross-species orthology. MONDO disease-to-gene/phenotype, HP phenotype profiles, cross-species comparisons. Use for rare disease gene prioritization and phenotype-based candidate ranking. For GWAS use gwas-database; for clinical pathogenicity use clinvar-database.

tools199

ena-database

ENA REST API for sequences, reads, assemblies, and annotations. Portal API search, Browser API retrieval (XML/FASTA/EMBL), file reports for FASTQ/BAM URLs, taxonomy, cross-refs. For multi-DB Python use bioservices; for NCBI-only use pubmed-database or Biopython Entrez.

development199

gwas-database

NHGRI-EBI GWAS Catalog REST API for SNP-trait associations from published GWAS. Query studies, associations, variants, traits, genes, summary stats. Build PRS candidates, analyze pleiotropy, fetch stats for Manhattan plots. No auth.

development199

bcftools-variant-manipulation

CLI for VCF/BCF: filter, merge, annotate, query, normalize, compute stats. Core post-variant-calling: quality filtering, multi-sample merging, rsID annotation, genotype extraction. Samtools companion in HTSlib. Use GATK for complex indel realignment during calling; use VCFtools for population genetics stats.

tools199

gseapy-gene-enrichment

GSEA and over-representation analysis (ORA) for RNA-seq and proteomics. Wraps Enrichr for ORA against MSigDB, KEGG, GO, and 200+ databases; runs preranked GSEA on ranked DE gene lists. Outputs enrichment tables and running-score plots. Use after DESeq2 or edgeR for pathway-level interpretation.

development199

samtools-bam-processing

CLI toolkit for SAM/BAM/CRAM: sort, index, convert, filter, QC alignments. Core commands: view, sort, index, flagstat, stats, depth, markdup, merge. Required between alignment and variant/peak calling. Use pysam for Python-native BAM access; deeptools for normalized coverage tracks.

tools199

esm-protein-language-model

Protein language models (ESM3, ESM C) for sequence generation, structure prediction, inverse folding, and embeddings. Design novel proteins, extract ML features, or fold sequences. Local GPU or EvolutionaryScale Forge API. Use AlphaFold for traditional folding; RDKit for small molecules.

development199

celltypist-cell-annotation

Automated scRNA-seq cell type annotation via pre-trained logistic regression. 45+ models: immune, gut, lung, brain, fetal, cancer microenvironments. Input normalized AnnData; outputs per-cell labels, majority-vote cluster labels, confidence scores. Use for fast, reference-backed annotation without manual marker inspection.

data-ai199

scvi-tools-single-cell

Deep generative models for single-cell omics: probabilistic batch correction (scVI), semi-supervised annotation (scANVI), CITE-seq RNA+protein (totalVI), transfer learning (scARCHES), and DE with uncertainty. Unified setup→train→extract API on AnnData. Use harmony-batch-correction for fast linear correction without deep learning; muon for multi-modal MuData workflows.

tools199

multiqc-qc-reports

Aggregates QC from 150+ bioinformatics tools into one interactive HTML report. Scans FastQC, samtools, STAR, HISAT2, Trim Galore, featureCounts, Kallisto, Salmon, Picard, GATK logs; merges per-sample stats with plots. For NGS pipeline-wide QC. Use FastQC directly for single-sample; MultiQC for multi-sample reporting.

tools199

gatk-variant-calling

GATK Best Practices for germline SNP/indel calling from WGS/WES BAMs. Per-sample HaplotypeCaller GVCFs, GenomicsDBImport, GenotypeGVCFs joint calling, VQSR or hard filters. Requires BWA-MEM2-aligned, markdup, BQSR BAMs. Use DeepVariant for a faster DL alternative; GATK is the NIH/ENCODE standard.

development199

gnomad-database

gnomAD v4 population variant frequencies via GraphQL API. Allele counts and frequencies stratified by ancestry (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint (pLI, LOEUF, missense z), and coverage. Identify rare or constrained variants. For clinical pathogenicity use clinvar-database; for GWAS use gwas-database.

tools199

geniml

Python library for genomic interval ML. Train/apply region2vec embeddings turning BED regions into vectors, index interval datasets for ML, search embedding space with BEDSpace, and evaluate embedding quality. Use for chromatin accessibility clustering, regulatory element classification, and cross-sample region comparison.

development199

gwas-database

NHGRI-EBI GWAS Catalog REST API for SNP-trait associations from published GWAS. Query studies, associations, variants, traits, genes, summary stats. Build PRS candidates, analyze pleiotropy, fetch stats for Manhattan plots. No auth.

development199

depmap-crispr-essentiality

DepMap CRISPR gene effect (Chronos) analysis: sign convention for essentiality, per-gene NaN-safe Spearman correlation, data loading/alignment. For general NaN-safe correlation see nan-safe-correlation; for quality filtering see degenerate-input-filtering.

testing199

gtars

Rust-backed Python library for fast genomic token arithmetic and BED processing. High-performance BED I/O, interval set ops (intersect, merge, complement, subtract), region tokenization against a universe, universe construction. Use for preprocessing large BED collections and ML token vocabularies.

development199

encode-database

ENCODE Portal REST API for regulatory genomics: TF ChIP-seq, ATAC-seq/DNase-seq peaks, histone marks, and RNA-seq across 1000+ cell types. Search experiments by assay/biosample/target; download BED/bigWig; retrieve SCREEN cCREs by region or gene. Use to annotate variants with regulatory tracks, find open chromatin in a cell type, or fetch peak files for ChIP/ATAC analysis. For regulatory variant scoring use regulomedb-database; for GWAS associations use gwas-database.

development199

cosmic-database

Query COSMIC for cancer somatic mutations, gene census, mutational signatures, drug resistance variants. REST API v3.1 supports gene/sample/variant queries; free registration. For germline use clinvar-database; for drug-target data use opentargets-database or chembl-database-bioactivity.

tools199

dbsnp-database

Query NCBI dbSNP for SNP records by rsID, gene, or region via E-utilities and Variation Services REST API. Retrieve alleles, MAF, variant class (SNV/indel/MNV), clinical links, cross-DB IDs (ClinVar, dbVar, 1000G). Free; 3 req/sec (10 with key). For clinical pathogenicity use clinvar-database; for population frequencies use gnomad-database.

tools199

jaspar-database

JASPAR 2024 TF binding profiles via REST API and pyJASPAR. Retrieve PFMs/PWMs by TF name, JASPAR ID, species, or structural class. Scan DNA for TFBS; browse by taxon (human, mouse) or TF family (bHLH, zinc finger). Use for motif enrichment input, TFBS scanning, and regulatory sequence analysis. For ChIP-seq peak motif discovery use homer-motif-analysis; for regulatory variant scoring use regulomedb-database.

development199

star-rna-seq-aligner

Splice-aware RNA-seq aligner producing sorted BAM and splice junction tables. Builds genome index, runs two-pass alignment for better junctions. Outputs sorted BAM, junctions (SJ.out.tab), stats (Log.final.out), optional gene counts. Use Salmon for fast pseudoalignment; STAR when a BAM is needed for variant calling, IGV, or ENCODE pipelines.

development199

scvi-tools-single-cell

Deep generative models for single-cell omics: probabilistic batch correction (scVI), semi-supervised annotation (scANVI), CITE-seq RNA+protein (totalVI), transfer learning (scARCHES), and DE with uncertainty. Unified setup→train→extract API on AnnData. Use harmony-batch-correction for fast linear correction without deep learning; muon for multi-modal MuData workflows.

tools199

clinvar-database

Query NCBI ClinVar via E-utilities for variant clinical significance, pathogenicity, disease associations. Search by gene/rsID/condition/review status; returns ClinSig, submitter data, conditions, HGVS. For GWAS use gwas-database; for variant consequence prediction use Ensembl VEP.

tools199

gene-database

NCBI Gene via E-utilities: curated records across 1M+ taxa. Official symbols, aliases, RefSeq IDs, summaries, coordinates, GO, interactions. Use for gene ID resolution and cross-species function queries. For sequences use Ensembl; for expression use geo-database.

development199

mouse-phenome-database

Retrieve mouse phenotype data from the Jackson Laboratory Mouse Phenome Database (MPD) via its REST API. Browse 520+ projects, look up per-project measure metadata, pull strain-level means (raw or LS-mean adjusted) and per-animal values, find measures by MP/VT ontology terms, and resolve strain nomenclature or gene coordinates. Use for QTL support, cross-strain comparison, mouse model selection, and ontology-driven phenotype discovery. Use monarch-database for disease-gene-phenotype knowledge graphs; ensembl-database for mouse genome annotations.

development199

prokka-genome-annotation

Annotate prokaryotic genomes (bacteria, archaea, viruses) via Prokka's BLAST/HMM pipeline. Identifies CDS, rRNA, tRNA, tmRNA, signal peptides against Pfam, TIGRFAMs, RefSeq. Outputs GFF3, GenBank, FASTA, TSV. Use PGAP for NCBI GenBank submission; Bakta for faster NCBI-compatible annotation.

devops199

celltypist-cell-annotation

Automated scRNA-seq cell type annotation via pre-trained logistic regression. 45+ models: immune, gut, lung, brain, fetal, cancer microenvironments. Input normalized AnnData; outputs per-cell labels, majority-vote cluster labels, confidence scores. Use for fast, reference-backed annotation without manual marker inspection.

data-ai199

multiqc-qc-reports

Aggregates QC from 150+ bioinformatics tools into one interactive HTML report. Scans FastQC, samtools, STAR, HISAT2, Trim Galore, featureCounts, Kallisto, Salmon, Picard, GATK logs; merges per-sample stats with plots. For NGS pipeline-wide QC. Use FastQC directly for single-sample; MultiQC for multi-sample reporting.

tools199

gtars

Rust-backed Python library for fast genomic token arithmetic and BED processing. High-performance BED I/O, interval set ops (intersect, merge, complement, subtract), region tokenization against a universe, universe construction. Use for preprocessing large BED collections and ML token vocabularies.

development199

gnomad-database

gnomAD v4 population variant frequencies via GraphQL API. Allele counts and frequencies stratified by ancestry (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint (pLI, LOEUF, missense z), and coverage. Identify rare or constrained variants. For clinical pathogenicity use clinvar-database; for GWAS use gwas-database.

tools199

lamindb-data-management

Open-source FAIR biology data framework. Version artifacts (AnnData, DataFrame, Zarr), track lineage, validate via ontologies (Bionty), query datasets. Integrates with Nextflow, Snakemake, W&B, scVI. For scRNA-seq use scanpy; for ontology lookups use bionty.

development199

salmon-rna-quantification

Ultra-fast RNA-seq transcript/gene quantification via quasi-mapping (no BAM). Builds a k-mer index from transcriptome FASTA, quantifies in minutes. Outputs TPM/count tables (quant.sf) with optional GC- and sequence-bias correction. Integrates with tximeta/tximport for DESeq2/edgeR. Use STAR when a genome-aligned BAM is needed.

development199

kegg-database

KEGG REST API (academic only). Pathways, genes, compounds, enzymes, diseases, drugs via 7 ops (info/list/find/get/conv/link/ddi). ID conversion (NCBI/UniProt/PubChem). Use bioservices for multi-DB Python.

development199

cosmic-database

Query COSMIC for cancer somatic mutations, gene census, mutational signatures, drug resistance variants. REST API v3.1 supports gene/sample/variant queries; free registration. For germline use clinvar-database; for drug-target data use opentargets-database or chembl-database-bioactivity.

tools199

clinpgx-database

Query the ClinPGx (formerly PharmGKB) REST API plus the CPIC PostgREST companion API for pharmacogenomic clinical annotations, CPIC/DPWG dosing guidelines, gene-drug pairs, variant-drug associations, FDA/EMA drug labels, and PGx pathways. Two-host architecture: api.clinpgx.org for annotation records, api.cpicpgx.org for genotype→recommendation lookups. No auth. For germline pathogenicity use clinvar-database; for somatic cancer PGx use cosmic-database or opentargets-database; for drug bioactivity use chembl-database-bioactivity.

tools199

encode-database

ENCODE Portal REST API for regulatory genomics: TF ChIP-seq, ATAC-seq/DNase-seq peaks, histone marks, and RNA-seq across 1000+ cell types. Search experiments by assay/biosample/target; download BED/bigWig; retrieve SCREEN cCREs by region or gene. Use to annotate variants with regulatory tracks, find open chromatin in a cell type, or fetch peak files for ChIP/ATAC analysis. For regulatory variant scoring use regulomedb-database; for GWAS associations use gwas-database.

development199

clinvar-database

Query NCBI ClinVar via E-utilities for variant clinical significance, pathogenicity, disease associations. Search by gene/rsID/condition/review status; returns ClinSig, submitter data, conditions, HGVS. For GWAS use gwas-database; for variant consequence prediction use Ensembl VEP.

tools199

star-rna-seq-aligner

Splice-aware RNA-seq aligner producing sorted BAM and splice junction tables. Builds genome index, runs two-pass alignment for better junctions. Outputs sorted BAM, junctions (SJ.out.tab), stats (Log.final.out), optional gene counts. Use Salmon for fast pseudoalignment; STAR when a BAM is needed for variant calling, IGV, or ENCODE pipelines.

development199

samtools-bam-processing

CLI toolkit for SAM/BAM/CRAM: sort, index, convert, filter, QC alignments. Core commands: view, sort, index, flagstat, stats, depth, markdup, merge. Required between alignment and variant/peak calling. Use pysam for Python-native BAM access; deeptools for normalized coverage tracks.

tools199

prokka-genome-annotation

Annotate prokaryotic genomes (bacteria, archaea, viruses) via Prokka's BLAST/HMM pipeline. Identifies CDS, rRNA, tRNA, tmRNA, signal peptides against Pfam, TIGRFAMs, RefSeq. Outputs GFF3, GenBank, FASTA, TSV. Use PGAP for NCBI GenBank submission; Bakta for faster NCBI-compatible annotation.

devops199

fastp-fastq-preprocessing

All-in-one FASTQ QC and adapter trimming. Auto-detects Illumina adapters, filters low-quality reads, corrects paired-end overlaps, emits HTML+JSON QC in one pass. 3-10x faster than Trim Galore/Trimmomatic. First step before STAR, BWA-MEM2, or Salmon.

development199

gseapy-gene-enrichment

GSEA and over-representation analysis (ORA) for RNA-seq and proteomics. Wraps Enrichr for ORA against MSigDB, KEGG, GO, and 200+ databases; runs preranked GSEA on ranked DE gene lists. Outputs enrichment tables and running-score plots. Use after DESeq2 or edgeR for pathway-level interpretation.

development199

bcftools-variant-manipulation

CLI for VCF/BCF: filter, merge, annotate, query, normalize, compute stats. Core post-variant-calling: quality filtering, multi-sample merging, rsID annotation, genotype extraction. Samtools companion in HTSlib. Use GATK for complex indel realignment during calling; use VCFtools for population genetics stats.

tools199

molfeat-molecular-featurization

Molecular featurization hub (100+ featurizers) for ML. SMILES to fingerprints (ECFP, MACCS, MAP4), descriptors (RDKit 2D, Mordred), pretrained embeddings (ChemBERTa, GIN, Graphormer), pharmacophores. Scikit-learn compatible with parallelization/caching. For QSAR, virtual screening, similarity, and molecular DL.

testing199

plink2-gwas-analysis

GWAS and population genetics tool. Processes PLINK (.bed/.bim/.fam), VCF, and BGEN; runs QC (MAF, HWE, missingness), IBD estimation, PCA, and linear/logistic regression GWAS. Outputs Manhattan-ready summary stats. Use regenie or SAIGE for biobanks (>100k samples) needing mixed models.

tools199

salmon-rna-quantification

Ultra-fast RNA-seq transcript/gene quantification via quasi-mapping (no BAM). Builds a k-mer index from transcriptome FASTA, quantifies in minutes. Outputs TPM/count tables (quant.sf) with optional GC- and sequence-bias correction. Integrates with tximeta/tximport for DESeq2/edgeR. Use STAR when a genome-aligned BAM is needed.

development199

ena-database

ENA REST API for sequences, reads, assemblies, and annotations. Portal API search, Browser API retrieval (XML/FASTA/EMBL), file reports for FASTQ/BAM URLs, taxonomy, cross-refs. For multi-DB Python use bioservices; for NCBI-only use pubmed-database or Biopython Entrez.

development199

depmap-crispr-essentiality

DepMap CRISPR gene effect (Chronos) analysis: sign convention for essentiality, per-gene NaN-safe Spearman correlation, data loading/alignment. For general NaN-safe correlation see nan-safe-correlation; for quality filtering see degenerate-input-filtering.

testing199

plink2-gwas-analysis

GWAS and population genetics tool. Processes PLINK (.bed/.bim/.fam), VCF, and BGEN; runs QC (MAF, HWE, missingness), IBD estimation, PCA, and linear/logistic regression GWAS. Outputs Manhattan-ready summary stats. Use regenie or SAIGE for biobanks (>100k samples) needing mixed models.

tools199

gene-database

NCBI Gene via E-utilities: curated records across 1M+ taxa. Official symbols, aliases, RefSeq IDs, summaries, coordinates, GO, interactions. Use for gene ID resolution and cross-species function queries. For sequences use Ensembl; for expression use geo-database.

development199

featurecounts-rna-counting

Counts RNA-seq reads overlapping GTF gene features. Takes sorted STAR BAMs plus GTF; outputs a per-gene tab-delimited matrix across samples. Handles strandedness (0/1/2), paired-end, multi-sample batch counting in one command, and outputs assignment statistics. Use Salmon for alignment-free quantification; use featureCounts when STAR BAMs already exist.

data-ai199

pubchem-compound-search

Query PubChem (110M+ compounds) directly via the PUG-REST/JSON API with plain `requests` — no SDK install required. Search by name/CID/SMILES/InChIKey/formula, retrieve properties (MW, XLogP, TPSA, H-bond counts), do similarity/substructure searches with async ListKey polling, fetch synonyms, descriptions, assay summaries, and download SDF/PNG. For local cheminformatics use rdkit; for bioactivity-centric workflows use chembl-database-bioactivity.

development199

zarr-python

Chunked N-D arrays with compression and cloud storage. NumPy-style indexing. Backends: local, S3, GCS, ZIP, memory. Dask/Xarray integration for parallel and labeled computation. For lineage use lamindb; for labeled arrays use xarray.

development199

uniprot-protein-database

Query UniProt REST API: search by gene/protein name, fetch FASTA, map IDs (Ensembl, PDB, RefSeq), access Swiss-Prot annotations. Use bioservices for multi-DB access; alphafold-database-access for structures.

development199

bwa-mem2-dna-aligner

Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.

tools185

bwa-mem2-dna-aligner

Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.

tools185

single-cell-annotation

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

tools183

pyhealth

Python library for healthcare ML on EHR data: process MIMIC-III/IV, eICU, OMOP-CDM; encode medical codes (ICD, ATC, NDC); build patient-level datasets; train Transformer, RETAIN, GRASP, MedBERT for mortality, drug recommendation, readmission, diagnosis prediction. Alternatives: FIDDLE (preprocessing), clinical-longformer (clinical NLP), ehr-ml (embeddings).

tools183

pubmed-database

Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.

development183

seaborn-statistical-visualization

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

development183

nan-safe-correlation

Per-feature NaN-safe Spearman/Pearson correlation across many features (genes, proteins, variants) with missing values. Covers why bulk matrix shortcuts fail, correct pairwise deletion, degenerate input filtering, and large-dataset performance. Use statistical-analysis for test choice; shap-model-explainability for interpretability.

testing183

opentrons-integration

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

development183

clinical-decision-support-documents

Guidelines for clinical decision support (CDS) documents: biomarker-stratified cohort analyses and GRADE-graded treatment reports. Covers structure, executive summaries, evidence grading (1A–2C), stats (HR, CI, survival), and biomarker integration. Use for pharma research docs, clinical guidelines, regulatory submissions.

tools183

pubmed-database

Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.

development183

shap-model-explainability

Model interpretability via SHAP (Shapley values from game theory). Covers explainer choice (Tree, Deep, Linear, Kernel, Gradient, Permutation), feature attribution, and plots (waterfall, beeswarm, bar, scatter, force, heatmap). Use to explain ML predictions, rank features, debug models, audit fairness, or compare models. Works with tree, deep, linear, and black-box models.

development183

mdtraj-trajectory-analysis

mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.

development183

latex-research-posters

Research posters in LaTeX using beamerposter, tikzposter, or baposter. Layout, typography, color schemes, figure integration, accessibility, and QA for conferences. Includes templates. For figure generation use matplotlib-scientific-plotting or plotly-interactive-plots.

testing183

scientific-manuscript-writing

Scientific manuscript writing: IMRAD, citation styles (APA/AMA/Vancouver/IEEE), figures/tables, reporting guidelines (CONSORT/STROBE/PRISMA/ARRIVE), writing principles (clarity/conciseness/accuracy), venue-specific style. For LaTeX see companion assets.

testing183

benchling-integration

Benchling R&D Python SDK: CRUD on registry entities (DNA, RNA, proteins, custom), inventory, ELN, workflow automation. Needs Benchling account and API key. Use biopython for local sequence analysis; pubchem for chemical DBs.

tools183

seaborn-statistical-plots

Statistical visualization on matplotlib with native pandas support. Auto aggregation, CIs, grouping for distributions (histplot, kdeplot), categorical (boxplot, violinplot), relational (scatterplot, lineplot), regression (regplot, lmplot), matrix (heatmap, clustermap), grids (pairplot, FacetGrid). Use for quick statistical summaries; matplotlib for fine control; plotly for interactive HTML.

development183

imaging-data-commons

Query and download NCI Imaging Data Commons (IDC) cancer radiology and pathology datasets via the idc-index Python client. No authentication required: the parquet index ships inside the pip wheel, SQL runs locally via DuckDB, and DICOM downloads stream from public S3/GCS buckets through s5cmd. Use sql_query() for DuckDB cohort selection, get_collections/get_patients/get_dicom_studies/get_dicom_series for hierarchical browsing, download_from_selection() for downloads, and get_viewer_URL() for OHIF/Slim links. Use pydicom-medical-imaging for local DICOM reading; histolab for whole-slide pathology preprocessing.

tools183

nan-safe-correlation

Per-feature NaN-safe Spearman/Pearson correlation across many features (genes, proteins, variants) with missing values. Covers why bulk matrix shortcuts fail, correct pairwise deletion, degenerate input filtering, and large-dataset performance. Use statistical-analysis for test choice; shap-model-explainability for interpretability.

testing183

matplotlib-scientific-plotting

Low-level Python plotting for scientific figures: publication-quality line, scatter, bar, heatmap, contour, 3D; multi-panel layouts; fine control of every element. PNG/PDF/SVG export. Use seaborn for quick stats, plotly for interactive.

development183

hypothesis-generation

Structured hypothesis formulation: turn observations into testable hypotheses with predictions, propose mechanisms, design experiments. Follows the scientific method. Use scientific-brainstorming for open ideation; hypogenic for automated LLM hypothesis testing on datasets.

testing183

networkx-graph-analysis

Graph and network analysis toolkit. Four graph types (directed, undirected, multi-edge), centrality, shortest paths, community detection, generators, I/O (GraphML, GML, edge list), matplotlib viz. For large graphs (100K+ nodes) use igraph or graph-tool; for GNNs use PyG.

tools183

shap-model-explainability

Model interpretability via SHAP (Shapley values from game theory). Covers explainer choice (Tree, Deep, Linear, Kernel, Gradient, Permutation), feature attribution, and plots (waterfall, beeswarm, bar, scatter, force, heatmap). Use to explain ML predictions, rank features, debug models, audit fairness, or compare models. Works with tree, deep, linear, and black-box models.

development183

degenerate-input-filtering

Filter degenerate, uninformative inputs before statistical tests: single-sequence alignments, empty files, constant features, zero-variance inputs, all-NaN columns. See nan-safe-correlation for NaN-aware correlation; statistical-analysis for test guidance.

development183

scientific-slides

Scientific presentations for conferences, seminars, thesis defenses, and grant pitches. Slide design, talk structure, timing, data viz for slides, QA. PowerPoint and LaTeX Beamer. For posters use latex-research-posters.

testing183

interpro-database

Query InterPro REST API for protein domain architecture, family classification, and member-DB integration. Search entries, retrieve a protein's domains, list family members, get taxonomic distribution, link to PDB. Unifies Pfam, PANTHER, PIRSF, PRINTS, PROSITE, SMART, CDD, NCBIfam. Use uniprot-protein-database for sequences; pdb-database for 3D structures.

development183

simpleitk-image-registration

Register, segment, filter, resample 3D medical images (MRI, CT, microscopy) via SimpleITK Python; DICOM, NIfTI, multi-modal. Rigid/affine/deformable registration, threshold/region-growing segmentation, Gaussian/morph filtering, label stats, format conversion. Use to align volumes across timepoints/modalities, segment fluorescence, or convert DICOM→NIfTI.

development183

pathml

Computational pathology toolkit for whole-slide images (WSIs): load slides, extract tiles, stain normalization, nuclear segmentation, feature extraction, and ML training. Supports H&E and multiplex. For end-to-end pipelines from raw WSIs to quantitative outputs.

tools183

omero-integration

Open-source bio-image data management. Use the omero-py client to connect to an OMERO server, retrieve images as numpy arrays, annotate with tags and key-value pairs, manage ROIs, and feed image data into Python analysis pipelines — programmatically, no GUI.

tools183

protocolsio-integration

protocols.io REST API: search and fetch wet-lab, bioinformatics, and clinical protocols by keyword, DOI, or category, with steps, reagents, materials, equipment, timing. Public access free; auth needed for private or publishing. Pair with opentrons-protocol-api or benchling-integration to execute.

tools183

histolab-wsi-processing

WSI processing for digital pathology. Tissue detection, tile extraction (random, grid, score-based), filter pipelines for H&E/IHC. For dataset prep, tile-based DL, slide QC. Use pathml for multiplexed imaging.

devops183

plotly-interactive-visualization

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

development183

benchling-integration

Benchling R&D Python SDK: CRUD on registry entities (DNA, RNA, proteins, custom), inventory, ELN, workflow automation. Needs Benchling account and API key. Use biopython for local sequence analysis; pubchem for chemical DBs.

tools183

scientific-manuscript-writing

Scientific manuscript writing: IMRAD, citation styles (APA/AMA/Vancouver/IEEE), figures/tables, reporting guidelines (CONSORT/STROBE/PRISMA/ARRIVE), writing principles (clarity/conciseness/accuracy), venue-specific style. For LaTeX see companion assets.

testing183

nnunet-segmentation

Medical image segmentation with nnU-Net's self-configuring framework — auto-selects architecture, preprocessing, training for any modality. CT, MRI, microscopy, ultrasound in 2D, 3D full-res, 3D low-res, cascade. Pipeline: convert → plan/preprocess → train (5-fold CV) → best config → predict → ensemble. Use when classical segmentation fails and annotated data exists.

development183

hypothesis-generation

Structured hypothesis formulation: turn observations into testable hypotheses with predictions, propose mechanisms, design experiments. Follows the scientific method. Use scientific-brainstorming for open ideation; hypogenic for automated LLM hypothesis testing on datasets.

testing183

seaborn-statistical-visualization

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

development183

networkx-graph-analysis

Graph and network analysis toolkit. Four graph types (directed, undirected, multi-edge), centrality, shortest paths, community detection, generators, I/O (GraphML, GML, edge list), matplotlib viz. For large graphs (100K+ nodes) use igraph or graph-tool; for GNNs use PyG.

tools183

scikit-learn-machine-learning

Classical ML in Python: classification, regression, clustering, dim reduction, evaluation, tuning, preprocessing pipelines. Linear models, tree ensembles, SVMs, K-Means, PCA, t-SNE. Use PyTorch/TF for deep learning; XGBoost/LightGBM for scale.

development183

degenerate-input-filtering

Filter degenerate, uninformative inputs before statistical tests: single-sequence alignments, empty files, constant features, zero-variance inputs, all-NaN columns. See nan-safe-correlation for NaN-aware correlation; statistical-analysis for test guidance.

development183

pyhealth

Python library for healthcare ML on EHR data: process MIMIC-III/IV, eICU, OMOP-CDM; encode medical codes (ICD, ATC, NDC); build patient-level datasets; train Transformer, RETAIN, GRASP, MedBERT for mortality, drug recommendation, readmission, diagnosis prediction. Alternatives: FIDDLE (preprocessing), clinical-longformer (clinical NLP), ehr-ml (embeddings).

tools183

simpleitk-image-registration

Register, segment, filter, resample 3D medical images (MRI, CT, microscopy) via SimpleITK Python; DICOM, NIfTI, multi-modal. Rigid/affine/deformable registration, threshold/region-growing segmentation, Gaussian/morph filtering, label stats, format conversion. Use to align volumes across timepoints/modalities, segment fluorescence, or convert DICOM→NIfTI.

development183

pydicom-medical-imaging

Pure Python DICOM for medical imaging (CT, MRI, X-ray, ultrasound). Read/write DICOM, pixels as NumPy, edit tags, windowing (VOI LUT), PHI anonymization, build DICOM, series→3D volumes. Use histolab for WSI pathology; nibabel for NIfTI.

development183

pathml

Computational pathology toolkit for whole-slide images (WSIs): load slides, extract tiles, stain normalization, nuclear segmentation, feature extraction, and ML training. Supports H&E and multiplex. For end-to-end pipelines from raw WSIs to quantitative outputs.

tools183

histolab-wsi-processing

WSI processing for digital pathology. Tissue detection, tile extraction (random, grid, score-based), filter pipelines for H&E/IHC. For dataset prep, tile-based DL, slide QC. Use pathml for multiplexed imaging.

devops183

omero-integration

Open-source bio-image data management. Use the omero-py client to connect to an OMERO server, retrieve images as numpy arrays, annotate with tags and key-value pairs, manage ROIs, and feed image data into Python analysis pipelines — programmatically, no GUI.

tools183

nnunet-segmentation

Medical image segmentation with nnU-Net's self-configuring framework — auto-selects architecture, preprocessing, training for any modality. CT, MRI, microscopy, ultrasound in 2D, 3D full-res, 3D low-res, cascade. Pipeline: convert → plan/preprocess → train (5-fold CV) → best config → predict → ensemble. Use when classical segmentation fails and annotated data exists.

development183

protocolsio-integration

protocols.io REST API: search and fetch wet-lab, bioinformatics, and clinical protocols by keyword, DOI, or category, with steps, reagents, materials, equipment, timing. Public access free; auth needed for private or publishing. Pair with opentrons-protocol-api or benchling-integration to execute.

tools183

mdtraj-trajectory-analysis

mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.

development183

clinical-decision-support-documents

Guidelines for clinical decision support (CDS) documents: biomarker-stratified cohort analyses and GRADE-graded treatment reports. Covers structure, executive summaries, evidence grading (1A–2C), stats (HR, CI, survival), and biomarker integration. Use for pharma research docs, clinical guidelines, regulatory submissions.

tools183

plotly-interactive-visualization

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

development183

opentrons-integration

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

development183

pydicom-medical-imaging

Pure Python DICOM for medical imaging (CT, MRI, X-ray, ultrasound). Read/write DICOM, pixels as NumPy, edit tags, windowing (VOI LUT), PHI anonymization, build DICOM, series→3D volumes. Use histolab for WSI pathology; nibabel for NIfTI.

development183

imaging-data-commons

Query and download NCI Imaging Data Commons (IDC) cancer radiology and pathology datasets via the idc-index Python client. No authentication required: the parquet index ships inside the pip wheel, SQL runs locally via DuckDB, and DICOM downloads stream from public S3/GCS buckets through s5cmd. Use sql_query() for DuckDB cohort selection, get_collections/get_patients/get_dicom_studies/get_dicom_series for hierarchical browsing, download_from_selection() for downloads, and get_viewer_URL() for OHIF/Slim links. Use pydicom-medical-imaging for local DICOM reading; histolab for whole-slide pathology preprocessing.

tools183

single-cell-annotation

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

tools183

scientific-slides

Scientific presentations for conferences, seminars, thesis defenses, and grant pitches. Slide design, talk structure, timing, data viz for slides, QA. PowerPoint and LaTeX Beamer. For posters use latex-research-posters.

testing183

latex-research-posters

Research posters in LaTeX using beamerposter, tikzposter, or baposter. Layout, typography, color schemes, figure integration, accessibility, and QA for conferences. Includes templates. For figure generation use matplotlib-scientific-plotting or plotly-interactive-plots.

testing183

matplotlib-scientific-plotting

Low-level Python plotting for scientific figures: publication-quality line, scatter, bar, heatmap, contour, 3D; multi-panel layouts; fine control of every element. PNG/PDF/SVG export. Use seaborn for quick stats, plotly for interactive.

development183

seaborn-statistical-plots

Statistical visualization on matplotlib with native pandas support. Auto aggregation, CIs, grouping for distributions (histplot, kdeplot), categorical (boxplot, violinplot), relational (scatterplot, lineplot), regression (regplot, lmplot), matrix (heatmap, clustermap), grids (pairplot, FacetGrid). Use for quick statistical summaries; matplotlib for fine control; plotly for interactive HTML.

development183

scikit-learn-machine-learning

Classical ML in Python: classification, regression, clustering, dim reduction, evaluation, tuning, preprocessing pipelines. Linear models, tree ensembles, SVMs, K-Means, PCA, t-SNE. Use PyTorch/TF for deep learning; XGBoost/LightGBM for scale.

development183

gnomad-database

gnomAD v4 population variant frequencies via GraphQL API. Allele counts and frequencies stratified by ancestry (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint (pLI, LOEUF, missense z), and coverage. Identify rare or constrained variants. For clinical pathogenicity use clinvar-database; for GWAS use gwas-database.

tools169

encode-database

ENCODE Portal REST API for regulatory genomics: TF ChIP-seq, ATAC-seq/DNase-seq peaks, histone marks, and RNA-seq across 1000+ cell types. Search experiments by assay/biosample/target; download BED/bigWig; retrieve SCREEN cCREs by region or gene. Use to annotate variants with regulatory tracks, find open chromatin in a cell type, or fetch peak files for ChIP/ATAC analysis. For regulatory variant scoring use regulomedb-database; for GWAS associations use gwas-database.

development169

mouse-phenome-database

Retrieve mouse phenotype data from the Jackson Laboratory Mouse Phenome Database (MPD) via its REST API. Browse 520+ projects, look up per-project measure metadata, pull strain-level means (raw or LS-mean adjusted) and per-animal values, find measures by MP/VT ontology terms, and resolve strain nomenclature or gene coordinates. Use for QTL support, cross-strain comparison, mouse model selection, and ontology-driven phenotype discovery. Use monarch-database for disease-gene-phenotype knowledge graphs; ensembl-database for mouse genome annotations.

development169

chembl-database-bioactivity

Query ChEMBL (2M+ compounds, 19M+ bioactivity measurements, 13K+ targets) via the public REST/JSON API with plain `requests` — no SDK install required. Search compounds, retrieve IC50/Ki/EC50 bioactivities, find target inhibitors, run SAR, access drug mechanism/indication data.

development169

clinvar-database

Query NCBI ClinVar via E-utilities for variant clinical significance, pathogenicity, disease associations. Search by gene/rsID/condition/review status; returns ClinSig, submitter data, conditions, HGVS. For GWAS use gwas-database; for variant consequence prediction use Ensembl VEP.

tools169

imaging-data-commons

Query and download NCI Imaging Data Commons (IDC) cancer radiology and pathology datasets via the idc-index Python client. No authentication required: the parquet index ships inside the pip wheel, SQL runs locally via DuckDB, and DICOM downloads stream from public S3/GCS buckets through s5cmd. Use sql_query() for DuckDB cohort selection, get_collections/get_patients/get_dicom_studies/get_dicom_series for hierarchical browsing, download_from_selection() for downloads, and get_viewer_URL() for OHIF/Slim links. Use pydicom-medical-imaging for local DICOM reading; histolab for whole-slide pathology preprocessing.

tools169

ena-database

ENA REST API for sequences, reads, assemblies, and annotations. Portal API search, Browser API retrieval (XML/FASTA/EMBL), file reports for FASTQ/BAM URLs, taxonomy, cross-refs. For multi-DB Python use bioservices; for NCBI-only use pubmed-database or Biopython Entrez.

development169

regulomedb-database

Query RegulomeDB v2 GET REST API to score variants for regulatory function and retrieve overlapping evidence (TF binding, histone marks, DNase peaks, footprints, motifs, eQTLs, chromatin state). Scores range 1a (strongest) to 7 (none). Use for GWAS hit prioritization, regulatory variant annotation, cis-regulatory discovery. Use clinvar-database for pathogenicity; gwas-database for trait associations.

tools169

metabolomics-workbench-database

Query Metabolomics Workbench REST API (4,200+ NIH studies) for metabolite ID, study discovery, RefMet standardization, m/z precursor searches, and gene/protein annotations. Quirks: compound input_item rejects `name` (use pubchem_cid/kegg_id/inchi_key/etc.); free-text → compound is a two-step refmet/match→refmet/name flow; moverz endpoint returns TSV text, not JSON. Use hmdb-database for local XML; pubchem-compound-search for general compound lookup.

development169

biorxiv-database

Query bioRxiv/medRxiv preprints via REST API. Search by DOI, category, or date range; retrieve metadata (title, abstract, authors, category, DOI, version history) and PDFs. No auth. For peer-reviewed biomedical use pubmed-database; broader scholarly search use openalex-database.

development169

openalex-database

Query OpenAlex REST API for 250M+ scholarly works, authors, institutions, journals, concepts. Search by keyword, author, DOI, ORCID, or ID; filter by year, OA, citations, field; retrieve citations, references, author disambiguation. Free, no auth. For PubMed use pubmed-database; preprints use biorxiv-database.

development169

chembl-database-bioactivity

Query ChEMBL (2M+ compounds, 19M+ bioactivity measurements, 13K+ targets) via the public REST/JSON API with plain `requests` — no SDK install required. Search compounds, retrieve IC50/Ki/EC50 bioactivities, find target inhibitors, run SAR, access drug mechanism/indication data.

development169

emdb-database

Look up EMDB cryo-EM density maps and fitted atomic models via the entry REST API + EBI Search WS. Fetch entry metadata (resolution, method, organism, sample), map download URLs, fitted PDB IDs, and citations. Keyword search via EBI Search. No auth. For atomic coordinates use pdb-database; for AlphaFold predictions use alphafold-database-access.

development169

unichem-database

Cross-reference compound IDs across 20+ databases (ChEMBL, DrugBank, PubChem, ChEBI, PDB, SureChEMBL, HMDB, DrugCentral, BindingDB) via UniChem REST API. Resolve InChIKeys to source IDs, translate between source-specific IDs, find structurally related compounds by connectivity. POST with a JSON body for all cross-reference queries; only /sources is GET. No auth required.

development169

imaging-data-commons

Query and download NCI Imaging Data Commons (IDC) cancer radiology and pathology datasets via the idc-index Python client. No authentication required: the parquet index ships inside the pip wheel, SQL runs locally via DuckDB, and DICOM downloads stream from public S3/GCS buckets through s5cmd. Use sql_query() for DuckDB cohort selection, get_collections/get_patients/get_dicom_studies/get_dicom_series for hierarchical browsing, download_from_selection() for downloads, and get_viewer_URL() for OHIF/Slim links. Use pydicom-medical-imaging for local DICOM reading; histolab for whole-slide pathology preprocessing.

tools169

dbsnp-database

Query NCBI dbSNP for SNP records by rsID, gene, or region via E-utilities and Variation Services REST API. Retrieve alleles, MAF, variant class (SNV/indel/MNV), clinical links, cross-DB IDs (ClinVar, dbVar, 1000G). Free; 3 req/sec (10 with key). For clinical pathogenicity use clinvar-database; for population frequencies use gnomad-database.

tools169

encode-database

ENCODE Portal REST API for regulatory genomics: TF ChIP-seq, ATAC-seq/DNase-seq peaks, histone marks, and RNA-seq across 1000+ cell types. Search experiments by assay/biosample/target; download BED/bigWig; retrieve SCREEN cCREs by region or gene. Use to annotate variants with regulatory tracks, find open chromatin in a cell type, or fetch peak files for ChIP/ATAC analysis. For regulatory variant scoring use regulomedb-database; for GWAS associations use gwas-database.

development169

regulomedb-database

Query RegulomeDB v2 GET REST API to score variants for regulatory function and retrieve overlapping evidence (TF binding, histone marks, DNase peaks, footprints, motifs, eQTLs, chromatin state). Scores range 1a (strongest) to 7 (none). Use for GWAS hit prioritization, regulatory variant annotation, cis-regulatory discovery. Use clinvar-database for pathogenicity; gwas-database for trait associations.

tools169

openalex-database

Query OpenAlex REST API for 250M+ scholarly works, authors, institutions, journals, concepts. Search by keyword, author, DOI, ORCID, or ID; filter by year, OA, citations, field; retrieve citations, references, author disambiguation. Free, no auth. For PubMed use pubmed-database; preprints use biorxiv-database.

development169

emdb-database

Look up EMDB cryo-EM density maps and fitted atomic models via the entry REST API + EBI Search WS. Fetch entry metadata (resolution, method, organism, sample), map download URLs, fitted PDB IDs, and citations. Keyword search via EBI Search. No auth. For atomic coordinates use pdb-database; for AlphaFold predictions use alphafold-database-access.

development169

opentargets-database

Query Open Targets GraphQL API for target-disease associations, evidence, drug links, safety. Search targets by gene, diseases by EFO ID; scores from 20+ sources, drug mechanisms, tractability. For ChEMBL use chembl-database-bioactivity; for trials use clinicaltrials-database-search.

tools169

pdb-database

Query RCSB PDB (200K+ structures) via the public REST + GraphQL APIs with plain `requests` (no SDK). Search by text, attribute, sequence, or 3D structure similarity (Search API); retrieve metadata via GraphQL (Data API); download PDB/mmCIF from files.rcsb.org. For AlphaFold predictions use alphafold-database-access; for protein sequences only use uniprot-protein-database.

development169

clinpgx-database

Query the ClinPGx (formerly PharmGKB) REST API plus the CPIC PostgREST companion API for pharmacogenomic clinical annotations, CPIC/DPWG dosing guidelines, gene-drug pairs, variant-drug associations, FDA/EMA drug labels, and PGx pathways. Two-host architecture: api.clinpgx.org for annotation records, api.cpicpgx.org for genotype→recommendation lookups. No auth. For germline pathogenicity use clinvar-database; for somatic cancer PGx use cosmic-database or opentargets-database; for drug bioactivity use chembl-database-bioactivity.

tools169

clinpgx-database

Query the ClinPGx (formerly PharmGKB) REST API plus the CPIC PostgREST companion API for pharmacogenomic clinical annotations, CPIC/DPWG dosing guidelines, gene-drug pairs, variant-drug associations, FDA/EMA drug labels, and PGx pathways. Two-host architecture: api.clinpgx.org for annotation records, api.cpicpgx.org for genotype→recommendation lookups. No auth. For germline pathogenicity use clinvar-database; for somatic cancer PGx use cosmic-database or opentargets-database; for drug bioactivity use chembl-database-bioactivity.

tools169

dbsnp-database

Query NCBI dbSNP for SNP records by rsID, gene, or region via E-utilities and Variation Services REST API. Retrieve alleles, MAF, variant class (SNV/indel/MNV), clinical links, cross-DB IDs (ClinVar, dbVar, 1000G). Free; 3 req/sec (10 with key). For clinical pathogenicity use clinvar-database; for population frequencies use gnomad-database.

tools169

mouse-phenome-database

Retrieve mouse phenotype data from the Jackson Laboratory Mouse Phenome Database (MPD) via its REST API. Browse 520+ projects, look up per-project measure metadata, pull strain-level means (raw or LS-mean adjusted) and per-animal values, find measures by MP/VT ontology terms, and resolve strain nomenclature or gene coordinates. Use for QTL support, cross-strain comparison, mouse model selection, and ontology-driven phenotype discovery. Use monarch-database for disease-gene-phenotype knowledge graphs; ensembl-database for mouse genome annotations.

development169

gwas-database

NHGRI-EBI GWAS Catalog REST API for SNP-trait associations from published GWAS. Query studies, associations, variants, traits, genes, summary stats. Build PRS candidates, analyze pleiotropy, fetch stats for Manhattan plots. No auth.

development169

metabolomics-workbench-database

Query Metabolomics Workbench REST API (4,200+ NIH studies) for metabolite ID, study discovery, RefMet standardization, m/z precursor searches, and gene/protein annotations. Quirks: compound input_item rejects `name` (use pubchem_cid/kegg_id/inchi_key/etc.); free-text → compound is a two-step refmet/match→refmet/name flow; moverz endpoint returns TSV text, not JSON. Use hmdb-database for local XML; pubchem-compound-search for general compound lookup.

development169

pride-database

Search the PRIDE Archive v3 REST API for proteomics datasets: discover projects by keyword + faceted filters (organism, instrument, disease, software), fetch project metadata, list and download RAW/PEAK/RESULT/FASTA files (with FTP/Aspera URLs), look up which projects mention a UniProt accession, and find similar projects. PRIDE v3 no longer exposes peptide/PSM-level identification endpoints — for spectrum-level data download the project's RESULT files. Use uniprot-protein-database for protein sequences; interpro-database for domain architecture.

development169

clinvar-database

Query NCBI ClinVar via E-utilities for variant clinical significance, pathogenicity, disease associations. Search by gene/rsID/condition/review status; returns ClinSig, submitter data, conditions, HGVS. For GWAS use gwas-database; for variant consequence prediction use Ensembl VEP.

tools169

ena-database

ENA REST API for sequences, reads, assemblies, and annotations. Portal API search, Browser API retrieval (XML/FASTA/EMBL), file reports for FASTQ/BAM URLs, taxonomy, cross-refs. For multi-DB Python use bioservices; for NCBI-only use pubmed-database or Biopython Entrez.

development169

pride-database

Search the PRIDE Archive v3 REST API for proteomics datasets: discover projects by keyword + faceted filters (organism, instrument, disease, software), fetch project metadata, list and download RAW/PEAK/RESULT/FASTA files (with FTP/Aspera URLs), look up which projects mention a UniProt accession, and find similar projects. PRIDE v3 no longer exposes peptide/PSM-level identification endpoints — for spectrum-level data download the project's RESULT files. Use uniprot-protein-database for protein sequences; interpro-database for domain architecture.

development169

gnomad-database

gnomAD v4 population variant frequencies via GraphQL API. Allele counts and frequencies stratified by ancestry (AFR, AMR, EAS, NFE, SAS, FIN, ASJ, MID), gene-level constraint (pLI, LOEUF, missense z), and coverage. Identify rare or constrained variants. For clinical pathogenicity use clinvar-database; for GWAS use gwas-database.

tools169

gwas-database

NHGRI-EBI GWAS Catalog REST API for SNP-trait associations from published GWAS. Query studies, associations, variants, traits, genes, summary stats. Build PRS candidates, analyze pleiotropy, fetch stats for Manhattan plots. No auth.

development169

gtopdb-database

Query IUPHAR/BPS Guide to Pharmacology (GtoPdb) for receptor-ligand interactions, target/ligand metadata, families, and approved drugs. Affinities (pKi/pIC50/pKd), action (Agonist/Antagonist/etc.), species, structures (SMILES/InChI). No auth. Always resolve targets via geneSymbol/accession; most metadata lives in sub-resources (/databaseLinks, /structure, /synonyms).

development169

pdb-database

Query RCSB PDB (200K+ structures) via the public REST + GraphQL APIs with plain `requests` (no SDK). Search by text, attribute, sequence, or 3D structure similarity (Search API); retrieve metadata via GraphQL (Data API); download PDB/mmCIF from files.rcsb.org. For AlphaFold predictions use alphafold-database-access; for protein sequences only use uniprot-protein-database.

development169

interpro-database

Query InterPro REST API for protein domain architecture, family classification, and member-DB integration. Search entries, retrieve a protein's domains, list family members, get taxonomic distribution, link to PDB. Unifies Pfam, PANTHER, PIRSF, PRINTS, PROSITE, SMART, CDD, NCBIfam. Use uniprot-protein-database for sequences; pdb-database for 3D structures.

development169

gtopdb-database

Query IUPHAR/BPS Guide to Pharmacology (GtoPdb) for receptor-ligand interactions, target/ligand metadata, families, and approved drugs. Affinities (pKi/pIC50/pKd), action (Agonist/Antagonist/etc.), species, structures (SMILES/InChI). No auth. Always resolve targets via geneSymbol/accession; most metadata lives in sub-resources (/databaseLinks, /structure, /synonyms).

development169

gene-database

NCBI Gene via E-utilities: curated records across 1M+ taxa. Official symbols, aliases, RefSeq IDs, summaries, coordinates, GO, interactions. Use for gene ID resolution and cross-species function queries. For sequences use Ensembl; for expression use geo-database.

development169

opentargets-database

Query Open Targets GraphQL API for target-disease associations, evidence, drug links, safety. Search targets by gene, diseases by EFO ID; scores from 20+ sources, drug mechanisms, tractability. For ChEMBL use chembl-database-bioactivity; for trials use clinicaltrials-database-search.

tools169

biorxiv-database

Query bioRxiv/medRxiv preprints via REST API. Search by DOI, category, or date range; retrieve metadata (title, abstract, authors, category, DOI, version history) and PDFs. No auth. For peer-reviewed biomedical use pubmed-database; broader scholarly search use openalex-database.

development169

unichem-database

Cross-reference compound IDs across 20+ databases (ChEMBL, DrugBank, PubChem, ChEBI, PDB, SureChEMBL, HMDB, DrugCentral, BindingDB) via UniChem REST API. Resolve InChIKeys to source IDs, translate between source-specific IDs, find structurally related compounds by connectivity. POST with a JSON body for all cross-reference queries; only /sources is GET. No auth required.

development169

gene-database

NCBI Gene via E-utilities: curated records across 1M+ taxa. Official symbols, aliases, RefSeq IDs, summaries, coordinates, GO, interactions. Use for gene ID resolution and cross-species function queries. For sequences use Ensembl; for expression use geo-database.

development169

roary-pangenome

Compute the bacterial pan-genome from Prokka/Bakta GFF3 annotations with Roary's CD-HIT + BLAST + MCL clustering pipeline. Builds gene presence/absence matrices, core/soft-core/shell/cloud partitions, multi-FASTA core gene alignments (with `-e`), and a pan-genome reference. Use Panaroo for higher-accuracy pan-genomes from highly fragmented assemblies, PIRATE for paralog-aware clustering, or PPanGGOLiN for graph-based partitioning.

development145

bakta-genome-annotation

Annotate bacterial and archaeal genomes and plasmids with Bakta's Prodigal/HMM/diamond pipeline. Identifies CDS, ncRNA, tRNA, rRNA, tmRNA, sORFs, CRISPR arrays, oriC/oriV/oriT, and gaps against a curated UniRef-derived database. Produces NCBI-compatible GFF3, GenBank, EMBL, JSON, FASTA, TSV, and a circular genome plot. Use Prokka for legacy pipelines or non-bacterial kingdoms; PGAP for NCBI GenBank submission.

testing145

bakta-genome-annotation

Annotate bacterial and archaeal genomes and plasmids with Bakta's Prodigal/HMM/diamond pipeline. Identifies CDS, ncRNA, tRNA, rRNA, tmRNA, sORFs, CRISPR arrays, oriC/oriV/oriT, and gaps against a curated UniRef-derived database. Produces NCBI-compatible GFF3, GenBank, EMBL, JSON, FASTA, TSV, and a circular genome plot. Use Prokka for legacy pipelines or non-bacterial kingdoms; PGAP for NCBI GenBank submission.

testing145

roary-pangenome

Compute the bacterial pan-genome from Prokka/Bakta GFF3 annotations with Roary's CD-HIT + BLAST + MCL clustering pipeline. Builds gene presence/absence matrices, core/soft-core/shell/cloud partitions, multi-FASTA core gene alignments (with `-e`), and a pan-genome reference. Use Panaroo for higher-accuracy pan-genomes from highly fragmented assemblies, PIRATE for paralog-aware clustering, or PPanGGOLiN for graph-based partitioning.

development145

omics-analysis-guide

Three-tiered approach to omics data analysis (transcriptomics, proteomics) covering validated pipelines, standard workflows, and custom methods

testing137

single-cell-annotation

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

tools137

single-cell-annotation

Best practices for single-cell RNA-seq cell type annotation including marker-based, reference-based, and automated classification approaches.

tools137

sgrna-design-guide

Three-tiered sgRNA design guide using validated Addgene sequences, CRISPick pre-computed datasets, or de novo design rules for CRISPR experiments

testing137

western-blot-quantification

Protocols and best practices for western blot quantification and analysis including band detection, normalization, and statistical methods.

research137

western-blot-quantification

Protocols and best practices for western blot quantification and analysis including band detection, normalization, and statistical methods.

research137

scientific-literature-search

Systematic strategies for searching scientific literature across PubMed, arXiv, Google Scholar, and AI-assisted tools. Covers PICO framework for clinical questions, three-tiered search (database-specific, AI-assisted, content extraction), PubMed field tags and MeSH, boolean query construction, and full-text extraction. Use when planning a literature search or choosing a search tier.

tools137

sar-analysis

Structure-activity relationship (SAR) analysis guide for drug discovery including molecular descriptor analysis, scaffold analysis, and activity cliff detection.

tools137

omics-analysis-guide

Three-tiered approach to omics data analysis (transcriptomics, proteomics) covering validated pipelines, standard workflows, and custom methods

testing137

scientific-literature-search

Systematic strategies for searching scientific literature across PubMed, arXiv, Google Scholar, and AI-assisted tools. Covers PICO framework for clinical questions, three-tiered search (database-specific, AI-assisted, content extraction), PubMed field tags and MeSH, boolean query construction, and full-text extraction. Use when planning a literature search or choosing a search tier.

tools137

sgrna-design-guide

Three-tiered sgRNA design guide using validated Addgene sequences, CRISPick pre-computed datasets, or de novo design rules for CRISPR experiments

testing137

sar-analysis

Structure-activity relationship (SAR) analysis guide for drug discovery including molecular descriptor analysis, scaffold analysis, and activity cliff detection.

tools137

mdanalysis-trajectory

Analyze MD trajectories from GROMACS, AMBER, NAMD, CHARMM, LAMMPS. Reads topology/trajectory into Universe objects; supports RMSD, RMSF, radius of gyration, contact maps, H-bonds, PCA, and custom distance/angle calculations. Use for post-simulation structural analysis; use OpenMM/GROMACS for running simulations.

research132

seaborn-statistical-visualization

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

development132

biopython-sequence-analysis

Biopython sequence analysis: parse FASTA/FASTQ/GenBank/GFF (SeqIO), NCBI Entrez (esearch/efetch/elink), remote/local BLAST, pairwise/MSA alignment (PairwiseAligner, MUSCLE/ClustalW), phylogenetic trees (Phylo). Use for gene family studies, phylogenomics, comparative genomics, NCBI pipelines. For PCR/restriction/cloning use biopython-molecular-biology; for SAM/BAM use pysam.

development132

cnvkit-copy-number

Detect somatic CNVs from WES/WGS/targeted BAMs (CNVkit v0.9.x). Bin coverage in target/antitarget regions, normalize vs reference, segment with CBS/HMM, call amps/dels, scatter/diagram plots, purity/ploidy, VCF/SEG export. CLI plus Python API (cnvlib). Use GATK CNV for deep WGS with population controls; use CNVkit for targeted/exome where antitarget bins matter.

tools132

pylabrobot

Hardware-agnostic Python liquid-handler library: portable scripts run on Hamilton STAR, Tecan Freedom EVO, Opentrons OT-2, or a simulator without vendor lock-in. For protocol automation, method dev, plate reformatting, serial dilutions, and Python lab workflows.

tools132

star-rna-seq-aligner

Splice-aware RNA-seq aligner producing sorted BAM and splice junction tables. Builds genome index, runs two-pass alignment for better junctions. Outputs sorted BAM, junctions (SJ.out.tab), stats (Log.final.out), optional gene counts. Use Salmon for fast pseudoalignment; STAR when a BAM is needed for variant calling, IGV, or ENCODE pipelines.

development132

mdanalysis-trajectory

Analyze MD trajectories from GROMACS, AMBER, NAMD, CHARMM, LAMMPS. Reads topology/trajectory into Universe objects; supports RMSD, RMSF, radius of gyration, contact maps, H-bonds, PCA, and custom distance/angle calculations. Use for post-simulation structural analysis; use OpenMM/GROMACS for running simulations.

research132

zinc-database

Query ZINC15/ZINC22 virtual compound libraries (1.4B compounds, 750M purchasable). Search lead/fragment/drug-like compounds by MW, logP, reactivity, or SMILES similarity; download 3D sets for docking. For bioactivity use chembl-database-bioactivity; for approved drugs use drugbank-database-access.

development132

viennarna-structure-prediction

Predict RNA secondary structure, MFE folding, base-pair probabilities, RNA-RNA interactions via ViennaRNA Python bindings. Pipeline: sequence → MFE → partition function and pair-probability matrix → dot-bracket → duplex. Use for siRNA/sgRNA targeting, ribozyme design, RNA accessibility. Use RNAfold CLI for batch use without Python.

tools132

vcf-variant-filtering

Guide to quality filtering raw VCF files before computing summary stats (Ts/Tv ratio, variant counts, AF distributions). Covers detecting raw VCFs via FILTER column and QUAL inspection, QUAL-based filtering with bcftools, Ts/Tv interpretation, and when NOT to filter. Read before any variant-level QC task. See bcftools-variant-manipulation for advanced filters, gatk-variant-calling for caller config, samtools-bam-processing for upstream alignment QC.

tools132

uspto-database

Access USPTO patent data via PatentsView REST API and Google Patents Public Data (BigQuery). Search by inventor, assignee, CPC, or keywords; download metadata and claims; analyze portfolios; track tech trends. For IP landscape analysis, competitor monitoring, prior art search, and tech forecasting in life sciences and biotech.

development132

uspto-database

Access USPTO patent data via PatentsView REST API and Google Patents Public Data (BigQuery). Search by inventor, assignee, CPC, or keywords; download metadata and claims; analyze portfolios; track tech trends. For IP landscape analysis, competitor monitoring, prior art search, and tech forecasting in life sciences and biotech.

development132

ucsc-genome-browser

Query UCSC Genome Browser REST API for DNA sequences, tracks, gene models, and conservation across 100+ assemblies. Retrieve sequence by region, list/fetch BED/bigWig tracks, chromosome sizes, RefSeq/GENCODE gene structures, PhyloP/PhastCons scores. Use for UCSC annotations; Ensembl REST API for Ensembl gene IDs and VEP variant annotation.

development132

ucsc-genome-browser

Query UCSC Genome Browser REST API for DNA sequences, tracks, gene models, and conservation across 100+ assemblies. Retrieve sequence by region, list/fetch BED/bigWig tracks, chromosome sizes, RefSeq/GENCODE gene structures, PhyloP/PhastCons scores. Use for UCSC annotations; Ensembl REST API for Ensembl gene IDs and VEP variant annotation.

development132

torch-geometric-graph-neural-networks

PyTorch Geometric (PyG) for graph neural networks: node/graph classification, link prediction with GCN, GAT, GraphSAGE, GIN. Message passing, mini-batches, heterogeneous graphs, neighbor sampling, explainability. Supports molecules (QM9, MoleculeNet), social/knowledge graphs, 3D point clouds. For non-graph DL use PyTorch; for classical graph algorithms use NetworkX.

development132

torch-geometric-graph-neural-networks

PyTorch Geometric (PyG) for graph neural networks: node/graph classification, link prediction with GCN, GAT, GraphSAGE, GIN. Message passing, mini-batches, heterogeneous graphs, neighbor sampling, explainability. Supports molecules (QM9, MoleculeNet), social/knowledge graphs, 3D point clouds. For non-graph DL use PyTorch; for classical graph algorithms use NetworkX.

development132

string-database-ppi

Query STRING REST API for PPIs (59M proteins, 20B interactions, 5000+ species). Retrieve networks, run GO/KEGG enrichment, find partners, test PPI significance, visualize networks, analyze homology. For chemical interactions use chembl-database-bioactivity; pathways use kegg-database.

development132

string-database-ppi

Query STRING REST API for PPIs (59M proteins, 20B interactions, 5000+ species). Retrieve networks, run GO/KEGG enrichment, find partners, test PPI significance, visualize networks, analyze homology. For chemical interactions use chembl-database-bioactivity; pathways use kegg-database.

development132

statistical-significance-annotation

Guide for annotating statistical significance (p-value asterisks) on comparison plots. Covers standard notation (ns, *, **, ***, ****), matplotlib bracket+asterisk implementation, and use with seaborn box/violin/bar plots. Use when preparing publication-ready figures with significance markers.

documentation132

statistical-analysis

Guided statistical analysis: test choice, assumption checks, effect sizes, power, APA reporting. Pick tests, verify assumptions, or format results for publication. Covers frequentist (t-test, ANOVA, chi-square, regression, correlation, survival, count, reliability) and Bayesian. Use statsmodels or pymc-bayesian-modeling to fit.

development132

spikeinterface-electrophysiology

Unified Python framework for extracellular electrophysiology. Load 20+ formats (SpikeGLX, OpenEphys, NWB, Intan, Maxwell, Blackrock), preprocess, run 10+ sorters (Kilosort4, SpykingCircus2, Tridesclous, MountainSort5) via one API, compute quality metrics (SNR, ISI, firing rate), compare sorters, export NWB/Phy. For format-agnostic multi-sorter workflows. For Neuropixels-specific PSTH/decoding use neuropixels.

development132

snakemake-workflow-engine

Python-based workflow manager for reproducible, scalable pipelines. Define rules with file-based dependencies; Snakemake resolves execution order and parallelism. Runs local, SLURM, LSF, AWS, GCP via profiles; per-rule conda/Singularity envs. For NGS pipelines, ML training, and multi-step file processing. Use Nextflow for Groovy dataflow or nf-core integration.

development132

single-cell-annotation-guide

Decision framework for manual marker-based, automated (CellTypist), and reference-based (popV) cell type annotation in scRNA-seq. Three-tier strategy: Tier 1 manual markers, Tier 2 CellTypist, Tier 3 popV ensemble transfer. Use when planning or troubleshooting annotation.

development132

scvi-tools-single-cell

Deep generative models for single-cell omics: probabilistic batch correction (scVI), semi-supervised annotation (scANVI), CITE-seq RNA+protein (totalVI), transfer learning (scARCHES), and DE with uncertainty. Unified setup→train→extract API on AnnData. Use harmony-batch-correction for fast linear correction without deep learning; muon for multi-modal MuData workflows.

tools132

scvi-tools-single-cell

Deep generative models for single-cell omics: probabilistic batch correction (scVI), semi-supervised annotation (scANVI), CITE-seq RNA+protein (totalVI), transfer learning (scARCHES), and DE with uncertainty. Unified setup→train→extract API on AnnData. Use harmony-batch-correction for fast linear correction without deep learning; muon for multi-modal MuData workflows.

tools132

scikit-survival-analysis

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

testing132

scikit-image-processing

Python image processing for microscopy and bioimage analysis. Read/write images, filter (Gaussian, median, LoG), segment (thresholding, watershed, active contours), measure region properties, detect features. SciPy/NumPy ecosystem. Use OpenCV for real-time video; CellPose for DL cell segmentation; napari for visualization.

development132

scikit-bio

Python library for biology: sequence manipulation (DNA/RNA/protein), pairwise/multiple alignment, phylogenetic trees (NJ, UPGMA), diversity (Shannon, Faith PD, Bray-Curtis, UniFrac), ordination (PCoA, CCA, RDA), stats (PERMANOVA, ANOSIM, Mantel), file I/O (FASTA, FASTQ, Newick, BIOM). Use for microbiome, community ecology, or phylogenetics.

development132

statsmodels-statistical-modeling

Python statistical modeling: regression (OLS, WLS, GLM), discrete (Logit, Poisson, NegBin), time series (ARIMA, SARIMAX, VAR), with rigorous inference, diagnostics, and hypothesis tests. Use scikit-learn for ML; statistical-analysis for test choice.

development132

scientific-critical-thinking

Evaluating scientific evidence and claims. Covers study design hierarchy (RCT to expert opinion), effect sizes (OR, RR, NNT, Cohen's d), confounding, p-value vs clinical significance, GRADE quality assessment, reproducibility, and bias types (selection, information, confounding, reporting). Use when reading a paper or assessing claims.

tools132

scientific-brainstorming

Structured ideation methods: SCAMPER, Six Thinking Hats, Morphological Analysis, TRIZ, Biomimicry, plus more. Decision framework for picking methods by challenge type (stuck, improving, systematic exploration, contradiction). Use when generating research ideas or exploring interdisciplinary connections.

development132

science-figure-guide

Science (AAAS) figure preparation: resolution (150-300+ DPI), formats (PDF/EPS/TIFF), RGB color, Myriad/Helvetica fonts, strict image manipulation policies including gamma adjustment disclosure.

development132

scanpy-scrna-seq

scRNA-seq with Scanpy: QC, normalization, HVG selection, PCA, neighborhood graph, UMAP/t-SNE, Leiden clustering, markers, cell annotation, trajectory inference. Standard scRNA-seq exploration.

tools132

rowan

Cloud quantum chemistry platform with Python SDK. Run geometry optimization, conformer generation, torsional scans, and energy minimization (DFT/semiempirical), and retrieve properties (dipole, partial charges, frontier orbitals) — no local QC software or HPC needed.

development132

remap-database

Query ReMap 2022 TF ChIP-seq peak database via REST API and BED downloads. Retrieve TF peaks overlapping a region (chr:start-end), peaks near a gene, TFs by species, peaks filtered by biotype (promoter, enhancer), and BED files for a TF-cell type pair. Use for TF co-occupancy, regulatory annotation, and TF binding atlases. Use jaspar-database for PWM motifs; encode-database for ENCODE tracks.

development132

reactome-database

Query Reactome pathways via REST: pathway queries, entity lookup, keyword search, gene list enrichment, hierarchy, cross-refs. Content + Analysis services. Python wrapper: reactome2py. For KEGG use kegg-database; for PPIs use string-database-ppi.

development132

reactome-database

Query Reactome pathways via REST: pathway queries, entity lookup, keyword search, gene list enrichment, hierarchy, cross-refs. Content + Analysis services. Python wrapper: reactome2py. For KEGG use kegg-database; for PPIs use string-database-ppi.

development132

rdkit-cheminformatics

Cheminformatics toolkit for molecular analysis and virtual screening: SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints (Morgan/ECFP, MACCS), Tanimoto similarity, SMARTS substructure filtering, Lipinski drug-likeness, reaction enumeration, 2D/3D coordinates. For simpler API use datamol; use RDKit for fine-grained sanitization, custom fingerprints, or SMARTS/reaction control.

tools132

quickgo-database

Query EBI QuickGO REST API for GO terms and protein annotations. Fetch term metadata by ID, search by keyword, walk ancestor/descendant hierarchies, download annotations filtered by taxon, evidence code, aspect. Use for GO resolution, ontology traversal, annotation retrieval before enrichment. Use gseapy-gene-enrichment for enrichment; uniprot-protein-database for proteins.

development132

quickgo-database

Query EBI QuickGO REST API for GO terms and protein annotations. Fetch term metadata by ID, search by keyword, walk ancestor/descendant hierarchies, download annotations filtered by taxon, evidence code, aspect. Use for GO resolution, ontology traversal, annotation retrieval before enrichment. Use gseapy-gene-enrichment for enrichment; uniprot-protein-database for proteins.

development132

pytdc-therapeutics-data-commons

Therapeutics Data Commons (TDC) AI-ready drug discovery datasets. Curated ADME, toxicity, DTI, DDI with scaffold/cold splits, standardized metrics, molecular oracles, and ADMET benchmarks for therapeutic ML and property prediction. For chemical database queries use chembl-database-bioactivity; for featurization use molfeat.

testing132

pymc-bayesian-modeling

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

testing132

pymc-bayesian-modeling

Bayesian modeling with PyMC 5: priors, likelihood, NUTS/ADVI sampling, diagnostics (R-hat, ESS), LOO/WAIC comparison, prediction. Hierarchical, logistic, GP variants; predictive checks.

testing132

pymatgen

Python Materials Genomics library for structure analysis, thermodynamics, and electronic properties. Parse/create crystal structures (CIF, POSCAR), query Materials Project for DFT-computed properties, analyze phase and Pourbaix diagrams, compute XRD patterns, generate DFT inputs for VASP, Quantum ESPRESSO, CP2K. Alternatives: ASE (MD/geometry), AFLOW (high-throughput), OVITO (visualization).

development132

pymatgen

Python Materials Genomics library for structure analysis, thermodynamics, and electronic properties. Parse/create crystal structures (CIF, POSCAR), query Materials Project for DFT-computed properties, analyze phase and Pourbaix diagrams, compute XRD patterns, generate DFT inputs for VASP, Quantum ESPRESSO, CP2K. Alternatives: ASE (MD/geometry), AFLOW (high-throughput), OVITO (visualization).

development132

pyimagej-fiji-bridge

Python bridge to ImageJ2/Fiji for macros, plugins (Bio-Formats, TrackMate, Analyze Particles), NumPy↔ImagePlus/ImgLib2 exchange, and ImageJ Ops. Automates Fiji headlessly from Python. Use scikit-image for pure Python without Fiji plugins; napari for visualization.

tools132

pyhealth

Python library for healthcare ML on EHR data: process MIMIC-III/IV, eICU, OMOP-CDM; encode medical codes (ICD, ATC, NDC); build patient-level datasets; train Transformer, RETAIN, GRASP, MedBERT for mortality, drug recommendation, readmission, diagnosis prediction. Alternatives: FIDDLE (preprocessing), clinical-longformer (clinical NLP), ehr-ml (embeddings).

tools132

pyhealth

Python library for healthcare ML on EHR data: process MIMIC-III/IV, eICU, OMOP-CDM; encode medical codes (ICD, ATC, NDC); build patient-level datasets; train Transformer, RETAIN, GRASP, MedBERT for mortality, drug recommendation, readmission, diagnosis prediction. Alternatives: FIDDLE (preprocessing), clinical-longformer (clinical NLP), ehr-ml (embeddings).

tools132

pydicom-medical-imaging

Pure Python DICOM for medical imaging (CT, MRI, X-ray, ultrasound). Read/write DICOM, pixels as NumPy, edit tags, windowing (VOI LUT), PHI anonymization, build DICOM, series→3D volumes. Use histolab for WSI pathology; nibabel for NIfTI.

development132

pydeseq2-differential-expression

Bulk RNA-seq DE with PyDESeq2: load counts, normalize, fit negative binomial models, Wald test (BH-FDR), LFC shrinkage, volcano/MA plots. Use for two-group comparisons, multi-factor designs with batch correction, multiple contrasts.

testing132

pydeseq2-differential-expression

Bulk RNA-seq DE with PyDESeq2: load counts, normalize, fit negative binomial models, Wald test (BH-FDR), LFC shrinkage, volcano/MA plots. Use for two-group comparisons, multi-factor designs with batch correction, multiple contrasts.

testing132

popv-cell-annotation

Consensus cell type annotation: runs 10+ algorithms (KNN-Harmony/BBKNN/Scanorama/scVI, CellTypist, ONCLASS, Random Forest, SCANVI, SVM, XGBoost) on a labeled reference and transfers labels via majority voting. Outputs per-method labels, consensus, agreement score. Use when single-method annotation is insufficient or you need ensemble uncertainty for novel states.

development132

popv-cell-annotation

Consensus cell type annotation: runs 10+ algorithms (KNN-Harmony/BBKNN/Scanorama/scVI, CellTypist, ONCLASS, Random Forest, SCANVI, SVM, XGBoost) on a labeled reference and transfers labels via majority voting. Outputs per-method labels, consensus, agreement score. Use when single-method annotation is insufficient or you need ensemble uncertainty for novel states.

development132

polars-dataframes

Fast in-memory DataFrame with lazy evaluation, parallel execution, Arrow backend. Use for tabular data in RAM (1–100 GB) when pandas is too slow. Expression API: select, filter, group_by, joins, pivots, window. Lazy mode enables predicate/projection pushdown. Reads CSV, Parquet, JSON, Excel, DBs, cloud. Larger-than-RAM: Dask; GPU: cuDF.

development132

pnas-figure-guide

PNAS figure preparation: resolution (300-1000 PPI), formats (TIFF/EPS/PDF), strict RGB-only color, Arial/Helvetica fonts, italicized uppercase panel labels, automated image screening.

development132

pnas-figure-guide

PNAS figure preparation: resolution (300-1000 PPI), formats (TIFF/EPS/PDF), strict RGB-only color, Arial/Helvetica fonts, italicized uppercase panel labels, automated image screening.

development132

plotly-interactive-visualization

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

development132

plotly-interactive-visualization

Interactive visualization with Plotly. 40+ chart types (scatter, line, heatmap, 3D, geographic) with hover, zoom, pan. Two APIs: Plotly Express (DataFrame) and Graph Objects (fine control). For static publication figures use matplotlib; for statistical grammar use seaborn.

development132

plotly-interactive-plots

Interactive scientific visualization with Plotly. Two APIs: plotly.express (px) for one-liner DataFrame plots, plotly.graph_objects (go) for trace-level control. 40+ chart types with hover, zoom, pan, animation. Exports HTML or static PNG/SVG/PDF via kaleido. Use for volcano plots with gene hover, dose-response dashboards, expression heatmaps, 3D molecular views. Use seaborn for stats; matplotlib for publication figures.

development132

plannotate-plasmid-annotation

Auto-annotate plasmids with features (promoters, terminators, resistance, origins, tags, fluorescent proteins) via BLAST against curated DBs (Addgene, fpbase, SnapGene). FASTA or raw sequence in; annotated GenBank, interactive HTML maps, CSV tables out. Handles circular topology. Use to verify synthetic constructs, prep Addgene submissions, share maps, or batch-annotate cloning libraries.

development132

plannotate-plasmid-annotation

Auto-annotate plasmids with features (promoters, terminators, resistance, origins, tags, fluorescent proteins) via BLAST against curated DBs (Addgene, fpbase, SnapGene). FASTA or raw sequence in; annotated GenBank, interactive HTML maps, CSV tables out. Handles circular topology. Use to verify synthetic constructs, prep Addgene submissions, share maps, or batch-annotate cloning libraries.

development132

pathml

Computational pathology toolkit for whole-slide images (WSIs): load slides, extract tiles, stain normalization, nuclear segmentation, feature extraction, and ML training. Supports H&E and multiplex. For end-to-end pipelines from raw WSIs to quantitative outputs.

tools132

opentrons-protocol-api

Python API v2 for Opentrons OT-2/Flex liquid handlers: protocols as Python files with metadata and run(); control pipettes, labware, and modules (thermocycler, heater-shaker, magnetic, temperature). Simulate via opentrons_simulate then upload. Use PyLabRobot for vendor-agnostic scripts (Hamilton, Tecan).

development132

opentrons-integration

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

development132

opencv-bioimage-analysis

Computer vision for bio-image preprocessing, feature detection, real-time microscopy. Color conversion, morphology, contour/blob detection, template matching, optical flow on fluorescence/brightfield. 10-100× faster than pure Python via C++. Use scikit-image for scientific morphometry/regionprops; OpenCV for real-time, video, classical feature extraction.

development132

omero-integration

Open-source bio-image data management. Use the omero-py client to connect to an OMERO server, retrieve images as numpy arrays, annotate with tags and key-value pairs, manage ROIs, and feed image data into Python analysis pipelines — programmatically, no GUI.

tools132

omero-integration

Open-source bio-image data management. Use the omero-py client to connect to an OMERO server, retrieve images as numpy arrays, annotate with tags and key-value pairs, manage ROIs, and feed image data into Python analysis pipelines — programmatically, no GUI.

tools132

nnunet-segmentation

Medical image segmentation with nnU-Net's self-configuring framework — auto-selects architecture, preprocessing, training for any modality. CT, MRI, microscopy, ultrasound in 2D, 3D full-res, 3D low-res, cascade. Pipeline: convert → plan/preprocess → train (5-fold CV) → best config → predict → ensemble. Use when classical segmentation fails and annotated data exists.

development132

nextflow-workflow-engine

Dataflow workflow engine for scalable bioinformatics pipelines. Defines processes (containerized tasks) connected by channels; runs local, HPC (SLURM/SGE), cloud (AWS/GCP/Azure), or Kubernetes via a single config change. Powers nf-core. Use Snakemake for rule-based Python workflows; use Nextflow for containerized, cloud-native, and nf-core pipelines.

development132

neuropixels-analysis

Pipeline for Neuropixels extracellular electrophysiology: probe geometry (ProbeInterface), Kilosort sorting via SpikeInterface, quality metrics, unit curation (ISI, firing rate, SNR), post-sort analysis (PSTH, tuning curves, population decoding). Supports Neuropixels 1.0/2.0/Ultra in rodent/primate experiments.

testing132

neurokit2

Python toolkit for neurophysiological signal processing: ECG (HR, HRV, R-peaks), EEG (complexity, PSD), EMG (activation onset), EDA/GSR (SCR decomposition), PPG, and RSP. Includes synthetic signal simulation. Alternatives: BioSPPy (less maintained), MNE (EEG/MEG specialist), heartpy (ECG only), scipy.signal (raw DSP).

tools132

nejm-figure-guide

NEJM figure preparation: resolution (300-1200 DPI), editable vector formats (AI/EPS/SVG), in-house medical illustration policy, and strict image integrity requirements.

development132

nature-figure-guide

Nature figure preparation: resolution (300+ DPI), formats (AI/EPS/TIFF), RGB color, Helvetica/Arial fonts, lowercase panel labels, image integrity requirements.

development132

clinical-decision-support-documents

Guidelines for clinical decision support (CDS) documents: biomarker-stratified cohort analyses and GRADE-graded treatment reports. Covers structure, executive summaries, evidence grading (1A–2C), stats (HR, CI, survival), and biomarker integration. Use for pharma research docs, clinical guidelines, regulatory submissions.

tools132

scikit-learn-machine-learning

Classical ML in Python: classification, regression, clustering, dim reduction, evaluation, tuning, preprocessing pipelines. Linear models, tree ensembles, SVMs, K-Means, PCA, t-SNE. Use PyTorch/TF for deep learning; XGBoost/LightGBM for scale.

development132

scikit-survival-analysis

Time-to-event modeling with scikit-survival: Cox PH (elastic net), Random Survival Forests, Boosting, SVMs for censored data. C-index, Brier, time-dependent AUC; Kaplan-Meier, Nelson-Aalen, competing risks. Pipeline/GridSearchCV compatible. Use statsmodels for frequentist, pymc for Bayesian, lifelines for parametric.

testing132

shap-model-explainability

Model interpretability via SHAP (Shapley values from game theory). Covers explainer choice (Tree, Deep, Linear, Kernel, Gradient, Permutation), feature attribution, and plots (waterfall, beeswarm, bar, scatter, force, heatmap). Use to explain ML predictions, rank features, debug models, audit fairness, or compare models. Works with tree, deep, linear, and black-box models.

development132

flowio-flow-cytometry

Parse/write FCS (Flow Cytometry) files v2.0-3.1. Events as NumPy, channel metadata, multi-dataset files, CSV/FCS export. Use FlowKit for gating/compensation.

documentation132

napari-image-viewer

Interactive viewer for microscopy. Displays 2D/3D/4D arrays as Image, Labels, Points, Shapes, Tracks layers; supports annotation, plugin analysis, headless screenshots. Core visualization for Python bioimage workflows. Use ImageJ/FIJI for macro processing; napari for Python-native interactive visualization and DL segmentation review.

tools132

trackpy-particle-tracking

Python library for single-particle tracking (SPT) in video microscopy via the Crocker-Grier algorithm. Locate particles (fluorescent spots, colloids, vesicles, cells) per frame, link into trajectories, filter short tracks, and compute MSD for diffusion analysis. 2D/3D with subpixel accuracy; reads TIF stacks, AVI, image series via pims. Use for quantitative SPT and diffusion coefficient extraction from fluorescence or brightfield video.

development132

simpleitk-image-registration

Register, segment, filter, resample 3D medical images (MRI, CT, microscopy) via SimpleITK Python; DICOM, NIfTI, multi-modal. Rigid/affine/deformable registration, threshold/region-growing segmentation, Gaussian/morph filtering, label stats, format conversion. Use to align volumes across timepoints/modalities, segment fluorescence, or convert DICOM→NIfTI.

development132

scientific-visualization

Guide for choosing and creating scientific visualizations for publications and talks. Covers chart-type selection by data structure, color theory for accessibility/print, figure composition, journal formatting (Nature, Cell, ACS), and common pitfalls. Consult when visualizing data or preparing submission figures.

development132

nan-safe-correlation

Per-feature NaN-safe Spearman/Pearson correlation across many features (genes, proteins, variants) with missing values. Covers why bulk matrix shortcuts fail, correct pairwise deletion, degenerate input filtering, and large-dataset performance. Use statistical-analysis for test choice; shap-model-explainability for interpretability.

testing132

monarch-database

Monarch Initiative knowledge graph REST API for disease-gene-phenotype associations and cross-species orthology. MONDO disease-to-gene/phenotype, HP phenotype profiles, cross-species comparisons. Use for rare disease gene prioritization and phenotype-based candidate ranking. For GWAS use gwas-database; for clinical pathogenicity use clinvar-database.

tools132

seaborn-statistical-visualization

Statistical visualization on matplotlib + pandas. Distributions (histplot, kdeplot, violin, box), relational (scatter, line), categorical, regression, correlation heatmaps. Auto aggregation/CIs. Use plotly for interactive; matplotlib for low-level.

development132

medchem

Medicinal chemistry filters for compound triage. Drug-likeness rules (Lipinski Ro5, Veber, Oprea, CNS, leadlike, REOS, Golden Triangle, Ro3), structural alerts (PAINS, NIBR, Lilly Demerits), chemical group detectors, complexity metrics, and filter composition query language. Built on RDKit/datamol. For hit-to-lead filtering, library design, ADMET pre-screening. For molecular I/O use rdkit-cheminformatics or datamol.

development132

maxquant-proteomics

MaxQuant + Perseus proteomics pipeline: run MaxQuant for LFQ and SILAC; parse proteinGroups.txt in Python; filter contaminants/decoys; log2 + median-normalize; impute MNAR; t-test with FDR; volcano plot; GO/pathway enrichment. Use Proteome Discoverer for Thermo-native processing; FragPipe/MSFragger for GPU-accelerated DB search.

development132

biopython-molecular-biology

Molecular biology toolkit: sequence manipulation, FASTA/GenBank/PDB I/O, NCBI Entrez, BLAST automation, pairwise/MSA alignment, Bio.PDB, phylogenetic trees. Use for batch processing, custom pipelines, format conversion, PubMed/GenBank queries. For quick gene lookups use gget; for multi-service REST APIs use bioservices.

tools132

bedtools-genomic-intervals

Genomic interval ops on BED/BAM/GFF/VCF. Find overlaps, merge intervals, compute coverage, extract FASTA, find nearest features. Core for ChIP-seq peak annotation, region filtering, genome arithmetic. Use tabix for indexed single-region queries; use deeptools for normalized bigWig coverage.

tools132

bwa-mem2-dna-aligner

Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.

tools132

cobrapy-metabolic-modeling

Constraint-based (COBRA) analysis of genome-scale metabolic models: FBA, FVA, knockouts, flux sampling, production envelopes, gapfilling, media optimization. Use for strain design, essential gene ID, flux analysis. For kinetic modeling use tellurium; for visualization use Escher.

data-ai132

ensembl-database

Ensembl REST API for gene/transcript/variant annotations in 300+ species. Gene info by symbol/ID, sequence, cross-refs (HGNC, RefSeq, UniProt), regulatory features. For bulk local use pyensembl; for pathways use kegg-database.

development132

anndata-data-structure

Annotated matrices for single-cell genomics. Stores X with obs/var metadata, layers, embeddings (obsm/varm), graphs (obsp/varp), uns. Use for .h5ad/.zarr I/O, concatenation, scverse integration. For analysis use scanpy; for probabilistic models use scvi-tools.

tools132

depmap-crispr-essentiality

DepMap CRISPR gene effect (Chronos) analysis: sign convention for essentiality, per-gene NaN-safe Spearman correlation, data loading/alignment. For general NaN-safe correlation see nan-safe-correlation; for quality filtering see degenerate-input-filtering.

testing132

cosmic-database

Query COSMIC for cancer somatic mutations, gene census, mutational signatures, drug resistance variants. REST API v3.1 supports gene/sample/variant queries; free registration. For germline use clinvar-database; for drug-target data use opentargets-database or chembl-database-bioactivity.

tools132

matplotlib-scientific-plotting

Low-level Python plotting for scientific figures: publication-quality line, scatter, bar, heatmap, contour, 3D; multi-panel layouts; fine control of every element. PNG/PDF/SVG export. Use seaborn for quick stats, plotly for interactive.

development132

matplotlib-scientific-plotting

Low-level Python plotting for scientific figures: publication-quality line, scatter, bar, heatmap, contour, 3D; multi-panel layouts; fine control of every element. PNG/PDF/SVG export. Use seaborn for quick stats, plotly for interactive.

development132

matlab-scientific-computing

MATLAB/GNU Octave numerical computing: matrices, linear algebra, ODEs, signal processing, optimization, statistics, scientific visualization. MATLAB-syntax examples run on both. For Python use numpy/scipy; for statistical modeling use statsmodels.

development132

matlab-scientific-computing

MATLAB/GNU Octave numerical computing: matrices, linear algebra, ODEs, signal processing, optimization, statistics, scientific visualization. MATLAB-syntax examples run on both. For Python use numpy/scipy; for statistical modeling use statsmodels.

development132

matchms-spectral-matching

MS spectral matching and metabolite ID with matchms. Import spectra (mzML, MGF, MSP, JSON), filter/normalize peaks, score similarity (cosine, modified cosine, fingerprint), build reproducible pipelines, identify unknowns vs spectral libraries. Use pyopenms for full LC-MS/MS proteomics.

development132

macs3-peak-calling

Poisson-model peak caller for ChIP-seq/ATAC-seq BAMs. MACS3 callpeak finds enriched regions (TF sites or histone marks) vs input/IgG; outputs BED narrowPeak/broadPeak for motif analysis, annotation, and differential binding. Use narrow peaks for TF ChIP-seq and ATAC-seq; broad for H3K27me3, H3K9me3, and other broad marks.

data-ai132

literature-review

Conducting systematic, scoping, and narrative literature reviews. Covers PRISMA/PRISMA-ScR protocols, search strategy (Boolean, MeSH), database selection (PubMed, Scopus, Web of Science, Embase), screening, data extraction, evidence synthesis (narrative, meta-analysis, thematic), and reporting. Use when planning or executing a formal literature review.

development132

libsbml-network-modeling

Build, read, validate, modify SBML biological network models via the libSBML Python API. SBML Levels 1–3, reactions/kinetic laws, species, rules, FBC extension for flux balance, conversion. Interoperates with COBRApy, Tellurium/RoadRunner, COPASI. Use when programmatically constructing ODE or constraint-based metabolic/signaling models in SBML.

tools132

lancet-figure-guide

The Lancet figure preparation: resolution (300+ DPI at 120%), preferred editable formats (PowerPoint/Word/SVG), column widths (75/154 mm), Times New Roman, in-house redraw policy.

development132

lancet-figure-guide

The Lancet figure preparation: resolution (300+ DPI at 120%), preferred editable formats (PowerPoint/Word/SVG), column widths (75/154 mm), Times New Roman, in-house redraw policy.

development132

kegg-database

KEGG REST API (academic only). Pathways, genes, compounds, enzymes, diseases, drugs via 7 ops (info/list/find/get/conv/link/ddi). ID conversion (NCBI/UniProt/PubChem). Use bioservices for multi-DB Python.

development132

kegg-database

KEGG REST API (academic only). Pathways, genes, compounds, enzymes, diseases, drugs via 7 ops (info/list/find/get/conv/link/ddi). ID conversion (NCBI/UniProt/PubChem). Use bioservices for multi-DB Python.

development132

hypothesis-generation

Structured hypothesis formulation: turn observations into testable hypotheses with predictions, propose mechanisms, design experiments. Follows the scientific method. Use scientific-brainstorming for open ideation; hypogenic for automated LLM hypothesis testing on datasets.

testing132

homer-motif-analysis

De novo and known TF motif enrichment in ChIP-seq/ATAC-seq peaks via HOMER. findMotifsGenome.pl finds over-represented patterns vs background; annotatePeaks.pl assigns context (TSS distance, gene, repeat). Use after MACS3 to identify enriched TFs, annotate peaks with nearest genes, and validate ChIP-seq via the target motif.

testing132

hmdb-database

Parse HMDB (Human Metabolome Database) local XML for metabolite info, chemical properties, biological context, disease links, spectra, and cross-DB mapping. No REST API — uses ~6 GB XML download. Use drugbank-database-access for drugs; pubchem-compound-search for live lookups.

development132

hmdb-database

Parse HMDB (Human Metabolome Database) local XML for metabolite info, chemical properties, biological context, disease links, spectra, and cross-DB mapping. No REST API — uses ~6 GB XML download. Use drugbank-database-access for drugs; pubchem-compound-search for live lookups.

development132

histolab-wsi-processing

WSI processing for digital pathology. Tissue detection, tile extraction (random, grid, score-based), filter pipelines for H&E/IHC. For dataset prep, tile-based DL, slide QC. Use pathml for multiplexed imaging.

devops132

histolab-wsi-processing

WSI processing for digital pathology. Tissue detection, tile extraction (random, grid, score-based), filter pipelines for H&E/IHC. For dataset prep, tile-based DL, slide QC. Use pathml for multiplexed imaging.

devops132

harmony-batch-correction

Harmony batch correction for scRNA-seq and other omics. Removes batch effects from PCA embeddings while preserving biology. Run after PCA, before UMAP. Scales to millions of cells. Python (harmonypy, scanpy) and R (Seurat).

development132

harmony-batch-correction

Harmony batch correction for scRNA-seq and other omics. Removes batch effects from PCA embeddings while preserving biology. Run after PCA, before UMAP. Scales to millions of cells. Python (harmonypy, scanpy) and R (Seurat).

development132

gtars

Rust-backed Python library for fast genomic token arithmetic and BED processing. High-performance BED I/O, interval set ops (intersect, merge, complement, subtract), region tokenization against a universe, universe construction. Use for preprocessing large BED collections and ML token vocabularies.

development132

gtars

Rust-backed Python library for fast genomic token arithmetic and BED processing. High-performance BED I/O, interval set ops (intersect, merge, complement, subtract), region tokenization against a universe, universe construction. Use for preprocessing large BED collections and ML token vocabularies.

development132

gget-genomic-databases

Unified CLI/Python interface to 20+ genomic databases. Gene lookups (Ensembl search/info/seq), BLAST/BLAT, AlphaFold, Enrichr enrichment, OpenTargets disease/drug, CELLxGENE single-cell, cBioPortal/COSMIC cancer, ARCHS4 expression. Spans genomics, proteomics, disease. For batch/advanced BLAST use biopython; for multi-DB Python SDK use bioservices.

tools132

geopandas-geospatial

Geospatial vector analysis extending pandas. Read/write spatial formats (Shapefile, GeoJSON, GeoPackage, Parquet, PostGIS), CRS handling, geometric ops (buffer, simplify, centroid, affine), spatial analysis (joins, overlays, dissolve, clipping, distance), visualization (choropleth, interactive maps, basemaps). Use for spatial joins, overlays, CRS transforms, area/distance, maps.

tools132

geopandas-geospatial

Geospatial vector analysis extending pandas. Read/write spatial formats (Shapefile, GeoJSON, GeoPackage, Parquet, PostGIS), CRS handling, geometric ops (buffer, simplify, centroid, affine), spatial analysis (joins, overlays, dissolve, clipping, distance), visualization (choropleth, interactive maps, basemaps). Use for spatial joins, overlays, CRS transforms, area/distance, maps.

tools132

geniml

Python library for genomic interval ML. Train/apply region2vec embeddings turning BED regions into vectors, index interval datasets for ML, search embedding space with BEDSpace, and evaluate embedding quality. Use for chromatin accessibility clustering, regulatory element classification, and cross-sample region comparison.

development132

general-figure-guide

Universal QA checklist for generated scientific plots: overlapping labels, clipped text, missing axes/legends, overcrowded data, and cross-journal resolution/format guidance.

tools132

general-figure-guide

Universal QA checklist for generated scientific plots: overlapping labels, clipped text, missing axes/legends, overcrowded data, and cross-journal resolution/format guidance.

tools132

gatk-variant-calling

GATK Best Practices for germline SNP/indel calling from WGS/WES BAMs. Per-sample HaplotypeCaller GVCFs, GenomicsDBImport, GenotypeGVCFs joint calling, VQSR or hard filters. Requires BWA-MEM2-aligned, markdup, BQSR BAMs. Use DeepVariant for a faster DL alternative; GATK is the NIH/ENCODE standard.

development132

flowio-flow-cytometry

Parse/write FCS (Flow Cytometry) files v2.0-3.1. Events as NumPy, channel metadata, multi-dataset files, CSV/FCS export. Use FlowKit for gating/compensation.

documentation132

fda-database

Query openFDA REST API for adverse events (FAERS), labeling, product info, recalls, enforcement. Search by drug name, ingredient, MedDRA, or NDC. 1k req/day no key; 120k with free key. For trials use clinicaltrials-database-search; for structures use drugbank-database-access or chembl-database-bioactivity.

tools132

fda-database

Query openFDA REST API for adverse events (FAERS), labeling, product info, recalls, enforcement. Search by drug name, ingredient, MedDRA, or NDC. 1k req/day no key; 120k with free key. For trials use clinicaltrials-database-search; for structures use drugbank-database-access or chembl-database-bioactivity.

tools132

etetoolkit

ETE Toolkit (ETE3): Python phylogenetic tree analysis and visualization. Parse Newick/NHX/PhyloXML, traverse/annotate nodes, render figures with TreeStyle/NodeStyle, integrate NCBI taxonomy, run PhyloTree comparative genomics. Use for species trees, gene family evolution, annotated tree figures.

tools132

ensembl-database

Ensembl REST API for gene/transcript/variant annotations in 300+ species. Gene info by symbol/ID, sequence, cross-refs (HGNC, RefSeq, UniProt), regulatory features. For bulk local use pyensembl; for pathways use kegg-database.

development132

elife-figure-guide

eLife figure preparation: file formats (TIFF/EPS/PDF), striking image requirements (1800x900 px), figure supplement naming, and image screening policy treating selective enhancement as misconduct.

development132

elife-figure-guide

eLife figure preparation: file formats (TIFF/EPS/PDF), striking image requirements (1800x900 px), figure supplement naming, and image screening policy treating selective enhancement as misconduct.

development132

drugbank-database-access

Parse local DrugBank XML for drug info, interactions, targets, and properties. Search by ID/name/CAS, extract DDIs with severity, map targets/enzymes/transporters, compute SMILES similarity. Primary via local XML; REST API rate-limited (3k/month dev). For live bioactivity use chembl-database-bioactivity; for compound properties use pubchem-compound-search.

development132

drugbank-database-access

Parse local DrugBank XML for drug info, interactions, targets, and properties. Search by ID/name/CAS, extract DDIs with severity, map targets/enzymes/transporters, compute SMILES similarity. Primary via local XML; REST API rate-limited (3k/month dev). For live bioactivity use chembl-database-bioactivity; for compound properties use pubchem-compound-search.

development132

diffdock

Diffusion-based docking that predicts protein-ligand poses without a predefined site. Use for blind docking, when traditional docking fails, or exploring multiple binding modes. Pipeline: prep protein (PDB) and ligand (SMILES/SDF), run inference, analyze confidence-ranked poses.

devops132

deseq2-differential-expression

Bulk RNA-seq DE with R/Bioconductor DESeq2. Negative binomial GLM, empirical Bayes shrinkage, Wald/LRT tests, multi-factor designs, Salmon tximeta import, apeglm LFC shrinkage, MA/volcano/heatmap viz. R gold standard. Use pydeseq2-differential-expression for Python; use edgeR for TMM normalization.

development132

deseq2-differential-expression

Bulk RNA-seq DE with R/Bioconductor DESeq2. Negative binomial GLM, empirical Bayes shrinkage, Wald/LRT tests, multi-factor designs, Salmon tximeta import, apeglm LFC shrinkage, MA/volcano/heatmap viz. R gold standard. Use pydeseq2-differential-expression for Python; use edgeR for TMM normalization.

development132

depmap-crispr-essentiality

DepMap CRISPR gene effect (Chronos) analysis: sign convention for essentiality, per-gene NaN-safe Spearman correlation, data loading/alignment. For general NaN-safe correlation see nan-safe-correlation; for quality filtering see degenerate-input-filtering.

testing132

degenerate-input-filtering

Filter degenerate, uninformative inputs before statistical tests: single-sequence alignments, empty files, constant features, zero-variance inputs, all-NaN columns. See nan-safe-correlation for NaN-aware correlation; statistical-analysis for test guidance.

development132

degenerate-input-filtering

Filter degenerate, uninformative inputs before statistical tests: single-sequence alignments, empty files, constant features, zero-variance inputs, all-NaN columns. See nan-safe-correlation for NaN-aware correlation; statistical-analysis for test guidance.

development132

sympy-symbolic-math

Symbolic math in Python: exact algebra, calculus (derivatives, integrals, limits), equation solving, symbolic matrices, ODEs, code gen (lambdify, C/Fortran). Use for exact symbolic results. For numerical use numpy/scipy; for stats use statsmodels.

development132

samtools-bam-processing

CLI toolkit for SAM/BAM/CRAM: sort, index, convert, filter, QC alignments. Core commands: view, sort, index, flagstat, stats, depth, markdup, merge. Required between alignment and variant/peak calling. Use pysam for Python-native BAM access; deeptools for normalized coverage tracks.

tools132

pysam-genomic-files

Read/write SAM/BAM/CRAM, VCF/BCF, FASTA/FASTQ. Region queries, pileup, variant filtering, read groups. Python htslib wrapper exposing samtools/bcftools CLI. Use STAR/BWA for alignment; GATK/DeepVariant for variant calling.

tools132

scikit-bio

Python library for biology: sequence manipulation (DNA/RNA/protein), pairwise/multiple alignment, phylogenetic trees (NJ, UPGMA), diversity (Shannon, Faith PD, Bray-Curtis, UniFrac), ordination (PCoA, CCA, RDA), stats (PERMANOVA, ANOSIM, Mantel), file I/O (FASTA, FASTQ, Newick, BIOM). Use for microbiome, community ecology, or phylogenetics.

development132

snpeff-variant-annotation

Annotate and filter VCF variants with SnpEff and SnpSift. SnpEff predicts functional effects (HIGH/MODERATE/LOW/MODIFIER), genes, transcripts, AA changes, HGVS; SnpSift filters and adds ClinVar/dbSNP. Java CLI with Python subprocess integration. Use ANNOVAR for multi-database annotation; Ensembl VEP for REST API; SnpEff for fast CLI with pre-built genomes.

tools132

star-rna-seq-aligner

Splice-aware RNA-seq aligner producing sorted BAM and splice junction tables. Builds genome index, runs two-pass alignment for better junctions. Outputs sorted BAM, junctions (SJ.out.tab), stats (Log.final.out), optional gene counts. Use Salmon for fast pseudoalignment; STAR when a BAM is needed for variant calling, IGV, or ENCODE pipelines.

development132

pylabrobot

Hardware-agnostic Python liquid-handler library: portable scripts run on Hamilton STAR, Tecan Freedom EVO, Opentrons OT-2, or a simulator without vendor lock-in. For protocol automation, method dev, plate reformatting, serial dilutions, and Python lab workflows.

tools132

nextflow-workflow-engine

Dataflow workflow engine for scalable bioinformatics pipelines. Defines processes (containerized tasks) connected by channels; runs local, HPC (SLURM/SGE), cloud (AWS/GCP/Azure), or Kubernetes via a single config change. Powers nf-core. Use Snakemake for rule-based Python workflows; use Nextflow for containerized, cloud-native, and nf-core pipelines.

development132

multiqc-qc-reports

Aggregates QC from 150+ bioinformatics tools into one interactive HTML report. Scans FastQC, samtools, STAR, HISAT2, Trim Galore, featureCounts, Kallisto, Salmon, Picard, GATK logs; merges per-sample stats with plots. For NGS pipeline-wide QC. Use FastQC directly for single-sample; MultiQC for multi-sample reporting.

tools132

pyopenms-mass-spectrometry

MS data processing with PyOpenMS for LC-MS/MS proteomics and metabolomics — mzML/mzXML I/O, signal processing (smoothing, peak picking, centroiding), feature detection/linking, peptide/protein ID with FDR, untargeted metabolomics. Use matchms for simple spectral matching.

data-ai132

nejm-figure-guide

NEJM figure preparation: resolution (300-1200 DPI), editable vector formats (AI/EPS/SVG), in-house medical illustration policy, and strict image integrity requirements.

development132

umap-learn

UMAP dimensionality reduction for visualization, clustering prep, and feature engineering. Fast nonlinear manifold learning preserving local and global structure. Standard UMAP (fit/transform, sklearn-compatible), supervised/semi-supervised, Parametric UMAP (NN encoder/decoder, TensorFlow), DensMAP (density), AlignedUMAP (temporal/batch). 15+ distance metrics, custom Numba metrics, precomputed distances. For linear reduction use PCA; for neighborhood graphs use sklearn NearestNeighbors.

development132

vaex-dataframes

Out-of-core DataFrame for billion-row data via lazy evaluation and memory-mapped files. Use when data exceeds RAM (10 GB–TB) for fast aggregation, filtering, virtual columns, and visualization without loading. Supports HDF5, Arrow, Parquet, CSV with cloud (S3, GCS, Azure). Built-in ML transformers (scaling, PCA, K-means). In-memory: polars; distributed: dask.

devops132

transformers-bio-nlp

HuggingFace Transformers with biomedical LMs (BioBERT, PubMedBERT, BioGPT, BioMedLM) for scientific NLP: NER (genes, diseases, chemicals), relation extraction, QA, text classification, abstract summarization. Covers loading, biomedical tokenization, inference pipelines, fine-tuning. Alternatives: spaCy en_core_sci_lg (rule-based NER), Stanza (biomedical models), NLTK.

testing132

citation-management

Selecting a reference manager and applying citation styles. Compares Zotero, Mendeley, EndNote, Paperpile; covers APA/Vancouver/ACS/Nature styles, DOI management, citation tracking, and Word/Google Docs/LaTeX integration. Use when setting up a reference workflow or fixing citation formatting.

development132

mofaplus-multi-omics

Multi-Omics Factor Analysis v2 (MOFA+) with mofapy2. Jointly decompose omics layers (scRNA, ATAC, proteomics, methylation) into latent factors capturing major variation. Multi-group designs. AnnData views → MOFA object → train → variance explained → correlate factors with metadata → visualize/cluster → enrich top loadings.

data-ai132

gseapy-gene-enrichment

GSEA and over-representation analysis (ORA) for RNA-seq and proteomics. Wraps Enrichr for ORA against MSigDB, KEGG, GO, and 200+ databases; runs preranked GSEA on ranked DE gene lists. Outputs enrichment tables and running-score plots. Use after DESeq2 or edgeR for pathway-level interpretation.

development132

peer-review-methodology

Structured peer review of manuscripts and grants. 7-stage evaluation: initial assessment, section review, statistical rigor, reproducibility, figure integrity, ethics, writing. Covers CONSORT/STROBE/PRISMA and report structure. For evidence quality see scientific-critical-thinking; scoring see scholar-evaluation.

development132

nnunet-segmentation

Medical image segmentation with nnU-Net's self-configuring framework — auto-selects architecture, preprocessing, training for any modality. CT, MRI, microscopy, ultrasound in 2D, 3D full-res, 3D low-res, cascade. Pipeline: convert → plan/preprocess → train (5-fold CV) → best config → predict → ensemble. Use when classical segmentation fails and annotated data exists.

development132

statistical-significance-annotation

Guide for annotating statistical significance (p-value asterisks) on comparison plots. Covers standard notation (ns, *, **, ***, ****), matplotlib bracket+asterisk implementation, and use with seaborn box/violin/bar plots. Use when preparing publication-ready figures with significance markers.

documentation132

diffdock

Diffusion-based docking that predicts protein-ligand poses without a predefined site. Use for blind docking, when traditional docking fails, or exploring multiple binding modes. Pipeline: prep protein (PDB) and ligand (SMILES/SDF), run inference, analyze confidence-ranked poses.

devops132

medchem

Medicinal chemistry filters for compound triage. Drug-likeness rules (Lipinski Ro5, Veber, Oprea, CNS, leadlike, REOS, Golden Triangle, Ro3), structural alerts (PAINS, NIBR, Lilly Demerits), chemical group detectors, complexity metrics, and filter composition query language. Built on RDKit/datamol. For hit-to-lead filtering, library design, ADMET pre-screening. For molecular I/O use rdkit-cheminformatics or datamol.

development132

autodock-vina-docking

Molecular docking with AutoDock Vina (Python API). Receptor/ligand prep (Meeko + RDKit), grid box, docking, pose and binding energy analysis, and batch virtual screening.

development132

torchdrug

PyTorch-based ML platform for drug discovery: graph molecular representation learning, property prediction (ADMET, activity), retrosynthesis, drug-target interaction (DTI), and pretraining on large molecular datasets. Provides GNN layers (GraphConv, GAT, MPNN), pretrained models, and benchmark datasets.

data-ai132

zinc-database

Query ZINC15/ZINC22 virtual compound libraries (1.4B compounds, 750M purchasable). Search lead/fragment/drug-like compounds by MW, logP, reactivity, or SMILES similarity; download 3D sets for docking. For bioactivity use chembl-database-bioactivity; for approved drugs use drugbank-database-access.

development132

scientific-brainstorming

Structured ideation methods: SCAMPER, Six Thinking Hats, Morphological Analysis, TRIZ, Biomimicry, plus more. Decision framework for picking methods by challenge type (stuck, improving, systematic exploration, contradiction). Use when generating research ideas or exploring interdisciplinary connections.

development132

scientific-critical-thinking

Evaluating scientific evidence and claims. Covers study design hierarchy (RCT to expert opinion), effect sizes (OR, RR, NNT, Cohen's d), confounding, p-value vs clinical significance, GRADE quality assessment, reproducibility, and bias types (selection, information, confounding, reporting). Use when reading a paper or assessing claims.

tools132

napari-image-viewer

Interactive viewer for microscopy. Displays 2D/3D/4D arrays as Image, Labels, Points, Shapes, Tracks layers; supports annotation, plugin analysis, headless screenshots. Core visualization for Python bioimage workflows. Use ImageJ/FIJI for macro processing; napari for Python-native interactive visualization and DL segmentation review.

tools132

pydicom-medical-imaging

Pure Python DICOM for medical imaging (CT, MRI, X-ray, ultrasound). Read/write DICOM, pixels as NumPy, edit tags, windowing (VOI LUT), PHI anonymization, build DICOM, series→3D volumes. Use histolab for WSI pathology; nibabel for NIfTI.

development132

nan-safe-correlation

Per-feature NaN-safe Spearman/Pearson correlation across many features (genes, proteins, variants) with missing values. Covers why bulk matrix shortcuts fail, correct pairwise deletion, degenerate input filtering, and large-dataset performance. Use statistical-analysis for test choice; shap-model-explainability for interpretability.

testing132

statistical-analysis

Guided statistical analysis: test choice, assumption checks, effect sizes, power, APA reporting. Pick tests, verify assumptions, or format results for publication. Covers frequentist (t-test, ANOVA, chi-square, regression, correlation, survival, count, reliability) and Bayesian. Use statsmodels or pymc-bayesian-modeling to fit.

development132

kegg-pathway-analysis

Guide to KEGG pathway enrichment for DEG results. Covers ORA vs GSEA, mandatory directionality splitting, KEGG organism codes, API failure handling with offline fallbacks, cross-condition comparisons, and answer-first reporting. Consult when running enrichment with clusterProfiler or gseapy.

development132

scikit-learn-machine-learning

Classical ML in Python: classification, regression, clustering, dim reduction, evaluation, tuning, preprocessing pipelines. Linear models, tree ensembles, SVMs, K-Means, PCA, t-SNE. Use PyTorch/TF for deep learning; XGBoost/LightGBM for scale.

development132

statsmodels-statistical-modeling

Python statistical modeling: regression (OLS, WLS, GLM), discrete (Logit, Poisson, NegBin), time series (ARIMA, SARIMAX, VAR), with rigorous inference, diagnostics, and hypothesis tests. Use scikit-learn for ML; statistical-analysis for test choice.

development132

scikit-image-processing

Python image processing for microscopy and bioimage analysis. Read/write images, filter (Gaussian, median, LoG), segment (thresholding, watershed, active contours), measure region properties, detect features. SciPy/NumPy ecosystem. Use OpenCV for real-time video; CellPose for DL cell segmentation; napari for visualization.

development132

scientific-visualization

Guide for choosing and creating scientific visualizations for publications and talks. Covers chart-type selection by data structure, color theory for accessibility/print, figure composition, journal formatting (Nature, Cell, ACS), and common pitfalls. Consult when visualizing data or preparing submission figures.

development132

trackpy-particle-tracking

Python library for single-particle tracking (SPT) in video microscopy via the Crocker-Grier algorithm. Locate particles (fluorescent spots, colloids, vesicles, cells) per frame, link into trajectories, filter short tracks, and compute MSD for diffusion analysis. 2D/3D with subpixel accuracy; reads TIF stacks, AVI, image series via pims. Use for quantitative SPT and diffusion coefficient extraction from fluorescence or brightfield video.

development132

bcftools-variant-manipulation

CLI for VCF/BCF: filter, merge, annotate, query, normalize, compute stats. Core post-variant-calling: quality filtering, multi-sample merging, rsID annotation, genotype extraction. Samtools companion in HTSlib. Use GATK for complex indel realignment during calling; use VCFtools for population genetics stats.

tools132

bwa-mem2-dna-aligner

Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.

tools132

plotly-interactive-plots

Interactive scientific visualization with Plotly. Two APIs: plotly.express (px) for one-liner DataFrame plots, plotly.graph_objects (go) for trace-level control. 40+ chart types with hover, zoom, pan, animation. Exports HTML or static PNG/SVG/PDF via kaleido. Use for volcano plots with gene hover, dose-response dashboards, expression heatmaps, 3D molecular views. Use seaborn for stats; matplotlib for publication figures.

development132

bioservices-multi-database

Unified Python interface to 40+ bioinformatics web services: UniProt proteins, KEGG pathways, ChEMBL/ChEBI/PubChem, BLAST, cross-database ID mapping, GO annotations, PPI. For deep single-DB queries use dedicated tools (gget for Ensembl, pubchempy for PubChem); bioservices excels at cross-database workflows.

tools132

gget-genomic-databases

Unified CLI/Python interface to 20+ genomic databases. Gene lookups (Ensembl search/info/seq), BLAST/BLAT, AlphaFold, Enrichr enrichment, OpenTargets disease/drug, CELLxGENE single-cell, cBioPortal/COSMIC cancer, ARCHS4 expression. Spans genomics, proteomics, disease. For batch/advanced BLAST use biopython; for multi-DB Python SDK use bioservices.

tools132

pyopenms-mass-spectrometry

MS data processing with PyOpenMS for LC-MS/MS proteomics and metabolomics — mzML/mzXML I/O, signal processing (smoothing, peak picking, centroiding), feature detection/linking, peptide/protein ID with FDR, untargeted metabolomics. Use matchms for simple spectral matching.

data-ai132

etetoolkit

ETE Toolkit (ETE3): Python phylogenetic tree analysis and visualization. Parse Newick/NHX/PhyloXML, traverse/annotate nodes, render figures with TreeStyle/NodeStyle, integrate NCBI taxonomy, run PhyloTree comparative genomics. Use for species trees, gene family evolution, annotated tree figures.

tools132

monarch-database

Monarch Initiative knowledge graph REST API for disease-gene-phenotype associations and cross-species orthology. MONDO disease-to-gene/phenotype, HP phenotype profiles, cross-species comparisons. Use for rare disease gene prioritization and phenotype-based candidate ranking. For GWAS use gwas-database; for clinical pathogenicity use clinvar-database.

tools132

geniml

Python library for genomic interval ML. Train/apply region2vec embeddings turning BED regions into vectors, index interval datasets for ML, search embedding space with BEDSpace, and evaluate embedding quality. Use for chromatin accessibility clustering, regulatory element classification, and cross-sample region comparison.

development132

homer-motif-analysis

De novo and known TF motif enrichment in ChIP-seq/ATAC-seq peaks via HOMER. findMotifsGenome.pl finds over-represented patterns vs background; annotatePeaks.pl assigns context (TSS distance, gene, repeat). Use after MACS3 to identify enriched TFs, annotate peaks with nearest genes, and validate ChIP-seq via the target motif.

testing132

multiqc-qc-reports

Aggregates QC from 150+ bioinformatics tools into one interactive HTML report. Scans FastQC, samtools, STAR, HISAT2, Trim Galore, featureCounts, Kallisto, Salmon, Picard, GATK logs; merges per-sample stats with plots. For NGS pipeline-wide QC. Use FastQC directly for single-sample; MultiQC for multi-sample reporting.

tools132

macs3-peak-calling

Poisson-model peak caller for ChIP-seq/ATAC-seq BAMs. MACS3 callpeak finds enriched regions (TF sites or histone marks) vs input/IgG; outputs BED narrowPeak/broadPeak for motif analysis, annotation, and differential binding. Use narrow peaks for TF ChIP-seq and ATAC-seq; broad for H3K27me3, H3K9me3, and other broad marks.

data-ai132

single-cell-annotation-guide

Decision framework for manual marker-based, automated (CellTypist), and reference-based (popV) cell type annotation in scRNA-seq. Three-tier strategy: Tier 1 manual markers, Tier 2 CellTypist, Tier 3 popV ensemble transfer. Use when planning or troubleshooting annotation.

development132

pysam-genomic-files

Read/write SAM/BAM/CRAM, VCF/BCF, FASTA/FASTQ. Region queries, pileup, variant filtering, read groups. Python htslib wrapper exposing samtools/bcftools CLI. Use STAR/BWA for alignment; GATK/DeepVariant for variant calling.

tools132

salmon-rna-quantification

Ultra-fast RNA-seq transcript/gene quantification via quasi-mapping (no BAM). Builds a k-mer index from transcriptome FASTA, quantifies in minutes. Outputs TPM/count tables (quant.sf) with optional GC- and sequence-bias correction. Integrates with tximeta/tximport for DESeq2/edgeR. Use STAR when a genome-aligned BAM is needed.

development132

snpeff-variant-annotation

Annotate and filter VCF variants with SnpEff and SnpSift. SnpEff predicts functional effects (HIGH/MODERATE/LOW/MODIFIER), genes, transcripts, AA changes, HGVS; SnpSift filters and adds ClinVar/dbSNP. Java CLI with Python subprocess integration. Use ANNOVAR for multi-database annotation; Ensembl VEP for REST API; SnpEff for fast CLI with pre-built genomes.

tools132

samtools-bam-processing

CLI toolkit for SAM/BAM/CRAM: sort, index, convert, filter, QC alignments. Core commands: view, sort, index, flagstat, stats, depth, markdup, merge. Required between alignment and variant/peak calling. Use pysam for Python-native BAM access; deeptools for normalized coverage tracks.

tools132

exploratory-data-analysis

Methodology for exploratory data analysis on scientific files. Decision frameworks by data type (tabular, sequence, image, spectral, structural, omics), quality assessment, report generation, format detection across 200+ formats. Use when given a data file for initial exploration or to pick an analysis before a pipeline.

development132

hypogenic-hypothesis-generation

LLM-driven hypothesis generation/testing on tabular data. Three methods: HypoGeniC (data-driven), HypoRefine (literature+data), Union. Iterative refinement, Redis caching, multi-hypothesis inference. Manual: hypothesis-generation; ideation: scientific-brainstorming.

testing132

cobrapy-metabolic-modeling

Constraint-based (COBRA) analysis of genome-scale metabolic models: FBA, FVA, knockouts, flux sampling, production envelopes, gapfilling, media optimization. Use for strain design, essential gene ID, flux analysis. For kinetic modeling use tellurium; for visualization use Escher.

data-ai132

remap-database

Query ReMap 2022 TF ChIP-seq peak database via REST API and BED downloads. Retrieve TF peaks overlapping a region (chr:start-end), peaks near a gene, TFs by species, peaks filtered by biotype (promoter, enhancer), and BED files for a TF-cell type pair. Use for TF co-occupancy, regulatory annotation, and TF binding atlases. Use jaspar-database for PWM motifs; encode-database for ENCODE tracks.

development132

viennarna-structure-prediction

Predict RNA secondary structure, MFE folding, base-pair probabilities, RNA-RNA interactions via ViennaRNA Python bindings. Pipeline: sequence → MFE → partition function and pair-probability matrix → dot-bracket → duplex. Use for siRNA/sgRNA targeting, ribozyme design, RNA accessibility. Use RNAfold CLI for batch use without Python.

tools132

matchms-spectral-matching

MS spectral matching and metabolite ID with matchms. Import spectra (mzML, MGF, MSP, JSON), filter/normalize peaks, score similarity (cosine, modified cosine, fingerprint), build reproducible pipelines, identify unknowns vs spectral libraries. Use pyopenms for full LC-MS/MS proteomics.

development132

pymoo

Python framework for single- and multi-objective optimization with evolutionary algorithms. Define vectorized objectives and constraints; solve with NSGA-II, NSGA-III, MOEA/D, GAs, or differential evolution. Analyze Pareto fronts, visualize trade-offs, customize operators and callbacks. For engineering design, hyperparameter search, and conflicting objectives. Alternatives: scipy.optimize (single-objective, gradient), platypus, jMetalPy (Java).

development132

neuropixels-analysis

Pipeline for Neuropixels extracellular electrophysiology: probe geometry (ProbeInterface), Kilosort sorting via SpikeInterface, quality metrics, unit curation (ISI, firing rate, SNR), post-sort analysis (PSTH, tuning curves, population decoding). Supports Neuropixels 1.0/2.0/Ultra in rodent/primate experiments.

testing132

literature-review

Conducting systematic, scoping, and narrative literature reviews. Covers PRISMA/PRISMA-ScR protocols, search strategy (Boolean, MeSH), database selection (PubMed, Scopus, Web of Science, Embase), screening, data extraction, evidence synthesis (narrative, meta-analysis, thematic), and reporting. Use when planning or executing a formal literature review.

development132

cnvkit-copy-number

Detect somatic CNVs from WES/WGS/targeted BAMs (CNVkit v0.9.x). Bin coverage in target/antitarget regions, normalize vs reference, segment with CBS/HMM, call amps/dels, scatter/diagram plots, purity/ploidy, VCF/SEG export. CLI plus Python API (cnvlib). Use GATK CNV for deep WGS with population controls; use CNVkit for targeted/exome where antitarget bins matter.

tools132

deepchem

Deep learning for drug discovery. 60+ models (GCN, GAT, AttentiveFP, MPNN, ChemBERTa, GROVER), 50+ featurizers, MoleculeNet benchmarks, HPO, transfer learning. Unified load-featurize-split-train-evaluate API. For fingerprints use rdkit-cheminformatics; for featurization-only use molfeat.

development132

spikeinterface-electrophysiology

Unified Python framework for extracellular electrophysiology. Load 20+ formats (SpikeGLX, OpenEphys, NWB, Intan, Maxwell, Blackrock), preprocess, run 10+ sorters (Kilosort4, SpykingCircus2, Tridesclous, MountainSort5) via one API, compute quality metrics (SNR, ISI, firing rate), compare sorters, export NWB/Phy. For format-agnostic multi-sorter workflows. For Neuropixels-specific PSTH/decoding use neuropixels.

development132

polars-dataframes

Fast in-memory DataFrame with lazy evaluation, parallel execution, Arrow backend. Use for tabular data in RAM (1–100 GB) when pandas is too slow. Expression API: select, filter, group_by, joins, pivots, window. Lazy mode enables predicate/projection pushdown. Reads CSV, Parquet, JSON, Excel, DBs, cloud. Larger-than-RAM: Dask; GPU: cuDF.

development132

peer-review-methodology

Structured peer review of manuscripts and grants. 7-stage evaluation: initial assessment, section review, statistical rigor, reproducibility, figure integrity, ethics, writing. Covers CONSORT/STROBE/PRISMA and report structure. For evidence quality see scientific-critical-thinking; scoring see scholar-evaluation.

development132

scientific-schematics

Designing scientific schematics, diagrams, and graphical abstracts. Covers tool selection (BioRender, Inkscape, Affinity, PowerPoint), design principles for pathway diagrams, mechanism schematics, experimental workflows, and journal graphical abstracts. Includes composition, icon sourcing, color for biological entities, and accessibility. Use when creating illustrative (not data-driven) scientific figures.

tools132

vaex-dataframes

Out-of-core DataFrame for billion-row data via lazy evaluation and memory-mapped files. Use when data exceeds RAM (10 GB–TB) for fast aggregation, filtering, virtual columns, and visualization without loading. Supports HDF5, Arrow, Parquet, CSV with cloud (S3, GCS, Azure). Built-in ML transformers (scaling, PCA, K-means). In-memory: polars; distributed: dask.

devops132

umap-learn

UMAP dimensionality reduction for visualization, clustering prep, and feature engineering. Fast nonlinear manifold learning preserving local and global structure. Standard UMAP (fit/transform, sklearn-compatible), supervised/semi-supervised, Parametric UMAP (NN encoder/decoder, TensorFlow), DensMAP (density), AlignedUMAP (temporal/batch). 15+ distance metrics, custom Numba metrics, precomputed distances. For linear reduction use PCA; for neighborhood graphs use sklearn NearestNeighbors.

development132

torchdrug

PyTorch-based ML platform for drug discovery: graph molecular representation learning, property prediction (ADMET, activity), retrosynthesis, drug-target interaction (DTI), and pretraining on large molecular datasets. Provides GNN layers (GraphConv, GAT, MPNN), pretrained models, and benchmark datasets.

data-ai132

libsbml-network-modeling

Build, read, validate, modify SBML biological network models via the libSBML Python API. SBML Levels 1–3, reactions/kinetic laws, species, rules, FBC extension for flux balance, conversion. Interoperates with COBRApy, Tellurium/RoadRunner, COPASI. Use when programmatically constructing ODE or constraint-based metabolic/signaling models in SBML.

tools132

ddinter-database

Query DDInter drug-drug interactions via REST API (1.7M+ interactions, 2,400+ drugs). Search by drug name/ID for severity (major/moderate/minor), mechanisms, and clinical recommendations. No auth. For FDA labeling use dailymed-database; for pharmacogenomics use clinpgx-database.

tools132

mofaplus-multi-omics

Multi-Omics Factor Analysis v2 (MOFA+) with mofapy2. Jointly decompose omics layers (scRNA, ATAC, proteomics, methylation) into latent factors capturing major variation. Multi-group designs. AnnData views → MOFA object → train → variance explained → correlate factors with metadata → visualize/cluster → enrich top loadings.

data-ai132

kegg-pathway-analysis

Guide to KEGG pathway enrichment for DEG results. Covers ORA vs GSEA, mandatory directionality splitting, KEGG organism codes, API failure handling with offline fallbacks, cross-condition comparisons, and answer-first reporting. Consult when running enrichment with clusterProfiler or gseapy.

development132

rdkit-cheminformatics

Cheminformatics toolkit for molecular analysis and virtual screening: SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints (Morgan/ECFP, MACCS), Tanimoto similarity, SMARTS substructure filtering, Lipinski drug-likeness, reaction enumeration, 2D/3D coordinates. For simpler API use datamol; use RDKit for fine-grained sanitization, custom fingerprints, or SMARTS/reaction control.

tools132

rowan

Cloud quantum chemistry platform with Python SDK. Run geometry optimization, conformer generation, torsional scans, and energy minimization (DFT/semiempirical), and retrieve properties (dipole, partial charges, frontier orbitals) — no local QC software or HPC needed.

development132

science-figure-guide

Science (AAAS) figure preparation: resolution (150-300+ DPI), formats (PDF/EPS/TIFF), RGB color, Myriad/Helvetica fonts, strict image manipulation policies including gamma adjustment disclosure.

development132

clinicaltrials-database-search

Query ClinicalTrials.gov API v2 for trial data. Search by condition, drug/intervention, location, sponsor, or phase; fetch details by NCT ID; filter by status; paginate; export CSV. For clinical research, patient matching, and trial portfolio analysis.

tools132

clinicaltrials-database-search

Query ClinicalTrials.gov API v2 for trial data. Search by condition, drug/intervention, location, sponsor, or phase; fetch details by NCT ID; filter by status; paginate; export CSV. For clinical research, patient matching, and trial portfolio analysis.

tools132

clinical-decision-support-documents

Guidelines for clinical decision support (CDS) documents: biomarker-stratified cohort analyses and GRADE-graded treatment reports. Covers structure, executive summaries, evidence grading (1A–2C), stats (HR, CI, survival), and biomarker integration. Use for pharma research docs, clinical guidelines, regulatory submissions.

tools132

citation-management

Selecting a reference manager and applying citation styles. Compares Zotero, Mendeley, EndNote, Paperpile; covers APA/Vancouver/ACS/Nature styles, DOI management, citation tracking, and Word/Google Docs/LaTeX integration. Use when setting up a reference workflow or fixing citation formatting.

development132

cellxgene-census

Query CELLxGENE Census (61M+ cells). Search by cell type/tissue/disease/organism; get AnnData, stream out-of-core, train PyTorch models. For your own data use scanpy; for annotated data use anndata.

data-ai132

cellxgene-census

Query CELLxGENE Census (61M+ cells). Search by cell type/tissue/disease/organism; get AnnData, stream out-of-core, train PyTorch models. For your own data use scanpy; for annotated data use anndata.

data-ai132

celltypist-cell-annotation

Automated scRNA-seq cell type annotation via pre-trained logistic regression. 45+ models: immune, gut, lung, brain, fetal, cancer microenvironments. Input normalized AnnData; outputs per-cell labels, majority-vote cluster labels, confidence scores. Use for fast, reference-backed annotation without manual marker inspection.

data-ai132

celltypist-cell-annotation

Automated scRNA-seq cell type annotation via pre-trained logistic regression. 45+ models: immune, gut, lung, brain, fetal, cancer microenvironments. Input normalized AnnData; outputs per-cell labels, majority-vote cluster labels, confidence scores. Use for fast, reference-backed annotation without manual marker inspection.

data-ai132

cellpose-cell-segmentation

DL cell/nucleus segmentation for fluorescence and brightfield microscopy. Pre-trained models (cyto3, nuclei, tissuenet) and a generalist flow-based algorithm segment cells without retraining. Outputs label masks for morphology and tracking. Use scikit-image watershed for rule-based; Cellpose when DL generalization across staining is needed.

development132

cellpose-cell-segmentation

DL cell/nucleus segmentation for fluorescence and brightfield microscopy. Pre-trained models (cyto3, nuclei, tissuenet) and a generalist flow-based algorithm segment cells without retraining. Outputs label masks for morphology and tracking. Use scikit-image watershed for rule-based; Cellpose when DL generalization across staining is needed.

development132

cell-figure-guide

Cell (Cell Press) figure preparation: resolution (300-1000 DPI), formats (TIFF/PDF), RGB color, Avenir/Arial fonts, uppercase panel labels, strict image manipulation policies.

development132

cellchat-cell-communication

Infer and visualize intercellular communication from scRNA-seq with CellChat (R). Build CellChat from Seurat/counts → subset CellChatDB ligand-receptor pairs → over-expressed genes per group → communication probabilities → pathway signaling → network centrality (senders/receivers/influencers) → chord/heatmap/bubble plots → cross-condition compare. Human, mouse. Use liana for pure-Python.

development132

cbioportal-database

Cancer genomics (TCGA et al.) via cBioPortal REST API. Retrieve somatic mutations, CNAs, expression, clinical data (survival/stage/treatment) across thousands of studies. Use for TMB, oncoprints, survival analysis. For population frequencies use gnomad-database; for drug-gene interactions use dgidb-database.

tools132

cbioportal-database

Cancer genomics (TCGA et al.) via cBioPortal REST API. Retrieve somatic mutations, CNAs, expression, clinical data (survival/stage/treatment) across thousands of studies. Use for TMB, oncoprints, survival analysis. For population frequencies use gnomad-database; for drug-gene interactions use dgidb-database.

tools132

cancer-research-figure-guide

Cancer Research (AACR) figures: resolution (300-1200 DPI), formats (EPS/TIFF/AI), hierarchical panel labels (Ai, Aii, Bi), figure/table limits, legend requirements with replicate counts.

development132

cancer-research-figure-guide

Cancer Research (AACR) figures: resolution (300-1200 DPI), formats (EPS/TIFF/AI), hierarchical panel labels (Ai, Aii, Bi), figure/table limits, legend requirements with replicate counts.

development132

busco-status-interpretation

Guide to interpreting BUSCO completeness statuses: why Duplicated BUSCOs count as complete, parsing output files, computing/comparing completeness across proteomes/genomes, common counting mistakes. Use when running BUSCO QC, comparing assemblies, or reporting completeness. See also: prokka-genome-annotation for annotation workflows feeding BUSCO.

documentation132

brenda-database

BRENDA Enzyme DB SOAP/REST queries: kinetic parameters (Km, Vmax, kcat, Ki), EC classes, substrate specificity, inhibitors, cofactors, organism data. 80K+ enzymes, 7M+ values. Free academic registration. For metabolic modeling use cobrapy-metabolic-modeling; metabolites use hmdb-database.

testing132

brenda-database

BRENDA Enzyme DB SOAP/REST queries: kinetic parameters (Km, Vmax, kcat, Ki), EC classes, substrate specificity, inhibitors, cofactors, organism data. 80K+ enzymes, 7M+ values. Free academic registration. For metabolic modeling use cobrapy-metabolic-modeling; metabolites use hmdb-database.

testing132

bioservices-multi-database

Unified Python interface to 40+ bioinformatics web services: UniProt proteins, KEGG pathways, ChEMBL/ChEBI/PubChem, BLAST, cross-database ID mapping, GO annotations, PPI. For deep single-DB queries use dedicated tools (gget for Ensembl, pubchempy for PubChem); bioservices excels at cross-database workflows.

tools132

deeptools-ngs-analysis

NGS CLI for ChIP/RNA/ATAC-seq. BAM→bigWig with RPGC/CPM/RPKM, sample correlation/PCA, heatmaps/profiles around features, fingerprints. For alignment use STAR/BWA; for peak calling use MACS2.

tools132

biopython-sequence-analysis

Biopython sequence analysis: parse FASTA/FASTQ/GenBank/GFF (SeqIO), NCBI Entrez (esearch/efetch/elink), remote/local BLAST, pairwise/MSA alignment (PairwiseAligner, MUSCLE/ClustalW), phylogenetic trees (Phylo). Use for gene family studies, phylogenomics, comparative genomics, NCBI pipelines. For PCR/restriction/cloning use biopython-molecular-biology; for SAM/BAM use pysam.

development132

biopython-molecular-biology

Molecular biology toolkit: sequence manipulation, FASTA/GenBank/PDB I/O, NCBI Entrez, BLAST automation, pairwise/MSA alignment, Bio.PDB, phylogenetic trees. Use for batch processing, custom pipelines, format conversion, PubMed/GenBank queries. For quick gene lookups use gget; for multi-service REST APIs use bioservices.

tools132

bedtools-genomic-intervals

Genomic interval ops on BED/BAM/GFF/VCF. Find overlaps, merge intervals, compute coverage, extract FASTA, find nearest features. Core for ChIP-seq peak annotation, region filtering, genome arithmetic. Use tabix for indexed single-region queries; use deeptools for normalized bigWig coverage.

tools132

bcftools-variant-manipulation

CLI for VCF/BCF: filter, merge, annotate, query, normalize, compute stats. Core post-variant-calling: quality filtering, multi-sample merging, rsID annotation, genotype extraction. Samtools companion in HTSlib. Use GATK for complex indel realignment during calling; use VCFtools for population genetics stats.

tools132

autodock-vina-docking

Molecular docking with AutoDock Vina (Python API). Receptor/ligand prep (Meeko + RDKit), grid box, docking, pose and binding energy analysis, and batch virtual screening.

development132

astropy-astronomy

Core Python library for astronomy/astrophysics: units with dimensional analysis, celestial coordinate transforms (ICRS/Galactic/AltAz/FK5), FITS I/O, tables (FITS/HDF5/VOTable/CSV), cosmology (Planck18, distance/age), precise time (UTC/TAI/TT/TDB, Julian, barycentric), WCS pixel-world mapping, model fitting. For general tables use pandas/polars; for radio interferometry use CASA.

development132

astropy-astronomy

Core Python library for astronomy/astrophysics: units with dimensional analysis, celestial coordinate transforms (ICRS/Galactic/AltAz/FK5), FITS I/O, tables (FITS/HDF5/VOTable/CSV), cosmology (Planck18, distance/age), precise time (UTC/TAI/TT/TDB, Julian, barycentric), WCS pixel-world mapping, model fitting. For general tables use pandas/polars; for radio interferometry use CASA.

development132

archs4-database

Query ARCHS4 REST API for uniformly processed RNA-seq expression, tissue patterns, co-expression across 1M+ human/mouse samples. Retrieve z-scores, co-expressed genes, samples by metadata, HDF5 matrices. For variant population genetics use gnomad-database; for pathway enrichment use gget-genomic-databases (Enrichr).

development132

archs4-database

Query ARCHS4 REST API for uniformly processed RNA-seq expression, tissue patterns, co-expression across 1M+ human/mouse samples. Retrieve z-scores, co-expressed genes, samples by metadata, HDF5 matrices. For variant population genetics use gnomad-database; for pathway enrichment use gget-genomic-databases (Enrichr).

development132

arboreto-grn-inference

GRN inference from expression via GRNBoost2 (gradient boosting) or GENIE3 (Random Forest). Load matrix, filter by TFs, infer TF-target-importance links, save network. Dask-parallelized to single-cell scale. Core SCENIC component.

tools132

arboreto-grn-inference

GRN inference from expression via GRNBoost2 (gradient boosting) or GENIE3 (Random Forest). Load matrix, filter by TFs, infer TF-target-importance links, save network. Dask-parallelized to single-cell scale. Core SCENIC component.

tools132

anndata-data-structure

Annotated matrices for single-cell genomics. Stores X with obs/var metadata, layers, embeddings (obsm/varm), graphs (obsp/varp), uns. Use for .h5ad/.zarr I/O, concatenation, scverse integration. For analysis use scanpy; for probabilistic models use scvi-tools.

tools132

alphafold-database-access

Access AlphaFold DB's 200M+ predicted structures by UniProt ID. Download PDB/mmCIF, analyze pLDDT/PAE, bulk-fetch proteomes via Google Cloud. For experimental structures use PDB; for prediction use ColabFold or ESMFold.

development132

alphafold-database-access

Access AlphaFold DB's 200M+ predicted structures by UniProt ID. Download PDB/mmCIF, analyze pLDDT/PAE, bulk-fetch proteomes via Google Cloud. For experimental structures use PDB; for prediction use ColabFold or ESMFold.

development132

scanpy-scrna-seq

scRNA-seq with Scanpy: QC, normalization, HVG selection, PCA, neighborhood graph, UMAP/t-SNE, Leiden clustering, markers, cell annotation, trajectory inference. Standard scRNA-seq exploration.

tools132

aeon

scikit-learn compatible Python toolkit for time series ML: classify, cluster, regress, segment, transform with 30+ algorithms (ROCKET, InceptionTime, KNN-DTW, HIVE-COTE, WEASEL). Handles panel, multivariate, and unequal-length series. Maintained successor to sktime. Alternatives: sktime (larger ecosystem), tslearn (fewer algorithms), catch22 (features only).

tools132

aeon

scikit-learn compatible Python toolkit for time series ML: classify, cluster, regress, segment, transform with 30+ algorithms (ROCKET, InceptionTime, KNN-DTW, HIVE-COTE, WEASEL). Handles panel, multivariate, and unequal-length series. Maintained successor to sktime. Alternatives: sktime (larger ecosystem), tslearn (fewer algorithms), catch22 (features only).

tools132

adaptyv-bio

API + Python SDK for ordering cell-free protein expression and binding assays. Submit sequences for expression (10–100 µg), measure binding affinity (KD) against targets, track status, and retrieve results programmatically — no wet-lab setup. Built for ML-guided directed evolution and antibody/nanobody optimization. Requires Adaptyv account and API key.

development132

adaptyv-bio

API + Python SDK for ordering cell-free protein expression and binding assays. Submit sequences for expression (10–100 µg), measure binding affinity (KD) against targets, track status, and retrieve results programmatically — no wet-lab setup. Built for ML-guided directed evolution and antibody/nanobody optimization. Requires Adaptyv account and API key.

development132

deepchem

Deep learning for drug discovery. 60+ models (GCN, GAT, AttentiveFP, MPNN, ChemBERTa, GROVER), 50+ featurizers, MoleculeNet benchmarks, HPO, transfer learning. Unified load-featurize-split-train-evaluate API. For fingerprints use rdkit-cheminformatics; for featurization-only use molfeat.

development132

ddinter-database

Query DDInter drug-drug interactions via REST API (1.7M+ interactions, 2,400+ drugs). Search by drug name/ID for severity (major/moderate/minor), mechanisms, and clinical recommendations. No auth. For FDA labeling use dailymed-database; for pharmacogenomics use clinpgx-database.

tools132

datamol-cheminformatics

Pythonic RDKit wrapper with sensible defaults for drug discovery. SMILES parsing, standardization, descriptors, fingerprints, similarity, clustering, diversity selection, scaffold analysis, BRICS/RECAP fragmentation, 3D conformers, and visualization. Returns native rdkit.Chem.Mol. Prefer datamol for standard workflows; use RDKit directly for advanced control.

development132

dask-parallel-computing

Parallel/distributed computing for larger-than-RAM data. Components: DataFrames (parallel pandas), Arrays (parallel NumPy), Bags, Futures, Schedulers. Scales laptop to HPC cluster. For single-machine speed use polars; for out-of-core without cluster use vaex.

data-ai132

dask-parallel-computing

Parallel/distributed computing for larger-than-RAM data. Components: DataFrames (parallel pandas), Arrays (parallel NumPy), Bags, Futures, Schedulers. Scales laptop to HPC cluster. For single-machine speed use polars; for out-of-core without cluster use vaex.

data-ai132

dailymed-database

Query FDA drug labels (DailyMed) via REST API. Search structured product labels (SPLs) by name, NDC, set ID, or RxCUI; get indications, dosage, warnings, adverse reactions, packaging. No auth. For adverse events use fda-database; for DDIs use ddinter-database.

development132

dailymed-database

Query FDA drug labels (DailyMed) via REST API. Search structured product labels (SPLs) by name, NDC, set ID, or RxCUI; get indications, dosage, warnings, adverse reactions, packaging. No auth. For adverse events use fda-database; for DDIs use ddinter-database.

development132

cosmic-database

Query COSMIC for cancer somatic mutations, gene census, mutational signatures, drug resistance variants. REST API v3.1 supports gene/sample/variant queries; free registration. For germline use clinvar-database; for drug-target data use opentargets-database or chembl-database-bioactivity.

tools132

salmon-rna-quantification

Ultra-fast RNA-seq transcript/gene quantification via quasi-mapping (no BAM). Builds a k-mer index from transcriptome FASTA, quantifies in minutes. Outputs TPM/count tables (quant.sf) with optional GC- and sequence-bias correction. Integrates with tximeta/tximport for DESeq2/edgeR. Use STAR when a genome-aligned BAM is needed.

development132

gseapy-gene-enrichment

GSEA and over-representation analysis (ORA) for RNA-seq and proteomics. Wraps Enrichr for ORA against MSigDB, KEGG, GO, and 200+ databases; runs preranked GSEA on ranked DE gene lists. Outputs enrichment tables and running-score plots. Use after DESeq2 or edgeR for pathway-level interpretation.

development132

prokka-genome-annotation

Annotate prokaryotic genomes (bacteria, archaea, viruses) via Prokka's BLAST/HMM pipeline. Identifies CDS, rRNA, tRNA, tmRNA, signal peptides against Pfam, TIGRFAMs, RefSeq. Outputs GFF3, GenBank, FASTA, TSV. Use PGAP for NCBI GenBank submission; Bakta for faster NCBI-compatible annotation.

devops132

fastp-fastq-preprocessing

All-in-one FASTQ QC and adapter trimming. Auto-detects Illumina adapters, filters low-quality reads, corrects paired-end overlaps, emits HTML+JSON QC in one pass. 3-10x faster than Trim Galore/Trimmomatic. First step before STAR, BWA-MEM2, or Salmon.

development132

plink2-gwas-analysis

GWAS and population genetics tool. Processes PLINK (.bed/.bim/.fam), VCF, and BGEN; runs QC (MAF, HWE, missingness), IBD estimation, PCA, and linear/logistic regression GWAS. Outputs Manhattan-ready summary stats. Use regenie or SAIGE for biobanks (>100k samples) needing mixed models.

tools132

sympy-symbolic-math

Symbolic math in Python: exact algebra, calculus (derivatives, integrals, limits), equation solving, symbolic matrices, ODEs, code gen (lambdify, C/Fortran). Use for exact symbolic results. For numerical use numpy/scipy; for stats use statsmodels.

development132

fastp-fastq-preprocessing

All-in-one FASTQ QC and adapter trimming. Auto-detects Illumina adapters, filters low-quality reads, corrects paired-end overlaps, emits HTML+JSON QC in one pass. 3-10x faster than Trim Galore/Trimmomatic. First step before STAR, BWA-MEM2, or Salmon.

development132

jaspar-database

JASPAR 2024 TF binding profiles via REST API and pyJASPAR. Retrieve PFMs/PWMs by TF name, JASPAR ID, species, or structural class. Scan DNA for TFBS; browse by taxon (human, mouse) or TF family (bHLH, zinc finger). Use for motif enrichment input, TFBS scanning, and regulatory sequence analysis. For ChIP-seq peak motif discovery use homer-motif-analysis; for regulatory variant scoring use regulomedb-database.

development132

hypothesis-generation

Structured hypothesis formulation: turn observations into testable hypotheses with predictions, propose mechanisms, design experiments. Follows the scientific method. Use scientific-brainstorming for open ideation; hypogenic for automated LLM hypothesis testing on datasets.

testing132

pubmed-database

Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.

development132

jaspar-database

JASPAR 2024 TF binding profiles via REST API and pyJASPAR. Retrieve PFMs/PWMs by TF name, JASPAR ID, species, or structural class. Scan DNA for TFBS; browse by taxon (human, mouse) or TF family (bHLH, zinc finger). Use for motif enrichment input, TFBS scanning, and regulatory sequence analysis. For ChIP-seq peak motif discovery use homer-motif-analysis; for regulatory variant scoring use regulomedb-database.

development132

prokka-genome-annotation

Annotate prokaryotic genomes (bacteria, archaea, viruses) via Prokka's BLAST/HMM pipeline. Identifies CDS, rRNA, tRNA, tmRNA, signal peptides against Pfam, TIGRFAMs, RefSeq. Outputs GFF3, GenBank, FASTA, TSV. Use PGAP for NCBI GenBank submission; Bakta for faster NCBI-compatible annotation.

devops132

pubmed-database

Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.

development132

geo-database

NCBI GEO access via GEOparse and E-utilities. Search by keyword/organism/platform, download GSE series matrices, parse GPL annotations, extract GSM metadata, load expression matrices into pandas. For single-cell use cellxgene-census; for multi-DB access use gget-genomic-databases.

data-ai132

geo-database

NCBI GEO access via GEOparse and E-utilities. Search by keyword/organism/platform, download GSE series matrices, parse GPL annotations, extract GSM metadata, load expression matrices into pandas. For single-cell use cellxgene-census; for multi-DB access use gget-genomic-databases.

data-ai132

opentrons-protocol-api

Python API v2 for Opentrons OT-2/Flex liquid handlers: protocols as Python files with metadata and run(); control pipettes, labware, and modules (thermocycler, heater-shaker, magnetic, temperature). Simulate via opentrons_simulate then upload. Use PyLabRobot for vendor-agnostic scripts (Hamilton, Tecan).

development132

opencv-bioimage-analysis

Computer vision for bio-image preprocessing, feature detection, real-time microscopy. Color conversion, morphology, contour/blob detection, template matching, optical flow on fluorescence/brightfield. 10-100× faster than pure Python via C++. Use scikit-image for scientific morphometry/regionprops; OpenCV for real-time, video, classical feature extraction.

development132

cellchat-cell-communication

Infer and visualize intercellular communication from scRNA-seq with CellChat (R). Build CellChat from Seurat/counts → subset CellChatDB ligand-receptor pairs → over-expressed genes per group → communication probabilities → pathway signaling → network centrality (senders/receivers/influencers) → chord/heatmap/bubble plots → cross-condition compare. Human, mouse. Use liana for pure-Python.

development132

transformers-bio-nlp

HuggingFace Transformers with biomedical LMs (BioBERT, PubMedBERT, BioGPT, BioMedLM) for scientific NLP: NER (genes, diseases, chemicals), relation extraction, QA, text classification, abstract summarization. Covers loading, biomedical tokenization, inference pipelines, fine-tuning. Alternatives: spaCy en_core_sci_lg (rule-based NER), Stanza (biomedical models), NLTK.

testing132

simpy-discrete-event-simulation

Process-based discrete-event simulation. Model queues, shared resources, timed events: manufacturing, service ops, network traffic, logistics. Processes are Python generators yielding events. Resources: capacity-limited (Resource/Priority/Preemptive), bulk (Container), objects (Store, FilterStore). For continuous use SciPy ODEs; for agent-based use Mesa.

development132

maxquant-proteomics

MaxQuant + Perseus proteomics pipeline: run MaxQuant for LFQ and SILAC; parse proteinGroups.txt in Python; filter contaminants/decoys; log2 + median-normalize; impute MNAR; t-test with FDR; volcano plot; GO/pathway enrichment. Use Proteome Discoverer for Thermo-native processing; FragPipe/MSFragger for GPU-accelerated DB search.

development132

shap-model-explainability

Model interpretability via SHAP (Shapley values from game theory). Covers explainer choice (Tree, Deep, Linear, Kernel, Gradient, Permutation), feature attribution, and plots (waterfall, beeswarm, bar, scatter, force, heatmap). Use to explain ML predictions, rank features, debug models, audit fairness, or compare models. Works with tree, deep, linear, and black-box models.

development132

pathml

Computational pathology toolkit for whole-slide images (WSIs): load slides, extract tiles, stain normalization, nuclear segmentation, feature extraction, and ML training. Supports H&E and multiplex. For end-to-end pipelines from raw WSIs to quantitative outputs.

tools132

busco-status-interpretation

Guide to interpreting BUSCO completeness statuses: why Duplicated BUSCOs count as complete, parsing output files, computing/comparing completeness across proteomes/genomes, common counting mistakes. Use when running BUSCO QC, comparing assemblies, or reporting completeness. See also: prokka-genome-annotation for annotation workflows feeding BUSCO.

documentation132

deeptools-ngs-analysis

NGS CLI for ChIP/RNA/ATAC-seq. BAM→bigWig with RPGC/CPM/RPKM, sample correlation/PCA, heatmaps/profiles around features, fingerprints. For alignment use STAR/BWA; for peak calling use MACS2.

tools132

pytdc-therapeutics-data-commons

Therapeutics Data Commons (TDC) AI-ready drug discovery datasets. Curated ADME, toxicity, DTI, DDI with scaffold/cold splits, standardized metrics, molecular oracles, and ADMET benchmarks for therapeutic ML and property prediction. For chemical database queries use chembl-database-bioactivity; for featurization use molfeat.

testing132

muon-multiomics-singlecell

Multi-modal single-cell analysis with muon/MuData. Joint RNA+ATAC (10x Multiome), CITE-seq (RNA+protein), other multi-omics. MuData holds per-modality AnnData with shared obs. WNN joint embedding, per-modality preprocessing, MOFA factor analysis. Use scanpy-scrna-seq for single-modality RNA; use muon when combining 2+ omics from the same cells.

testing132

vcf-variant-filtering

Guide to quality filtering raw VCF files before computing summary stats (Ts/Tv ratio, variant counts, AF distributions). Covers detecting raw VCFs via FILTER column and QUAL inspection, QUAL-based filtering with bcftools, Ts/Tv interpretation, and when NOT to filter. Read before any variant-level QC task. See bcftools-variant-manipulation for advanced filters, gatk-variant-calling for caller config, samtools-bam-processing for upstream alignment QC.

tools132

featurecounts-rna-counting

Counts RNA-seq reads overlapping GTF gene features. Takes sorted STAR BAMs plus GTF; outputs a per-gene tab-delimited matrix across samples. Handles strandedness (0/1/2), paired-end, multi-sample batch counting in one command, and outputs assignment statistics. Use Salmon for alignment-free quantification; use featureCounts when STAR BAMs already exist.

data-ai132

muon-multiomics-singlecell

Multi-modal single-cell analysis with muon/MuData. Joint RNA+ATAC (10x Multiome), CITE-seq (RNA+protein), other multi-omics. MuData holds per-modality AnnData with shared obs. WNN joint embedding, per-modality preprocessing, MOFA factor analysis. Use scanpy-scrna-seq for single-modality RNA; use muon when combining 2+ omics from the same cells.

testing132

neurokit2

Python toolkit for neurophysiological signal processing: ECG (HR, HRV, R-peaks), EEG (complexity, PSD), EMG (activation onset), EDA/GSR (SCR decomposition), PPG, and RSP. Includes synthetic signal simulation. Alternatives: BioSPPy (less maintained), MNE (EEG/MEG specialist), heartpy (ECG only), scipy.signal (raw DSP).

tools132

nature-figure-guide

Nature figure preparation: resolution (300+ DPI), formats (AI/EPS/TIFF), RGB color, Helvetica/Arial fonts, lowercase panel labels, image integrity requirements.

development132

datamol-cheminformatics

Pythonic RDKit wrapper with sensible defaults for drug discovery. SMILES parsing, standardization, descriptors, fingerprints, similarity, clustering, diversity selection, scaffold analysis, BRICS/RECAP fragmentation, 3D conformers, and visualization. Returns native rdkit.Chem.Mol. Prefer datamol for standard workflows; use RDKit directly for advanced control.

development132

plink2-gwas-analysis

GWAS and population genetics tool. Processes PLINK (.bed/.bim/.fam), VCF, and BGEN; runs QC (MAF, HWE, missingness), IBD estimation, PCA, and linear/logistic regression GWAS. Outputs Manhattan-ready summary stats. Use regenie or SAIGE for biobanks (>100k samples) needing mixed models.

tools132

featurecounts-rna-counting

Counts RNA-seq reads overlapping GTF gene features. Takes sorted STAR BAMs plus GTF; outputs a per-gene tab-delimited matrix across samples. Handles strandedness (0/1/2), paired-end, multi-sample batch counting in one command, and outputs assignment statistics. Use Salmon for alignment-free quantification; use featureCounts when STAR BAMs already exist.

data-ai132

simpleitk-image-registration

Register, segment, filter, resample 3D medical images (MRI, CT, microscopy) via SimpleITK Python; DICOM, NIfTI, multi-modal. Rigid/affine/deformable registration, threshold/region-growing segmentation, Gaussian/morph filtering, label stats, format conversion. Use to align volumes across timepoints/modalities, segment fluorescence, or convert DICOM→NIfTI.

development132

snakemake-workflow-engine

Python-based workflow manager for reproducible, scalable pipelines. Define rules with file-based dependencies; Snakemake resolves execution order and parallelism. Runs local, SLURM, LSF, AWS, GCP via profiles; per-rule conda/Singularity envs. For NGS pipelines, ML training, and multi-step file processing. Use Nextflow for Groovy dataflow or nf-core integration.

development132

pyimagej-fiji-bridge

Python bridge to ImageJ2/Fiji for macros, plugins (Bio-Formats, TrackMate, Analyze Particles), NumPy↔ImagePlus/ImgLib2 exchange, and ImageJ Ops. Automates Fiji headlessly from Python. Use scikit-image for pure Python without Fiji plugins; napari for visualization.

tools132

gatk-variant-calling

GATK Best Practices for germline SNP/indel calling from WGS/WES BAMs. Per-sample HaplotypeCaller GVCFs, GenomicsDBImport, GenotypeGVCFs joint calling, VQSR or hard filters. Requires BWA-MEM2-aligned, markdup, BQSR BAMs. Use DeepVariant for a faster DL alternative; GATK is the NIH/ENCODE standard.

development132

opentrons-integration

Opentrons Protocol API v2 for OT-2/Flex: Python protocols for pipetting, serial dilutions, PCR, plate replication; control thermocycler, heater-shaker, magnetic, temperature modules. Use pylabrobot for multi-vendor.

development132

exploratory-data-analysis

Methodology for exploratory data analysis on scientific files. Decision frameworks by data type (tabular, sequence, image, spectral, structural, omics), quality assessment, report generation, format detection across 200+ formats. Use when given a data file for initial exploration or to pick an analysis before a pipeline.

development132

pymoo

Python framework for single- and multi-objective optimization with evolutionary algorithms. Define vectorized objectives and constraints; solve with NSGA-II, NSGA-III, MOEA/D, GAs, or differential evolution. Analyze Pareto fronts, visualize trade-offs, customize operators and callbacks. For engineering design, hyperparameter search, and conflicting objectives. Alternatives: scipy.optimize (single-objective, gradient), platypus, jMetalPy (Java).

development132

hypogenic-hypothesis-generation

LLM-driven hypothesis generation/testing on tabular data. Three methods: HypoGeniC (data-driven), HypoRefine (literature+data), Union. Iterative refinement, Redis caching, multi-hypothesis inference. Manual: hypothesis-generation; ideation: scientific-brainstorming.

testing132

simpy-discrete-event-simulation

Process-based discrete-event simulation. Model queues, shared resources, timed events: manufacturing, service ops, network traffic, logistics. Processes are Python generators yielding events. Resources: capacity-limited (Resource/Priority/Preemptive), bulk (Container), objects (Store, FilterStore). For continuous use SciPy ODEs; for agent-based use Mesa.

development132

cell-figure-guide

Cell (Cell Press) figure preparation: resolution (300-1000 DPI), formats (TIFF/PDF), RGB color, Avenir/Arial fonts, uppercase panel labels, strict image manipulation policies.

development132

scientific-schematics

Designing scientific schematics, diagrams, and graphical abstracts. Covers tool selection (BioRender, Inkscape, Affinity, PowerPoint), design principles for pathway diagrams, mechanism schematics, experimental workflows, and journal graphical abstracts. Includes composition, icon sourcing, color for biological entities, and accessibility. Use when creating illustrative (not data-driven) scientific figures.

tools132

Adoption

deseq2-differential-expression

deseq2-differential-expression

busco-status-interpretation

kegg-database

mouse-phenome-database

cellxgene-census

uniprot-protein-database

cellxgene-census

cnvkit-copy-number

scanpy-scrna-seq

popv-cell-annotation

harmony-batch-correction

anndata-data-structure

pydeseq2-differential-expression

featurecounts-rna-counting

busco-status-interpretation

deeptools-ngs-analysis

bedtools-genomic-intervals

ucsc-genome-browser

remap-database

regulomedb-database

quickgo-database

jaspar-database

gget-genomic-databases

ensembl-database

geo-database

bioservices-multi-database

clinpgx-database

cbioportal-database

gatk-variant-calling

archs4-database

roary-pangenome

bakta-genome-annotation

pysam-genomic-files

cbioportal-database

molfeat-molecular-featurization

bioservices-multi-database

esm-protein-language-model

zarr-python

snpeff-variant-annotation

cnvkit-copy-number

popv-cell-annotation

harmony-batch-correction

anndata-data-structure

pydeseq2-differential-expression

scanpy-scrna-seq

deeptools-ngs-analysis

bedtools-genomic-intervals

ucsc-genome-browser

remap-database

regulomedb-database

quickgo-database

gget-genomic-databases

ensembl-database

roary-pangenome

archs4-database

bakta-genome-annotation

pysam-genomic-files

lamindb-data-management

pubchem-compound-search

smina-molecular-docking

vcf-variant-filtering

vcf-variant-filtering

single-cell-annotation-guide

snpeff-variant-annotation

single-cell-annotation-guide

smina-molecular-docking

monarch-database

geniml

geo-database

dbsnp-database

fastp-fastq-preprocessing

monarch-database

ena-database

gwas-database

bcftools-variant-manipulation

gseapy-gene-enrichment

samtools-bam-processing