scientific-skills/scvi-tools/SKILL.md
Deep generative models for single-cell omics. Use when you need probabilistic batch correction (scVI), transfer learning, differential expression with uncertainty, or multi-modal integration (TOTALVI, MultiVI). Best for advanced modeling, batch effects, multimodal data. For standard analysis pipelines use scanpy.
npx skillsauth add K-Dense-AI/claude-scientific-skills scvi-toolsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities.
Use this skill when:
scvi-tools provides models organized by data modality:
Core models for expression analysis, batch correction, and integration. See references/models-scrna-seq.md for:
Models for analyzing single-cell chromatin data. See references/models-atac-seq.md for:
Joint analysis of multiple data types. See references/models-multimodal.md for:
Spatially-resolved transcriptomics analysis. See references/models-spatial.md for:
Additional specialized analysis tools. See references/models-specialized.md for:
All scvi-tools models follow a consistent API pattern:
# 1. Load and preprocess data (AnnData format)
import scvi
import scanpy as sc
adata = scvi.data.heart_cell_atlas_subsampled()
sc.pp.filter_genes(adata, min_counts=3)
sc.pp.highly_variable_genes(adata, n_top_genes=1200)
# 2. Register data with model (specify layers, covariates)
scvi.model.SCVI.setup_anndata(
adata,
layer="counts", # Use raw counts, not log-normalized
batch_key="batch",
categorical_covariate_keys=["donor"],
continuous_covariate_keys=["percent_mito"]
)
# 3. Create and train model
model = scvi.model.SCVI(adata)
model.train()
# 4. Extract latent representations and normalized values
latent = model.get_latent_representation()
normalized = model.get_normalized_expression(library_size=1e4)
# 5. Store in AnnData for downstream analysis
adata.obsm["X_scVI"] = latent
adata.layers["scvi_normalized"] = normalized
# 6. Downstream analysis with scanpy
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)
sc.tl.leiden(adata)
Key Design Principles:
Probabilistic DE analysis using the learned generative models:
de_results = model.differential_expression(
groupby="cell_type",
group1="TypeA",
group2="TypeB",
mode="change", # Use composite hypothesis testing
delta=0.25 # Minimum effect size threshold
)
See references/differential-expression.md for detailed methodology and interpretation.
Save and load trained models:
# Save model
model.save("./model_directory", overwrite=True)
# Load model
model = scvi.model.SCVI.load("./model_directory", adata=adata)
Integrate datasets across batches or studies:
# Register batch information
scvi.model.SCVI.setup_anndata(adata, batch_key="study")
# Model automatically learns batch-corrected representations
model = scvi.model.SCVI(adata)
model.train()
latent = model.get_latent_representation() # Batch-corrected
scvi-tools is built on:
See references/theoretical-foundations.md for detailed background on the mathematical framework.
references/workflows.md contains common workflows, best practices, hyperparameter tuning, and GPU optimizationreferences/ directoryuv pip install scvi-tools
# For GPU support
uv pip install scvi-tools[cuda]
min_counts=3)setup_anndataaccelerator="gpu")development
Spectral similarity and compound identification for metabolomics. Use for comparing mass spectra, computing similarity scores (cosine, modified cosine), and identifying unknown compounds from spectral libraries. Best for metabolite identification, spectral matching, library searching. For full LC-MS/MS proteomics pipelines use pyopenms.
development
Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more.
development
Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.
testing
Comprehensive markdown and Mermaid diagram writing skill. Use when creating any scientific document, report, analysis, or visualization. Establishes text-based diagrams as the default documentation standard with full style guides (markdown + mermaid), 24 diagram type references, and 9 document templates.