skills/lamindb/SKILL.md
Use when working with LaminDB, the open-source lineage-native lakehouse for biological datasets and models. Covers setup, artifact registration, query/search, lineage tracking, validation, ontology-backed annotation with Bionty, collections, branches, storage, and workflow integrations.
npx skillsauth add K-Dense-AI/claude-scientific-skills lamindbInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
LaminDB is an open-source, lineage-native lakehouse for biology. It makes datasets and models queryable, traceable, validated, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable) while storing data in open formats across local filesystems, S3, GCS, Hugging Face, SQLite, and Postgres.
Core Value Proposition:
Use this skill when:
LaminDB provides six interconnected capability areas, each documented in detail in the references folder.
Core entities:
Key workflows:
ln.track() and ln.finish()@ln.flow() and @ln.step()artifact.view_lineage()Reference: references/core-concepts.md - Read this for detailed information on artifacts, records, runs, transforms, features, versioning, and lineage tracking.
Query capabilities:
get(), one(), one_or_none()__gt, __lte, __contains, __startswith)Feature objectsln.Q objects (AND, OR, NOT)Key workflows:
Reference: references/data-management.md - Read this for comprehensive query patterns, filtering examples, streaming strategies, and data organization best practices.
Curation process:
Schema types:
Supported data types:
Key workflows:
DataFrameCurator, AnnDataCurator, SpatialDataCurator, or TiledbsomaExperimentCurator for validation.cat.standardize().cat.add_ontology()Reference: references/annotation-validation.md - Read this for detailed curation workflows, schema design patterns, handling validation errors, and best practices.
Available ontologies (via Bionty):
Key workflows:
bt.CellType.import_source()Reference: references/ontologies.md - Read this for comprehensive ontology operations, standardization strategies, hierarchy navigation, and annotation workflows.
Workflow managers:
MLOps platforms:
Storage systems:
Array stores:
Visualization:
Version control:
Reference: references/integrations.md - Read this for integration patterns, code examples, and troubleshooting for third-party systems.
Installation:
lamindb==2.5.1 (released 2026-06-01; Python >=3.10, <=3.14)uv pip install 'lamindb==2.5.1'uv pip install 'lamindb[gcp,zarr-v2,fcs]==2.5.1'uv pip install 'lamindb-core==2.5.1'uv pip install 'bionty==2.4.0'Instance types:
Storage options:
Configuration:
Deployment patterns:
Reference: references/setup-deployment.md - Read this for detailed installation, configuration, storage setup, database management, security best practices, and troubleshooting.
When helping with LaminDB setup or integrations:
LAMIN_DB_URL, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and GOOGLE_APPLICATION_CREDENTIALS; only check whether a named variable is present, not its value.import lamindb as ln
import bionty as bt
import anndata as ad
# Start tracking a notebook/script run
ln.track(params={"analysis": "scRNA-seq QC and annotation"})
# Import cell type ontology
bt.CellType.import_source()
# Load data
adata = ad.read_h5ad("raw_counts.h5ad")
# Validate and standardize cell types
adata.obs["cell_type"] = bt.CellType.standardize(adata.obs["cell_type"])
# Curate with schema
curator = ln.curators.AnnDataCurator(adata, schema)
curator.validate()
artifact = curator.save_artifact(key="scrna/validated.h5ad")
# Link ontology-backed annotations for queryability
cell_types = bt.CellType.from_values(adata.obs["cell_type"])
artifact.cell_types.add(*cell_types)
ln.finish()
import lamindb as ln
# Register multiple experiments
for i, file in enumerate(data_files):
artifact = ln.Artifact.from_anndata(
ad.read_h5ad(file),
key=f"scrna/batch_{i}.h5ad",
description=f"scRNA-seq batch {i}"
).save()
# Annotate with features
artifact.features.set_values({
"batch": i,
"tissue": tissues[i],
"condition": conditions[i]
})
# Query across all experiments by annotated features
immune_datasets = ln.Artifact.filter(
key__startswith="scrna/",
tissue="PBMC",
condition="treated"
).to_dataframe()
# Load specific datasets
for artifact in immune_datasets:
adata = artifact.load()
# Analyze
import lamindb as ln
import wandb
# Initialize both systems
wandb.init(project="drug-response", name="exp-42")
ln.track(params={"model": "random_forest", "n_estimators": 100})
# Load training data from LaminDB
train_artifact = ln.Artifact.get(key="datasets/train.parquet")
train_data = train_artifact.load()
# Train model
model = train_model(train_data)
# Log to W&B
wandb.log({"accuracy": 0.95})
# Save model in LaminDB with W&B linkage
import joblib
joblib.dump(model, "model.pkl")
model_artifact = ln.Artifact("model.pkl", key="models/exp-42.pkl").save()
model_artifact.features.set_values({"wandb_run_id": wandb.run.id})
ln.finish()
wandb.finish()
# In Nextflow process script
import lamindb as ln
ln.track()
# Load input artifact
input_artifact = ln.Artifact.get(key="raw/batch_${batch_id}.fastq.gz")
input_path = input_artifact.cache()
# Process (alignment, quantification, etc.)
# ... Nextflow process logic ...
# Save output
output_artifact = ln.Artifact(
"counts.csv",
key="processed/batch_${batch_id}_counts.csv"
).save()
ln.finish()
For native Nextflow projects, prefer the nf-lamin plugin and current nextflow.config patterns when available; use inline Python tracking for small or custom pipeline steps.
To start using LaminDB effectively:
Installation & Setup (references/setup-deployment.md)
lamin loginlamin init --storage ...Learn Core Concepts (references/core-concepts.md)
ln.track()/ln.finish() or @ln.flow()/@ln.step() in workflowsMaster Querying (references/data-management.md)
Set Up Validation (references/annotation-validation.md)
Integrate Ontologies (references/ontologies.md)
Connect Tools (references/integrations.md)
Follow these principles when working with LaminDB:
Track everything: Use ln.track() at the start of every analysis for automatic lineage capture
Validate early: Define schemas and validate data before extensive analysis
Use ontologies: Leverage public biological ontologies for standardized annotations
Organize with keys: Structure artifact keys hierarchically (e.g., project/experiment/batch/file.h5ad)
Query metadata first: Filter and search before loading large files
Version, don't duplicate: Use built-in versioning instead of creating new keys for modifications
Annotate with features: Define typed features and use artifact.features.set_values() for queryable metadata
Document thoroughly: Add descriptions to artifacts, schemas, and transforms
Leverage lineage: Use view_lineage() to understand data provenance
Start local, scale cloud: Develop locally with SQLite, deploy to cloud with PostgreSQL
This skill includes comprehensive reference documentation organized by capability:
references/core-concepts.md - Artifacts, records, runs, transforms, features, versioning, lineagereferences/data-management.md - Querying, filtering, searching, streaming, organizing datareferences/annotation-validation.md - Schema design, curation workflows, validation strategiesreferences/ontologies.md - Biological ontology management, standardization, hierarchiesreferences/integrations.md - Workflow managers, MLOps platforms, storage systems, toolsreferences/setup-deployment.md - Installation, configuration, deployment, troubleshootingRead the relevant reference file(s) based on the specific LaminDB capability needed for the task at hand.
development
Spectral similarity and compound identification for metabolomics. Use for comparing mass spectra, computing similarity scores (cosine, modified cosine), and identifying unknown compounds from spectral libraries. Best for metabolite identification, spectral matching, library searching. For full LC-MS/MS proteomics pipelines use pyopenms.
development
Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more.
development
Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.
testing
Comprehensive markdown and Mermaid diagram writing skill. Use when creating any scientific document, report, analysis, or visualization. Establishes text-based diagrams as the default documentation standard with full style guides (markdown + mermaid), 24 diagram type references, and 9 document templates.