skills_all/claude-scientific-skills/scientific-skills/lamindb/SKILL.md
This skill should be used when working with LaminDB, an open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR. Use when managing biological datasets (scRNA-seq, spatial, flow cytometry, etc.), tracking computational workflows, curating and validating data with biological ontologies, building data lakehouses, or ensuring data lineage and reproducibility in biological research. Covers data management, annotation, ontologies (genes, cell types, diseases, tissues), schema validation, integrations with workflow managers (Nextflow, Snakemake) and MLOps platforms (W&B, MLflow), and deployment strategies.
npx skillsauth add activer007/ordinary-claude-skills lamindbInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
LaminDB is an open-source data framework for biology designed to make data queryable, traceable, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable). It provides a unified platform that combines lakehouse architecture, lineage tracking, feature stores, biological ontologies, LIMS (Laboratory Information Management System), and ELN (Electronic Lab Notebook) capabilities through a single Python API.
Core Value Proposition:
Use this skill when:
LaminDB provides six interconnected capability areas, each documented in detail in the references folder.
Core entities:
Key workflows:
ln.track() and ln.finish()artifact.view_lineage()Reference: references/core-concepts.md - Read this for detailed information on artifacts, records, runs, transforms, features, versioning, and lineage tracking.
Query capabilities:
get(), one(), one_or_none()__gt, __lte, __contains, __startswith)Key workflows:
Reference: references/data-management.md - Read this for comprehensive query patterns, filtering examples, streaming strategies, and data organization best practices.
Curation process:
Schema types:
Supported data types:
Key workflows:
DataFrameCurator or AnnDataCurator for validation.cat.standardize().cat.add_ontology()Reference: references/annotation-validation.md - Read this for detailed curation workflows, schema design patterns, handling validation errors, and best practices.
Available ontologies (via Bionty):
Key workflows:
bt.CellType.import_source()Reference: references/ontologies.md - Read this for comprehensive ontology operations, standardization strategies, hierarchy navigation, and annotation workflows.
Workflow managers:
MLOps platforms:
Storage systems:
Array stores:
Visualization:
Version control:
Reference: references/integrations.md - Read this for integration patterns, code examples, and troubleshooting for third-party systems.
Installation:
uv pip install lamindbuv pip install 'lamindb[gcp,zarr,fcs]'Instance types:
Storage options:
Configuration:
Deployment patterns:
Reference: references/setup-deployment.md - Read this for detailed installation, configuration, storage setup, database management, security best practices, and troubleshooting.
import lamindb as ln
import bionty as bt
import anndata as ad
# Start tracking
ln.track(params={"analysis": "scRNA-seq QC and annotation"})
# Import cell type ontology
bt.CellType.import_source()
# Load data
adata = ad.read_h5ad("raw_counts.h5ad")
# Validate and standardize cell types
adata.obs["cell_type"] = bt.CellType.standardize(adata.obs["cell_type"])
# Curate with schema
curator = ln.curators.AnnDataCurator(adata, schema)
curator.validate()
artifact = curator.save_artifact(key="scrna/validated.h5ad")
# Link ontology annotations
cell_types = bt.CellType.from_values(adata.obs.cell_type)
artifact.feature_sets.add_ontology(cell_types)
ln.finish()
import lamindb as ln
# Register multiple experiments
for i, file in enumerate(data_files):
artifact = ln.Artifact.from_anndata(
ad.read_h5ad(file),
key=f"scrna/batch_{i}.h5ad",
description=f"scRNA-seq batch {i}"
).save()
# Annotate with features
artifact.features.add_values({
"batch": i,
"tissue": tissues[i],
"condition": conditions[i]
})
# Query across all experiments
immune_datasets = ln.Artifact.filter(
key__startswith="scrna/",
tissue="PBMC",
condition="treated"
).to_dataframe()
# Load specific datasets
for artifact in immune_datasets:
adata = artifact.load()
# Analyze
import lamindb as ln
import wandb
# Initialize both systems
wandb.init(project="drug-response", name="exp-42")
ln.track(params={"model": "random_forest", "n_estimators": 100})
# Load training data from LaminDB
train_artifact = ln.Artifact.get(key="datasets/train.parquet")
train_data = train_artifact.load()
# Train model
model = train_model(train_data)
# Log to W&B
wandb.log({"accuracy": 0.95})
# Save model in LaminDB with W&B linkage
import joblib
joblib.dump(model, "model.pkl")
model_artifact = ln.Artifact("model.pkl", key="models/exp-42.pkl").save()
model_artifact.features.add_values({"wandb_run_id": wandb.run.id})
ln.finish()
wandb.finish()
# In Nextflow process script
import lamindb as ln
ln.track()
# Load input artifact
input_artifact = ln.Artifact.get(key="raw/batch_${batch_id}.fastq.gz")
input_path = input_artifact.cache()
# Process (alignment, quantification, etc.)
# ... Nextflow process logic ...
# Save output
output_artifact = ln.Artifact(
"counts.csv",
key="processed/batch_${batch_id}_counts.csv"
).save()
ln.finish()
To start using LaminDB effectively:
Installation & Setup (references/setup-deployment.md)
lamin loginlamin init --storage ...Learn Core Concepts (references/core-concepts.md)
ln.track() and ln.finish() in workflowsMaster Querying (references/data-management.md)
Set Up Validation (references/annotation-validation.md)
Integrate Ontologies (references/ontologies.md)
Connect Tools (references/integrations.md)
Follow these principles when working with LaminDB:
Track everything: Use ln.track() at the start of every analysis for automatic lineage capture
Validate early: Define schemas and validate data before extensive analysis
Use ontologies: Leverage public biological ontologies for standardized annotations
Organize with keys: Structure artifact keys hierarchically (e.g., project/experiment/batch/file.h5ad)
Query metadata first: Filter and search before loading large files
Version, don't duplicate: Use built-in versioning instead of creating new keys for modifications
Annotate with features: Define typed features for queryable metadata
Document thoroughly: Add descriptions to artifacts, schemas, and transforms
Leverage lineage: Use view_lineage() to understand data provenance
Start local, scale cloud: Develop locally with SQLite, deploy to cloud with PostgreSQL
This skill includes comprehensive reference documentation organized by capability:
references/core-concepts.md - Artifacts, records, runs, transforms, features, versioning, lineagereferences/data-management.md - Querying, filtering, searching, streaming, organizing datareferences/annotation-validation.md - Schema design, curation workflows, validation strategiesreferences/ontologies.md - Biological ontology management, standardization, hierarchiesreferences/integrations.md - Workflow managers, MLOps platforms, storage systems, toolsreferences/setup-deployment.md - Installation, configuration, deployment, troubleshootingRead the relevant reference file(s) based on the specific LaminDB capability needed for the task at hand.
tools
Generate typed TypeScript SDKs for AI agents to interact with MCP servers. Converts verbose JSON-RPC curl commands to clean function calls (docs.createDocument() vs curl). Auto-detects MCP tools from server modules, generates TypeScript types and client methods, creates runnable example scripts. Use when: building MCP-enabled applications, need typed programmatic access to MCP tools, want Claude Code to manage apps via scripts, eliminating manual JSON-RPC curl commands, validating MCP inputs/outputs, or creating reusable agent automation.
testing
Generate structured task lists from specs or requirements. IMPORTANT: After completing ANY spec via ExitSpecMode, ALWAYS ask the user: "Would you like me to generate a task list for this spec?" Use when user confirms or explicitly requests task generation from a plan/spec/PRD.
tools
Create compelling story-format summaries using UltraThink to find the best narrative framing. Support multiple formats - 3-part narrative, n-length with inline links, abridged 5-line, or comprehensive via Foundry MCP. USE WHEN user says 'create story explanation', 'narrative summary', 'explain as a story', or wants content in Daniel's conversational first-person voice.
testing
Navigate through the original three-world shamanic technology. Deploy when soul retrieval, power animal guidance, or journey between realms emerges. Deeply respectful of Tungus, Buryat, Yakut, Evenki traditions. Use for consciousness navigation, NOT cultural appropriation.