Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

harsh040506/ml-engineering

Name: ml-engineering
Author: harsh040506

engineering/ai-ml-engineering/skills/ml-engineering/SKILL.md

npx skillsauth add harsh040506/claude-code-unified-skill-plugin-library ml-engineering

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

ML Engineering

Production-grade guidance for training, optimizing, and deploying machine learning models.

The ML Development Lifecycle

Problem definition → Data collection → EDA → Feature engineering
→ Model selection → Training → Evaluation → Hyperparameter tuning
→ Error analysis → Deployment → Monitoring → Retraining

Never skip the problem definition or data stages. The most common ML failures are:

Solving the wrong problem
Bad data quality masquerading as a model quality problem
Data leakage causing inflated offline metrics
Ignoring production distribution shift

Data Best Practices

Train/Val/Test Splits

For time-series or temporal data: Split by time, not randomly.

Training: oldest data
Validation: middle period
Test: most recent data (held out completely until final evaluation)

Random splits on time-series cause data leakage — future information leaks into training.

For i.i.d. data:

70/10/20 or 80/10/10 split
Stratified split for classification to preserve class ratios
Group-aware split when multiple samples share an entity (same user, document, patient)

Preventing Data Leakage

Leakage = test-set information contaminating the training set. It makes models look better than they are.

Common leakage sources:

Applying scaling/normalization using statistics from the full dataset before splitting
Target-encoding using the target variable from test samples
Temporal overlap in time-series splits
Duplicated rows that appear in both train and test

# WRONG — leaks test statistics
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # Fits on entire dataset
X_train, X_test = train_test_split(X_scaled)

# CORRECT
X_train, X_test = train_test_split(X)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)   # Fit only on train
X_test_scaled = scaler.transform(X_test)          # Transform test with train statistics

Model Selection

Starting Point Heuristic

| Task | Start here | Then try | |------|-----------|---------| | Tabular classification | LightGBM / XGBoost | Neural network if still underperforming | | Tabular regression | LightGBM / XGBoost | Neural network | | Image classification | ResNet50 (fine-tune) | EfficientNet, ConvNeXt | | Text classification | DistilBERT / RoBERTa (fine-tune) | Larger model if needed | | Text generation | Fine-tune instruction-tuned LLM (Mistral 7B, Llama 3) | GPT-4 API if quality > cost | | Sequence-to-sequence | T5/mT5 fine-tune | LLM fine-tune | | Tabular anomaly detection | Isolation Forest | Autoencoder | | Embeddings | sentence-transformers/all-MiniLM-L6-v2 | Fine-tune on domain data |

Always establish a simple baseline first (logistic regression, rule-based heuristic, random). Beat the baseline before adding complexity.

PyTorch Training Loop (Production Template)

import torch
import torch.nn as nn
from torch.cuda.amp import GradScaler, autocast
import wandb
from pathlib import Path

def train(
    model: nn.Module,
    train_loader,
    val_loader,
    config: dict,
    output_dir: str = "./checkpoints",
):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)

    optimizer = torch.optim.AdamW(
        model.parameters(),
        lr=config["lr"],
        weight_decay=config.get("weight_decay", 0.01),
    )
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, T_max=config["epochs"]
    )
    scaler = GradScaler()  # Mixed precision training
    criterion = nn.CrossEntropyLoss()

    wandb.init(project=config["project"], config=config)
    
    best_val_loss = float("inf")
    patience_counter = 0
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)

    for epoch in range(config["epochs"]):
        # ── Training ──────────────────────────────
        model.train()
        train_loss = 0.0
        for batch_idx, (inputs, targets) in enumerate(train_loader):
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()

            with autocast():  # Mixed precision — 2x speedup on modern GPUs
                outputs = model(inputs)
                loss = criterion(outputs, targets)

            scaler.scale(loss).backward()
            scaler.unscale_(optimizer)
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)  # Gradient clipping
            scaler.step(optimizer)
            scaler.update()
            train_loss += loss.item()

        # ── Validation ────────────────────────────
        model.eval()
        val_loss = 0.0
        correct = 0
        total = 0
        with torch.no_grad():
            for inputs, targets in val_loader:
                inputs, targets = inputs.to(device), targets.to(device)
                with autocast():
                    outputs = model(inputs)
                    loss = criterion(outputs, targets)
                val_loss += loss.item()
                _, predicted = outputs.max(1)
                correct += predicted.eq(targets).sum().item()
                total += targets.size(0)

        train_loss /= len(train_loader)
        val_loss /= len(val_loader)
        val_acc = correct / total

        wandb.log({
            "epoch": epoch,
            "train/loss": train_loss,
            "val/loss": val_loss,
            "val/accuracy": val_acc,
            "lr": scheduler.get_last_lr()[0],
        })

        print(f"Epoch {epoch+1}/{config['epochs']} | "
              f"Train Loss: {train_loss:.4f} | "
              f"Val Loss: {val_loss:.4f} | "
              f"Val Acc: {val_acc:.4f}")

        # ── Checkpointing ─────────────────────────
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
            torch.save({
                "epoch": epoch,
                "model_state_dict": model.state_dict(),
                "optimizer_state_dict": optimizer.state_dict(),
                "val_loss": val_loss,
                "config": config,
            }, output_path / "best_model.pt")
            wandb.save(str(output_path / "best_model.pt"))
        else:
            patience_counter += 1
            if patience_counter >= config.get("patience", 10):
                print(f"Early stopping at epoch {epoch+1}")
                break

        scheduler.step()

    wandb.finish()
    return output_path / "best_model.pt"

Diagnosing Training Problems

Loss is NaN

Check for NaN in data — df.isnull().any(), np.isnan(X).any()
Learning rate too high — try 1/10th the current LR
Exploding gradients — add gradient clipping: clip_grad_norm_(model.parameters(), 1.0)
Log of zero — numerical instability in loss function; add epsilon: log(x + 1e-8)
Infinity in features — check for division by zero in preprocessing

Loss Not Decreasing

Learning rate too low — try 10× higher, or use learning rate finder
Learning rate too high — loss oscillates or diverges
Bug in data loading — confirm labels are correct with dataloader[0]
Weight initialization — use appropriate init (Xavier for tanh, Kaiming for ReLU)
Model capacity too low — try larger model

Overfitting

Gap between training and validation metrics is growing:

More data — always the best solution
Data augmentation — random crop, flip, color jitter for images; back-translation, synonym replacement for text
Regularization — increase dropout rate (0.1 → 0.3), L2 weight decay
Reduce model size — fewer layers or smaller hidden dim
Early stopping — stop when val loss stops improving

Training Too Slow

Enable mixed precision — GradScaler + autocast → 2–3× speedup
Increase batch size — more GPU utilization (check GPU utilization first: nvidia-smi)
DataLoader workers — set num_workers=4 (or num CPU cores)
Pin memory — DataLoader(pin_memory=True) for faster CPU→GPU transfer
Profile — use torch.profiler to identify the bottleneck

Hyperparameter Search

Use Optuna for efficient hyperparameter search. Prefer Tree-structured Parzen Estimators (TPE) over grid search — it finds good parameters 10× faster.

import optuna

def objective(trial):
    config = {
        "lr": trial.suggest_float("lr", 1e-5, 1e-2, log=True),
        "batch_size": trial.suggest_categorical("batch_size", [16, 32, 64, 128]),
        "dropout": trial.suggest_float("dropout", 0.1, 0.5),
        "hidden_dim": trial.suggest_categorical("hidden_dim", [128, 256, 512]),
        "epochs": 20,  # Short runs for search
        "patience": 5,
    }
    
    # Train with these params and return val metric
    val_loss = train_and_evaluate(config)
    return val_loss

study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=50, timeout=3600)  # 50 trials or 1 hour

print(f"Best params: {study.best_params}")
print(f"Best val loss: {study.best_value}")

Model Deployment Patterns

ONNX Export (Framework-Agnostic Inference)

import torch.onnx

model.eval()
dummy_input = torch.randn(1, input_size)
torch.onnx.export(
    model, dummy_input,
    "model.onnx",
    export_params=True,
    opset_version=17,
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}},
)

# Validate export
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
output = session.run(None, {"input": dummy_input.numpy()})

FastAPI Serving

from fastapi import FastAPI
from pydantic import BaseModel
import onnxruntime as ort
import numpy as np

app = FastAPI()
session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])

class PredictRequest(BaseModel):
    features: list[float]

class PredictResponse(BaseModel):
    prediction: int
    confidence: float

@app.post("/predict", response_model=PredictResponse)
def predict(request: PredictRequest):
    input_array = np.array([request.features], dtype=np.float32)
    logits = session.run(None, {"input": input_array})[0]
    probs = softmax(logits[0])
    return PredictResponse(
        prediction=int(probs.argmax()),
        confidence=float(probs.max()),
    )

Deeper Reference

For complete training recipes and hyperparameter tuning guides, see:

references/training-recipes.md — end-to-end PyTorch and HuggingFace training scripts with mixed-precision, gradient checkpointing, and distributed training
references/hyperparameter-guide.md — learning rate schedules, batch size scaling rules, regularization strategies, and Optuna/Ray Tune search configurations

harsh040506/ml-engineering

engineering/ai-ml-engineering/skills/ml-engineering/SKILL.md

This skill should be used when the user asks about "machine learning", "deep learning", "neural network", "train a model", "PyTorch", "TensorFlow", "JAX", "scikit-learn", "XGBoost", "LightGBM", "fine-tune", "transfer learning", "model architecture", "loss function", "optimizer", "learning rate", "batch size", "epoch", "overfitting", "underfitting", "regularization", "dropout", "batch normalization", "gradient descent", "backpropagation", "training loop", "validation", "hyperparameter tuning", "Optuna", "Ray Tune", "Weights & Biases", "MLflow", "model checkpoint", "early stopping", "mixed precision", "distributed training", "GPU training", "CUDA", "model serving", "TorchServe", "ONNX", or "model deployment". Also trigger for "my model isn't converging", "loss is NaN", "training is slow", "model is overfitting", or "how do I improve my model accuracy".

2 stars

testing

Updated Apr 5, 2026

$ install --global

skillsauth

npx skillsauth add harsh040506/claude-code-unified-skill-plugin-library ml-engineering

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 5, 2026, 5:10 PM4.3s3 files scanned

SKILL.md

name:: ml-engineering
description:: This skill should be used when the user asks about "machine learning", "deep learning", "neural network", "train a model", "PyTorch", "TensorFlow", "JAX", "scikit-learn", "XGBoost", "LightGBM", "fine-tune", "transfer learning", "model architecture", "loss function", "optimizer", "learning rate", "batch size", "epoch", "overfitting", "underfitting", "regularization", "dropout", "batch normalization", "gradient descent", "backpropagation", "training loop", "validation", "hyperparameter tuning", "Optuna", "Ray Tune", "Weights & Biases", "MLflow", "model checkpoint", "early stopping", "mixed precision", "distributed training", "GPU training", "CUDA", "model serving", "TorchServe", "ONNX", or "model deployment". Also trigger for "my model isn't converging", "loss is NaN", "training is slow", "model is overfitting", or "how do I improve my model accuracy".

ML Engineering

Production-grade guidance for training, optimizing, and deploying machine learning models.

The ML Development Lifecycle

Problem definition → Data collection → EDA → Feature engineering
→ Model selection → Training → Evaluation → Hyperparameter tuning
→ Error analysis → Deployment → Monitoring → Retraining

Never skip the problem definition or data stages. The most common ML failures are:

Solving the wrong problem
Bad data quality masquerading as a model quality problem
Data leakage causing inflated offline metrics
Ignoring production distribution shift

Data Best Practices

Train/Val/Test Splits

For time-series or temporal data: Split by time, not randomly.

Training: oldest data
Validation: middle period
Test: most recent data (held out completely until final evaluation)

Random splits on time-series cause data leakage — future information leaks into training.

For i.i.d. data:

70/10/20 or 80/10/10 split
Stratified split for classification to preserve class ratios
Group-aware split when multiple samples share an entity (same user, document, patient)

Preventing Data Leakage

Leakage = test-set information contaminating the training set. It makes models look better than they are.

Common leakage sources:

Applying scaling/normalization using statistics from the full dataset before splitting
Target-encoding using the target variable from test samples
Temporal overlap in time-series splits
Duplicated rows that appear in both train and test

# WRONG — leaks test statistics
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # Fits on entire dataset
X_train, X_test = train_test_split(X_scaled)

# CORRECT
X_train, X_test = train_test_split(X)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)   # Fit only on train
X_test_scaled = scaler.transform(X_test)          # Transform test with train statistics

Model Selection

Starting Point Heuristic

Always establish a simple baseline first (logistic regression, rule-based heuristic, random). Beat the baseline before adding complexity.

PyTorch Training Loop (Production Template)

import torch
import torch.nn as nn
from torch.cuda.amp import GradScaler, autocast
import wandb
from pathlib import Path

def train(
    model: nn.Module,
    train_loader,
    val_loader,
    config: dict,
    output_dir: str = "./checkpoints",
):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)

    optimizer = torch.optim.AdamW(
        model.parameters(),
        lr=config["lr"],
        weight_decay=config.get("weight_decay", 0.01),
    )
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, T_max=config["epochs"]
    )
    scaler = GradScaler()  # Mixed precision training
    criterion = nn.CrossEntropyLoss()

    wandb.init(project=config["project"], config=config)
    
    best_val_loss = float("inf")
    patience_counter = 0
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)

    for epoch in range(config["epochs"]):
        # ── Training ──────────────────────────────
        model.train()
        train_loss = 0.0
        for batch_idx, (inputs, targets) in enumerate(train_loader):
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()

            with autocast():  # Mixed precision — 2x speedup on modern GPUs
                outputs = model(inputs)
                loss = criterion(outputs, targets)

            scaler.scale(loss).backward()
            scaler.unscale_(optimizer)
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)  # Gradient clipping
            scaler.step(optimizer)
            scaler.update()
            train_loss += loss.item()

        # ── Validation ────────────────────────────
        model.eval()
        val_loss = 0.0
        correct = 0
        total = 0
        with torch.no_grad():
            for inputs, targets in val_loader:
                inputs, targets = inputs.to(device), targets.to(device)
                with autocast():
                    outputs = model(inputs)
                    loss = criterion(outputs, targets)
                val_loss += loss.item()
                _, predicted = outputs.max(1)
                correct += predicted.eq(targets).sum().item()
                total += targets.size(0)

        train_loss /= len(train_loader)
        val_loss /= len(val_loader)
        val_acc = correct / total

        wandb.log({
            "epoch": epoch,
            "train/loss": train_loss,
            "val/loss": val_loss,
            "val/accuracy": val_acc,
            "lr": scheduler.get_last_lr()[0],
        })

        print(f"Epoch {epoch+1}/{config['epochs']} | "
              f"Train Loss: {train_loss:.4f} | "
              f"Val Loss: {val_loss:.4f} | "
              f"Val Acc: {val_acc:.4f}")

        # ── Checkpointing ─────────────────────────
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
            torch.save({
                "epoch": epoch,
                "model_state_dict": model.state_dict(),
                "optimizer_state_dict": optimizer.state_dict(),
                "val_loss": val_loss,
                "config": config,
            }, output_path / "best_model.pt")
            wandb.save(str(output_path / "best_model.pt"))
        else:
            patience_counter += 1
            if patience_counter >= config.get("patience", 10):
                print(f"Early stopping at epoch {epoch+1}")
                break

        scheduler.step()

    wandb.finish()
    return output_path / "best_model.pt"

Diagnosing Training Problems

Loss is NaN

Check for NaN in data — df.isnull().any(), np.isnan(X).any()
Learning rate too high — try 1/10th the current LR
Exploding gradients — add gradient clipping: clip_grad_norm_(model.parameters(), 1.0)
Log of zero — numerical instability in loss function; add epsilon: log(x + 1e-8)
Infinity in features — check for division by zero in preprocessing

Loss Not Decreasing

Learning rate too low — try 10× higher, or use learning rate finder
Learning rate too high — loss oscillates or diverges
Bug in data loading — confirm labels are correct with dataloader[0]
Weight initialization — use appropriate init (Xavier for tanh, Kaiming for ReLU)
Model capacity too low — try larger model

Overfitting

Gap between training and validation metrics is growing:

More data — always the best solution
Data augmentation — random crop, flip, color jitter for images; back-translation, synonym replacement for text
Regularization — increase dropout rate (0.1 → 0.3), L2 weight decay
Reduce model size — fewer layers or smaller hidden dim
Early stopping — stop when val loss stops improving

Training Too Slow

Enable mixed precision — GradScaler + autocast → 2–3× speedup
Increase batch size — more GPU utilization (check GPU utilization first: nvidia-smi)
DataLoader workers — set num_workers=4 (or num CPU cores)
Pin memory — DataLoader(pin_memory=True) for faster CPU→GPU transfer
Profile — use torch.profiler to identify the bottleneck

Hyperparameter Search

Use Optuna for efficient hyperparameter search. Prefer Tree-structured Parzen Estimators (TPE) over grid search — it finds good parameters 10× faster.

import optuna

def objective(trial):
    config = {
        "lr": trial.suggest_float("lr", 1e-5, 1e-2, log=True),
        "batch_size": trial.suggest_categorical("batch_size", [16, 32, 64, 128]),
        "dropout": trial.suggest_float("dropout", 0.1, 0.5),
        "hidden_dim": trial.suggest_categorical("hidden_dim", [128, 256, 512]),
        "epochs": 20,  # Short runs for search
        "patience": 5,
    }
    
    # Train with these params and return val metric
    val_loss = train_and_evaluate(config)
    return val_loss

study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=50, timeout=3600)  # 50 trials or 1 hour

print(f"Best params: {study.best_params}")
print(f"Best val loss: {study.best_value}")

Model Deployment Patterns

ONNX Export (Framework-Agnostic Inference)

import torch.onnx

model.eval()
dummy_input = torch.randn(1, input_size)
torch.onnx.export(
    model, dummy_input,
    "model.onnx",
    export_params=True,
    opset_version=17,
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}},
)

# Validate export
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
output = session.run(None, {"input": dummy_input.numpy()})

FastAPI Serving

from fastapi import FastAPI
from pydantic import BaseModel
import onnxruntime as ort
import numpy as np

app = FastAPI()
session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])

class PredictRequest(BaseModel):
    features: list[float]

class PredictResponse(BaseModel):
    prediction: int
    confidence: float

@app.post("/predict", response_model=PredictResponse)
def predict(request: PredictRequest):
    input_array = np.array([request.features], dtype=np.float32)
    logits = session.run(None, {"input": input_array})[0]
    probs = softmax(logits[0])
    return PredictResponse(
        prediction=int(probs.argmax()),
        confidence=float(probs.max()),
    )

Deeper Reference

For complete training recipes and hyperparameter tuning guides, see:

references/training-recipes.md — end-to-end PyTorch and HuggingFace training scripts with mixed-precision, gradient checkpointing, and distributed training
references/hyperparameter-guide.md — learning rate schedules, batch size scaling rules, regularization strategies, and Optuna/Ray Tune search configurations

Related Skills

harsh040506/single-cell-rna-qc

testing

VerifiedTrustedCommunity

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.

2SKILL.mdUpdated Apr 5, 2026

harsh040506/single-cell-rna-qc

harsh040506/scvi-tools

tools

VerifiedTrustedCommunity

Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.

2SKILL.mdUpdated Apr 5, 2026

harsh040506/scvi-tools

harsh040506/scientific-problem-selection

testing

VerifiedTrustedCommunity

This skill should be used when scientists need help with research problem selection, project ideation, troubleshooting stuck projects, or strategic scientific decisions. Use this skill when users ask to pitch a new research idea, work through a project problem, evaluate project risks, plan research strategy, navigate decision trees, or get help choosing what scientific problem to work on. Typical requests include "I have an idea for a project", "I'm stuck on my research", "help me evaluate this project", "what should I work on", or "I need strategic advice about my research".

2SKILL.mdUpdated Apr 5, 2026

harsh040506/scientific-problem-selection

harsh040506/nextflow-development

development

VerifiedTrustedCommunity

Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.

2SKILL.mdUpdated Apr 5, 2026

harsh040506/nextflow-development

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/harsh040506/claude-code-unified-skill-plugin-library.git

# Copy into Claude Code skills folder (global)
cp -r claude-code-unified-skill-plugin-library/engineering/ai-ml-engineering/skills/ml-engineering ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

harsh040506/claude-code-unified-skill-plugin-library

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT