Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

harsh040506/self-healing-models

Name: self-healing-models
Author: harsh040506

engineering/advanced-ml-engineering/skills/self-healing-models/SKILL.md

npx skillsauth add harsh040506/claude-code-unified-skill-plugin-library self-healing-models

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Self-Healing Models — Drift Detection and Autonomous Recovery

Provides the complete framework for detecting statistical distribution shifts in production ML systems and autonomously responding with shadow retraining and zero-downtime model updates. Implements the "Self-Healing Model" concept where concept drift triggers an end-to-end recovery cycle without manual intervention.

Concept Drift Taxonomy

| Drift Type | Description | Detection Method | |---|---|---| | Sudden drift | Abrupt change at a specific time point (e.g., system change, world event) | Immediate KS test comparison | | Gradual drift | Slow, continuous shift (e.g., user behavior evolution) | Rolling window PSI trend | | Recurrent drift | Seasonal or cyclic patterns reappear (e.g., holiday behavior) | Season-aware baseline comparison | | Incremental drift | Monotonic directional drift over time | Linear trend test on PSI | | Feature drift | Input distribution changes but relationship to target is stable | Input-only KS/PSI (re-calibrate, don't retrain) | | Label drift | Relationship between features and target changes (true concept drift) | Requires label monitoring + performance tracking |

Statistical Tests

Kolmogorov-Smirnov (KS) Test for continuous features:

D = max|F_ref(x) − F_current(x)| (maximum CDF difference)
p-value < 0.05 → statistically significant drift
Sensitive to any shape of distribution change (shift, scale, shape)

Chi-Square Test for categorical features:

χ² = Σ (O_i − E_i)² / E_i
O_i = observed frequency in current period; E_i = expected frequency from reference
df = k - 1 (k = number of categories); p-value < 0.05 → drift

Population Stability Index (PSI) for magnitude quantification:

PSI = Σ (P_current_i − P_reference_i) · ln(P_current_i / P_reference_i)
PSI < 0.1: negligible shift — no action
0.1 ≤ PSI < 0.2: moderate shift — increase monitoring frequency, alert team
PSI ≥ 0.2: major shift — trigger retraining pipeline

Drift aggregation rule (avoids false positives from individual feature noise):

Trigger retraining only when: ≥ 3 features have KS p < 0.05 AND ≥ 1 feature has PSI ≥ 0.2, OR performance metric degrades > 5% relative.

See references/drift-detection.md for complete test implementations, window size selection, and rolling-window monitoring patterns.

Self-Healing Pipeline Sequence

1. Drift Detected (KS + PSI thresholds exceeded)
          ↓
2. Shadow Training Triggered (new data distribution)
          ↓
3. Challenger Model Evaluated vs. Champion
          ↓
4. Statistical Significance Test (paired t-test, p < 0.05)
          ↓
5. Blue-Green Deployment (10% canary → 100% cutover)
          ↓
6. Champion model archived (24-hour rollback window)

Shadow Training

Shadow training runs the full training pipeline on the drifted data distribution without disrupting the production model:

Isolate: use the current data window (typically last 30–90 days) as the new training distribution
Architecture: reuse the same architecture unless drift analysis suggests a fundamentally different data signature
HPO: reuse the last best hyperparameters; re-run HPO only if drift severity is "critical" (PSI ≥ 0.5)
Parallelism: shadow model trains on a separate compute partition — no production impact
Hardware: prefer spot instances for cost efficiency; configure checkpointing for preemption resilience

Champion-Challenger Evaluation

Before any deployment, the challenger must beat the champion on the current data distribution:

Set aside 20% of the current data window as a holdout evaluation set (after shadow training).
Evaluate both champion and challenger on the holdout set.
Run a paired t-test (or permutation test) on per-sample losses:
- H₀: challenger performance = champion performance
- Reject H₀ if p < 0.05 → challenger is statistically better
Require: challenger primary metric improvement ≥ 2% relative AND p < 0.05.
Check all quality gates: latency SLA, fairness thresholds, no regression on reference distribution.

See references/deployment-strategies.md for blue-green, canary, and shadow traffic routing patterns with Kubernetes ingress configurations.

Rollback Protocol

If post-deployment monitoring detects regression within 24 hours:

Immediately route 100% traffic back to the previous champion (now in "green" slot).
Log the regression with full diagnostics: which metrics degraded, by how much.
File an incident report and increase shadow training data window (e.g., 30 → 90 days).
Re-evaluate whether drift was correctly classified (feature drift vs. true label drift).

harsh040506/self-healing-models

engineering/advanced-ml-engineering/skills/self-healing-models/SKILL.md

This skill should be used when the user asks about "concept drift", "data drift", "model degradation", "model staleness", "production model monitoring", "self-healing", "automatic retraining", "shadow model", "champion-challenger", "blue-green deployment", "canary deployment", "Kolmogorov-Smirnov test", "PSI", "Population Stability Index", "model refresh", "continuous learning", "online learning", or when a production model's performance has degraded over time due to distribution shift.

2 stars

development

Updated Apr 5, 2026

$ install --global

skillsauth

npx skillsauth add harsh040506/claude-code-unified-skill-plugin-library self-healing-models

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 5, 2026, 5:10 PM6.5s3 files scanned

SKILL.md

name:: self-healing-models
description:: This skill should be used when the user asks about "concept drift", "data drift", "model degradation", "model staleness", "production model monitoring", "self-healing", "automatic retraining", "shadow model", "champion-challenger", "blue-green deployment", "canary deployment", "Kolmogorov-Smirnov test", "PSI", "Population Stability Index", "model refresh", "continuous learning", "online learning", or when a production model's performance has degraded over time due to distribution shift.
version:: 1.0.0

Self-Healing Models — Drift Detection and Autonomous Recovery

Concept Drift Taxonomy

Statistical Tests

Kolmogorov-Smirnov (KS) Test for continuous features:

D = max|F_ref(x) − F_current(x)| (maximum CDF difference)
p-value < 0.05 → statistically significant drift
Sensitive to any shape of distribution change (shift, scale, shape)

Chi-Square Test for categorical features:

χ² = Σ (O_i − E_i)² / E_i
O_i = observed frequency in current period; E_i = expected frequency from reference
df = k - 1 (k = number of categories); p-value < 0.05 → drift

Population Stability Index (PSI) for magnitude quantification:

PSI = Σ (P_current_i − P_reference_i) · ln(P_current_i / P_reference_i)
PSI < 0.1: negligible shift — no action
0.1 ≤ PSI < 0.2: moderate shift — increase monitoring frequency, alert team
PSI ≥ 0.2: major shift — trigger retraining pipeline

Drift aggregation rule (avoids false positives from individual feature noise):

Trigger retraining only when: ≥ 3 features have KS p < 0.05 AND ≥ 1 feature has PSI ≥ 0.2, OR performance metric degrades > 5% relative.

See references/drift-detection.md for complete test implementations, window size selection, and rolling-window monitoring patterns.

Self-Healing Pipeline Sequence

1. Drift Detected (KS + PSI thresholds exceeded)
          ↓
2. Shadow Training Triggered (new data distribution)
          ↓
3. Challenger Model Evaluated vs. Champion
          ↓
4. Statistical Significance Test (paired t-test, p < 0.05)
          ↓
5. Blue-Green Deployment (10% canary → 100% cutover)
          ↓
6. Champion model archived (24-hour rollback window)

Shadow Training

Shadow training runs the full training pipeline on the drifted data distribution without disrupting the production model:

Isolate: use the current data window (typically last 30–90 days) as the new training distribution
Architecture: reuse the same architecture unless drift analysis suggests a fundamentally different data signature
HPO: reuse the last best hyperparameters; re-run HPO only if drift severity is "critical" (PSI ≥ 0.5)
Parallelism: shadow model trains on a separate compute partition — no production impact
Hardware: prefer spot instances for cost efficiency; configure checkpointing for preemption resilience

Champion-Challenger Evaluation

Before any deployment, the challenger must beat the champion on the current data distribution:

Set aside 20% of the current data window as a holdout evaluation set (after shadow training).
Evaluate both champion and challenger on the holdout set.
Run a paired t-test (or permutation test) on per-sample losses:
- H₀: challenger performance = champion performance
- Reject H₀ if p < 0.05 → challenger is statistically better
Require: challenger primary metric improvement ≥ 2% relative AND p < 0.05.
Check all quality gates: latency SLA, fairness thresholds, no regression on reference distribution.

See references/deployment-strategies.md for blue-green, canary, and shadow traffic routing patterns with Kubernetes ingress configurations.

Rollback Protocol

If post-deployment monitoring detects regression within 24 hours:

Immediately route 100% traffic back to the previous champion (now in "green" slot).
Log the regression with full diagnostics: which metrics degraded, by how much.
File an incident report and increase shadow training data window (e.g., 30 → 90 days).
Re-evaluate whether drift was correctly classified (feature drift vs. true label drift).

Related Skills

harsh040506/single-cell-rna-qc

testing

VerifiedTrustedCommunity

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.

2SKILL.mdUpdated Apr 5, 2026

harsh040506/single-cell-rna-qc

harsh040506/scvi-tools

tools

VerifiedTrustedCommunity

Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.

2SKILL.mdUpdated Apr 5, 2026

harsh040506/scvi-tools

harsh040506/scientific-problem-selection

testing

VerifiedTrustedCommunity

This skill should be used when scientists need help with research problem selection, project ideation, troubleshooting stuck projects, or strategic scientific decisions. Use this skill when users ask to pitch a new research idea, work through a project problem, evaluate project risks, plan research strategy, navigate decision trees, or get help choosing what scientific problem to work on. Typical requests include "I have an idea for a project", "I'm stuck on my research", "help me evaluate this project", "what should I work on", or "I need strategic advice about my research".

2SKILL.mdUpdated Apr 5, 2026

harsh040506/scientific-problem-selection

harsh040506/nextflow-development

development

VerifiedTrustedCommunity

Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.

2SKILL.mdUpdated Apr 5, 2026

harsh040506/nextflow-development

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/harsh040506/claude-code-unified-skill-plugin-library.git

# Copy into Claude Code skills folder (global)
cp -r claude-code-unified-skill-plugin-library/engineering/advanced-ml-engineering/skills/self-healing-models ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

harsh040506/claude-code-unified-skill-plugin-library

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT