Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

fmind/MLOps Observability

Name: MLOps Observability
Author: fmind

.gemini/skills/MLOps Observability/SKILL.md

npx skillsauth add fmind/mlops-python-package MLOps Observability

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

MLOps Observability

Goal

To implement a "Glass Box" system where every result is Reproducible, every asset has Lineage, and system health is Monitored, Alerted on, and Explained.

Prerequisites

Language: Python
Context: Production monitoring and debugging.
Platform Suggestion: MLflow, SHAP, Evidently, ...

Instructions

1. Guarantee Reproducibility

Consistency is key. For instance:

Randomness: Set seeds for random, numpy, torch, tensorflow.
Environment: Use docker and locked dependencies (uv.lock).
Builds: Use justfile with uv build --build-constraint for deterministic wheels.
Code: Track git commit hash for every run.

2. Track Data Lineage

Know the origin of your data. For instance:

Datasets: Create MLflow Datasets with mlflow.data.from_pandas.
Logging: Log inputs to MLflow context with mlflow.log_input.
Versioning: Version data files (e.g., data/v1.csv) or use DVC.
Transformations: Log preprocessing parameters mapping data versions to model versions.

3. Monitoring & Drift Detection

Watch for silent failures. For instance:

Validation: Use MLflow Evaluate to gate models against quality thresholds.
Drift: Use evidently to compare reference (training) vs current (production) data.
- Detect Data Drift (input distribution changes) and Concept Drift (relationship changes).
System: Enable MLflow System Metrics (log_system_metrics=True) for CPU/GPU.

4. Alerting

Don't stare at dashboards. For instance:

Local: Use plyer for desktop notifications during long training runs.
Production: Use PagerDuty (critical) or Slack (warnings).
Thresholds: Use Static (fixed value) or Dynamic (anomaly detection) rules.
Action: Alerts must link to a dashboard or playbook.

5. Explainability (XAI)

Trust but verify. For instance:

Global: Use Feature Importance (e.g., Random Forest) to understand overall logic.
Local: Use SHAP values to explain individual predictions.
Artifacts: Save explanations (plots/tables) as MLflow artifacts.

6. Infrastructure & Costs

Optimize resources. For instance:

Tags: Tag runs with project, env, user.
Costs: Log run_time and instance type to estimate ROI.

Self-Correction Checklist

[ ] Seeds: Are random seeds fixed?
[ ] Inputs: Are input datasets logged to MLflow?
[ ] System Metrics: Is log_system_metrics enabled?
[ ] Explanations: Are SHAP values generated?
[ ] Alerts: Are thresholds defined for failures?

fmind/MLOps Observability

.gemini/skills/MLOps Observability/SKILL.md

Guide to implement full stack observability including reproducibility, lineage, monitoring, alerting, and explainability.

1,402 stars

testing

Updated Apr 5, 2026

$ install --global

skillsauth

npx skillsauth add fmind/mlops-python-package MLOps Observability

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 5, 2026, 2:01 AM3.8s1 file scanned

SKILL.md

name:: MLOps Observability
description:: Guide to implement full stack observability including reproducibility, lineage, monitoring, alerting, and explainability.

MLOps Observability

Goal

To implement a "Glass Box" system where every result is Reproducible, every asset has Lineage, and system health is Monitored, Alerted on, and Explained.

Prerequisites

Language: Python
Context: Production monitoring and debugging.
Platform Suggestion: MLflow, SHAP, Evidently, ...

Instructions

1. Guarantee Reproducibility

Consistency is key. For instance:

Randomness: Set seeds for random, numpy, torch, tensorflow.
Environment: Use docker and locked dependencies (uv.lock).
Builds: Use justfile with uv build --build-constraint for deterministic wheels.
Code: Track git commit hash for every run.

2. Track Data Lineage

Know the origin of your data. For instance:

Datasets: Create MLflow Datasets with mlflow.data.from_pandas.
Logging: Log inputs to MLflow context with mlflow.log_input.
Versioning: Version data files (e.g., data/v1.csv) or use DVC.
Transformations: Log preprocessing parameters mapping data versions to model versions.

3. Monitoring & Drift Detection

Watch for silent failures. For instance:

Validation: Use MLflow Evaluate to gate models against quality thresholds.
Drift: Use evidently to compare reference (training) vs current (production) data.
- Detect Data Drift (input distribution changes) and Concept Drift (relationship changes).
System: Enable MLflow System Metrics (log_system_metrics=True) for CPU/GPU.

4. Alerting

Don't stare at dashboards. For instance:

Local: Use plyer for desktop notifications during long training runs.
Production: Use PagerDuty (critical) or Slack (warnings).
Thresholds: Use Static (fixed value) or Dynamic (anomaly detection) rules.
Action: Alerts must link to a dashboard or playbook.

5. Explainability (XAI)

Trust but verify. For instance:

Global: Use Feature Importance (e.g., Random Forest) to understand overall logic.
Local: Use SHAP values to explain individual predictions.
Artifacts: Save explanations (plots/tables) as MLflow artifacts.

6. Infrastructure & Costs

Optimize resources. For instance:

Tags: Tag runs with project, env, user.
Costs: Log run_time and instance type to estimate ROI.

Self-Correction Checklist

[ ] Seeds: Are random seeds fixed?
[ ] Inputs: Are input datasets logged to MLflow?
[ ] System Metrics: Is log_system_metrics enabled?
[ ] Explanations: Are SHAP values generated?
[ ] Alerts: Are thresholds defined for failures?

Related Skills

fmind/MLOps Validation

development

VerifiedTrustedCommunity

Guide to implement rigorous validation layers including static analysis, automated testing, structured logging, and security scanning.

1,402SKILL.mdUpdated Apr 5, 2026

fmind/MLOps Validation

fmind/MLOps Prototyping

testing

VerifiedTrustedCommunity

Guide to create structured, reproducible Jupyter notebooks for MLOps prototyping, emphasizing configuration management and pipeline integrity.

1,402SKILL.mdUpdated Apr 5, 2026

fmind/MLOps Prototyping

fmind/MLOps Initialization

tools

VerifiedTrustedCommunity

Guide to initialize a new MLOps project with standard tools (uv, git, VS Code) and best practices.

1,402SKILL.mdUpdated Apr 5, 2026

fmind/MLOps Initialization

fmind/MLOps Industrialization

development

VerifiedTrustedCommunity

Guide to transform prototypes into robust, distributable Python packages using the src layout, hybrid paradigm, and strict configuration management.

1,402SKILL.mdUpdated Apr 5, 2026

fmind/MLOps Industrialization

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/fmind/mlops-python-package.git

# Copy into Claude Code skills folder (global)
cp -r mlops-python-package/.gemini/skills/MLOps Observability ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

fmind/mlops-python-package

1,402 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT