Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

a5c-ai/mlflow-experiment-tracker

Name: mlflow-experiment-tracker
Author: a5c-ai

library/specializations/data-science-ml/skills/mlflow-experiment-tracker/SKILL.md

npx skillsauth add a5c-ai/babysitter mlflow-experiment-tracker

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

MLflow Experiment Tracker

Integrate with MLflow for comprehensive ML experiment tracking, model registry operations, and artifact management.

Overview

This skill provides capabilities for interacting with MLflow's tracking server and model registry. It enables automated experiment logging, run comparison, model versioning, and artifact retrieval within ML workflows.

Capabilities

Experiment Management

Create and manage experiments
Start and end runs programmatically
Set experiment tags and descriptions
List and search experiments

Parameter and Metric Logging

Log hyperparameters for reproducibility
Track metrics during training (loss, accuracy, etc.)
Log batch metrics with timestamps
Set run tags for organization

Artifact Management

Log model artifacts (serialized models, checkpoints)
Store datasets and data samples
Save plots and visualizations
Retrieve artifacts from completed runs

Model Registry Operations

Register trained models
Manage model versions
Transition models between stages (Staging, Production, Archived)
Add model descriptions and tags

Run Comparison and Analysis

Compare metrics across runs
Search runs by parameters/metrics
Retrieve best performing runs
Generate comparison visualizations

Prerequisites

MLflow Installation

pip install mlflow>=2.0.0

MLflow Tracking Server

Configure tracking URI:

import mlflow
mlflow.set_tracking_uri("http://localhost:5000")  # or remote server

Optional: MLflow MCP Server

For enhanced LLM integration, install the MLflow MCP server:

pip install mlflow>=3.4  # Official MCP support
# or
pip install mlflow-mcp   # Community server

Usage Patterns

Starting an Experiment Run

import mlflow

# Set experiment
mlflow.set_experiment("my-classification-experiment")

# Start run with context manager
with mlflow.start_run(run_name="baseline-model"):
    # Log parameters
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("batch_size", 32)
    mlflow.log_param("epochs", 100)

    # Log metrics during training
    for epoch in range(100):
        train_loss = train_one_epoch()
        mlflow.log_metric("train_loss", train_loss, step=epoch)

    # Log final metrics
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_metric("f1_score", 0.93)

    # Log model artifact
    mlflow.sklearn.log_model(model, "model")

Searching and Comparing Runs

import mlflow

# Search runs with filter
runs = mlflow.search_runs(
    experiment_names=["my-classification-experiment"],
    filter_string="metrics.accuracy > 0.9",
    order_by=["metrics.accuracy DESC"],
    max_results=10
)

# Get best run
best_run = runs.iloc[0]
print(f"Best run ID: {best_run.run_id}")
print(f"Best accuracy: {best_run['metrics.accuracy']}")

Model Registry Operations

import mlflow

# Register model from run
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "production-classifier")

# Transition model stage
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="production-classifier",
    version=1,
    stage="Production"
)

# Load production model
model = mlflow.pyfunc.load_model("models:/production-classifier/Production")

Integration with Babysitter SDK

Task Definition Example

const mlflowTrackingTask = defineTask({
  name: 'mlflow-experiment-tracking',
  description: 'Track ML experiment with MLflow',

  inputs: {
    experimentName: { type: 'string', required: true },
    runName: { type: 'string', required: true },
    parameters: { type: 'object', required: true },
    metrics: { type: 'object', required: true },
    modelPath: { type: 'string' }
  },

  outputs: {
    runId: { type: 'string' },
    experimentId: { type: 'string' },
    artifactUri: { type: 'string' }
  },

  async run(inputs, taskCtx) {
    return {
      kind: 'skill',
      title: `Track experiment: ${inputs.experimentName}/${inputs.runName}`,
      skill: {
        name: 'mlflow-experiment-tracker',
        context: {
          operation: 'log_run',
          experimentName: inputs.experimentName,
          runName: inputs.runName,
          parameters: inputs.parameters,
          metrics: inputs.metrics,
          modelPath: inputs.modelPath
        }
      },
      io: {
        inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
        outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
      }
    };
  }
});

MCP Server Integration

Using mlflow-mcp Server

{
  "mcpServers": {
    "mlflow": {
      "command": "uvx",
      "args": ["mlflow-mcp"],
      "env": {
        "MLFLOW_TRACKING_URI": "http://localhost:5000"
      }
    }
  }
}

Available MCP Tools

mlflow_list_experiments - List all experiments
mlflow_search_runs - Search runs with filters
mlflow_get_run - Get run details
mlflow_log_metric - Log a metric
mlflow_log_param - Log a parameter
mlflow_list_artifacts - List run artifacts
mlflow_get_model_version - Get model version details

Best Practices

Consistent Naming: Use descriptive experiment and run names
Complete Logging: Log all hyperparameters, not just tuned ones
Metric Granularity: Log metrics at appropriate intervals
Artifact Organization: Use consistent artifact paths
Model Documentation: Add descriptions to registered models
Stage Management: Use proper staging workflow (None -> Staging -> Production)

References

MLflow Documentation
MLflow MCP Server
Official MLflow MCP (3.4+)
MLflow Model Registry

a5c-ai/mlflow-experiment-tracker

library/specializations/data-science-ml/skills/mlflow-experiment-tracker/SKILL.md

MLflow integration skill for experiment tracking, model registry, and artifact management. Enables LLMs to log experiments, compare runs, manage model lifecycle, and retrieve artifacts through the MLflow API.

680 stars

development

Updated Apr 28, 2026

$ install --global

skillsauth

npx skillsauth add a5c-ai/babysitter mlflow-experiment-tracker

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 28, 2026, 5:04 PM247.9s2 files scanned

SKILL.md

name:: mlflow-experiment-tracker
description:: MLflow integration skill for experiment tracking, model registry, and artifact management. Enables LLMs to log experiments, compare runs, manage model lifecycle, and retrieve artifacts through the MLflow API.
allowed-tools:: Read, Grep, Write, Bash, Edit, Glob, WebFetch

MLflow Experiment Tracker

Integrate with MLflow for comprehensive ML experiment tracking, model registry operations, and artifact management.

Overview

Capabilities

Experiment Management

Create and manage experiments
Start and end runs programmatically
Set experiment tags and descriptions
List and search experiments

Parameter and Metric Logging

Log hyperparameters for reproducibility
Track metrics during training (loss, accuracy, etc.)
Log batch metrics with timestamps
Set run tags for organization

Artifact Management

Log model artifacts (serialized models, checkpoints)
Store datasets and data samples
Save plots and visualizations
Retrieve artifacts from completed runs

Model Registry Operations

Register trained models
Manage model versions
Transition models between stages (Staging, Production, Archived)
Add model descriptions and tags

Run Comparison and Analysis

Compare metrics across runs
Search runs by parameters/metrics
Retrieve best performing runs
Generate comparison visualizations

Prerequisites

MLflow Installation

pip install mlflow>=2.0.0

MLflow Tracking Server

Configure tracking URI:

import mlflow
mlflow.set_tracking_uri("http://localhost:5000")  # or remote server

Optional: MLflow MCP Server

For enhanced LLM integration, install the MLflow MCP server:

pip install mlflow>=3.4  # Official MCP support
# or
pip install mlflow-mcp   # Community server

Usage Patterns

Starting an Experiment Run

import mlflow

# Set experiment
mlflow.set_experiment("my-classification-experiment")

# Start run with context manager
with mlflow.start_run(run_name="baseline-model"):
    # Log parameters
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("batch_size", 32)
    mlflow.log_param("epochs", 100)

    # Log metrics during training
    for epoch in range(100):
        train_loss = train_one_epoch()
        mlflow.log_metric("train_loss", train_loss, step=epoch)

    # Log final metrics
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_metric("f1_score", 0.93)

    # Log model artifact
    mlflow.sklearn.log_model(model, "model")

Searching and Comparing Runs

import mlflow

# Search runs with filter
runs = mlflow.search_runs(
    experiment_names=["my-classification-experiment"],
    filter_string="metrics.accuracy > 0.9",
    order_by=["metrics.accuracy DESC"],
    max_results=10
)

# Get best run
best_run = runs.iloc[0]
print(f"Best run ID: {best_run.run_id}")
print(f"Best accuracy: {best_run['metrics.accuracy']}")

Model Registry Operations

import mlflow

# Register model from run
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "production-classifier")

# Transition model stage
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="production-classifier",
    version=1,
    stage="Production"
)

# Load production model
model = mlflow.pyfunc.load_model("models:/production-classifier/Production")

Integration with Babysitter SDK

Task Definition Example

const mlflowTrackingTask = defineTask({
  name: 'mlflow-experiment-tracking',
  description: 'Track ML experiment with MLflow',

  inputs: {
    experimentName: { type: 'string', required: true },
    runName: { type: 'string', required: true },
    parameters: { type: 'object', required: true },
    metrics: { type: 'object', required: true },
    modelPath: { type: 'string' }
  },

  outputs: {
    runId: { type: 'string' },
    experimentId: { type: 'string' },
    artifactUri: { type: 'string' }
  },

  async run(inputs, taskCtx) {
    return {
      kind: 'skill',
      title: `Track experiment: ${inputs.experimentName}/${inputs.runName}`,
      skill: {
        name: 'mlflow-experiment-tracker',
        context: {
          operation: 'log_run',
          experimentName: inputs.experimentName,
          runName: inputs.runName,
          parameters: inputs.parameters,
          metrics: inputs.metrics,
          modelPath: inputs.modelPath
        }
      },
      io: {
        inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
        outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
      }
    };
  }
});

MCP Server Integration

Using mlflow-mcp Server

{
  "mcpServers": {
    "mlflow": {
      "command": "uvx",
      "args": ["mlflow-mcp"],
      "env": {
        "MLFLOW_TRACKING_URI": "http://localhost:5000"
      }
    }
  }
}

Available MCP Tools

mlflow_list_experiments - List all experiments
mlflow_search_runs - Search runs with filters
mlflow_get_run - Get run details
mlflow_log_metric - Log a metric
mlflow_log_param - Log a parameter
mlflow_list_artifacts - List run artifacts
mlflow_get_model_version - Get model version details

Best Practices

Consistent Naming: Use descriptive experiment and run names
Complete Logging: Log all hyperparameters, not just tuned ones
Metric Granularity: Log metrics at appropriate intervals
Artifact Organization: Use consistent artifact paths
Model Documentation: Add descriptions to registered models
Stage Management: Use proper staging workflow (None -> Staging -> Production)

References

MLflow Documentation
MLflow MCP Server
Official MLflow MCP (3.4+)
MLflow Model Registry

Related Skills

a5c-ai/model-card-generator

development

VerifiedTrustedCommunity

Model documentation skill for generating model cards following Google's model card framework.

680SKILL.mdUpdated Apr 28, 2026

a5c-ai/model-card-generator

a5c-ai/lime-explainer

data-ai

VerifiedTrustedCommunity

LIME-based local explanation skill for individual predictions across tabular, text, and image data.

680SKILL.mdUpdated Apr 28, 2026

a5c-ai/lime-explainer

a5c-ai/kubeflow-pipeline-executor

devops

VerifiedTrustedCommunity

Kubeflow Pipelines skill for ML workflow orchestration, component management, and Kubernetes-native ML.

680SKILL.mdUpdated Apr 28, 2026

a5c-ai/kubeflow-pipeline-executor

a5c-ai/jupyter-notebook-executor

tools

VerifiedTrustedCommunity

Jupyter notebook execution skill for running notebooks programmatically and extracting outputs.

680SKILL.mdUpdated Apr 28, 2026

a5c-ai/jupyter-notebook-executor

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/a5c-ai/babysitter.git

# Copy into Claude Code skills folder (global)
cp -r babysitter/library/specializations/data-science-ml/skills/mlflow-experiment-tracker ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

a5c-ai/babysitter

680 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT