library/specializations/data-science-ml/skills/mlflow-experiment-tracker/SKILL.md
MLflow integration skill for experiment tracking, model registry, and artifact management. Enables LLMs to log experiments, compare runs, manage model lifecycle, and retrieve artifacts through the MLflow API.
npx skillsauth add a5c-ai/babysitter mlflow-experiment-trackerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Integrate with MLflow for comprehensive ML experiment tracking, model registry operations, and artifact management.
This skill provides capabilities for interacting with MLflow's tracking server and model registry. It enables automated experiment logging, run comparison, model versioning, and artifact retrieval within ML workflows.
pip install mlflow>=2.0.0
Configure tracking URI:
import mlflow
mlflow.set_tracking_uri("http://localhost:5000") # or remote server
For enhanced LLM integration, install the MLflow MCP server:
pip install mlflow>=3.4 # Official MCP support
# or
pip install mlflow-mcp # Community server
import mlflow
# Set experiment
mlflow.set_experiment("my-classification-experiment")
# Start run with context manager
with mlflow.start_run(run_name="baseline-model"):
# Log parameters
mlflow.log_param("learning_rate", 0.01)
mlflow.log_param("batch_size", 32)
mlflow.log_param("epochs", 100)
# Log metrics during training
for epoch in range(100):
train_loss = train_one_epoch()
mlflow.log_metric("train_loss", train_loss, step=epoch)
# Log final metrics
mlflow.log_metric("accuracy", 0.95)
mlflow.log_metric("f1_score", 0.93)
# Log model artifact
mlflow.sklearn.log_model(model, "model")
import mlflow
# Search runs with filter
runs = mlflow.search_runs(
experiment_names=["my-classification-experiment"],
filter_string="metrics.accuracy > 0.9",
order_by=["metrics.accuracy DESC"],
max_results=10
)
# Get best run
best_run = runs.iloc[0]
print(f"Best run ID: {best_run.run_id}")
print(f"Best accuracy: {best_run['metrics.accuracy']}")
import mlflow
# Register model from run
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "production-classifier")
# Transition model stage
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="production-classifier",
version=1,
stage="Production"
)
# Load production model
model = mlflow.pyfunc.load_model("models:/production-classifier/Production")
const mlflowTrackingTask = defineTask({
name: 'mlflow-experiment-tracking',
description: 'Track ML experiment with MLflow',
inputs: {
experimentName: { type: 'string', required: true },
runName: { type: 'string', required: true },
parameters: { type: 'object', required: true },
metrics: { type: 'object', required: true },
modelPath: { type: 'string' }
},
outputs: {
runId: { type: 'string' },
experimentId: { type: 'string' },
artifactUri: { type: 'string' }
},
async run(inputs, taskCtx) {
return {
kind: 'skill',
title: `Track experiment: ${inputs.experimentName}/${inputs.runName}`,
skill: {
name: 'mlflow-experiment-tracker',
context: {
operation: 'log_run',
experimentName: inputs.experimentName,
runName: inputs.runName,
parameters: inputs.parameters,
metrics: inputs.metrics,
modelPath: inputs.modelPath
}
},
io: {
inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
}
};
}
});
{
"mcpServers": {
"mlflow": {
"command": "uvx",
"args": ["mlflow-mcp"],
"env": {
"MLFLOW_TRACKING_URI": "http://localhost:5000"
}
}
}
}
mlflow_list_experiments - List all experimentsmlflow_search_runs - Search runs with filtersmlflow_get_run - Get run detailsmlflow_log_metric - Log a metricmlflow_log_param - Log a parametermlflow_list_artifacts - List run artifactsmlflow_get_model_version - Get model version detailsdevelopment
Model documentation skill for generating model cards following Google's model card framework.
data-ai
LIME-based local explanation skill for individual predictions across tabular, text, and image data.
devops
Kubeflow Pipelines skill for ML workflow orchestration, component management, and Kubernetes-native ML.
tools
Jupyter notebook execution skill for running notebooks programmatically and extracting outputs.