areas/software/mlops/skills/experiment-tracking/SKILL.md
# Skill: Experiment Tracking (MLflow) ## When to load When running training experiments, comparing runs, or reproducing a historical experiment. ## MLflow Tracking Pattern ```python with mlflow.start_run(run_name="xgboost-lr-0.01-depth-6") as run: mlflow.log_params({ "model_type": "xgboost", "learning_rate": 0.01, "max_depth": 6, "data_version": dataset_version, "random_seed": 42, }) mlflow.set_tags({ "git_commit": subprocess.check_
npx skillsauth add sawrus/agent-guides areas/software/mlops/skills/experiment-trackingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When running training experiments, comparing runs, or reproducing a historical experiment.
with mlflow.start_run(run_name="xgboost-lr-0.01-depth-6") as run:
mlflow.log_params({
"model_type": "xgboost",
"learning_rate": 0.01,
"max_depth": 6,
"data_version": dataset_version,
"random_seed": 42,
})
mlflow.set_tags({
"git_commit": subprocess.check_output(["git", "rev-parse", "HEAD"]).decode().strip(),
})
model = train_model(X_train, y_train, hyperparams)
mlflow.log_metrics({"test_auc_roc": 0.847, "test_f1": 0.731})
signature = mlflow.models.infer_signature(X_train, model.predict(X_train))
mlflow.xgboost.log_model(model, "model", signature=signature)
mlflow.log_artifact("evaluation_scorecard.json")
testing
QA Expert for writing E2E tests, test scenarios, test plans, and ensuring test coverage quality.
development
Expert UI/UX design intelligence for creating distinctive, high-craft, and mobile-first interfaces. Focuses on premium aesthetics, touch-first ergonomics, and Flutter performance.
development
Code Review Expert for static analysis, security auditing, architecture review, and ensuring code quality standards.
development
Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.