Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

nvidia-nemo/accessing-mlflow

Name: accessing-mlflow
Author: nvidia-nemo

packages/nemo-evaluator-launcher/.claude/skills/accessing-mlflow/SKILL.md

npx skillsauth add nvidia-nemo/evaluator accessing-mlflow

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Accessing MLflow

MCP Server

mlflow-mcp gives agents direct access to MLflow — query runs, compare metrics, browse artifacts, all through natural language.

ID Convention

When the user provides a hex ID (e.g. 71f3f3199ea5e1f0) without specifying what it is, assume it is an invocation_id (not an MLflow run_id). An invocation_id identifies a launcher invocation and is stored as both a tag and a param on MLflow runs. One invocation can produce multiple MLflow runs (one per task). You may need to search across multiple experiments if you don't know which experiment the run belongs to.

Querying Runs

# Find runs by invocation_id
MLflow:search_runs_by_tags(experiment_id, {"invocation_id": "<invocation_id>"})

# Query for example model/task runs
MLflow:query_runs(experiment_id, "tags.model LIKE '%<model>%'")
MLflow:query_runs(experiment_id, "tags.task_name LIKE '%<task_name>%'")

# Get a config from run's artifacts
MLflow:get_artifact_content(run_id, "config.yml")

# Get nested stats from run's artifacts
MLflow:get_artifact_content(run_id, "artifacts/eval_factory_metrics.json")

NOTE: You WILL NOT find PENDING, RUNNING, KILLED, or FAILED runs in MLflow! Only SUCCESSFUL runs are exported to MLflow.

Workflow Tips

When comparing metrics across runs, fetch the data via MCP, then run the computation in Python for exact results rather than doing math in-context:

uv run --with pandas python3 << 'EOF'
import pandas as pd
# ... compute deltas, averages, etc.
EOF

Artifacts Structure

<harness>.<task>/
├── artifacts/
│   ├── config.yml                # Fully resolved config used during the evaluation
│   ├── launcher_unresolved_config.yaml # Unresolved config passed to the launcher
│   ├── results.yml               # All results in YAML format
│   ├── eval_factory_metrics.json # Runtime stats (latency, tokens count, memory)
│   ├── report.html               # Request-Response Pairs samples in HTML format (if enabled)
│   └── report.json               # Request-Response Pairs samples in JSON format (if enabled)
└── logs/
    ├── client-*.log              # Evaluation client
    ├── server-*-N.log            # Deployment per node
    ├── slurm-*.log               # Slurm job
    └── proxy-*.log               # Request proxy

Troubleshooting

If the MLflow MCP server fails to load or its tools are unavailable:

uvx not found — install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

MCP server not configured — add the config and restart the agent:

For Claude Code — add to .claude/settings.json (project or user level), under "mcpServers":

"MLflow": {
  "command": "uvx",
  "args": ["mlflow-mcp"],
  "env": {
    "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
  }
}

For Cursor — edit ~/.cursor/mcp.json (Settings > Tools & MCP > New MCP Server):

{
  "mcpServers": {
    "MLflow": {
      "command": "uvx",
      "args": ["mlflow-mcp"],
      "env": {
        "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
      }
    }
  }
}

nvidia-nemo/accessing-mlflow

packages/nemo-evaluator-launcher/.claude/skills/accessing-mlflow/SKILL.md

Query and browse evaluation results stored in MLflow. Use when the user wants to look up runs by invocation ID, compare metrics across models, fetch artifacts (configs, logs, results), or set up the MLflow MCP server. ALWAYS triggers on mentions of MLflow, experiment results, run comparison, invocation IDs in the context of results, or MLflow MCP setup.

255 stars

tools

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add nvidia-nemo/evaluator accessing-mlflow

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 11:28 AM17.4s1 file scanned

SKILL.md

name:: accessing-mlflow
description:: Query and browse evaluation results stored in MLflow. Use when the user wants to look up runs by invocation ID, compare metrics across models, fetch artifacts (configs, logs, results), or set up the MLflow MCP server. ALWAYS triggers on mentions of MLflow, experiment results, run comparison, invocation IDs in the context of results, or MLflow MCP setup.

Accessing MLflow

MCP Server

mlflow-mcp gives agents direct access to MLflow — query runs, compare metrics, browse artifacts, all through natural language.

ID Convention

Querying Runs

# Find runs by invocation_id
MLflow:search_runs_by_tags(experiment_id, {"invocation_id": "<invocation_id>"})

# Query for example model/task runs
MLflow:query_runs(experiment_id, "tags.model LIKE '%<model>%'")
MLflow:query_runs(experiment_id, "tags.task_name LIKE '%<task_name>%'")

# Get a config from run's artifacts
MLflow:get_artifact_content(run_id, "config.yml")

# Get nested stats from run's artifacts
MLflow:get_artifact_content(run_id, "artifacts/eval_factory_metrics.json")

NOTE: You WILL NOT find PENDING, RUNNING, KILLED, or FAILED runs in MLflow! Only SUCCESSFUL runs are exported to MLflow.

Workflow Tips

When comparing metrics across runs, fetch the data via MCP, then run the computation in Python for exact results rather than doing math in-context:

uv run --with pandas python3 << 'EOF'
import pandas as pd
# ... compute deltas, averages, etc.
EOF

Artifacts Structure

<harness>.<task>/
├── artifacts/
│   ├── config.yml                # Fully resolved config used during the evaluation
│   ├── launcher_unresolved_config.yaml # Unresolved config passed to the launcher
│   ├── results.yml               # All results in YAML format
│   ├── eval_factory_metrics.json # Runtime stats (latency, tokens count, memory)
│   ├── report.html               # Request-Response Pairs samples in HTML format (if enabled)
│   └── report.json               # Request-Response Pairs samples in JSON format (if enabled)
└── logs/
    ├── client-*.log              # Evaluation client
    ├── server-*-N.log            # Deployment per node
    ├── slurm-*.log               # Slurm job
    └── proxy-*.log               # Request proxy

Troubleshooting

If the MLflow MCP server fails to load or its tools are unavailable:

uvx not found — install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

MCP server not configured — add the config and restart the agent:

For Claude Code — add to .claude/settings.json (project or user level), under "mcpServers":

"MLflow": {
  "command": "uvx",
  "args": ["mlflow-mcp"],
  "env": {
    "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
  }
}

For Cursor — edit ~/.cursor/mcp.json (Settings > Tools & MCP > New MCP Server):

{
  "mcpServers": {
    "MLflow": {
      "command": "uvx",
      "args": ["mlflow-mcp"],
      "env": {
        "MLFLOW_TRACKING_URI": "https://<your-mlflow-server>/"
      }
    }
  }
}

Related Skills

nvidia-nemo/byob

development

VerifiedTrustedCommunity

Create custom LLM evaluation benchmarks using the BYOB decorator framework. Use when the user wants to (1) create a new benchmark from a dataset, (2) pick or write a scorer, (3) compile and run a BYOB benchmark, (4) containerize a benchmark, or (5) use LLM-as-Judge evaluation. Triggers on mentions of BYOB, custom benchmark, bring your own benchmark, scorer, or benchmark compilation.

255SKILL.mdUpdated Apr 15, 2026

nvidia-nemo/nel-assistant

development

VerifiedTrustedCommunity

Interactive config wizard for NeMo Evaluator Launcher (NEL). Use when the user wants to create a new evaluation config from scratch, set up an evaluation from existing configs, or modify a NEL config (deployment, tasks, multi-node, interceptors). ALWAYS triggers on mentions of creating configs, setting up evaluations, configuring models for evaluation, or modifying NEL YAML files. Do NOT use for monitoring, debugging, or analyzing already-running evaluations.

255SKILL.mdUpdated Apr 15, 2026

nvidia-nemo/nel-assistant

nvidia-nemo/launching-evals

tools

VerifiedTrustedCommunity

Run, monitor, analyze, and debug LLM evaluations via nemo-evaluator-launcher. Covers running evaluations, checking status and live progress, debugging failed runs, exporting artifacts and logs, and analyzing results. ALWAYS triggers on mentions of running evaluations, checking progress, debugging failed evals, analyzing or analysing runs or results, run directories or artifact paths on clusters, Slurm job issues, invocation IDs, or inspecting logs (client logs, server logs, SSH to cluster, tail logs, grep logs). Do NOT use for creating or modifying evaluation configs.

255SKILL.mdUpdated Apr 15, 2026

nvidia-nemo/launching-evals

openclaw/taskflow

tools

VerifiedTrustedCommunity

Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.

357,764SKILL.mdUpdated Apr 10, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/nvidia-nemo/evaluator.git

# Copy into Claude Code skills folder (global)
cp -r evaluator/packages/nemo-evaluator-launcher/.claude/skills/accessing-mlflow ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

nvidia-nemo/evaluator

255 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT