skills/codex/databricks-mlflow-evaluation/SKILL.md
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-mlflow-evaluation --- # MLflow 3 GenAI Evaluation ## Before Writing Any Code 1. **Read GOTCHAS.md** - 15+ common mistakes that cause failures 2. **Read CRITICAL-interfaces.md** - Exact API signatures and data schemas ## End-to-End Workflows Follow these workflows based on your goal. Each step indicates which reference files to read. ### Workflow 1: First-Time Evaluation Setup For users new to MLflow GenAI evalu
npx skillsauth add frank-luongt/faos-skills-marketplace skills/codex/databricks-mlflow-evaluationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Follow these workflows based on your goal. Each step indicates which reference files to read.
For users new to MLflow GenAI evaluation or setting up evaluation for a new agent.
| Step | Action | Reference Files |
| ---- | --------------------------- | ---------------------------------------------------------------- |
| 1 | Understand what to evaluate | user-journeys.md (Journey 0: Strategy) |
| 2 | Learn API patterns | GOTCHAS.md + CRITICAL-interfaces.md |
| 3 | Build initial dataset | patterns-datasets.md (Patterns 1-4) |
| 4 | Choose/create scorers | patterns-scorers.md + CRITICAL-interfaces.md (built-in list) |
| 5 | Run evaluation | patterns-evaluation.md (Patterns 1-3) |
For building evaluation datasets from production traces.
| Step | Action | Reference Files |
| ---- | ----------------------------- | ------------------------------------------------ |
| 1 | Search and filter traces | patterns-trace-analysis.md (MCP tools section) |
| 2 | Analyze trace quality | patterns-trace-analysis.md (Patterns 1-7) |
| 3 | Tag traces for inclusion | patterns-datasets.md (Patterns 16-17) |
| 4 | Build dataset from traces | patterns-datasets.md (Patterns 6-7) |
| 5 | Add expectations/ground truth | patterns-datasets.md (Pattern 2) |
For debugging slow or expensive agent execution.
| Step | Action | Reference Files |
| ---- | ----------------------------- | ---------------------------------------------------- |
| 1 | Profile latency by span | patterns-trace-analysis.md (Patterns 4-6) |
| 2 | Analyze token usage | patterns-trace-analysis.md (Pattern 9) |
| 3 | Detect context issues | patterns-context-optimization.md (Section 5) |
| 4 | Apply optimizations | patterns-context-optimization.md (Sections 1-4, 6) |
| 5 | Re-evaluate to measure impact | patterns-evaluation.md (Pattern 6-7) |
For comparing agent versions and finding regressions.
| Step | Action | Reference Files |
| ---- | ----------------------- | ------------------------------------------------ |
| 1 | Establish baseline | patterns-evaluation.md (Pattern 4: named runs) |
| 2 | Run current version | patterns-evaluation.md (Pattern 1) |
| 3 | Compare metrics | patterns-evaluation.md (Patterns 6-7) |
| 4 | Analyze failing traces | patterns-trace-analysis.md (Pattern 7) |
| 5 | Debug specific failures | patterns-trace-analysis.md (Patterns 8-9) |
For creating project-specific evaluation metrics.
| Step | Action | Reference Files |
| ---- | --------------------------- | ----------------------------------------- |
| 1 | Understand scorer interface | CRITICAL-interfaces.md (Scorer section) |
| 2 | Choose scorer pattern | patterns-scorers.md (Patterns 4-11) |
| 3 | For multi-agent scorers | patterns-scorers.md (Patterns 13-16) |
| 4 | Test with evaluation | patterns-evaluation.md (Pattern 1) |
| Reference | Purpose | When to Read |
| ---------------------------------- | ------------------------ | ----------------------------------------- |
| GOTCHAS.md | Common mistakes | Always read first before writing code |
| CRITICAL-interfaces.md | API signatures, schemas | When writing any evaluation code |
| patterns-evaluation.md | Running evals, comparing | When executing evaluations |
| patterns-scorers.md | Custom scorer creation | When built-in scorers aren't enough |
| patterns-datasets.md | Dataset building | When preparing evaluation data |
| patterns-trace-analysis.md | Trace debugging | When analyzing agent behavior |
| patterns-context-optimization.md | Token/latency fixes | When agent is slow or expensive |
| user-journeys.md | High-level workflows | When starting a new evaluation project |
mlflow.genai.evaluate() (NOT mlflow.evaluate()){"inputs": {"query": "..."}} (nested structure required)**unpacked kwargs (not a dict)See GOTCHAS.md for complete list.
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-lakebase-provisioned --- # Lakebase Provisioned Patterns and best practices for using Lakebase Provisioned (Databricks managed PostgreSQL) for OLTP workloads. ## When to Use Use this skill when: - Building applications that need a PostgreSQL database for transactional workloads - Adding persistent state to Databricks Apps - Implementing reverse ETL from Delta Lake to an operational database - Storing chat/agent m
tools
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-jobs --- # Databricks Lakeflow Jobs ## Overview Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles. ## Reference Files | Use Case | Reference File | | ----------------------
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-genie --- # Databricks Genie Create and query Databricks Genie Spaces - natural language interfaces for SQL-based data exploration. ## Overview Genie Spaces allow users to ask natural language questions about structured data in Unity Catalog. The system translates questions into SQL queries, executes them on a SQL warehouse, and presents results conversationally. ## When to Use This Skill Use this skill when: -
tools
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-docs --- # Databricks Documentation Reference This skill provides access to the complete Databricks documentation index via llms.txt - use it as a **reference resource** to supplement other skills and inform your use of MCP tools. ## Role of This Skill This is a **reference skill**, not an action skill. Use it to: - Look up documentation when other skills don't cover a topic - Get authoritative guidance on Databr