bundled/skills/evaluating-machine-learning-models/SKILL.md
Evaluate trained machine learning models with the right metrics and comparison logic. Use for benchmark review, threshold selection, calibration, validation, and model comparison; not for feature engineering or leakage auditing.
npx skillsauth add foryourhealth111-pixel/vco-skills-codex evaluating-machine-learning-modelsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the model exists and the question is whether it is good enough.
This skill focuses on choosing and interpreting the right evaluation metrics for the problem, then comparing candidate models or thresholds.
training-machine-learning-modelsengineering-features-for-machine-learningml-data-leakage-guardconfusion-matrix-generator for class-level error breakdownsscientific-reporting when the evaluation must become a deliverabledevelopment
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model.
development
Use when the user asks to inspect Sentry issues or events, summarize recent production errors, or pull basic Sentry health data via the Sentry API; perform read-only queries with the bundled script and require `SENTRY_AUTH_TOKEN`.
development
World-class prompt engineering skill for LLM optimization, prompt patterns, structured outputs, and AI product development. Expertise in Claude, GPT-4, prompt design patterns, few-shot learning, chain-of-thought, and AI evaluation. Includes RAG optimization, agent design, and LLM system architecture. Use when building AI products, optimizing LLM performance, designing agentic systems, or implementing advanced prompting techniques.
development
World-class ML engineering skill for productionizing ML models, MLOps, and building scalable ML systems. Expertise in PyTorch, TensorFlow, model deployment, feature stores, model monitoring, and ML infrastructure. Includes LLM integration, fine-tuning, RAG systems, and agentic AI. Use when deploying ML models, building ML platforms, implementing MLOps, or integrating LLMs into production systems.