skills/model-evaluation/SKILL.md
Use when selecting evaluation metrics, detecting bias, or validating model readiness for production
npx skillsauth add kienbui1995/magic-powers model-evaluationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When assessing whether a model is good enough to deploy, fairly represents all user groups, and won't degrade in production.
Match metric to problem type:
Business metric matters more than ML metric — always connect model performance to business outcome.
Don't report only aggregate metrics. Slice by:
content-media
Use when designing for XR (AR/VR/MR), choosing interaction modes, or adapting 2D UI patterns for spatial computing
testing
Use when creating new skills, editing existing skills, or verifying skills work before deployment
development
Use when you have a spec or requirements for a multi-step task, before touching code
development
Use when executing a structured workflow — select and run a feature, bugfix, refactor, research, or incident template with correct agent and model assignments per phase.