areas/software/mlops/skills/model-monitoring/SKILL.md
# Skill: Model Monitoring ## When to load When setting up monitoring for a deployed model or responding to drift alerts. ## Monitoring Dimensions ``` 1. Operational health - Latency: p50, p95, p99 - Error rate: prediction failures, input validation failures 2. Data drift (vs training baseline) - PSI (Population Stability Index) per feature - PSI > 0.2 = significant shift → retrain likely needed 3. Model quality (when labels available) - Accuracy metrics after ground truth ar
npx skillsauth add sawrus/agent-guides areas/software/mlops/skills/model-monitoringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When setting up monitoring for a deployed model or responding to drift alerts.
1. Operational health
- Latency: p50, p95, p99
- Error rate: prediction failures, input validation failures
2. Data drift (vs training baseline)
- PSI (Population Stability Index) per feature
- PSI > 0.2 = significant shift → retrain likely needed
3. Model quality (when labels available)
- Accuracy metrics after ground truth arrives
- Business outcome correlation
def calculate_psi(expected: np.ndarray, actual: np.ndarray, buckets: int = 10) -> float:
"""PSI < 0.1: stable. 0.1-0.2: monitor. > 0.2: retrain."""
breakpoints = np.percentile(expected, np.linspace(0, 100, buckets + 1))
exp_counts = np.clip(np.histogram(expected, breakpoints)[0] / len(expected), 1e-4, None)
act_counts = np.clip(np.histogram(actual, breakpoints)[0] / len(actual), 1e-4, None)
return np.sum((act_counts - exp_counts) * np.log(act_counts / exp_counts))
testing
QA Expert for writing E2E tests, test scenarios, test plans, and ensuring test coverage quality.
development
Expert UI/UX design intelligence for creating distinctive, high-craft, and mobile-first interfaces. Focuses on premium aesthetics, touch-first ergonomics, and Flutter performance.
development
Code Review Expert for static analysis, security auditing, architecture review, and ensuring code quality standards.
development
Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.