skills/sheriff-triage/SKILL.md
Use when performing comprehensive failure triage for sheriffs and image maintainers. Automatically determines if a failure is caused by code changes, image changes, or is a known intermittent. Combines data from Taskcluster, Treeherder, and worker image analysis. Triggers on "triage", "sheriff", "why did this fail", "is this an image regression", "failure analysis".
npx skillsauth add jwmossmoz/agent-skills sheriff-triageInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive failure triage that automatically determines the likely cause of CI failures.
cd /Users/jwmoss/github_moz/agent-skills/skills/sheriff-triage/scripts
# Full triage for a failing task
uv run triage.py <TASK_ID>
# Triage with Taskcluster URL
uv run triage.py https://firefox-ci-tc.services.mozilla.com/tasks/Xcac5C8gRqiOT13YsVRX8A
# JSON output for scripting
uv run triage.py <TASK_ID> --json
# Skip cross-branch search (faster)
uv run triage.py <TASK_ID> --skip-treeherder
The triage command performs a comprehensive analysis:
| Verdict | Meaning | Evidence |
|---------|---------|----------|
| CODE_REGRESSION | Likely caused by code change | Same failure on production branches |
| IMAGE_REGRESSION | Likely caused by image change | Only fails on alpha, different image version |
| INTERMITTENT | Known flaky test | Classified as intermittent in Treeherder |
| INFRA | Infrastructure issue | Classified as infra in Treeherder |
| NEEDS_INVESTIGATION | Unclear cause | No strong signals either way |
## Triage Report: Xcac5C8gRqiOT13YsVRX8A
**Test**: mochitest-chrome-1proc
**Status**: failed
### Signals
| Signal | Value | Implication |
|--------|-------|-------------|
| Alpha Pool | Yes | Using new/staging image |
| Image Version Differs | Yes (1.0.9 vs 1.0.8) | Image change detected |
| Similar Failures on autoland | 0 | Not failing on production |
| Similar Failures on mozilla-central | 0 | Not failing on production |
| Treeherder Classification | not classified | No prior triage |
### Verdict: IMAGE_REGRESSION
**Confidence**: High
**Rationale**: Task failed on alpha pool with different image version than production,
and no similar failures found on production branches.
### Recommended Actions
1. Notify image maintainer
2. Check SBOM for image changes
3. Consider rolling back image
taskcluster CLI: brew install taskclusteruv for running scriptsdevelopment
Download Azure Cost Management exports and query local Parquet/CSV in DuckDB. Use when refreshing local Azure cost caches or writing DuckDB SQL over exports. DO NOT USE FOR live Cost Management API diagnosis; use azure-cost-analysis.
data-ai
Use when creating performance self-reviews from local notes, prior reviews, review prompts, and verified evidence. Helps draft H1/H2, annual, and promotion self evaluations, example answers, and rich review-form paste output. Do not use for routine status or 1:1 summaries; use one-on-one.
tools
Prepare one-on-one/status bullets from ~/moz_artifacts using qmd and copy a topic-organized HTML/RTF list with embedded links to the macOS clipboard. Use when summarizing recent Mozilla work for a manager, 1:1, or status update. DO NOT USE FOR generating raw daily logs; use daily-log.
development
Use when tracing Taskcluster Azure VM startup from worker-manager request through in-VM boot scripts to generic-worker `workerReady` with tc-logview, paperctl, Splunk Web, and Yardstick Prometheus. Applies to Windows worker provisioning latency. DO NOT USE FOR task failure triage (use worker-image-investigation).