Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

latestaiagents/on-call-best-practices

Name: on-call-best-practices
Author: latestaiagents

plugins/devops-sre/skills/on-call/on-call-best-practices/SKILL.md

npx skillsauth add latestaiagents/agent-skills on-call-best-practices

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

On-Call Best Practices

Sustainable on-call that protects engineers and keeps systems reliable.

On-Call Philosophy

"On-call should be a learning opportunity, not a punishment."

Core Principles

Fair distribution - Burden shared equally
Sustainable pace - No burnout
Clear expectations - Everyone knows their role
Continuous improvement - Learn from every incident

Rotation Design

Recommended Structure

Primary On-Call (24/7):
- First responder for all pages
- 1 week shifts (max)
- Clear handoff process

Secondary On-Call (24/7):
- Backup if primary unavailable
- Can be shadow for training
- Steps in if primary overloaded

Business Hours Escalation:
- Subject matter experts
- Available for complex issues
- Not paged at night

Rotation Schedule Example

Week    Mon    Tue    Wed    Thu    Fri    Sat    Sun
───────────────────────────────────────────────────────
Jan 6   Alice  Alice  Alice  Alice  Alice  Alice  Alice
Jan 13  Bob    Bob    Bob    Bob    Bob    Bob    Bob
Jan 20  Carol  Carol  Carol  Carol  Carol  Carol  Carol
Jan 27  Dave   Dave   Dave   Dave   Dave   Dave   Dave
Feb 3   Alice  ...

Secondary follows same pattern, offset by 1 week

Scheduling Guidelines

| Guideline | Recommendation | |-----------|----------------| | Shift length | 1 week max, shorter if high volume | | Gap between shifts | 2+ weeks minimum | | Consecutive nights | Comp time if >2 pages | | Holidays | Volunteer-based, compensated | | Team size | 4+ people for sustainable rotation |

Response Expectations

Response Time SLAs

| Severity | Acknowledge | Respond | Escalate If | |----------|-------------|---------|-------------| | SEV1 | 5 min | Immediate | No ack in 5 min | | SEV2 | 15 min | 30 min | No ack in 15 min | | SEV3 | 1 hour | 4 hours | Business hours | | SEV4 | Best effort | Next day | N/A |

What "On-Call" Means

During your shift:
✓ Phone charged and with you
✓ Laptop accessible within 15 min
✓ Reliable internet access
✓ Not impaired (alcohol, etc.)
✓ Able to focus if paged

You are NOT expected to:
✗ Be at your desk 24/7
✗ Respond instantly to Slack
✗ Fix everything yourself
✗ Work normal hours + on-call

Handoff Protocol

End of Shift Checklist

## On-Call Handoff

**Outgoing:** @alice
**Incoming:** @bob
**Date:** 2026-01-13 09:00 UTC

### Active Issues
- [ ] INC-123: Monitoring elevated error rate (context: ...)
- [ ] Deployment in progress: api-service v2.3.4

### Watch Items
- Payment processor maintenance tonight 02:00-04:00 UTC
- New monitoring rolled out, may be noisy

### Recent Incidents
- INC-121: Resolved, postmortem scheduled Friday
- INC-122: Resolved, no action needed

### Runbook Updates
- Updated: database/connection-pool-reset (added step 3)
- Outdated: search/reindex (needs review)

### Notes
- Had 3 pages this week, all during business hours
- Nothing woke me up at night
- Good luck! 🍀

Verbal Handoff (5-10 minutes)

1. Walk through active issues
2. Highlight anything unusual
3. Share context not in writing
4. Confirm contact info current
5. Test page to verify setup

Reducing On-Call Burden

Metrics to Track

| Metric | Healthy | Action Needed | |--------|---------|---------------| | Pages/week | <5 | Review alert thresholds | | Night pages/week | <1 | Investigate or fix root causes | | MTTA | <5 min | Check notification settings | | Time to resolve | <30 min avg | Improve runbooks | | % actionable | >80% | Reduce noisy alerts |

Improvement Strategies

1. Fix the root cause
   - Every incident should have action items
   - Track action item completion

2. Improve detection
   - Catch issues before they page
   - Add canary deployments

3. Automate remediation
   - Auto-restart crashed services
   - Auto-scale on high load
   - Self-healing infrastructure

4. Improve runbooks
   - Clear, tested procedures
   - One-click remediation where possible

5. Reduce noise
   - Tune alert thresholds
   - Add deduplication
   - Use proper severity levels

Compensation & Support

Fair Compensation

Recommended compensation models:

1. Stipend Model
   - Fixed amount per on-call week
   - Example: $500/week on-call

2. Per-Page Model
   - Base stipend + per-page bonus
   - Example: $200/week + $50/page

3. Comp Time Model
   - Time off for night/weekend pages
   - Example: 2 hours off per night page

4. Combined Model
   - Stipend + comp time for disruption
   - Most engineer-friendly

Support Structures

✓ Clear escalation paths
✓ Secondary on-call backup
✓ Manager support for difficult situations
✓ Mental health resources
✓ Training and shadowing for new on-callers
✓ Blameless postmortem culture

Training New On-Callers

Shadow Program

Week 1: Observe
- Shadow primary on-call
- Read all runbooks
- Review recent incidents

Week 2: Assisted
- Take some pages with backup
- Primary available immediately
- Debrief after each incident

Week 3: Primary with Safety Net
- Primary on-call
- Experienced shadow
- Extended escalation time

Week 4+: Full Primary
- Normal on-call duties
- Standard escalation paths

Required Knowledge

□ Access to all systems
□ Can reach all tools (VPN, etc.)
□ Know escalation paths
□ Reviewed all runbooks
□ Understand SLO/SLA
□ Know who SMEs are
□ Have done a test page
□ Know how to declare incident

Well-Being

Signs of On-Call Burnout

Dreading on-call shifts
Anxiety about phone notifications
Sleep disruption even when not paged
Decreased job satisfaction
Avoidance of learning new systems

Prevention

1. Sustainable rotation size (4+ people)
2. Enforce gap between shifts
3. Comp time for disruption
4. Regular feedback loops
5. Continuously reduce burden
6. Leadership does on-call too

If You're Struggling

- Talk to your manager
- Request temporary rotation skip
- Ask for additional support
- Suggest rotation improvements
- It's okay to ask for help

latestaiagents/on-call-best-practices

plugins/devops-sre/skills/on-call/on-call-best-practices/SKILL.md

Manage on-call rotations with sustainable practices, fair scheduling, and effective handoffs. Use this skill when setting up on-call, improving on-call experience, or managing rotations. Activate when: on-call, pagerduty, rotation, schedule, handoff, on-call burden, being paged, night pages, weekend on-call, on-call fatigue.

2 stars

data-ai

Updated Apr 23, 2026

$ install --global

skillsauth

npx skillsauth add latestaiagents/agent-skills on-call-best-practices

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 2:54 AM19.0s1 file scanned

SKILL.md

name:: on-call-best-practices
description:: |
Activate when:: on-call, pagerduty, rotation, schedule, handoff, on-call burden, being paged,

On-Call Best Practices

Sustainable on-call that protects engineers and keeps systems reliable.

On-Call Philosophy

"On-call should be a learning opportunity, not a punishment."

Core Principles

Fair distribution - Burden shared equally
Sustainable pace - No burnout
Clear expectations - Everyone knows their role
Continuous improvement - Learn from every incident

Rotation Design

Recommended Structure

Primary On-Call (24/7):
- First responder for all pages
- 1 week shifts (max)
- Clear handoff process

Secondary On-Call (24/7):
- Backup if primary unavailable
- Can be shadow for training
- Steps in if primary overloaded

Business Hours Escalation:
- Subject matter experts
- Available for complex issues
- Not paged at night

Rotation Schedule Example

Week    Mon    Tue    Wed    Thu    Fri    Sat    Sun
───────────────────────────────────────────────────────
Jan 6   Alice  Alice  Alice  Alice  Alice  Alice  Alice
Jan 13  Bob    Bob    Bob    Bob    Bob    Bob    Bob
Jan 20  Carol  Carol  Carol  Carol  Carol  Carol  Carol
Jan 27  Dave   Dave   Dave   Dave   Dave   Dave   Dave
Feb 3   Alice  ...

Secondary follows same pattern, offset by 1 week

Scheduling Guidelines

Response Expectations

Response Time SLAs

What "On-Call" Means

During your shift:
✓ Phone charged and with you
✓ Laptop accessible within 15 min
✓ Reliable internet access
✓ Not impaired (alcohol, etc.)
✓ Able to focus if paged

You are NOT expected to:
✗ Be at your desk 24/7
✗ Respond instantly to Slack
✗ Fix everything yourself
✗ Work normal hours + on-call

Handoff Protocol

End of Shift Checklist

## On-Call Handoff

**Outgoing:** @alice
**Incoming:** @bob
**Date:** 2026-01-13 09:00 UTC

### Active Issues
- [ ] INC-123: Monitoring elevated error rate (context: ...)
- [ ] Deployment in progress: api-service v2.3.4

### Watch Items
- Payment processor maintenance tonight 02:00-04:00 UTC
- New monitoring rolled out, may be noisy

### Recent Incidents
- INC-121: Resolved, postmortem scheduled Friday
- INC-122: Resolved, no action needed

### Runbook Updates
- Updated: database/connection-pool-reset (added step 3)
- Outdated: search/reindex (needs review)

### Notes
- Had 3 pages this week, all during business hours
- Nothing woke me up at night
- Good luck! 🍀

Verbal Handoff (5-10 minutes)

1. Walk through active issues
2. Highlight anything unusual
3. Share context not in writing
4. Confirm contact info current
5. Test page to verify setup

Reducing On-Call Burden

Metrics to Track

Improvement Strategies

1. Fix the root cause
   - Every incident should have action items
   - Track action item completion

2. Improve detection
   - Catch issues before they page
   - Add canary deployments

3. Automate remediation
   - Auto-restart crashed services
   - Auto-scale on high load
   - Self-healing infrastructure

4. Improve runbooks
   - Clear, tested procedures
   - One-click remediation where possible

5. Reduce noise
   - Tune alert thresholds
   - Add deduplication
   - Use proper severity levels

Compensation & Support

Fair Compensation

Recommended compensation models:

1. Stipend Model
   - Fixed amount per on-call week
   - Example: $500/week on-call

2. Per-Page Model
   - Base stipend + per-page bonus
   - Example: $200/week + $50/page

3. Comp Time Model
   - Time off for night/weekend pages
   - Example: 2 hours off per night page

4. Combined Model
   - Stipend + comp time for disruption
   - Most engineer-friendly

Support Structures

✓ Clear escalation paths
✓ Secondary on-call backup
✓ Manager support for difficult situations
✓ Mental health resources
✓ Training and shadowing for new on-callers
✓ Blameless postmortem culture

Training New On-Callers

Shadow Program

Week 1: Observe
- Shadow primary on-call
- Read all runbooks
- Review recent incidents

Week 2: Assisted
- Take some pages with backup
- Primary available immediately
- Debrief after each incident

Week 3: Primary with Safety Net
- Primary on-call
- Experienced shadow
- Extended escalation time

Week 4+: Full Primary
- Normal on-call duties
- Standard escalation paths

Required Knowledge

□ Access to all systems
□ Can reach all tools (VPN, etc.)
□ Know escalation paths
□ Reviewed all runbooks
□ Understand SLO/SLA
□ Know who SMEs are
□ Have done a test page
□ Know how to declare incident

Well-Being

Signs of On-Call Burnout

Dreading on-call shifts
Anxiety about phone notifications
Sleep disruption even when not paged
Decreased job satisfaction
Avoidance of learning new systems

Prevention

1. Sustainable rotation size (4+ people)
2. Enforce gap between shifts
3. Comp time for disruption
4. Regular feedback loops
5. Continuously reduce burden
6. Leadership does on-call too

If You're Struggling

- Talk to your manager
- Request temporary rotation skip
- Ask for additional support
- Suggest rotation improvements
- It's okay to ask for help

Related Skills

latestaiagents/skill-testing

development

VerifiedTrustedCommunity

Test skills for correct activation, content quality, and regression — both automated checks (frontmatter validity, lint) and manual verification (query-suite activation testing). Covers CI integration and how to catch skill regressions before users do. Use this skill when adding skills to a repo, setting up CI for a skill library, or debugging "the skill exists but doesn't work". Activate when: test skills, validate skills, skill CI, skill linting, skill activation test, skill regression.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-testing

latestaiagents/skill-frontmatter

documentation

VerifiedTrustedCommunity

Write the YAML frontmatter for a SKILL.md file so it activates reliably — name, description, and activation keywords that the model matches against. Covers length, tone, and the most common frontmatter mistakes. Use this skill when authoring a new skill, fixing a skill that isn't auto-activating, or reviewing skills for publication. Activate when: SKILL.md frontmatter, skill description, skill activation, skill YAML, write a skill, author a skill.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-frontmatter

latestaiagents/skill-activation-patterns

development

VerifiedTrustedCommunity

Design skills that fire at the right moment — neither over-eager (noise) nor under-eager (silent). Covers activation specificity, trigger phrases, disambiguation between overlapping skills, and debugging activation. Use this skill when multiple skills could fire on the same query, a skill never fires, or a skill fires too often. Activate when: skill won't activate, skill over-activates, overlapping skills, skill triggers, skill selection, skill disambiguation.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-activation-patterns

latestaiagents/progressive-disclosure

development

VerifiedTrustedCommunity

Structure SKILL.md content so the model reads just enough — concise summary up front, progressively deeper detail, examples on demand. Covers section ordering, length budgets, when to split into multiple skills. Use this skill when writing or refactoring a skill body, one skill has grown too long, or a skill is wordy but not useful. Activate when: SKILL.md structure, skill content, skill too long, split skill, progressive disclosure, skill body.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/progressive-disclosure

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/latestaiagents/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/plugins/devops-sre/skills/on-call/on-call-best-practices ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

latestaiagents/agent-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT