Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

sawrus/slo-sli-design

Name: slo-sli-design
Author: sawrus

areas/devops/sre/skills/slo-sli-design/SKILL.md

npx skillsauth add sawrus/agent-guides slo-sli-design

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Skill: SLO/SLI Design

Expertise: SLI selection, SLO target setting, error budget calculation, burn rate alerting, Sloth/pyrra integration.

When to load

When defining SLOs for a new service, setting up error budget tracking, or reviewing existing SLOs after an incident.

SLI Selection Framework

Step 1: What does the user care about?
  → "The checkout completes successfully and quickly"

Step 2: What CAN we measure?
  → HTTP 2xx responses, p99 latency

Step 3: Define the SLI formula
  → Availability SLI: good_requests / total_requests
     where good = status < 500 AND latency < 500ms

Step 4: Pick SLO target (start conservative, tighten later)
  → 99.5% (don't chase 99.99% without data — high budget wasted on caution)

Step 5: Calculate error budget
  → 100% - 99.5% = 0.5% over 28 days = 0.5% × 28 × 24 × 60 = 201.6 minutes

Prometheus SLO Implementation (manual)

# Recording rules for SLO tracking
groups:
  - name: slo.checkout-service
    interval: 30s
    rules:
      # Good requests (2xx, latency < 500ms)
      - record: slo:http_requests_good:rate5m
        expr: |
          sum(rate(http_requests_total{
            service="checkout-service",
            status=~"2..",
            duration_bucket="0.5"
          }[5m]))

      # Total requests
      - record: slo:http_requests_total:rate5m
        expr: |
          sum(rate(http_requests_total{service="checkout-service"}[5m]))

      # SLI = good / total
      - record: slo:http_availability:ratio_rate5m
        expr: slo:http_requests_good:rate5m / slo:http_requests_total:rate5m

      # 28-day rolling availability
      - record: slo:http_availability:ratio_rate28d
        expr: |
          sum_over_time(slo:http_availability:ratio_rate5m[28d]) / (28 * 24 * 12)

Burn Rate Alerts (multiwindow)

# Multi-window, multi-burn-rate alerting (Google SRE Workbook pattern)
groups:
  - name: slo.checkout-service.burn-rate
    rules:
      # Fast burn: 14.4× rate (burns 1h of budget in 5 min)
      - alert: SLOFastBurn
        expr: |
          (
            slo:http_availability:ratio_rate1h{service="checkout-service"} < (1 - 14.4 * 0.005)
          ) and (
            slo:http_availability:ratio_rate5m{service="checkout-service"} < (1 - 14.4 * 0.005)
          )
        labels:
          severity: critical
          slo: checkout-service-availability
        annotations:
          summary: "Fast error budget burn — checkout-service (> 14.4× rate)"
          runbook_url: "https://runbooks.internal/slo-fast-burn"

      # Slow burn: 3× rate (burns 10% of budget in 6h)
      - alert: SLOSlowBurn
        expr: |
          (
            slo:http_availability:ratio_rate6h{service="checkout-service"} < (1 - 3 * 0.005)
          ) and (
            slo:http_availability:ratio_rate30m{service="checkout-service"} < (1 - 3 * 0.005)
          )
        labels:
          severity: warning
          slo: checkout-service-availability
        annotations:
          summary: "Slow error budget burn — checkout-service (> 3× rate)"

Sloth (SLO as Code)

# slo/checkout-service.yaml — Sloth generates all recording rules + alerts
version: "prometheus/v1"
service: checkout-service
labels:
  team: backend
slos:
  - name: requests-availability
    objective: 99.5
    description: "99.5% of checkout requests succeed with latency < 500ms"
    sli:
      events:
        error_query: |
          sum(rate(http_requests_total{service="checkout-service", status=~"5.."}[{{.window}}]))
        total_query: |
          sum(rate(http_requests_total{service="checkout-service"}[{{.window}}]))
    alerting:
      name: CheckoutServiceSLO
      page_alert:
        labels: { severity: critical }
      ticket_alert:
        labels: { severity: warning }

# Generate Prometheus rules from Sloth definition
sloth generate -i slo/checkout-service.yaml -o prometheus-rules/slo-checkout.yaml

Error Budget Dashboard (Grafana)

Key panels:

SLI over 28 days — current ratio vs SLO target line
Error budget remaining — percentage + time remaining (burn at current rate)
Burn rate — 1h, 6h, 1d, 7d windows
Events causing budget consumption — top error causes by count

sawrus/slo-sli-design

areas/devops/sre/skills/slo-sli-design/SKILL.md

Define SLIs, SLOs, and error budgets; implement burn rate alerts; integrate with Prometheus.

12 stars

content-media

Updated Apr 18, 2026

$ install --global

skillsauth

npx skillsauth add sawrus/agent-guides slo-sli-design

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 4:06 PM47.0s1 file scanned

SKILL.md

name:: slo-sli-design
type:: skill
description:: Define SLIs, SLOs, and error budgets; implement burn rate alerts; integrate with Prometheus.
allowed-tools:: Read, Write, Edit

Skill: SLO/SLI Design

Expertise: SLI selection, SLO target setting, error budget calculation, burn rate alerting, Sloth/pyrra integration.

When to load

When defining SLOs for a new service, setting up error budget tracking, or reviewing existing SLOs after an incident.

SLI Selection Framework

Step 1: What does the user care about?
  → "The checkout completes successfully and quickly"

Step 2: What CAN we measure?
  → HTTP 2xx responses, p99 latency

Step 3: Define the SLI formula
  → Availability SLI: good_requests / total_requests
     where good = status < 500 AND latency < 500ms

Step 4: Pick SLO target (start conservative, tighten later)
  → 99.5% (don't chase 99.99% without data — high budget wasted on caution)

Step 5: Calculate error budget
  → 100% - 99.5% = 0.5% over 28 days = 0.5% × 28 × 24 × 60 = 201.6 minutes

Prometheus SLO Implementation (manual)

# Recording rules for SLO tracking
groups:
  - name: slo.checkout-service
    interval: 30s
    rules:
      # Good requests (2xx, latency < 500ms)
      - record: slo:http_requests_good:rate5m
        expr: |
          sum(rate(http_requests_total{
            service="checkout-service",
            status=~"2..",
            duration_bucket="0.5"
          }[5m]))

      # Total requests
      - record: slo:http_requests_total:rate5m
        expr: |
          sum(rate(http_requests_total{service="checkout-service"}[5m]))

      # SLI = good / total
      - record: slo:http_availability:ratio_rate5m
        expr: slo:http_requests_good:rate5m / slo:http_requests_total:rate5m

      # 28-day rolling availability
      - record: slo:http_availability:ratio_rate28d
        expr: |
          sum_over_time(slo:http_availability:ratio_rate5m[28d]) / (28 * 24 * 12)

Burn Rate Alerts (multiwindow)

# Multi-window, multi-burn-rate alerting (Google SRE Workbook pattern)
groups:
  - name: slo.checkout-service.burn-rate
    rules:
      # Fast burn: 14.4× rate (burns 1h of budget in 5 min)
      - alert: SLOFastBurn
        expr: |
          (
            slo:http_availability:ratio_rate1h{service="checkout-service"} < (1 - 14.4 * 0.005)
          ) and (
            slo:http_availability:ratio_rate5m{service="checkout-service"} < (1 - 14.4 * 0.005)
          )
        labels:
          severity: critical
          slo: checkout-service-availability
        annotations:
          summary: "Fast error budget burn — checkout-service (> 14.4× rate)"
          runbook_url: "https://runbooks.internal/slo-fast-burn"

      # Slow burn: 3× rate (burns 10% of budget in 6h)
      - alert: SLOSlowBurn
        expr: |
          (
            slo:http_availability:ratio_rate6h{service="checkout-service"} < (1 - 3 * 0.005)
          ) and (
            slo:http_availability:ratio_rate30m{service="checkout-service"} < (1 - 3 * 0.005)
          )
        labels:
          severity: warning
          slo: checkout-service-availability
        annotations:
          summary: "Slow error budget burn — checkout-service (> 3× rate)"

Sloth (SLO as Code)

# slo/checkout-service.yaml — Sloth generates all recording rules + alerts
version: "prometheus/v1"
service: checkout-service
labels:
  team: backend
slos:
  - name: requests-availability
    objective: 99.5
    description: "99.5% of checkout requests succeed with latency < 500ms"
    sli:
      events:
        error_query: |
          sum(rate(http_requests_total{service="checkout-service", status=~"5.."}[{{.window}}]))
        total_query: |
          sum(rate(http_requests_total{service="checkout-service"}[{{.window}}]))
    alerting:
      name: CheckoutServiceSLO
      page_alert:
        labels: { severity: critical }
      ticket_alert:
        labels: { severity: warning }

# Generate Prometheus rules from Sloth definition
sloth generate -i slo/checkout-service.yaml -o prometheus-rules/slo-checkout.yaml

Error Budget Dashboard (Grafana)

Key panels:

SLI over 28 days — current ratio vs SLO target line
Error budget remaining — percentage + time remaining (burn at current rate)
Burn rate — 1h, 6h, 1d, 7d windows
Events causing budget consumption — top error causes by count

Related Skills

sawrus/qa_expert

testing

VerifiedTrustedCommunity

QA Expert for writing E2E tests, test scenarios, test plans, and ensuring test coverage quality.

12SKILL.mdUpdated Apr 18, 2026

sawrus/design_expert

development

VerifiedTrustedCommunity

Expert UI/UX design intelligence for creating distinctive, high-craft, and mobile-first interfaces. Focuses on premium aesthetics, touch-first ergonomics, and Flutter performance.

12SKILL.mdUpdated Apr 18, 2026

sawrus/code_review_expert

development

VerifiedTrustedCommunity

Code Review Expert for static analysis, security auditing, architecture review, and ensuring code quality standards.

12SKILL.mdUpdated Apr 18, 2026

sawrus/code_review_expert

sawrus/babysit-pr

development

VerifiedTrustedCommunity

Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.

12SKILL.mdUpdated Apr 18, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/sawrus/agent-guides.git

# Copy into Claude Code skills folder (global)
cp -r agent-guides/areas/devops/sre/skills/slo-sli-design ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

sawrus/agent-guides

12 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT