skills/prometheus-patterns/SKILL.md
PromQL queries, alerting rules, recording rules, Grafana dashboard JSON, SLO
npx skillsauth add rubicanjr/FinCognis prometheus-patternsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
# Request rate (per second, 5m window)
rate(http_requests_total[5m])
# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) * 100
# P99 latency from histogram
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
# P50 latency by endpoint
histogram_quantile(0.50,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, handler)
)
# Saturation: CPU usage per pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
/ sum(kube_pod_container_resource_limits{resource="cpu"}) by (pod) * 100
# SLO: 99.9% availability over 30 days
# Error budget = 0.1% = 43.2 minutes/month
# Current burn rate (how fast consuming budget)
1 - (
sum(rate(http_requests_total{status!~"5.."}[1h]))
/ sum(rate(http_requests_total[1h]))
) / (1 - 0.999)
# Remaining error budget (percentage)
1 - (
sum(increase(http_requests_total{status=~"5.."}[30d]))
/ (sum(increase(http_requests_total[30d])) * 0.001)
)
groups:
- name: sli_rules
interval: 30s
rules:
- record: job:http_request_rate:5m
expr: sum(rate(http_requests_total[5m])) by (job)
- record: job:http_error_rate:5m
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
/ sum(rate(http_requests_total[5m])) by (job)
- record: job:http_latency_p99:5m
expr: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, job)
)
groups:
- name: slo_alerts
rules:
- alert: HighErrorRate
expr: job:http_error_rate:5m > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate above 1% for {{ $labels.job }}"
runbook: "https://wiki.internal/runbooks/high-error-rate"
- alert: HighLatency
expr: job:http_latency_p99:5m > 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "P99 latency above 500ms for {{ $labels.job }}"
- alert: ErrorBudgetBurn
expr: |
(
sum(rate(http_requests_total{status=~"5.."}[1h]))
/ sum(rate(http_requests_total[1h]))
) > 14.4 * 0.001
for: 2m
labels:
severity: critical
annotations:
summary: "Burning error budget 14.4x faster than allowed"
import "github.com/prometheus/client_golang/prometheus"
var (
httpRequests = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total HTTP requests",
},
[]string{"method", "handler", "status"},
)
httpDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration",
Buckets: []float64{0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5},
},
[]string{"method", "handler"},
)
)
avg() for latency instead of histograms/quantilesfor clause in alerts causing alert stormsdevelopment
Goal-based workflow orchestration - routes tasks to specialist agents based on user goals
tools
Wiring Verification
development
Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.
development
Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.