.claude/skills/monitoring-specialist/SKILL.md
Infrastructure-level monitoring configuration for metrics, dashboards, alerting, logging backends, and SLO/SLI policy. Use when asked to set up monitoring, create a Grafana dashboard, write Prometheus alerting rules, define SLOs, configure Alertmanager routing, set up centralized logging with Loki or Elasticsearch, configure tracing backends such as Jaeger or Tempo, or write an on-call runbook.
npx skillsauth add daryllundy/claude-skills-library monitoring-specialistInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
set up monitoring, Grafana dashboard, Prometheus alert.Glob('**/prometheus/**', '**/grafana/**', '**/alertmanager/**', '**/monitoring/**', '**/loki/**') — find existing monitoring configseverity label, runbook_url annotation, and for: duration to avoid flappingIdentify gaps in scrape targets, dashboards, alert coverage, logging backends, and tracing backends.
For each signal path: metrics collection, log storage and querying, tracing backend, alert routing, and SLO reporting.
Prometheus: scrape configs, recording rules, alert rules. Grafana: dashboard JSON with proper variables for multi-environment use. Alertmanager: routing tree with correct receiver grouping.
# Validate Prometheus config
promtool check config prometheus.yml
# Validate alert rules
promtool check rules alerts/*.yml
# Validate Grafana dashboard JSON (look for syntax errors)
python3 -m json.tool dashboard.json > /dev/null
[Service][Metric][Condition] (e.g., APIHighErrorRate)uid field, template variables for environment and service, and thresholds on panelsUser says: "Set up alerting for my API - I need to know when error rate is high or latency is bad" Actions:
Alerts firing but Alertmanager not sending notifications
Cause: Alertmanager routing misconfiguration or receiver auth failure
Fix: Check amtool config routes and test receiver with amtool alert add
Grafana dashboard showing "No data"
Cause: Datasource misconfigured, wrong label selectors, or metric doesn't exist
Fix: Test query directly in Prometheus UI; check label names with {__name__=~"metric_name"}
references/legacy-agent.md: Prometheus patterns, PromQL reference, Grafana dashboard patterns, SLO/SLI frameworks, ELK stack, Loki, distributed tracing, incident responsetools
Zapier workflow automation design, Zap configuration, and SaaS integration planning across 6000+ apps. Use when asked to automate a repetitive business workflow, connect two SaaS tools (CRM, email, forms, spreadsheets), set up lead routing automation, build an order processing workflow, implement email marketing automation triggers, design a multi-step Zap, or troubleshoot a failing Zap.
development
Modern, accessible, and conversion-optimized web design direction, UX guidance, and design system development. Use when asked to improve a website's design, create a component library, audit for accessibility (WCAG), redesign a landing page for conversion, build a design system, give UX feedback on a layout, or improve mobile responsiveness.
development
Input validation, business rule implementation, and data integrity enforcement in application code. Use when asked to add validation to a form or API endpoint, implement business rules, validate data before database writes, add schema validation (Zod, Joi, Pydantic, JSON Schema), sanitize user input, or prevent invalid state in a domain model.
development
TikTok-specific short-form video strategy, scripting, and platform-native growth optimization. Use when asked to grow a TikTok account, develop a TikTok content strategy, write video scripts for TikTok, identify trending sounds or formats, plan a TikTok creator collaboration, or audit a TikTok profile for growth opportunities.