skills/observability-monitor/SKILL.md
Comprehensive observability and monitoring workflow that orchestrates metrics collection, logging, distributed tracing, and alerting systems. Handles everything from monitoring architecture design and implementation to APM integration, anomaly detection, and incident response automation.
npx skillsauth add ajianaz/skills-collection observability-monitorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides end-to-end observability and monitoring services by orchestrating monitoring architects, SRE specialists, and data analytics experts. It transforms monitoring requirements into comprehensive observability systems with real-time insights, proactive alerting, and intelligent incident response.
Key Capabilities:
Perfect for:
Triggers:
Use when: Starting observability implementation or monitoring modernization
Tools Used:
/sc:analyze observability-requirements
Observability Architect: observability strategy and requirements analysis
SRE Specialist: reliability requirements and SLO definition
Performance Analyst: performance monitoring requirements
Activities:
Use when: Designing monitoring infrastructure and data collection systems
Tools Used:
/sc:design --type monitoring observability-architecture
Observability Architect: comprehensive monitoring architecture design
Data Analytics Expert: data collection and analysis strategy
Automation Engineer: monitoring automation and integration design
Activities:
Use when: Setting up monitoring tools and infrastructure components
Tools Used:
/sc:implement monitoring-infrastructure
Observability Architect: monitoring tools implementation and configuration
Automation Engineer: monitoring automation and integration setup
Performance Analyst: performance monitoring implementation
Activities:
Use when: Implementing alerting systems and incident response automation
Tools Used:
/sc:implement alerting-incident-response
SRE Specialist: alerting strategy and incident response design
Data Analytics Expert: anomaly detection and smart alerting
Automation Engineer: incident response automation and workflows
Activities:
Use when: Setting up performance monitoring and optimization systems
Tools Used:
/sc:implement performance-monitoring
Performance Analyst: performance monitoring and optimization implementation
Observability Architect: performance visibility and analysis setup
Data Analytics Expert: performance analytics and insights
Activities:
Use when: Implementing advanced analytics and predictive monitoring capabilities
Tools Used:
/sc:implement predictive-monitoring
Data Analytics Expert: advanced analytics and machine learning implementation
Observability Architect: predictive monitoring architecture
SRE Specialist: predictive incident prevention and response
Activities:
| Command | Use Case | Output |
|---------|---------|--------|
| /sc:design --type monitoring | Monitoring design | Complete monitoring architecture |
| /sc:implement observability | Observability system | Comprehensive observability implementation |
| /sc:implement alerting | Alerting system | Intelligent alerting and incident response |
| /sc:implement apm | APM system | Application performance monitoring |
| /sc:implement predictive-monitoring | Predictive monitoring | Advanced analytics and prediction |
| Tool | Role | Capabilities | |------|------|------------| | Prometheus | Metrics collection | Time-series metrics collection and storage | | Grafana | Visualization | Monitoring dashboards and visualization | | ELK Stack | Log analysis | Log aggregation and analysis | | Jaeger/Zipkin | Distributed tracing | End-to-end request tracing |
| Server | Expertise | Use Case | |--------|----------|---------| | Sequential | Observability reasoning | Complex monitoring design and problem-solving | | Web Search | Monitoring trends | Latest monitoring practices and tools | | Firecrawl | Documentation | Monitoring tool documentation and best practices |
User: "Implement comprehensive observability for our microservices architecture with intelligent alerting"
Workflow:
1. Phase 1: Analyze observability requirements and define monitoring strategy
2. Phase 2: Design monitoring architecture with metrics, logs, and traces
3. Phase 3: Implement monitoring infrastructure with Prometheus, Grafana, and ELK
4. Phase 4: Set up intelligent alerting and incident response automation
5. Phase 5: Configure APM and performance monitoring
6. Phase 6: Implement predictive analytics and anomaly detection
Output: Complete observability system with intelligent alerting and predictive monitoring
User: "Set up APM for our web application to identify performance bottlenecks and optimize user experience"
Workflow:
1. Phase 1: Analyze performance monitoring requirements and objectives
2. Phase 2: Design APM architecture with distributed tracing
3. Phase 3: Implement APM tools and instrumentation
4. Phase 4: Set up performance dashboards and alerting
5. Phase 5: Configure user experience monitoring and analysis
6. Phase 6: Implement performance optimization recommendations
Output: Comprehensive APM system with performance optimization and user experience monitoring
User: "Create intelligent alerting system with automated incident response for our production systems"
Workflow:
1. Phase 1: Analyze alerting requirements and incident response needs
2. Phase 2: Design intelligent alerting strategy with anomaly detection
3. Phase 3: Implement alerting system with smart thresholds and correlation
4. Phase 4: Set up automated incident response workflows
5. Phase 5: Configure escalation procedures and on-call management
6. Phase 6: Implement incident communication and reporting
Output: Intelligent alerting system with automated incident response and management
observability-system/
├── monitoring-infrastructure/
│ ├── metrics/ # Metrics collection and storage
│ ├── logs/ # Log aggregation and analysis
│ ├── traces/ # Distributed tracing infrastructure
│ └── events/ # Event collection and processing
├── alerting-system/
│ ├── rules/ # Alerting rules and thresholds
│ ├── anomaly-detection/ # Anomaly detection algorithms
│ ├── escalation/ # Escalation procedures and policies
│ └── automation/ # Alerting automation and workflows
├── dashboards/
│ ├── system-overview/ # System-wide monitoring dashboards
│ ├── application-performance/ # Application performance dashboards
│ ├── business-metrics/ # Business metrics and KPIs
│ └── incident-response/ # Incident response dashboards
├── analytics/
│ ├── machine-learning/ # ML models for anomaly detection
│ ├── trend-analysis/ # Trend analysis and forecasting
│ ├── root-cause-analysis/ # Automated root cause analysis
│ └── predictive-analytics/ # Predictive monitoring and forecasting
├── incident-response/
│ ├── playbooks/ # Incident response playbooks
│ ├── automation/ # Incident response automation
│ ├── communication/ # Incident communication templates
│ └── post-mortem/ # Post-incident analysis and learning
└── configuration/
├── data-retention/ # Data retention and archival policies
├── security/ # Monitoring security and access control
├── integration/ # System integration configurations
└── backup-recovery/ # Backup and disaster recovery procedures
This observability monitor skill transforms the complex process of observability implementation into a guided, expert-supported workflow that ensures comprehensive system visibility, intelligent alerting, and proactive incident management with advanced analytics and automation capabilities.
tools
Replace with description of the skill and when Claude should use it.
testing
Generate structured task lists from specs or requirements. IMPORTANT: After completing ANY spec via ExitSpecMode, ALWAYS ask the user: "Would you like me to generate a task list for this spec?" Use when user confirms or explicitly requests task generation from a plan/spec/PRD.
tools
Optimize SvelteKit applications by leveraging SvelteKit's full-stack architecture for instant server-side rendering and progressive enhancement. Specialized in load functions, form actions, and SvelteKit's data loading patterns. Use when: - User reports slow initial page load with loading spinners - Page uses onMount + fetch for data fetching - Store patterns cause waterfall fetching - Need to improve SEO (content not in initial HTML) - Converting client-side data fetching to server-side load functions - Implementing progressive enhancement patterns Triggers: "slow loading", "optimize fetching", "SSR data", "SvelteKit optimization", "remove loading spinner", "server-side fetch", "convert to load function", "progressive enhancement", "data fetch lambat", "loading lama"
development
Implement or extend user-facing workflows in SvelteKit applications with full-stack capabilities. Specialized in SvelteKit's load functions, form actions, and progressive enhancement. Use when the feature is primarily a UI/UX change backed by existing APIs, affects only the web frontend, and requires following SvelteKit conventions.