awesome/skills/azure-resource-health-diagnose/SKILL.md
Analyze Azure resource health, diagnose issues from logs and telemetry, and create a remediation plan for identified problems.
npx skillsauth add gabeujin/workspace-init-mcp azure-resource-health-diagnoseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This workflow analyzes a specific Azure resource to assess its health status, diagnose potential issues using logs and telemetry data, and develop a comprehensive remediation plan for any problems discovered.
azmcp-*) over direct Azure CLI when availableAction: Retrieve diagnostic and troubleshooting best practices Tools: Azure MCP best practices tool Process:
Action: Locate and identify the target Azure resource Tools: Azure MCP tools + Azure CLI fallback Process:
Resource Lookup:
azmcp-subscription-listaz resource list --name <resource-name> to find matching resourcesResource Type Detection:
Action: Evaluate current resource health and availability Tools: Azure MCP monitoring tools + Azure CLI Process:
Basic Health Check:
Service-Specific Health Indicators:
Action: Analyze logs and telemetry to identify issues and patterns Tools: Azure MCP monitoring tools for Log Analytics queries Process:
Find Monitoring Sources:
azmcp-monitor-workspace-list to identify Log Analytics workspacesazmcp-monitor-table-listExecute Diagnostic Queries:
Use azmcp-monitor-log-query with targeted KQL queries based on resource type:
General Error Analysis:
// Recent errors and exceptions
union isfuzzy=true
AzureDiagnostics,
AppServiceHTTPLogs,
AppServiceAppLogs,
AzureActivity
| where TimeGenerated > ago(24h)
| where Level == "Error" or ResultType != "Success"
| summarize ErrorCount=count() by Resource, ResultType, bin(TimeGenerated, 1h)
| order by TimeGenerated desc
Performance Analysis:
// Performance degradation patterns
Perf
| where TimeGenerated > ago(7d)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
| where avg_CounterValue > 80
Application-Specific Queries:
// Application Insights - Failed requests
requests
| where timestamp > ago(24h)
| where success == false
| summarize FailureCount=count() by resultCode, bin(timestamp, 1h)
| order by timestamp desc
// Database - Connection failures
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.SQL"
| where Category == "SQLSecurityAuditEvents"
| where action_name_s == "CONNECTION_FAILED"
| summarize ConnectionFailures=count() by bin(TimeGenerated, 1h)
Pattern Recognition:
Action: Categorize identified issues and determine root causes Process:
Issue Classification:
Root Cause Analysis:
Impact Assessment:
Action: Create a comprehensive plan to address identified issues Process:
Immediate Actions (Critical issues):
Short-term Fixes (High/Medium issues):
Long-term Improvements (All issues):
Implementation Steps:
Action: Present findings and get approval for remediation actions Process:
Display Health Assessment Summary:
🏥 Azure Resource Health Assessment
📊 Resource Overview:
• Resource: [Name] ([Type])
• Status: [Healthy/Warning/Critical]
• Location: [Region]
• Last Analyzed: [Timestamp]
🚨 Issues Identified:
• Critical: X issues requiring immediate attention
• High: Y issues affecting performance/reliability
• Medium: Z issues for optimization
• Low: N informational items
🔍 Top Issues:
1. [Issue Type]: [Description] - Impact: [High/Medium/Low]
2. [Issue Type]: [Description] - Impact: [High/Medium/Low]
3. [Issue Type]: [Description] - Impact: [High/Medium/Low]
🛠️ Remediation Plan:
• Immediate Actions: X items
• Short-term Fixes: Y items
• Long-term Improvements: Z items
• Estimated Resolution Time: [Timeline]
❓ Proceed with detailed remediation plan? (y/n)
Generate Detailed Report:
# Azure Resource Health Report: [Resource Name]
**Generated**: [Timestamp]
**Resource**: [Full Resource ID]
**Overall Health**: [Status with color indicator]
## 🔍 Executive Summary
[Brief overview of health status and key findings]
## 📊 Health Metrics
- **Availability**: X% over last 24h
- **Performance**: [Average response time/throughput]
- **Error Rate**: X% over last 24h
- **Resource Utilization**: [CPU/Memory/Storage percentages]
## 🚨 Issues Identified
### Critical Issues
- **[Issue 1]**: [Description]
- **Root Cause**: [Analysis]
- **Impact**: [Business impact]
- **Immediate Action**: [Required steps]
### High Priority Issues
- **[Issue 2]**: [Description]
- **Root Cause**: [Analysis]
- **Impact**: [Performance/reliability impact]
- **Recommended Fix**: [Solution steps]
## 🛠️ Remediation Plan
### Phase 1: Immediate Actions (0-2 hours)
```bash
# Critical fixes to restore service
[Azure CLI commands with explanations]
# Performance and reliability improvements
[Azure CLI commands with explanations]
# Architectural and preventive measures
[Azure CLI commands and configuration changes]
documentation
Write a coding standards document for a project using the coding styles from the file(s) and/or folder(s) passed as arguments in the prompt.
testing
Safely upgrades legacy or older initialized workspaces to the latest managed harness structure with dry-run, backup, restore, and review discipline.
tools
Guides the Copilot CLI on how to use the WorkIQ CLI/MCP server to query Microsoft 365 Copilot data (emails, meetings, docs, Teams, people) for live context, summaries, and recommendations.
tools
Windows App Development CLI (winapp) for building, packaging, and deploying Windows applications. Use when asked to initialize Windows app projects, create MSIX packages, generate AppxManifest.xml, manage development certificates, add package identity for debugging, sign packages, or access Windows SDK build tools. Supports .NET, C++, Electron, Rust, Tauri, and cross-platform frameworks targeting Windows.