labs/vm-cosmosdb/skills/vm-performance-diagnostics/SKILL.md
# VM Performance Diagnostics You are an SRE Agent skill specialized in diagnosing and remediating VM performance issues for SAP workloads running on Azure VMs. ## When to Use This Skill Activate this skill when: - A CPU or memory alert fires on a VM - A user reports slow application performance - A scheduled health check detects performance degradation - VM disk I/O or network throughput anomalies are detected ## Investigation Procedure ### Step 1: Gather Current Metrics Run the following
npx skillsauth add microsoft/sre-agent labs/vm-cosmosdb/skills/vm-performance-diagnosticsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an SRE Agent skill specialized in diagnosing and remediating VM performance issues for SAP workloads running on Azure VMs.
Activate this skill when:
Run the following KQL query against the Log Analytics Workspace to get the current performance snapshot:
Perf
| where TimeGenerated > ago(30m)
| where Computer in ("vm-sap-app-01", "vm-sap-db-01")
| where ObjectName == "Processor" and CounterName == "% Processor Time"
or ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"
or ObjectName == "LogicalDisk" and CounterName == "% Free Space"
| summarize AvgValue = avg(CounterValue), MaxValue = max(CounterValue) by Computer, ObjectName, CounterName
| order by Computer asc, ObjectName asc
Compare against the baseline (last 7 days):
Perf
| where TimeGenerated > ago(7d)
| where Computer in ("vm-sap-app-01", "vm-sap-db-01")
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize
AvgCPU = avg(CounterValue),
P95CPU = percentile(CounterValue, 95),
MaxCPU = max(CounterValue)
by Computer, bin(TimeGenerated, 1h)
| order by TimeGenerated desc
VMProcess
| where TimeGenerated > ago(15m)
| where Computer in ("vm-sap-app-01", "vm-sap-db-01")
| summarize TotalCPU = sum(PercentProcessorTime) by Computer, ExecutableName
| top 10 by TotalCPU desc
Query Activity Logs for recent modifications:
AzureActivity
| where TimeGenerated > ago(24h)
| where ResourceGroup has "vm-perf"
| where OperationNameValue has "Microsoft.Compute/virtualMachines"
| project TimeGenerated, Caller, OperationNameValue, ActivityStatusValue
| order by TimeGenerated desc
az vm run-command invoke --resource-group {rg} --name {vm} \
--command-id RunShellScript --scripts "kill -9 $(pgrep stress)"
az vm restart --resource-group {rg} --name {vm}
az vm resize --resource-group {rg} --name {vm} --size Standard_B4ms
When reporting findings, use this structure:
## VM Performance Report
**VM:** {vmName}
**Time:** {timestamp}
**Severity:** {High/Medium/Low}
### Current State
| Metric | Current | Baseline (P95) | Status |
|--------|---------|-----------------|--------|
| CPU % | {val} | {baseline} | {OK/WARNING/CRITICAL} |
| Memory % | {val} | {baseline} | {OK/WARNING/CRITICAL} |
| Disk Free % | {val} | {baseline} | {OK/WARNING/CRITICAL} |
### Root Cause Analysis
{description of what's causing the issue}
### Recommended Actions
1. {action 1} — {impact}
2. {action 2} — {impact}
### Risk Assessment
{what could go wrong if we remediate vs. if we don't}
testing
Checks whether Azure Container App deployments comply with the organization's CI/CD-only deployment policy. Uses three signals: Activity Log caller identity, Docker image labels (tamper-proof), and resource tags. QueryLogAnalyticsByWorkspaceId
testing
Checks whether Azure Container App deployments comply with the organization's CI/CD-only deployment policy. Uses three signals: Activity Log caller identity, Docker image labels (tamper-proof), and resource tags. QueryLogAnalyticsByWorkspaceId
development
# Compliance Drift Detection You are an SRE Agent skill specialized in detecting and remediating configuration drift across Azure resources. You enforce organizational compliance policies for VMs, storage accounts, networking, and resource governance. ## When to Use This Skill Activate this skill when: - A scheduled compliance scan runs (every 30 minutes) - A user requests a compliance audit - An Activity Log shows manual resource modifications - A new resource is discovered without required
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".