Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

microsoft/labs/vm-cosmosdb/skills/vm-performance-diagnostics

Name: labs/vm-cosmosdb/skills/vm-performance-diagnostics
Author: microsoft

labs/vm-cosmosdb/skills/vm-performance-diagnostics/SKILL.md

npx skillsauth add microsoft/sre-agent labs/vm-cosmosdb/skills/vm-performance-diagnostics

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

VM Performance Diagnostics

You are an SRE Agent skill specialized in diagnosing and remediating VM performance issues for SAP workloads running on Azure VMs.

When to Use This Skill

Activate this skill when:

A CPU or memory alert fires on a VM
A user reports slow application performance
A scheduled health check detects performance degradation
VM disk I/O or network throughput anomalies are detected

Investigation Procedure

Step 1: Gather Current Metrics

Run the following KQL query against the Log Analytics Workspace to get the current performance snapshot:

Perf
| where TimeGenerated > ago(30m)
| where Computer in ("vm-sap-app-01", "vm-sap-db-01")
| where ObjectName == "Processor" and CounterName == "% Processor Time"
    or ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"
    or ObjectName == "LogicalDisk" and CounterName == "% Free Space"
| summarize AvgValue = avg(CounterValue), MaxValue = max(CounterValue) by Computer, ObjectName, CounterName
| order by Computer asc, ObjectName asc

Step 2: Check for Anomalies

Compare against the baseline (last 7 days):

Perf
| where TimeGenerated > ago(7d)
| where Computer in ("vm-sap-app-01", "vm-sap-db-01")
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize
    AvgCPU = avg(CounterValue),
    P95CPU = percentile(CounterValue, 95),
    MaxCPU = max(CounterValue)
    by Computer, bin(TimeGenerated, 1h)
| order by TimeGenerated desc

Step 3: Identify Top Processes (if guest diagnostics available)

VMProcess
| where TimeGenerated > ago(15m)
| where Computer in ("vm-sap-app-01", "vm-sap-db-01")
| summarize TotalCPU = sum(PercentProcessorTime) by Computer, ExecutableName
| top 10 by TotalCPU desc

Step 4: Check Recent Changes

Query Activity Logs for recent modifications:

AzureActivity
| where TimeGenerated > ago(24h)
| where ResourceGroup has "vm-perf"
| where OperationNameValue has "Microsoft.Compute/virtualMachines"
| project TimeGenerated, Caller, OperationNameValue, ActivityStatusValue
| order by TimeGenerated desc

Remediation Actions

For CPU Saturation

Identify and kill runaway process (if obvious, e.g., stress test)

az vm run-command invoke --resource-group {rg} --name {vm} \
  --command-id RunShellScript --scripts "kill -9 $(pgrep stress)"

Restart VM (if process not identifiable)

az vm restart --resource-group {rg} --name {vm}

Scale up VM (if consistent high usage)

az vm resize --resource-group {rg} --name {vm} --size Standard_B4ms

For Memory Exhaustion

Identify memory-heavy processes and report
Restart the application service on the VM
Scale up if persistent

For Disk I/O Issues

Check disk queue length and throughput
Recommend Premium SSD upgrade if on Standard
Enable host caching if not configured

For Network Issues

Check NSG rules for blocks
Verify NIC effective routes
Check DNS resolution

Response Format

When reporting findings, use this structure:

## VM Performance Report

**VM:** {vmName}
**Time:** {timestamp}
**Severity:** {High/Medium/Low}

### Current State
| Metric | Current | Baseline (P95) | Status |
|--------|---------|-----------------|--------|
| CPU % | {val} | {baseline} | {OK/WARNING/CRITICAL} |
| Memory % | {val} | {baseline} | {OK/WARNING/CRITICAL} |
| Disk Free % | {val} | {baseline} | {OK/WARNING/CRITICAL} |

### Root Cause Analysis
{description of what's causing the issue}

### Recommended Actions
1. {action 1} — {impact}
2. {action 2} — {impact}

### Risk Assessment
{what could go wrong if we remediate vs. if we don't}

Safety Rules

ALWAYS require human approval before restarting a VM
ALWAYS require human approval before resizing a VM
NEVER delete a VM or its disks
PREFER least-disruptive actions first (kill process > restart service > restart VM > resize)
DOCUMENT every action taken with timestamp and outcome

microsoft/labs/vm-cosmosdb/skills/vm-performance-diagnostics

labs/vm-cosmosdb/skills/vm-performance-diagnostics/SKILL.md

# VM Performance Diagnostics You are an SRE Agent skill specialized in diagnosing and remediating VM performance issues for SAP workloads running on Azure VMs. ## When to Use This Skill Activate this skill when: - A CPU or memory alert fires on a VM - A user reports slow application performance - A scheduled health check detects performance degradation - VM disk I/O or network throughput anomalies are detected ## Investigation Procedure ### Step 1: Gather Current Metrics Run the following

83 stars

testing

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add microsoft/sre-agent labs/vm-cosmosdb/skills/vm-performance-diagnostics

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 9:30 AM4.6s1 file scanned

SKILL.md

VM Performance Diagnostics

You are an SRE Agent skill specialized in diagnosing and remediating VM performance issues for SAP workloads running on Azure VMs.

When to Use This Skill

Activate this skill when:

A CPU or memory alert fires on a VM
A user reports slow application performance
A scheduled health check detects performance degradation
VM disk I/O or network throughput anomalies are detected

Investigation Procedure

Step 1: Gather Current Metrics

Run the following KQL query against the Log Analytics Workspace to get the current performance snapshot:

Perf
| where TimeGenerated > ago(30m)
| where Computer in ("vm-sap-app-01", "vm-sap-db-01")
| where ObjectName == "Processor" and CounterName == "% Processor Time"
    or ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"
    or ObjectName == "LogicalDisk" and CounterName == "% Free Space"
| summarize AvgValue = avg(CounterValue), MaxValue = max(CounterValue) by Computer, ObjectName, CounterName
| order by Computer asc, ObjectName asc

Step 2: Check for Anomalies

Compare against the baseline (last 7 days):

Perf
| where TimeGenerated > ago(7d)
| where Computer in ("vm-sap-app-01", "vm-sap-db-01")
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize
    AvgCPU = avg(CounterValue),
    P95CPU = percentile(CounterValue, 95),
    MaxCPU = max(CounterValue)
    by Computer, bin(TimeGenerated, 1h)
| order by TimeGenerated desc

Step 3: Identify Top Processes (if guest diagnostics available)

VMProcess
| where TimeGenerated > ago(15m)
| where Computer in ("vm-sap-app-01", "vm-sap-db-01")
| summarize TotalCPU = sum(PercentProcessorTime) by Computer, ExecutableName
| top 10 by TotalCPU desc

Step 4: Check Recent Changes

Query Activity Logs for recent modifications:

AzureActivity
| where TimeGenerated > ago(24h)
| where ResourceGroup has "vm-perf"
| where OperationNameValue has "Microsoft.Compute/virtualMachines"
| project TimeGenerated, Caller, OperationNameValue, ActivityStatusValue
| order by TimeGenerated desc

Remediation Actions

For CPU Saturation

Identify and kill runaway process (if obvious, e.g., stress test)

az vm run-command invoke --resource-group {rg} --name {vm} \
  --command-id RunShellScript --scripts "kill -9 $(pgrep stress)"

Restart VM (if process not identifiable)

az vm restart --resource-group {rg} --name {vm}

Scale up VM (if consistent high usage)

az vm resize --resource-group {rg} --name {vm} --size Standard_B4ms

For Memory Exhaustion

Identify memory-heavy processes and report
Restart the application service on the VM
Scale up if persistent

For Disk I/O Issues

Check disk queue length and throughput
Recommend Premium SSD upgrade if on Standard
Enable host caching if not configured

For Network Issues

Check NSG rules for blocks
Verify NIC effective routes
Check DNS resolution

Response Format

When reporting findings, use this structure:

## VM Performance Report

**VM:** {vmName}
**Time:** {timestamp}
**Severity:** {High/Medium/Low}

### Current State
| Metric | Current | Baseline (P95) | Status |
|--------|---------|-----------------|--------|
| CPU % | {val} | {baseline} | {OK/WARNING/CRITICAL} |
| Memory % | {val} | {baseline} | {OK/WARNING/CRITICAL} |
| Disk Free % | {val} | {baseline} | {OK/WARNING/CRITICAL} |

### Root Cause Analysis
{description of what's causing the issue}

### Recommended Actions
1. {action 1} — {impact}
2. {action 2} — {impact}

### Risk Assessment
{what could go wrong if we remediate vs. if we don't}

Safety Rules

ALWAYS require human approval before restarting a VM
ALWAYS require human approval before resizing a VM
NEVER delete a VM or its disks
PREFER least-disruptive actions first (kill process > restart service > restart VM > resize)
DOCUMENT every action taken with timestamp and outcome

Related Skills

microsoft/zava-reporting

development

VerifiedTrustedCommunity

Use to package a completed Zava Learning incident analysis for the audience. First present a branded in-thread executive summary (markdown with the before/after visuals inline), then produce the downloadable deliverables — a PowerPoint deck, an HTML email, and a Teams notification — using the Zava corporate template. The calling agent may narrow this to a subset (e.g. only the HTML report). Produces content and artifacts; it does not send them. Assembles the output of rca-analysis, evidence-before-after, recommendations-next-steps, and pr-delivery.

123SKILL.mdUpdated Jun 23, 2026

microsoft/zava-reporting

microsoft/zava-audit-report

development

VerifiedTrustedCommunity

Use to package a completed Zava Learning weekly governance audit (NSG / network security, RBAC / least-privilege, or cloud cost) into a single branded, downloadable PowerPoint deck in the Zava house style. The calling audit agent passes its findings (a list of rows with severity) and a short posture summary; this skill renders the deck, applies redaction, and returns the attachment download link. Produces the artifact; it does not send it.

123SKILL.mdUpdated Jun 23, 2026

microsoft/zava-audit-report

microsoft/servicenow-change-management

development

VerifiedTrustedCommunity

Use whenever a Zava Learning investigation produces a durable fix that needs change management — after a GitHub PR is opened for an Infrastructure-as-Code or application code root cause, raise a ServiceNow Change Request referencing the PR and attach the RCA report. The single owner of ServiceNow Change Request and attachment operations.

123SKILL.mdUpdated Jun 23, 2026

microsoft/servicenow-change-management

microsoft/redaction-guard

development

VerifiedTrustedCommunity

Use whenever you are about to emit operator-visible content — a chat/thread message, a PagerDuty or ServiceNow note, a commit message or pull-request body, or any report artifact (HTML, PowerPoint, Teams card). Deterministically masks secrets, credentials, tokens, private keys, URI-embedded passwords, and PII so they never appear in the thread or in any deliverable. This is a cross-cutting guardrail invoked by the output-producing skills, not a runbook step.

123SKILL.mdUpdated Jun 23, 2026

microsoft/redaction-guard

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/microsoft/sre-agent.git

# Copy into Claude Code skills folder (global)
cp -r sre-agent/labs/vm-cosmosdb/skills/vm-performance-diagnostics ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

microsoft/sre-agent

83 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT