infrastructure/cloud-azure/azure-vms/SKILL.md
Manage Azure Virtual Machines and scale sets. Configure availability sets and managed disks. Use when deploying compute resources on Azure.
npx skillsauth add bagelhole/devops-security-agent-skills azure-vmsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Deploy and manage Azure VMs, availability sets, scale sets, custom images, and managed disks. Covers VM creation, sizing, disk management, auto-scaling, and Terraform configurations for production environments.
# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# Login and set subscription
az login
az account set --subscription "my-subscription-id"
# Create resource group
az group create --name compute-rg --location eastus
# List available VM sizes in a region
az vm list-sizes --location eastus --output table
# List available VM images
az vm image list --output table
az vm image list --publisher Canonical --offer 0001-com-ubuntu-server-jammy --all --output table
az vm create \
--resource-group compute-rg \
--name myapp-vm \
--image Ubuntu2204 \
--size Standard_D4s_v5 \
--admin-username azureuser \
--generate-ssh-keys \
--vnet-name myapp-vnet \
--subnet app-subnet \
--nsg "" \
--public-ip-address "" \
--os-disk-size-gb 64 \
--os-disk-caching ReadWrite \
--storage-sku Premium_LRS \
--zone 1 \
--assign-identity \
--tags environment=prod team=platform app=myapp
# SSH into the VM (if public IP assigned)
ssh azureuser@$(az vm show -g compute-rg -n myapp-vm -d --query publicIps -o tsv)
az vm create \
--resource-group compute-rg \
--name myapp-win-vm \
--image Win2022Datacenter \
--size Standard_D4s_v5 \
--admin-username azureadmin \
--admin-password 'S3cur3P@ssw0rd!' \
--vnet-name myapp-vnet \
--subnet app-subnet \
--public-ip-address "" \
--os-disk-size-gb 128 \
--storage-sku Premium_LRS \
--zone 1
# cloud-init.yaml
# #cloud-config
# package_update: true
# packages:
# - nginx
# - docker.io
# runcmd:
# - systemctl enable nginx
# - systemctl start nginx
# - usermod -aG docker azureuser
az vm create \
--resource-group compute-rg \
--name web-vm \
--image Ubuntu2204 \
--size Standard_B2s \
--admin-username azureuser \
--generate-ssh-keys \
--custom-data cloud-init.yaml \
--tags role=web
| Family | Example Sizes | Use Case | |--------|---------------|----------| | B-series | Standard_B1s, Standard_B2s | Dev/test, low-traffic web servers | | D-series | Standard_D4s_v5, Standard_D8s_v5 | General purpose, most production workloads | | E-series | Standard_E4s_v5, Standard_E16s_v5 | Memory-intensive (databases, caching) | | F-series | Standard_F4s_v2, Standard_F16s_v2 | CPU-intensive (batch processing, analytics) | | L-series | Standard_L8s_v3, Standard_L32s_v3 | Storage-optimized (big data, SQL) | | N-series | Standard_NC6s_v3, Standard_NC24ads_A100_v4 | GPU workloads (ML training, rendering) | | M-series | Standard_M128s | SAP HANA, large in-memory workloads |
# Find VM sizes with specific capabilities
az vm list-sizes --location eastus \
--query "[?numberOfCores >= \`4\` && memoryInMb >= \`16000\`]" \
--output table
# Check VM size availability in a zone
az vm list-skus --location eastus \
--size Standard_D4s_v5 \
--output table
# Add a data disk to existing VM
az vm disk attach \
--resource-group compute-rg \
--vm-name myapp-vm \
--name myapp-data-disk \
--size-gb 256 \
--sku Premium_LRS \
--new \
--lun 0
# Create a standalone managed disk
az disk create \
--resource-group compute-rg \
--name shared-data-disk \
--size-gb 512 \
--sku Premium_LRS \
--zone 1
# Resize a disk (VM must be deallocated)
az vm deallocate --resource-group compute-rg --name myapp-vm
az disk update \
--resource-group compute-rg \
--name myapp-data-disk \
--size-gb 512
az vm start --resource-group compute-rg --name myapp-vm
# Snapshot a disk for backup
az snapshot create \
--resource-group compute-rg \
--name myapp-disk-snapshot \
--source myapp-data-disk
# Create disk from snapshot
az disk create \
--resource-group compute-rg \
--name myapp-disk-from-snap \
--source myapp-disk-snapshot \
--sku Premium_LRS
# List disks attached to a VM
az vm show \
--resource-group compute-rg \
--name myapp-vm \
--query "storageProfile.dataDisks" \
--output table
# Generalize the VM (run inside the VM first)
# sudo waagent -deprovision+user -force
# Deallocate and generalize
az vm deallocate --resource-group compute-rg --name myapp-vm
az vm generalize --resource-group compute-rg --name myapp-vm
# Create image from VM
az image create \
--resource-group compute-rg \
--name myapp-golden-image \
--source myapp-vm \
--os-type Linux
# Create VM from custom image
az vm create \
--resource-group compute-rg \
--name myapp-from-image \
--image myapp-golden-image \
--size Standard_D4s_v5 \
--admin-username azureuser \
--generate-ssh-keys
# Use Azure Compute Gallery for shared images
az sig create \
--resource-group compute-rg \
--gallery-name myAppGallery
az sig image-definition create \
--resource-group compute-rg \
--gallery-name myAppGallery \
--gallery-image-definition myapp-image \
--publisher myorg \
--offer myapp \
--sku 1.0 \
--os-type Linux \
--os-state Generalized
az sig image-version create \
--resource-group compute-rg \
--gallery-name myAppGallery \
--gallery-image-definition myapp-image \
--gallery-image-version 1.0.0 \
--managed-image myapp-golden-image \
--target-regions eastus westus \
--replica-count 2
# Create availability set
az vm availability-set create \
--resource-group compute-rg \
--name myapp-avset \
--platform-fault-domain-count 3 \
--platform-update-domain-count 5
# Create VM in availability set
az vm create \
--resource-group compute-rg \
--name myapp-vm-1 \
--image Ubuntu2204 \
--size Standard_D4s_v5 \
--availability-set myapp-avset \
--admin-username azureuser \
--generate-ssh-keys
# Create VMs across availability zones
for zone in 1 2 3; do
az vm create \
--resource-group compute-rg \
--name "myapp-vm-zone${zone}" \
--image Ubuntu2204 \
--size Standard_D4s_v5 \
--zone "$zone" \
--admin-username azureuser \
--generate-ssh-keys \
--no-wait
done
# Create VMSS with autoscaling
az vmss create \
--resource-group compute-rg \
--name myapp-vmss \
--image Ubuntu2204 \
--vm-sku Standard_D4s_v5 \
--instance-count 2 \
--admin-username azureuser \
--generate-ssh-keys \
--vnet-name myapp-vnet \
--subnet app-subnet \
--upgrade-policy-mode Rolling \
--health-probe "/" \
--load-balancer myapp-lb \
--zones 1 2 3 \
--custom-data cloud-init.yaml \
--tags environment=prod
# Configure autoscale rules
az monitor autoscale create \
--resource-group compute-rg \
--resource myapp-vmss \
--resource-type Microsoft.Compute/virtualMachineScaleSets \
--name myapp-autoscale \
--min-count 2 \
--max-count 20 \
--count 3
# Scale out when CPU > 70%
az monitor autoscale rule create \
--resource-group compute-rg \
--autoscale-name myapp-autoscale \
--condition "Percentage CPU > 70 avg 5m" \
--scale out 2
# Scale in when CPU < 30%
az monitor autoscale rule create \
--resource-group compute-rg \
--autoscale-name myapp-autoscale \
--condition "Percentage CPU < 30 avg 10m" \
--scale in 1
# Manual scale
az vmss scale \
--resource-group compute-rg \
--name myapp-vmss \
--new-capacity 5
# Update VMSS image
az vmss update \
--resource-group compute-rg \
--name myapp-vmss \
--set virtualMachineProfile.storageProfile.imageReference.version=latest
# Rolling upgrade of instances
az vmss rolling-upgrade start \
--resource-group compute-rg \
--name myapp-vmss
# List VMSS instances
az vmss list-instances \
--resource-group compute-rg \
--name myapp-vmss \
--output table
# Start / Stop / Restart / Deallocate
az vm start --resource-group compute-rg --name myapp-vm
az vm stop --resource-group compute-rg --name myapp-vm
az vm restart --resource-group compute-rg --name myapp-vm
az vm deallocate --resource-group compute-rg --name myapp-vm
# Run command on a VM
az vm run-command invoke \
--resource-group compute-rg \
--name myapp-vm \
--command-id RunShellScript \
--scripts "df -h && free -m && uptime"
# Enable boot diagnostics
az vm boot-diagnostics enable \
--resource-group compute-rg \
--name myapp-vm
# Get boot diagnostics log
az vm boot-diagnostics get-boot-log \
--resource-group compute-rg \
--name myapp-vm
# Enable Azure Backup
az backup protection enable-for-vm \
--resource-group compute-rg \
--vault-name myapp-vault \
--vm myapp-vm \
--policy-name DefaultPolicy
# Resize a VM
az vm resize \
--resource-group compute-rg \
--name myapp-vm \
--size Standard_D8s_v5
resource "azurerm_linux_virtual_machine" "main" {
name = "myapp-vm"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
size = "Standard_D4s_v5"
admin_username = "azureuser"
zone = "1"
network_interface_ids = [azurerm_network_interface.main.id]
admin_ssh_key {
username = "azureuser"
public_key = file("~/.ssh/id_rsa.pub")
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Premium_LRS"
disk_size_gb = 64
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts-gen2"
version = "latest"
}
identity {
type = "SystemAssigned"
}
tags = var.tags
}
resource "azurerm_managed_disk" "data" {
name = "myapp-data-disk"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
storage_account_type = "Premium_LRS"
create_option = "Empty"
disk_size_gb = 256
zone = "1"
tags = var.tags
}
resource "azurerm_virtual_machine_data_disk_attachment" "data" {
managed_disk_id = azurerm_managed_disk.data.id
virtual_machine_id = azurerm_linux_virtual_machine.main.id
lun = 0
caching = "ReadOnly"
}
resource "azurerm_linux_virtual_machine_scale_set" "main" {
name = "myapp-vmss"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
sku = "Standard_D4s_v5"
instances = 3
admin_username = "azureuser"
zones = [1, 2, 3]
admin_ssh_key {
username = "azureuser"
public_key = file("~/.ssh/id_rsa.pub")
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts-gen2"
version = "latest"
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Premium_LRS"
}
network_interface {
name = "vmss-nic"
primary = true
ip_configuration {
name = "internal"
primary = true
subnet_id = azurerm_subnet.app.id
}
}
automatic_os_upgrade_policy {
disable_automatic_rollback = false
enable_automatic_os_upgrade = true
}
rolling_upgrade_policy {
max_batch_instance_percent = 20
max_unhealthy_instance_percent = 20
max_unhealthy_upgraded_instance_percent = 5
pause_time_between_batches = "PT0S"
}
tags = var.tags
}
| Symptom | Cause | Fix |
|---------|-------|-----|
| VM fails to start | Quota exceeded in region | Check quota with az vm list-usage --location eastus; request increase |
| SSH connection refused | NSG blocking port 22 or VM not running | Check NSG rules and VM power state; use Azure Bastion for private VMs |
| VM disk full | OS disk too small or logs not rotated | Resize disk after deallocating; configure log rotation |
| VM performance is slow | Wrong VM size or disk throttling | Check metrics with az monitor metrics list; upgrade size or disk tier |
| Scale set not scaling out | Autoscale rule threshold not met | Review autoscale settings; verify metric thresholds match workload |
| Custom image VM boot fails | Image not properly generalized | Re-run waagent -deprovision before capturing; check boot diagnostics |
| VMSS rolling upgrade stuck | Health probe failing on new instances | Fix application health endpoint; check az vmss rolling-upgrade get-latest |
| Spot VM evicted unexpectedly | Azure reclaimed capacity | Use eviction policy Deallocate and set up eviction notifications |
azure-networking -- VNet and NSG configuration for VM connectivity.azure-aks -- Container alternative when VMs are not required.arm-templates -- Bicep-based VM deployment templates.terraform-azure -- Terraform-based VM and VMSS provisioning.development
Design and operationalize SRE dashboards that surface reliability, latency, error, saturation, and capacity signals across services. Use when building observability views for SLOs, incident response, and executive reliability reporting.
testing
Harden OpenClaw self-hosted environments with baseline host controls, auth tightening, secret handling, network segmentation, and safe update/rollback workflows. Use when deploying OpenClaw in home labs, startups, or production-like local AI infrastructure.
devops
Deploy, manage, and optimize vector databases for AI applications. Covers Qdrant, Weaviate, pgvector, and Pinecone — collection management, indexing strategies, backup, and performance tuning for production RAG and semantic search workloads.
testing
Deploy ML models on Kubernetes with KServe (formerly KFServing) and NVIDIA Triton Inference Server. Includes canary deployments, autoscaling, model versioning, A/B testing, and GPU resource management for production model serving.