.github/plugins/azure-skills/skills/azure-aigateway/SKILL.md
Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. WHEN: semantic caching, token limit, content safety, load balancing, AI model governance, MCP rate limiting, jailbreak detection, add Azure OpenAI backend, add AI Foundry model, test AI gateway, LLM policies, configure AI backend, token metrics, AI cost control, convert API to MCP, import OpenAPI to gateway.
npx skillsauth add microsoft/azure-skills azure-aigatewayInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
4 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Configure Azure API Management (APIM) as an AI Gateway for governing AI models, MCP tools, and agents.
To deploy APIM, use the azure-prepare skill. See APIM deployment guide.
| Category | Triggers | |----------|----------| | Model Governance | "semantic caching", "token limits", "load balance AI", "track token usage" | | Tool Governance | "rate limit MCP", "protect my tools", "configure my tool", "convert API to MCP" | | Agent Governance | "content safety", "jailbreak detection", "filter harmful content" | | Configuration | "add Azure OpenAI backend", "configure my model", "add AI Foundry model" | | Testing | "test AI gateway", "call OpenAI through gateway" |
| Policy | Purpose | Details |
|--------|---------|---------|
| azure-openai-token-limit | Cost control | Model Policies |
| azure-openai-semantic-cache-lookup/store | 60-80% cost savings | Model Policies |
| azure-openai-emit-token-metric | Observability | Model Policies |
| llm-content-safety | Safety & compliance | Agent Policies |
| rate-limit-by-key | MCP/tool protection | Tool Policies |
# Get gateway URL
az apim show --name <apim-name> --resource-group <rg> --query "gatewayUrl" -o tsv
# List backends (AI models)
az apim backend list --service-name <apim-name> --resource-group <rg> \
--query "[].{id:name, url:url}" -o table
# Get subscription key
az apim subscription keys list \
--service-name <apim-name> --resource-group <rg> --subscription-id <sub-id>
GATEWAY_URL=$(az apim show --name <apim-name> --resource-group <rg> --query "gatewayUrl" -o tsv)
curl -X POST "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <key>" \
-d '{"messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}'
See references/patterns.md for full steps.
# Discover AI resources
az cognitiveservices account list --query "[?kind=='OpenAI']" -o table
# Create backend
az apim backend create --service-name <apim> --resource-group <rg> \
--backend-id openai-backend --protocol http --url "https://<aoai>.openai.azure.com/openai"
# Grant access (managed identity)
az role assignment create --assignee <apim-principal-id> \
--role "Cognitive Services User" --scope <aoai-resource-id>
Recommended policy order in <inbound>:
See references/policies.md for complete example.
| Issue | Solution |
|-------|----------|
| Token limit 429 | Increase tokens-per-minute or add load balancing |
| No cache hits | Lower score-threshold to 0.7 |
| Content false positives | Increase category thresholds (5-6) |
| Backend auth 401 | Grant APIM "Cognitive Services User" role |
See references/troubleshooting.md for details.
tools
Deploy, evaluate, fine-tune, and manage Foundry agents end-to-end: Docker build, ACR push, hosted/prompt agent create, batch eval, continuous eval, prompt optimizer, Agent Optimizer scaffold, agent.yaml, dataset curation from traces, model fine-tuning (SFT/DPO/RFT). USE FOR: deploy agent, hosted agent, create agent, add tool to agent, invoke agent, evaluate agent, continuous eval, continuous monitoring, optimize prompt, improve prompt, optimize agent instructions, agent optimizer, deploy model, Foundry project, RBAC, role assignment, permissions, quota, capacity, region, troubleshoot agent, deployment failure, AI Services, create Foundry resource, provision, knowledge index, customize deployment, onboard, availability, fine-tune, SFT, DPO, RFT, training-data, grader, distillation, fine-tuned model, large file upload. DO NOT USE FOR: Azure Functions, App Service, general Azure deploy (use azure-deploy), general Azure prep (use azure-prepare).
testing
Architect and provision enterprise Azure infrastructure from workload descriptions. For cloud architects and platform engineers planning networking, identity, security, compliance, and multi-resource topologies with WAF alignment. Generates Bicep or Terraform directly (no azd). WHEN: 'plan Azure infrastructure', 'architect Azure landing zone', 'design hub-spoke network', 'plan multi-region DR topology', 'set up VNets firewalls and private endpoints', 'subscription-scope Bicep deployment', 'Azure Backup for VM workloads'. PREFER azure-prepare FOR app-centric workflows.
testing
Azure cost management: query costs, forecast spending, optimize to reduce waste. WHEN: "Azure costs", "Azure bill", "cost breakdown", "how much am I spending", "forecast spending", "optimize costs", "reduce spending", "orphaned resources", "rightsize VMs", "cost spike", "reduce storage costs", "AKS cost". DO NOT USE FOR: deploying resources, provisioning, diagnostics, or security audits.
development
Assess and upgrade Azure workloads between plans, tiers, or SKUs, or modernize Azure SDK dependencies in source code. WHEN: upgrade Consumption to Flex Consumption, upgrade Azure Functions plan, change hosting plan, function app SKU, migrate App Service to Container Apps, modernize legacy Azure Java SDKs (com.microsoft.azure to com.azure), migrate Azure Cache for Redis (ACR/ACRE) to Azure Managed Redis (AMR).