skills/a6-plugin-ai-proxy/SKILL.md
Skill for configuring the Apache APISIX ai-proxy plugin via the a6 CLI. Covers proxying requests to LLM providers (OpenAI, Azure OpenAI, DeepSeek, Anthropic, Gemini, Vertex AI, and more), authentication per provider, model configuration, streaming, logging, and load balancing with ai-proxy-multi.
npx skillsauth add moonming/a6 a6-plugin-ai-proxyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The ai-proxy plugin turns APISIX into an AI gateway. It proxies requests in
OpenAI-compatible format to LLM providers, handling authentication, endpoint
routing, and response streaming. Clients send a standard chat-completion
request; the plugin translates and forwards it to the configured provider.
ai-prompt-template, ai-prompt-decorator, or content
moderation plugins for a full AI gateway pipeline| Provider | Value | Default Endpoint |
|----------|-------|------------------|
| OpenAI | openai | https://api.openai.com/v1/chat/completions |
| DeepSeek | deepseek | https://api.deepseek.com/chat/completions |
| Azure OpenAI | azure-openai | Custom via override.endpoint |
| Anthropic | anthropic | https://api.anthropic.com/v1/chat/completions |
| AIMLAPI | aimlapi | https://api.aimlapi.com/v1/chat/completions |
| OpenRouter | openrouter | https://openrouter.ai/api/v1/chat/completions |
| Gemini | gemini | https://generativelanguage.googleapis.com/v1beta/openai/chat/completions |
| Vertex AI | vertex-ai | https://aiplatform.googleapis.com |
| OpenAI-Compatible | openai-compatible | Custom via override.endpoint |
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| provider | string | Yes | — | One of the 9 supported providers |
| auth | object | Yes | — | Authentication config (see below) |
| options | object | No | — | Model and generation parameters |
| options.model | string | No | — | Model name (provider-specific) |
| options.temperature | number | No | — | Sampling temperature |
| options.top_p | number | No | — | Nucleus sampling |
| options.max_tokens | integer | No | — | Maximum tokens to generate |
| options.stream | boolean | No | false | Enable SSE streaming |
| override | object | No | — | Override default endpoint |
| override.endpoint | string | No | — | Full URL for the provider API |
| provider_conf | object | No | — | Provider-specific config (Vertex AI) |
| provider_conf.project_id | string | No | — | GCP project ID (Vertex AI) |
| provider_conf.region | string | No | — | GCP region (Vertex AI) |
| logging | object | No | — | Logging options |
| logging.summaries | boolean | No | false | Log model, duration, tokens |
| logging.payloads | boolean | No | false | Log request/response bodies |
| timeout | integer | No | 30000 | Request timeout (ms) |
| keepalive | boolean | No | true | Keep connection alive |
| keepalive_timeout | integer | No | 60000 | Keepalive timeout (ms) |
| keepalive_pool | integer | No | 30 | Keepalive pool size |
| ssl_verify | boolean | No | true | Verify SSL certificate |
{
"auth": {
"header": {
"Authorization": "Bearer sk-your-api-key"
}
}
}
{
"auth": {
"header": {
"api-key": "your-azure-key"
}
},
"override": {
"endpoint": "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
}
}
{
"auth": {
"header": {
"Authorization": "Bearer your-gemini-key"
}
}
}
{
"auth": {
"gcp": {
"service_account_json": "{ ... }",
"max_ttl": 3600,
"expire_early_secs": 60
}
},
"provider_conf": {
"project_id": "your-project-id",
"region": "us-central1"
}
}
The service_account_json can also be set via the GCP_SERVICE_ACCOUNT
environment variable.
{
"auth": {
"header": {
"Authorization": "Bearer your-token"
}
},
"override": {
"endpoint": "https://your-custom-llm.com/v1/chat/completions"
}
}
a6 route create -f - <<'EOF'
{
"id": "openai-chat",
"uri": "/v1/chat/completions",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer sk-your-openai-key"
}
},
"options": {
"model": "gpt-4",
"temperature": 0.7,
"max_tokens": 1024
}
}
}
}
EOF
curl http://127.0.0.1:9080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 1+1?"}
]
}'
The gateway adds authentication and forwards to OpenAI. The client never sees the API key.
{
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer sk-your-key"
}
},
"options": {
"model": "gpt-4",
"stream": true
}
}
}
}
The client receives Server-Sent Events (SSE). To get token counts in
streaming mode, the client should include stream_options.include_usage: true
in the request body.
{
"plugins": {
"ai-proxy": {
"provider": "azure-openai",
"auth": {
"header": {
"api-key": "your-azure-key"
}
},
"options": {
"model": "gpt-4"
},
"override": {
"endpoint": "https://myresource.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
},
"timeout": 60000
}
}
}
a6 route create -f - <<'EOF'
{
"id": "embeddings",
"uri": "/v1/embeddings",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer sk-your-key"
}
},
"options": {
"model": "text-embedding-3-small"
},
"override": {
"endpoint": "https://api.openai.com/v1/embeddings"
}
}
}
}
EOF
{
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer sk-your-key"
}
},
"options": {
"model": "gpt-4"
},
"logging": {
"summaries": true,
"payloads": false
}
}
}
}
The plugin does not natively route by model. Use separate routes with vars
matching on request body fields:
# Route requests for gpt-4 to OpenAI
a6 route create -f - <<'EOF'
{
"id": "openai-gpt4",
"uri": "/v1/chat/completions",
"methods": ["POST"],
"vars": [["post_arg.model", "==", "gpt-4"]],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer sk-openai-key" } },
"options": { "model": "gpt-4" }
}
}
}
EOF
# Route requests for deepseek-chat to DeepSeek
a6 route create -f - <<'EOF'
{
"id": "deepseek-chat",
"uri": "/v1/chat/completions",
"methods": ["POST"],
"vars": [["post_arg.model", "==", "deepseek-chat"]],
"plugins": {
"ai-proxy": {
"provider": "deepseek",
"auth": { "header": { "Authorization": "Bearer sk-deepseek-key" } },
"options": { "model": "deepseek-chat" }
}
}
}
EOF
For load balancing, failover, and priority-based routing across providers,
use ai-proxy-multi instead:
{
"plugins": {
"ai-proxy-multi": {
"balancer": {
"algorithm": "roundrobin"
},
"fallback_strategy": ["rate_limiting", "http_429", "http_5xx"],
"instances": [
{
"name": "openai-primary",
"provider": "openai",
"priority": 1,
"weight": 8,
"auth": {
"header": { "Authorization": "Bearer sk-openai-key" }
},
"options": { "model": "gpt-4" }
},
{
"name": "deepseek-backup",
"provider": "deepseek",
"priority": 0,
"weight": 2,
"auth": {
"header": { "Authorization": "Bearer sk-deepseek-key" }
},
"options": { "model": "deepseek-chat" }
}
]
}
}
}
Configure APISIX to log LLM metrics:
| Variable | Description |
|----------|-------------|
| $request_type | traditional_http, ai_chat, or ai_stream |
| $llm_time_to_first_token | Time to first token (ms) |
| $llm_model | Actual model used by provider |
| $request_llm_model | Model requested by client |
| $llm_prompt_tokens | Prompt token count |
| $llm_completion_tokens | Completion token count |
version: "1"
routes:
- id: openai-chat
uri: /v1/chat/completions
methods:
- POST
plugins:
ai-proxy:
provider: openai
auth:
header:
Authorization: Bearer sk-your-openai-key
options:
model: gpt-4
max_tokens: 1024
temperature: 0.7
logging:
summaries: true
| Symptom | Cause | Fix |
|---------|-------|-----|
| 502 Bad Gateway | Wrong endpoint or provider value | Verify provider matches your API; check override.endpoint for Azure/custom |
| 401 from upstream | Invalid API key | Check auth.header value; ensure key is active with the provider |
| Timeout errors | Slow LLM response | Increase timeout (default 30000ms); use streaming for long completions |
| No token counts in streaming | Missing stream_options | Client should send stream_options.include_usage: true |
| Azure 404 | Missing api-version in URL | Include ?api-version=YYYY-MM-DD-preview in override.endpoint |
| Vertex AI auth failure | Bad service account JSON | Set via auth.gcp.service_account_json or GCP_SERVICE_ACCOUNT env var |
tools
Core skill for working with the a6 CLI — the Apache APISIX command-line tool. Provides project conventions, command patterns, architecture overview, and development workflow. Load this skill when working on a6 source code, adding new commands, writing tests, or modifying any a6 component.
tools
Recipe skill for implementing multi-tenant API gateway patterns using the a6 CLI. Covers tenant isolation via Consumer Groups, host/path/header-based routing, per-tenant rate limiting, context forwarding with proxy-rewrite, and declarative config sync workflows for multi-tenant management.
tools
Recipe skill for configuring mutual TLS (mTLS) using the a6 CLI. Covers SSL certificate management, upstream mTLS to backend services, client certificate verification, and end-to-end mTLS setup from client through APISIX to upstream.
tools
Recipe skill for configuring upstream health checks using the a6 CLI. Covers active health checks (HTTP probing), passive health checks (response analysis), combining both, configuring healthy/unhealthy thresholds, and monitoring upstream node status.