a6-plugin-ai-content-moderation

Overview

APISIX provides two content moderation plugins that filter harmful content in LLM requests and responses:

| Plugin | Provider | Request | Response | Streaming | |--------|----------|---------|----------|-----------| | ai-aws-content-moderation | AWS Comprehend | ✅ | ❌ | ❌ | | ai-aliyun-content-moderation | Aliyun Moderation Plus | ✅ | ✅ | ✅ |

Both must be used alongside ai-proxy or ai-proxy-multi.

When to Use

Block toxic, hateful, or sexual content before it reaches the LLM
Filter harmful LLM responses before they reach clients (Aliyun only)
Enforce content policies with configurable thresholds
Comply with content safety regulations

Plugin Execution Order

ai-prompt-template           (priority 1071)
ai-prompt-decorator          (priority 1070)
ai-aws-content-moderation    (priority 1050) ← runs BEFORE ai-proxy
ai-proxy                     (priority 1040)
ai-aliyun-content-moderation (priority 1029) ← runs AFTER ai-proxy

The AWS plugin blocks requests before they reach the LLM. The Aliyun plugin runs after ai-proxy sets context and can check both requests and responses.

Plugin 1: ai-aws-content-moderation

Uses the AWS Comprehend detectToxicContent API to score request content.

Configuration Reference

| Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | comprehend.access_key_id | string | Yes | — | AWS access key ID | | comprehend.secret_access_key | string | Yes | — | AWS secret access key | | comprehend.region | string | Yes | — | AWS region (e.g. us-east-1) | | comprehend.endpoint | string | No | Auto | Custom Comprehend endpoint | | comprehend.ssl_verify | boolean | No | true | Verify SSL certificate | | moderation_categories | object | No | — | Per-category thresholds (0-1) | | moderation_threshold | number | No | 0.5 | Overall toxicity threshold (0-1) |

Moderation Categories

| Category | Description | |----------|-------------| | PROFANITY | Profane language | | HATE_SPEECH | Hateful content | | INSULT | Insulting language | | HARASSMENT_OR_ABUSE | Harassment or abusive content | | SEXUAL | Sexual content | | VIOLENCE_OR_THREAT | Violent or threatening content |

Each category accepts a score threshold from 0 (strictest, blocks nearly everything) to 1 (most permissive). If moderation_categories is set, each category is checked individually. Otherwise, the moderation_threshold is used as an overall toxicity check.

Step-by-Step: AWS Content Moderation

a6 route create -f - <<'EOF'
{
  "id": "moderated-chat",
  "uri": "/v1/chat/completions",
  "methods": ["POST"],
  "plugins": {
    "ai-aws-content-moderation": {
      "comprehend": {
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-east-1"
      },
      "moderation_categories": {
        "HATE_SPEECH": 0.3,
        "VIOLENCE_OR_THREAT": 0.2,
        "SEXUAL": 0.5
      }
    },
    "ai-proxy": {
      "provider": "openai",
      "auth": {
        "header": {
          "Authorization": "Bearer sk-your-key"
        }
      },
      "options": {
        "model": "gpt-4"
      }
    }
  }
}
EOF

Toxic requests are rejected with HTTP 400:

request body exceeds HATE_SPEECH threshold

Overall threshold (no per-category filtering)

{
  "plugins": {
    "ai-aws-content-moderation": {
      "comprehend": {
        "access_key_id": "AKIA...",
        "secret_access_key": "secret...",
        "region": "us-east-1"
      },
      "moderation_threshold": 0.7
    }
  }
}

Plugin 2: ai-aliyun-content-moderation

Uses Aliyun Machine-Assisted Moderation Plus. Supports request moderation, response moderation, and real-time streaming moderation.

Configuration Reference

| Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | endpoint | string | Yes | — | Aliyun service endpoint URL | | region_id | string | Yes | — | Aliyun region (e.g. cn-shanghai) | | access_key_id | string | Yes | — | Aliyun access key ID | | access_key_secret | string | Yes | — | Aliyun access key secret | | check_request | boolean | No | true | Enable request moderation | | check_response | boolean | No | false | Enable response moderation | | stream_check_mode | string | No | final_packet | realtime or final_packet | | stream_check_cache_size | integer | No | 128 | Max chars per batch (realtime) | | stream_check_interval | number | No | 3 | Seconds between batch checks (realtime) | | request_check_service | string | No | llm_query_moderation | Aliyun service for request checks | | request_check_length_limit | number | No | 2000 | Max chars per request chunk | | response_check_service | string | No | llm_response_moderation | Aliyun service for response checks | | response_check_length_limit | number | No | 5000 | Max chars per response chunk | | risk_level_bar | string | No | high | Threshold: none, low, medium, high, max | | deny_code | number | No | 200 | HTTP status code for rejected content | | deny_message | string | No | — | Custom rejection message | | timeout | integer | No | 10000 | Request timeout (ms) | | ssl_verify | boolean | No | true | Verify SSL certificate |

Risk Level System

Content is blocked when its risk level meets or exceeds the risk_level_bar:

none (0) < low (1) < medium (2) < high (3) < max (4)

Setting risk_level_bar: "high" blocks content rated high or max. Setting risk_level_bar: "low" blocks everything rated low or above.

Streaming Modes

| Mode | Behavior | |------|----------| | final_packet | Buffers entire response, checks at end | | realtime | Checks content in batches during streaming, can interrupt mid-response |

Step-by-Step: Aliyun Request + Response Moderation

a6 route create -f - <<'EOF'
{
  "id": "aliyun-moderated-chat",
  "uri": "/v1/chat/completions",
  "methods": ["POST"],
  "plugins": {
    "ai-proxy": {
      "provider": "openai",
      "auth": {
        "header": {
          "Authorization": "Bearer sk-your-key"
        }
      },
      "options": {
        "model": "gpt-4"
      }
    },
    "ai-aliyun-content-moderation": {
      "endpoint": "https://green.cn-shanghai.aliyuncs.com",
      "region_id": "cn-shanghai",
      "access_key_id": "your-aliyun-key-id",
      "access_key_secret": "your-aliyun-key-secret",
      "check_request": true,
      "check_response": true,
      "risk_level_bar": "high",
      "deny_code": 400,
      "deny_message": "Content policy violation"
    }
  }
}
EOF

Realtime streaming moderation

{
  "plugins": {
    "ai-aliyun-content-moderation": {
      "endpoint": "https://green.cn-shanghai.aliyuncs.com",
      "region_id": "cn-shanghai",
      "access_key_id": "key-id",
      "access_key_secret": "key-secret",
      "check_request": true,
      "check_response": true,
      "stream_check_mode": "realtime",
      "stream_check_cache_size": 256,
      "stream_check_interval": 2,
      "risk_level_bar": "medium"
    }
  }
}

Integration Patterns

Pattern A: Request-only filtering (AWS)

Client → [AWS Comprehend blocks toxic] → ai-proxy → LLM → Response → Client

plugins:
  ai-aws-content-moderation:
    comprehend:
      access_key_id: "${AWS_ACCESS_KEY_ID}"
      secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
      region: us-east-1
    moderation_threshold: 0.5
  ai-proxy:
    provider: openai
    auth:
      header:
        Authorization: "Bearer ${OPENAI_API_KEY}"

Pattern B: Request + response filtering (Aliyun)

Client → ai-proxy [sets context] → [Aliyun checks request] → LLM
       → [Aliyun checks response] → Client

plugins:
  ai-proxy:
    provider: openai
    auth:
      header:
        Authorization: "Bearer ${OPENAI_API_KEY}"
  ai-aliyun-content-moderation:
    endpoint: "https://green.cn-shanghai.aliyuncs.com"
    region_id: cn-shanghai
    access_key_id: "${ALIYUN_KEY_ID}"
    access_key_secret: "${ALIYUN_KEY_SECRET}"
    check_request: true
    check_response: true
    risk_level_bar: high

Secret Management

Both plugins support APISIX secret management for credentials:

plugins:
  ai-aws-content-moderation:
    comprehend:
      access_key_id: "$secret://vault/aws_key_id"
      secret_access_key: "$secret://vault/aws_secret_key"
      region: us-east-1

Config Sync Example

version: "1"
routes:
  - id: moderated-chat
    uri: /v1/chat/completions
    methods:
      - POST
    plugins:
      ai-aws-content-moderation:
        comprehend:
          access_key_id: AKIAIOSFODNN7EXAMPLE
          secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
          region: us-east-1
        moderation_categories:
          HATE_SPEECH: 0.3
          VIOLENCE_OR_THREAT: 0.2
        moderation_threshold: 0.5
      ai-proxy:
        provider: openai
        auth:
          header:
            Authorization: Bearer sk-your-key
        options:
          model: gpt-4

Troubleshooting

| Symptom | Cause | Fix | |---------|-------|-----| | "no ai instance picked" | Aliyun plugin used without ai-proxy | Always configure ai-proxy or ai-proxy-multi on the same route | | AWS plugin not blocking | Threshold too permissive | Lower moderation_threshold or per-category thresholds | | Aliyun response moderation inactive | check_response defaults to false | Explicitly set check_response: true | | "Specified signature is not matched" | Wrong Aliyun credentials | Verify access_key_id and access_key_secret | | High latency | Double moderation (both plugins) | Use one moderation provider per route, not both | | Streaming interrupted mid-response | Aliyun realtime mode detected violation | Expected behavior; adjust risk_level_bar or use final_packet mode |

a6-plugin-ai-content-moderation

Overview

APISIX provides two content moderation plugins that filter harmful content in LLM requests and responses:

Both must be used alongside ai-proxy or ai-proxy-multi.

When to Use

Block toxic, hateful, or sexual content before it reaches the LLM
Filter harmful LLM responses before they reach clients (Aliyun only)
Enforce content policies with configurable thresholds
Comply with content safety regulations

Plugin Execution Order

ai-prompt-template           (priority 1071)
ai-prompt-decorator          (priority 1070)
ai-aws-content-moderation    (priority 1050) ← runs BEFORE ai-proxy
ai-proxy                     (priority 1040)
ai-aliyun-content-moderation (priority 1029) ← runs AFTER ai-proxy

The AWS plugin blocks requests before they reach the LLM. The Aliyun plugin runs after ai-proxy sets context and can check both requests and responses.

Plugin 1: ai-aws-content-moderation

Uses the AWS Comprehend detectToxicContent API to score request content.

Configuration Reference

Moderation Categories

Step-by-Step: AWS Content Moderation

a6 route create -f - <<'EOF'
{
  "id": "moderated-chat",
  "uri": "/v1/chat/completions",
  "methods": ["POST"],
  "plugins": {
    "ai-aws-content-moderation": {
      "comprehend": {
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "region": "us-east-1"
      },
      "moderation_categories": {
        "HATE_SPEECH": 0.3,
        "VIOLENCE_OR_THREAT": 0.2,
        "SEXUAL": 0.5
      }
    },
    "ai-proxy": {
      "provider": "openai",
      "auth": {
        "header": {
          "Authorization": "Bearer sk-your-key"
        }
      },
      "options": {
        "model": "gpt-4"
      }
    }
  }
}
EOF

Toxic requests are rejected with HTTP 400:

request body exceeds HATE_SPEECH threshold

Overall threshold (no per-category filtering)

{
  "plugins": {
    "ai-aws-content-moderation": {
      "comprehend": {
        "access_key_id": "AKIA...",
        "secret_access_key": "secret...",
        "region": "us-east-1"
      },
      "moderation_threshold": 0.7
    }
  }
}

Plugin 2: ai-aliyun-content-moderation

Uses Aliyun Machine-Assisted Moderation Plus. Supports request moderation, response moderation, and real-time streaming moderation.

Configuration Reference

Risk Level System

Content is blocked when its risk level meets or exceeds the risk_level_bar:

none (0) < low (1) < medium (2) < high (3) < max (4)

Setting risk_level_bar: "high" blocks content rated high or max. Setting risk_level_bar: "low" blocks everything rated low or above.

Streaming Modes

| Mode | Behavior | |------|----------| | final_packet | Buffers entire response, checks at end | | realtime | Checks content in batches during streaming, can interrupt mid-response |

Step-by-Step: Aliyun Request + Response Moderation

a6 route create -f - <<'EOF'
{
  "id": "aliyun-moderated-chat",
  "uri": "/v1/chat/completions",
  "methods": ["POST"],
  "plugins": {
    "ai-proxy": {
      "provider": "openai",
      "auth": {
        "header": {
          "Authorization": "Bearer sk-your-key"
        }
      },
      "options": {
        "model": "gpt-4"
      }
    },
    "ai-aliyun-content-moderation": {
      "endpoint": "https://green.cn-shanghai.aliyuncs.com",
      "region_id": "cn-shanghai",
      "access_key_id": "your-aliyun-key-id",
      "access_key_secret": "your-aliyun-key-secret",
      "check_request": true,
      "check_response": true,
      "risk_level_bar": "high",
      "deny_code": 400,
      "deny_message": "Content policy violation"
    }
  }
}
EOF

Realtime streaming moderation

{
  "plugins": {
    "ai-aliyun-content-moderation": {
      "endpoint": "https://green.cn-shanghai.aliyuncs.com",
      "region_id": "cn-shanghai",
      "access_key_id": "key-id",
      "access_key_secret": "key-secret",
      "check_request": true,
      "check_response": true,
      "stream_check_mode": "realtime",
      "stream_check_cache_size": 256,
      "stream_check_interval": 2,
      "risk_level_bar": "medium"
    }
  }
}

Integration Patterns

Pattern A: Request-only filtering (AWS)

Client → [AWS Comprehend blocks toxic] → ai-proxy → LLM → Response → Client

plugins:
  ai-aws-content-moderation:
    comprehend:
      access_key_id: "${AWS_ACCESS_KEY_ID}"
      secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
      region: us-east-1
    moderation_threshold: 0.5
  ai-proxy:
    provider: openai
    auth:
      header:
        Authorization: "Bearer ${OPENAI_API_KEY}"

Pattern B: Request + response filtering (Aliyun)

Client → ai-proxy [sets context] → [Aliyun checks request] → LLM
       → [Aliyun checks response] → Client

plugins:
  ai-proxy:
    provider: openai
    auth:
      header:
        Authorization: "Bearer ${OPENAI_API_KEY}"
  ai-aliyun-content-moderation:
    endpoint: "https://green.cn-shanghai.aliyuncs.com"
    region_id: cn-shanghai
    access_key_id: "${ALIYUN_KEY_ID}"
    access_key_secret: "${ALIYUN_KEY_SECRET}"
    check_request: true
    check_response: true
    risk_level_bar: high

Secret Management

Both plugins support APISIX secret management for credentials:

plugins:
  ai-aws-content-moderation:
    comprehend:
      access_key_id: "$secret://vault/aws_key_id"
      secret_access_key: "$secret://vault/aws_secret_key"
      region: us-east-1

Config Sync Example

version: "1"
routes:
  - id: moderated-chat
    uri: /v1/chat/completions
    methods:
      - POST
    plugins:
      ai-aws-content-moderation:
        comprehend:
          access_key_id: AKIAIOSFODNN7EXAMPLE
          secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
          region: us-east-1
        moderation_categories:
          HATE_SPEECH: 0.3
          VIOLENCE_OR_THREAT: 0.2
        moderation_threshold: 0.5
      ai-proxy:
        provider: openai
        auth:
          header:
            Authorization: Bearer sk-your-key
        options:
          model: gpt-4

Adoption

moonming/a6-plugin-ai-content-moderation

$ install --global

Security Scan Results

SKILL.md

a6-plugin-ai-content-moderation

Overview

When to Use

Plugin Execution Order

Plugin 1: ai-aws-content-moderation

Configuration Reference

Moderation Categories

Step-by-Step: AWS Content Moderation

Overall threshold (no per-category filtering)

Plugin 2: ai-aliyun-content-moderation

Configuration Reference

Risk Level System

Streaming Modes

Step-by-Step: Aliyun Request + Response Moderation

Realtime streaming moderation

Integration Patterns

Pattern A: Request-only filtering (AWS)

Pattern B: Request + response filtering (Aliyun)

Secret Management

Config Sync Example

Troubleshooting

Related Skills

moonming/a6-shared

moonming/a6-recipe-multi-tenant

moonming/a6-recipe-mtls

moonming/a6-recipe-health-check

moonming/a6-plugin-ai-content-moderation

$ install --global

Security Scan Results

SKILL.md

a6-plugin-ai-content-moderation

Overview

When to Use

Plugin Execution Order

Plugin 1: ai-aws-content-moderation

Configuration Reference

Moderation Categories

Step-by-Step: AWS Content Moderation

Overall threshold (no per-category filtering)

Plugin 2: ai-aliyun-content-moderation

Configuration Reference

Risk Level System

Streaming Modes

Step-by-Step: Aliyun Request + Response Moderation

Realtime streaming moderation

Integration Patterns

Pattern A: Request-only filtering (AWS)

Pattern B: Request + response filtering (Aliyun)

Secret Management

Config Sync Example

Troubleshooting

Related Skills

moonming/a6-shared

moonming/a6-recipe-multi-tenant

moonming/a6-recipe-mtls

moonming/a6-recipe-health-check