Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

pkuppens/agent-guardrails

Name: agent-guardrails
Author: pkuppens

skills/ai-agent-development/agent-guardrails/SKILL.md

npx skillsauth add pkuppens/pkuppens agent-guardrails

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Agent Guardrails

Patterns for securing AI agent input and output, controlling costs, protecting privacy, and maintaining audit trails. Guardrails wrap the agent core — they do not replace good prompt engineering.

When to use

Adding input validation or output filtering to an agent
Implementing PII detection and redaction
Setting up cost controls (token budgets, rate limits)
Adding audit logging for compliance or debugging
Reviewing agent security posture

Guardrail architecture

Guardrails apply at two points in the agent pipeline:

User Input → [Input Guardrails] → Agent Core → [Output Guardrails] → User Output
                                       │
                                  [Audit Log]

Input guardrails

| Guardrail | Purpose | Implementation | |-----------|---------|----------------| | Prompt injection detection | Block attempts to override system instructions | Pattern matching, classifier model, or dedicated guardrail LLM | | Input sanitisation | Strip dangerous content (scripts, SQL) | Regex/allowlist filtering | | Topic restriction | Reject off-topic or prohibited queries | Classifier or keyword filter | | Rate limiting | Prevent abuse and control costs | Token bucket, sliding window | | Authentication | Verify caller identity | OAuth2, API key, JWT validation |

Output guardrails

| Guardrail | Purpose | Implementation | |-----------|---------|----------------| | PII filtering | Redact personal data from responses | Named entity recognition (spaCy, Presidio) | | Content policy | Block harmful, biased, or inappropriate output | Content classifier, keyword blocklist | | Hallucination detection | Flag unsupported claims | Cross-reference with retrieved sources | | Format validation | Ensure structured output matches expected schema | JSON schema validation, Pydantic parsing | | Citation enforcement | Require source references in responses | Post-processing check against retrieved docs |

PII filtering

Use Microsoft Presidio or spaCy NER for on-premises PII detection:

Detection — identify PII entities (names, emails, phone numbers, SSNs, medical IDs)
Redaction — replace with placeholders ([PERSON], [EMAIL])
Pseudonymisation — replace with consistent fake values for testing
Audit — log what was redacted (type, position) without logging the actual PII

Healthcare-specific: filter patient identifiers, medical record numbers, and clinical data per HIPAA/GDPR.

Cost control

Token budgets — set max input + output tokens per request and per session
Model routing — use cheaper models for simple tasks, expensive models for complex ones
Caching — cache identical or semantically similar queries to avoid redundant LLM calls
Circuit breaker — disable the agent if spend exceeds a threshold

class CostGuard:
    def __init__(self, max_tokens_per_request: int, max_cost_per_day: float):
        self.max_tokens_per_request = max_tokens_per_request
        self.max_cost_per_day = max_cost_per_day
        self.daily_cost = 0.0

    def check(self, estimated_tokens: int, cost_per_token: float) -> bool:
        estimated_cost = estimated_tokens * cost_per_token
        if estimated_tokens > self.max_tokens_per_request:
            return False
        if self.daily_cost + estimated_cost > self.max_cost_per_day:
            return False
        return True

Audit logging

Every agent interaction should produce an audit record:

| Field | Description | |-------|-------------| | request_id | Unique identifier for the interaction | | timestamp | ISO 8601 timestamp | | user_id | Authenticated caller identity | | model | LLM model used | | input_tokens | Token count for the input | | output_tokens | Token count for the output | | tools_called | List of tools invoked with arguments (redacted) | | guardrails_triggered | Which guardrails fired and their action (block, redact, warn) | | latency_ms | End-to-end response time | | cost | Estimated cost of the interaction |

Store audit logs separately from application logs
Never log raw user input or LLM output in production (PII risk) — log hashes or redacted versions
Retention policy aligned with compliance requirements (GDPR: purpose-limited, HIPAA: 6 years)

Frameworks and libraries

| Tool | Language | Scope | |------|----------|-------| | Guardrails AI | Python | Input/output validation with RAIL specs | | NeMo Guardrails (NVIDIA) | Python | Programmable guardrails for LLM apps | | Microsoft Presidio | Python | PII detection and anonymisation | | LangChain output parsers | Python | Structured output validation | | Semantic Kernel filters | C# | Pre/post-processing in the SK pipeline |

Integration with other skills

agent-types — guardrails apply to all agent types
agent-on-premises — on-prem PII filtering avoids sending data to cloud
operations-audit — audit trail integration
architecture-crosscutting — guardrails as crosscutting concern

pkuppens/agent-guardrails

skills/ai-agent-development/agent-guardrails/SKILL.md

Guides implementation of AI agent guardrails: input/output validation, PII filtering, cost control, safety policies, and audit logging. Use when securing agent pipelines or adding compliance and observability.

testing

Updated May 15, 2026

$ install --global

skillsauth

npx skillsauth add pkuppens/pkuppens agent-guardrails

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 15, 2026, 5:39 AM24.7s1 file scanned

SKILL.md

name:: agent-guardrails
description:: >-
Guides implementation of AI agent guardrails:: input/output validation,

Agent Guardrails

Patterns for securing AI agent input and output, controlling costs, protecting privacy, and maintaining audit trails. Guardrails wrap the agent core — they do not replace good prompt engineering.

When to use

Adding input validation or output filtering to an agent
Implementing PII detection and redaction
Setting up cost controls (token budgets, rate limits)
Adding audit logging for compliance or debugging
Reviewing agent security posture

Guardrail architecture

Guardrails apply at two points in the agent pipeline:

User Input → [Input Guardrails] → Agent Core → [Output Guardrails] → User Output
                                       │
                                  [Audit Log]

Input guardrails

Output guardrails

PII filtering

Use Microsoft Presidio or spaCy NER for on-premises PII detection:

Detection — identify PII entities (names, emails, phone numbers, SSNs, medical IDs)
Redaction — replace with placeholders ([PERSON], [EMAIL])
Pseudonymisation — replace with consistent fake values for testing
Audit — log what was redacted (type, position) without logging the actual PII

Healthcare-specific: filter patient identifiers, medical record numbers, and clinical data per HIPAA/GDPR.

Cost control

Token budgets — set max input + output tokens per request and per session
Model routing — use cheaper models for simple tasks, expensive models for complex ones
Caching — cache identical or semantically similar queries to avoid redundant LLM calls
Circuit breaker — disable the agent if spend exceeds a threshold

class CostGuard:
    def __init__(self, max_tokens_per_request: int, max_cost_per_day: float):
        self.max_tokens_per_request = max_tokens_per_request
        self.max_cost_per_day = max_cost_per_day
        self.daily_cost = 0.0

    def check(self, estimated_tokens: int, cost_per_token: float) -> bool:
        estimated_cost = estimated_tokens * cost_per_token
        if estimated_tokens > self.max_tokens_per_request:
            return False
        if self.daily_cost + estimated_cost > self.max_cost_per_day:
            return False
        return True

Audit logging

Every agent interaction should produce an audit record:

Store audit logs separately from application logs
Never log raw user input or LLM output in production (PII risk) — log hashes or redacted versions
Retention policy aligned with compliance requirements (GDPR: purpose-limited, HIPAA: 6 years)

Frameworks and libraries

Integration with other skills

agent-types — guardrails apply to all agent types
agent-on-premises — on-prem PII filtering avoids sending data to cloud
operations-audit — audit trail integration
architecture-crosscutting — guardrails as crosscutting concern

Related Skills

pkuppens/sync-branch

testing

VerifiedTrustedCommunity

Syncs remote default branch locally (checkout, fetch --prune, pull) and returns to the previous branch when it still exists. Reports stashes and worktrees not yet handled. Use when the user asks to sync main, update default branch, fetch/pull origin, or run /sync-branch.

SKILL.mdUpdated Jun 6, 2026

pkuppens/azure-devops-work-items

tools

VerifiedTrustedCommunity

Creates, queries, updates, and links Azure Boards work items via az boards CLI. Use when filing ADO work items, running WIQL queries, or setting area path, iteration, tags, and assignee.

SKILL.mdUpdated May 29, 2026

pkuppens/azure-devops-work-items

pkuppens/azure-devops-repos

tools

VerifiedTrustedCommunity

Creates, reviews, and completes Azure Repos pull requests and branch policies via az repos CLI. Use when opening ADO PRs, setting required reviewers, or configuring build validation policies.

SKILL.mdUpdated May 29, 2026

pkuppens/azure-devops-repos

pkuppens/azure-devops-pipelines

development

VerifiedTrustedCommunity

Guides Azure Pipelines YAML structure, build validation on PRs, and staged deployment with environments and approvals. Use when authoring azure-pipelines.yml or configuring CI/CD on Azure DevOps.

SKILL.mdUpdated May 29, 2026

pkuppens/azure-devops-pipelines

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/pkuppens/pkuppens.git

# Copy into Claude Code skills folder (global)
cp -r pkuppens/skills/ai-agent-development/agent-guardrails ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

pkuppens/pkuppens

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT