Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

mikeparcewski/wicked-garden-agentic-trust-and-safety

Name: wicked-garden-agentic-trust-and-safety
Author: mikeparcewski

skills/agentic/trust-and-safety/SKILL.md

npx skillsauth add mikeparcewski/wicked-garden wicked-garden-agentic-trust-and-safety

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Trust and Safety

Essential patterns for building safe, trustworthy, production-ready agentic systems.

Core Principles

Least Privilege: Agents have minimum necessary permissions
Defense in Depth: Multiple layers of safety checks
Fail Safe: Errors should fail toward safety, not capability
Human Oversight: High-stakes decisions require human approval
Auditability: All decisions and actions are logged and traceable
Graceful Degradation: System remains safe even when components fail

Human-in-the-Loop Gates

When to Add Human Gates

Always require approval for:

Production data modifications (delete, update critical data)
Financial transactions above threshold
Communications to external parties
Credential or security changes
Irreversible operations

Consider approval for:

First-time operations
Operations outside normal patterns
Low-confidence decisions
Operations near resource limits

Implementation Pattern

async def execute_with_approval(action, threshold=0.8):
    if action.confidence < threshold or action.is_high_stakes():
        approval = await request_human_approval(action)
        if not approval.approved:
            raise ApprovalDenied(approval.reason)
    return await action.execute()

Approval Workflow Design

Synchronous Approval: Block until human responds (for urgent decisions) Asynchronous Approval: Queue for later review (for batch operations) Escalation Chains: Route to higher authority if primary approver unavailable Timeout Handling: Define what happens if no approval received

See refs/guardrails-input-output.md, refs/guardrails-actions.md, and refs/guardrails-resources.md for detailed implementation patterns.

Output Validation

Structured Output Validation

Force outputs into validated schemas using Pydantic or similar.

Content Validation

Check outputs before acting on them:

Format validation: Matches expected structure
Range validation: Numeric values within acceptable bounds
Completeness validation: Required fields are present
Consistency validation: Outputs are internally consistent

Hallucination Detection

Cross-Validation: Multiple agents check same fact Source Verification: Verify claims against ground truth Confidence Thresholds: Reject low-confidence outputs Fact Checking: Use retrieval to verify factual claims

See refs/guardrails-input-output.md for code examples.

Action Constraints and Sandboxing

Whitelisting

Safer than blacklisting. Define allowed commands/actions explicitly.

Sandboxing

Isolate agent execution:

Containerization: Run agents in Docker containers
Virtual environments: Separate Python/Node environments
File system restrictions: Limit access to specific directories
Network isolation: Control network access

Resource Limits

Prevent runaway resource usage:

Max runtime (timeouts)
Max memory
Max tokens per request/session
Max API calls

See refs/guardrails-actions.md and refs/guardrails-resources.md for implementation details.

Prompt Injection Defense

Input Sanitization

Clean user inputs before passing to LLM. Remove instruction-like patterns.

Delimiter-Based Protection

Use clear delimiters to separate system instructions from user input.

Privilege Separation

Separate instruction and data contexts using role-based message formatting.

See refs/guardrails-input-output.md for defense patterns and code examples.

PII Detection and Protection

Pattern-Based Detection

Regex patterns for email, SSN, credit cards, phone numbers, etc.

Redaction

Replace detected PII with [REDACTED_TYPE] tokens.

PII Policies

Never log PII in plain text
Encrypt PII at rest and in transit
Minimize PII collection (only collect what's needed)
Retention limits (delete after specified period)

See refs/guardrails-input-output.md for detection and redaction code.

Hallucination Mitigation

Grounding Techniques

Retrieval-Augmented Generation (RAG): Retrieve facts before generation Citation Requirements: Require source citations for all claims

Verification Strategies

Multi-Agent Verification: Independent verification by multiple agents Confidence Calibration: Require confidence scores, reject low-confidence outputs

See refs/guardrails-actions.md and refs/guardrails-resources.md for implementation patterns.

Kill Switches and Circuit Breakers

Kill Switch: Emergency stop that halts all operations and alerts administrators. Circuit Breaker: Opens circuit after threshold failures to prevent cascading failures. Rate Limiting: Limits requests per user/time window to prevent abuse.

See refs/guardrails-actions.md and refs/guardrails-resources.md for complete implementations.

Safety Checklist

See refs/safety-checklist-core.md for the full pre-deployment checklist covering human gates, validation, whitelisting, resource limits, PII, prompt injection, hallucination, circuit breakers, rate limiting, kill switches, audit logging, and rollback. See refs/safety-checklist-advanced.md for monitoring, incidents, testing, and ops checklists.

References

refs/safety-checklist-core.md - Core safety checklist (input, output, action, auth, privacy)
refs/safety-checklist-advanced.md - Advanced safety checklist (monitoring, incidents, testing, ops)
refs/guardrails-input-output.md - Input validation, sanitization, prompt injection, output filtering
refs/guardrails-actions.md - Action whitelisting, approvals, sandboxed execution
refs/guardrails-resources.md - Resource limiting, monitoring, complete guardrail architecture

mikeparcewski/wicked-garden-agentic-trust-and-safety

skills/agentic/trust-and-safety/SKILL.md

Trust, safety, and control patterns for production agentic systems with human-in-the-loop gates and guardrails. Use when: designing guardrails or human-in-the-loop gates for an agent, or hardening an agentic system against prompt injection.

8 stars

development

Updated Jul 12, 2026

$ install --global

skillsauth

npx skillsauth add mikeparcewski/wicked-garden wicked-garden-agentic-trust-and-safety

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 12, 2026, 7:05 AM101.2s6 files scanned

SKILL.md

name:: wicked-garden-agentic-trust-and-safety
description:: |
Use when:: designing guardrails or human-in-the-loop gates for an agent, or
portability:: portable
phase_relevance:: ["design", "review"]
archetype_relevance:: ["*"]

Trust and Safety

Essential patterns for building safe, trustworthy, production-ready agentic systems.

Core Principles

Least Privilege: Agents have minimum necessary permissions
Defense in Depth: Multiple layers of safety checks
Fail Safe: Errors should fail toward safety, not capability
Human Oversight: High-stakes decisions require human approval
Auditability: All decisions and actions are logged and traceable
Graceful Degradation: System remains safe even when components fail

Human-in-the-Loop Gates

When to Add Human Gates

Always require approval for:

Production data modifications (delete, update critical data)
Financial transactions above threshold
Communications to external parties
Credential or security changes
Irreversible operations

Consider approval for:

First-time operations
Operations outside normal patterns
Low-confidence decisions
Operations near resource limits

Implementation Pattern

async def execute_with_approval(action, threshold=0.8):
    if action.confidence < threshold or action.is_high_stakes():
        approval = await request_human_approval(action)
        if not approval.approved:
            raise ApprovalDenied(approval.reason)
    return await action.execute()

Approval Workflow Design

See refs/guardrails-input-output.md, refs/guardrails-actions.md, and refs/guardrails-resources.md for detailed implementation patterns.

Output Validation

Structured Output Validation

Force outputs into validated schemas using Pydantic or similar.

Content Validation

Check outputs before acting on them:

Format validation: Matches expected structure
Range validation: Numeric values within acceptable bounds
Completeness validation: Required fields are present
Consistency validation: Outputs are internally consistent

Hallucination Detection

See refs/guardrails-input-output.md for code examples.

Action Constraints and Sandboxing

Whitelisting

Safer than blacklisting. Define allowed commands/actions explicitly.

Sandboxing

Isolate agent execution:

Containerization: Run agents in Docker containers
Virtual environments: Separate Python/Node environments
File system restrictions: Limit access to specific directories
Network isolation: Control network access

Resource Limits

Prevent runaway resource usage:

Max runtime (timeouts)
Max memory
Max tokens per request/session
Max API calls

See refs/guardrails-actions.md and refs/guardrails-resources.md for implementation details.

Prompt Injection Defense

Input Sanitization

Clean user inputs before passing to LLM. Remove instruction-like patterns.

Delimiter-Based Protection

Use clear delimiters to separate system instructions from user input.

Privilege Separation

Separate instruction and data contexts using role-based message formatting.

See refs/guardrails-input-output.md for defense patterns and code examples.

PII Detection and Protection

Pattern-Based Detection

Regex patterns for email, SSN, credit cards, phone numbers, etc.

Redaction

Replace detected PII with [REDACTED_TYPE] tokens.

PII Policies

Never log PII in plain text
Encrypt PII at rest and in transit
Minimize PII collection (only collect what's needed)
Retention limits (delete after specified period)

See refs/guardrails-input-output.md for detection and redaction code.

Hallucination Mitigation

Grounding Techniques

Retrieval-Augmented Generation (RAG): Retrieve facts before generation Citation Requirements: Require source citations for all claims

Verification Strategies

Multi-Agent Verification: Independent verification by multiple agents Confidence Calibration: Require confidence scores, reject low-confidence outputs

See refs/guardrails-actions.md and refs/guardrails-resources.md for implementation patterns.

Kill Switches and Circuit Breakers

See refs/guardrails-actions.md and refs/guardrails-resources.md for complete implementations.

Safety Checklist

References

refs/safety-checklist-core.md - Core safety checklist (input, output, action, auth, privacy)
refs/safety-checklist-advanced.md - Advanced safety checklist (monitoring, incidents, testing, ops)
refs/guardrails-input-output.md - Input validation, sanitization, prompt injection, output filtering
refs/guardrails-actions.md - Action whitelisting, approvals, sandboxed execution
refs/guardrails-resources.md - Resource limiting, monitoring, complete guardrail architecture

Related Skills

mikeparcewski/wicked-garden-engineering-conformance-reviewer

development

VerifiedTrustedCommunity

Pattern-conformance agent-half: evaluates a produced artifact or diff against a set of architectural/design pattern rules from the conformance-rule store (wicked_governance schema). Returns structured findings with rule ID, severity, and rationale — the deterministic half (mechanical rule recall) is done by the guard pipeline; this is the semantic evaluation step. Triggered by: the guard_pipeline `outgov_pattern` check (session-close), or explicitly by an engineering review when WICKED_OUTGOV_RULES_DIR is populated. NOT a replacement for the full `engineering` review skill — focuses only on conformance to stored Pattern rules; architecture and code-quality checks live in the `engineering` skill. Semantic evaluation reuses `wicked-garden-qe-semantic-reviewer` as the designated agent-half evaluator (per garden#983 spec). This skill is the orchestrating wrapper that loads applicable Pattern rules and delegates the per-rule semantic judgment to qe-semantic-reviewer.

8SKILL.mdUpdated Jul 22, 2026

mikeparcewski/wicked-garden-engineering-conformance-reviewer

mikeparcewski/wicked-garden-domain

tools

VerifiedTrustedCommunity

The FOUNDATIONAL domain-model capability: extract a codebase's domain — testable business rules (with confidence + provenance), entities, requirements — as a schema-conformant model on the estate graph. The workers annotate the store; wicked-core reads it and builds the requirements graph, coverage-gating fail-closed. Steers three fork workers. A shared substrate, not a modernization tool. The `modernize` archetype DERIVES from it; build / migrate / review / specify / explore consume the SAME domain model — none OWN it. Understanding a codebase's domain is upstream of almost everything else garden does. Use when: "extract the business rules / domain model from this codebase", "build a requirements graph from the code", "what does this system actually require", "reverse-engineer the domain before we build/port/migrate". Works on ANY codebase (modern or legacy) — the value is the domain model, not the porting. NOT the code transform itself (that is the archetype consuming this model). This skill produces the DOMAIN MODEL, not new code.

8SKILL.mdUpdated Jul 15, 2026

mikeparcewski/wicked-garden-domain

mikeparcewski/wicked-garden-domain-modeler

development

VerifiedTrustedCommunity

Domain-graph fork worker for the modernize archetype. Groups the estate's Louvain communities into business domains, attaches each requirement to its cluster (advisory cluster_id provenance), and invokes wicked-core's domain-graph build (which reads the annotated estate store, recomputes coverage fail-closed, and builds the requirements graph) — then validates core's output against the vendored schema. Use when: dispatched by wicked-garden-domain after rule extraction to turn a flat rule set into cluster-keyed domains; "group these into domains", "build the requirements graph", "translate clusters into a domain model". NOT for mining the rules themselves (that is domain-extractor) or threat-modeling (that is domain-coverage).

8SKILL.mdUpdated Jul 15, 2026

mikeparcewski/wicked-garden-domain-modeler

mikeparcewski/wicked-garden-domain-extractor

tools

VerifiedTrustedCommunity

Rule-extraction fork worker for the FOUNDATIONAL domain-model capability. Mines testable business rules from a codebase — each with a numeric confidence and a provenance{source, ref, source_kinds} — and annotates them into the estate store so wicked-core can build the domain-model requirements graph (coverage-gated). This is a substrate, not a modernization tool: the `modernize` archetype DERIVES from it, and build / migrate / review / specify / explore can consume the same domain model — none OWN it. Use when: dispatched by wicked-garden-domain to mine the business_rules of a codebase (or a module); "extract the domain rules", "what does this system require", building the requirements half of a domain model. NOT for grouping into domains (that is domain-modeler) or judging coverage (that is domain-coverage — a seat-distinct evaluator).

8SKILL.mdUpdated Jul 15, 2026

mikeparcewski/wicked-garden-domain-extractor

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/mikeparcewski/wicked-garden.git

# Copy into Claude Code skills folder (global)
cp -r wicked-garden/skills/agentic/trust-and-safety ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

mikeparcewski/wicked-garden

8 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT