skills/agentic/trust-and-safety/SKILL.md
Trust, safety, and control patterns for production agentic systems with human-in-the-loop gates and guardrails. Use when: designing guardrails or human-in-the-loop gates for an agent, or hardening an agentic system against prompt injection.
npx skillsauth add mikeparcewski/wicked-garden trust-and-safetyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Essential patterns for building safe, trustworthy, production-ready agentic systems.
Always require approval for:
Consider approval for:
async def execute_with_approval(action, threshold=0.8):
if action.confidence < threshold or action.is_high_stakes():
approval = await request_human_approval(action)
if not approval.approved:
raise ApprovalDenied(approval.reason)
return await action.execute()
Synchronous Approval: Block until human responds (for urgent decisions) Asynchronous Approval: Queue for later review (for batch operations) Escalation Chains: Route to higher authority if primary approver unavailable Timeout Handling: Define what happens if no approval received
See refs/guardrails-input-output.md, refs/guardrails-actions.md, and refs/guardrails-resources.md for detailed implementation patterns.
Force outputs into validated schemas using Pydantic or similar.
Check outputs before acting on them:
Cross-Validation: Multiple agents check same fact Source Verification: Verify claims against ground truth Confidence Thresholds: Reject low-confidence outputs Fact Checking: Use retrieval to verify factual claims
See refs/guardrails-input-output.md for code examples.
Safer than blacklisting. Define allowed commands/actions explicitly.
Isolate agent execution:
Prevent runaway resource usage:
See refs/guardrails-actions.md and refs/guardrails-resources.md for implementation details.
Clean user inputs before passing to LLM. Remove instruction-like patterns.
Use clear delimiters to separate system instructions from user input.
Separate instruction and data contexts using role-based message formatting.
See refs/guardrails-input-output.md for defense patterns and code examples.
Regex patterns for email, SSN, credit cards, phone numbers, etc.
Replace detected PII with [REDACTED_TYPE] tokens.
See refs/guardrails-input-output.md for detection and redaction code.
Retrieval-Augmented Generation (RAG): Retrieve facts before generation Citation Requirements: Require source citations for all claims
Multi-Agent Verification: Independent verification by multiple agents Confidence Calibration: Require confidence scores, reject low-confidence outputs
See refs/guardrails-actions.md and refs/guardrails-resources.md for implementation patterns.
Kill Switch: Emergency stop that halts all operations and alerts administrators. Circuit Breaker: Opens circuit after threshold failures to prevent cascading failures. Rate Limiting: Limits requests per user/time window to prevent abuse.
See refs/guardrails-actions.md and refs/guardrails-resources.md for complete implementations.
See refs/safety-checklist-core.md for the full pre-deployment checklist covering human gates, validation, whitelisting, resource limits, PII, prompt injection, hallucination, circuit breakers, rate limiting, kill switches, audit logging, and rollback. See refs/safety-checklist-advanced.md for monitoring, incidents, testing, and ops checklists.
refs/safety-checklist-core.md - Core safety checklist (input, output, action, auth, privacy)refs/safety-checklist-advanced.md - Advanced safety checklist (monitoring, incidents, testing, ops)refs/guardrails-input-output.md - Input validation, sanitization, prompt injection, output filteringrefs/guardrails-actions.md - Action whitelisting, approvals, sandboxed executionrefs/guardrails-resources.md - Resource limiting, monitoring, complete guardrail architecturedevelopment
--- name: large-scale-migration description: How to execute a LARGE MECHANICAL change across any codebase with LEVERAGE instead of an agent-grind or hand-edits — a cross-cutting migration, refactor, rename, dialect/framework/DB port, library adoption, or bulk transform. The map→transform→gate pattern: a deterministic transform driven by a source-of-truth map, proven by a differential-equivalence gate. Use when the work is "migrate all X to Y", "rename Z everywhere", "port to a new DB/dialect/fra
testing
v11 LLM-based work-shape classifier. Replaces the regex archetype detector with the model's own reasoning. Reads the user's prompt, picks the right archetype(s) from the catalog, identifies signals (blast_radius, novelty, reversibility, etc.), and persists to SessionState so subsequent turns steer correctly. Use when: the prompt_submit hook emitted a `<wg classify-due />` directive, OR explicitly invoked at session start, OR when re-classifying after the user changes scope mid-session.
tools
v11 work-shape archetype runner. When a prompt has been routed to one of the 9 archetypes (triage, explore, specify, decide, ship, review, incident, build, migrate), this skill is the entry point. It picks the right per-archetype playbook from refs/ and executes the phase shape declared in `.claude-plugin/archetypes.json`. Use when: a `<wg archetype="X">` or `<wg archetypes>` system-reminder tag appears, an explicit "let's run the X archetype" request, or when one of the per-archetype slash commands resolves to this skill.
development
Show or set the session intent variable. Intent gates how loud the framework is — simple-edit (silent), feature/research (synthesis directive), rigor (full crew context). Auto-detected on turn 1; this skill overrides explicitly. Sticky for the session. Use when: "set intent", "intent override", "/wicked-garden:intent", "make the framework quiet", "force rigor", "what's my intent".