nWave/skills/nw-agent-testing/SKILL.md
5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance
npx skillsauth add nwave-ai/nwave nw-agent-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Validate agent produces correct, well-structured outputs for typical inputs.
Test: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds
How: Manual invocation with representative inputs. Check against acceptance criteria in agent description.
Validate correct input/output between agents in workflows.
Test: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously)
How: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER).
Challenge validity of agent outputs rather than accepting at face value.
Test: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?)
How: Peer review by -reviewer agent using structured critique dimensions.
Independent review to catch biases and blind spots in agent design.
Test: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler agent achieve same results?
How: @nw-agent-builder validates via 11-point checklist or @agent-builder-reviewer runs structured review.
Test resilience against misuse and prompt injection.
Test: Tool restriction enforcement | maxTurns respected | Permission mode correctly scoped | Agent stays within declared scope
How: Frontmatter fields enforce at platform level. Verify configuration.
Claude Code platform provides injection resistance through: subagent isolation (own context, no sub-subagents) | Tool restriction via frontmatter tools | Permission modes via permissionMode | Hook-based validation (PreToolUse, PostToolUse)
Do NOT add prose-based injection defense. Configure platform features:
---
tools: Read, Glob, Grep # Only tools this agent needs
maxTurns: 30 # Prevents runaway execution
permissionMode: default # User approves dangerous actions
---
tools restricted to minimum necessary (least privilege)maxTurns set to prevent runaway executionpermissionMode appropriate for risk levelBash unless agent requires command executionWrite unless agent creates/modifies filestesting
Acceptance test creation methodology for the DISTILL wave. Domain knowledge for the acceptance designer agent: port-to-port principle, prior wave reading, wave-decision reconciliation, graceful degradation, and document back-propagation.
testing
Methodology for minimizing test count while maximizing behavioral coverage - behavior definition, anti-pattern catalog, consolidation patterns, stopping criterion, coverage-preserving validation
testing
Methodology for minimizing test count while maximizing behavioral coverage - behavior definition, anti-pattern catalog, consolidation patterns, stopping criterion, coverage-preserving validation
development
Design mandates for acceptance tests - hexagonal boundary, business language abstraction, user journey completeness, pure function extraction, 3 Pillars (domain language / chained narrative / production composition), and the layered ATD discipline (Universe-bound assertion, layer-dependent PBT mode, two-tier acceptance, example-based sad paths)