Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

nwave-ai/nw-agent-testing

Name: nw-agent-testing
Author: nwave-ai

nWave/skills/nw-agent-testing/SKILL.md

npx skillsauth add nwave-ai/nwave nw-agent-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Agent Testing Framework

5-Layer Testing Approach

Layer 1: Output Quality (Unit-Level)

Validate agent produces correct, well-structured outputs for typical inputs.

Test: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds

How: Manual invocation with representative inputs. Check against acceptance criteria in agent description.

Layer 2: Integration / Handoff Validation

Validate correct input/output between agents in workflows.

Test: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously)

How: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER).

Layer 3: Adversarial Output Validation

Challenge validity of agent outputs rather than accepting at face value.

Test: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?)

How: Peer review by -reviewer agent using structured critique dimensions.

Layer 4: Adversarial Verification (Peer Review)

Independent review to catch biases and blind spots in agent design.

Test: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler agent achieve same results?

How: @nw-agent-builder validates via 11-point checklist or @agent-builder-reviewer runs structured review.

Layer 5: Security Validation

Test resilience against misuse and prompt injection.

Test: Tool restriction enforcement | maxTurns respected | Permission mode correctly scoped | Agent stays within declared scope

How: Frontmatter fields enforce at platform level. Verify configuration.

Prompt Injection Resistance

Claude Code platform provides injection resistance through: subagent isolation (own context, no sub-subagents) | Tool restriction via frontmatter tools | Permission modes via permissionMode | Hook-based validation (PreToolUse, PostToolUse)

Do NOT add prose-based injection defense. Configure platform features:

---
tools: Read, Glob, Grep           # Only tools this agent needs
maxTurns: 30                       # Prevents runaway execution
permissionMode: default            # User approves dangerous actions
---

Security Validation Checklist

[ ] tools restricted to minimum necessary (least privilege)
[ ] maxTurns set to prevent runaway execution
[ ] permissionMode appropriate for risk level
[ ] No Bash unless agent requires command execution
[ ] No Write unless agent creates/modifies files
[ ] Description accurately describes scope
[ ] Subagent mode handles autonomous execution correctly
[ ] No sensitive data hardcoded in definition

Testing Workflow for New Agents

Create with minimal definition
Layer 1: Invoke with 2-3 representative inputs, check outputs
Layer 2: Run in workflow chain if applicable
Fix failures observed
Validate: Run 11-point checklist
Iterate: Add instructions only for observed failure modes

nwave-ai/nw-agent-testing

nWave/skills/nw-agent-testing/SKILL.md

5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance

324 stars

testing

Updated Apr 9, 2026

$ install --global

skillsauth

npx skillsauth add nwave-ai/nwave nw-agent-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 9, 2026, 2:41 AM4.8s1 file scanned

SKILL.md

name:: nw-agent-testing
description:: 5-layer testing approach for agent validation including adversarial testing, security validation, and prompt injection resistance
user-invocable:: false
disable-model-invocation:: true

Agent Testing Framework

5-Layer Testing Approach

Layer 1: Output Quality (Unit-Level)

Validate agent produces correct, well-structured outputs for typical inputs.

Test: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds

How: Manual invocation with representative inputs. Check against acceptance criteria in agent description.

Layer 2: Integration / Handoff Validation

Validate correct input/output between agents in workflows.

Test: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously)

How: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER).

Layer 3: Adversarial Output Validation

Challenge validity of agent outputs rather than accepting at face value.

Test: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?)

How: Peer review by -reviewer agent using structured critique dimensions.

Layer 4: Adversarial Verification (Peer Review)

Independent review to catch biases and blind spots in agent design.

Test: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler agent achieve same results?

How: @nw-agent-builder validates via 11-point checklist or @agent-builder-reviewer runs structured review.

Layer 5: Security Validation

Test resilience against misuse and prompt injection.

Test: Tool restriction enforcement | maxTurns respected | Permission mode correctly scoped | Agent stays within declared scope

How: Frontmatter fields enforce at platform level. Verify configuration.

Prompt Injection Resistance

Do NOT add prose-based injection defense. Configure platform features:

---
tools: Read, Glob, Grep           # Only tools this agent needs
maxTurns: 30                       # Prevents runaway execution
permissionMode: default            # User approves dangerous actions
---

Security Validation Checklist

[ ] tools restricted to minimum necessary (least privilege)
[ ] maxTurns set to prevent runaway execution
[ ] permissionMode appropriate for risk level
[ ] No Bash unless agent requires command execution
[ ] No Write unless agent creates/modifies files
[ ] Description accurately describes scope
[ ] Subagent mode handles autonomous execution correctly
[ ] No sensitive data hardcoded in definition

Testing Workflow for New Agents

Create with minimal definition
Layer 1: Invoke with 2-3 representative inputs, check outputs
Layer 2: Run in workflow chain if applicable
Fix failures observed
Validate: Run 11-point checklist
Iterate: Add instructions only for observed failure modes

Related Skills

nwave-ai/nw-distill

testing

VerifiedTrustedCommunity

Acceptance test creation methodology for the DISTILL wave. Domain knowledge for the acceptance designer agent: port-to-port principle, prior wave reading, wave-decision reconciliation, graceful degradation, and document back-propagation.

563SKILL.mdUpdated May 21, 2026

nwave-ai/nw-collaboration-and-handoffs

development

VerifiedTrustedCommunity

Cross-agent collaboration protocols, workflow handoff patterns, and commit message formats for TDD/Mikado/refactoring workflows

563SKILL.mdUpdated Apr 16, 2026

nwave-ai/nw-collaboration-and-handoffs

nwave-ai/nw-roadmap

development

VerifiedTrustedCommunity

Creates a phased roadmap.json for a feature goal with acceptance criteria and TDD steps. Use when planning implementation steps before execution.

563SKILL.mdUpdated Apr 9, 2026

nwave-ai/nw-distill

testing

VerifiedTrustedCommunity

563SKILL.mdUpdated Apr 9, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/nwave-ai/nwave.git

# Copy into Claude Code skills folder (global)
cp -r nwave/nWave/skills/nw-agent-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

nwave-ai/nwave

324 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT