Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

liza-mas/white-box-red-testing

Name: white-box-red-testing
Author: liza-mas

skills/white-box-red-testing/SKILL.md

npx skillsauth add liza-mas/liza white-box-red-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

White-Box Red Testing

Hard Boundaries

NEVER modify source code. Analyze and classify only.
NEVER weaken existing tests.
Unexported/private members: test only if explicitly scoped.
Output to user-specified directory or language-appropriate default.

When to Use

You have specific targets: changed files, a commit range, or low-coverage functions
You need classified findings (confirmed-bug / likely-bug / specification-gap)
Contracts exist (docstrings, types, tests) and you want to verify code matches them
After a commit or PR — "did these changes introduce bugs?"

If you don't know where to look or want to explore broadly, use the black-box-red-testing skill instead.

Scope Modes

User chooses scope. Never run unsolicited.

| Mode | Trigger | Targets | |------|---------|---------| | Commits | --commits HEAD~3..HEAD | Changed/added functions in diff | | Files | --files <path> or --module <dir> | All public callables in specified paths | | Coverage | --coverage-below 70 [--branch] | Functions below threshold via scripts/discover_targets.py |

Note on coverage targeting: Low coverage indicates under-tested code, not necessarily buggy code. Use as a targeting heuristic to prioritize where to look, not as a bug predictor.

Workflow

1. IDENTIFY TARGETS
   - commits: git diff → parse changed functions
   - files: AST parse → list public callables
   - coverage: run scripts/discover_targets.py

2. GATHER CONTRACT EVIDENCE per target
   - docstrings, type annotations, existing passing tests
   - function/param names, assertions, call sites, commit messages
   - No evidence? → findings become "specification-gap"

3. FORM HYPOTHESES per target as structured one-liners:
   [code_path] × [defect_class] → [observable_symptom]

   Defect classes to consider:
   - boundary inputs (empty, zero, None, unicode, tz-naive)
   - state/mutation (shared state, call sequences, input mutation)
   - implicit contracts (name promises vs actual behavior)
   - error paths (timeouts, missing data, malformed input)

4. GENERATE ADVERSARIAL TESTS — one test per hypothesis

5. SELF-VALIDATE (mandatory)
   - Run all generated tests
   - Red → candidate finding
   - Green → record hypothesis in confidence section of report
   - Broken → fix or discard

6. CLASSIFY per references/finding-classification.md
   - confirmed-bug | likely-bug | specification-gap

7. APPLY DISTINCTNESS FILTER
   Each finding must differ from others in at least one of:
   - The code path exercised
   - The category of defect found
   - The component boundary tested
   Shallow variations of the same finding → consolidate into one.

8. OUTPUT
   - Test files to output directory (test code only, no classification in tests)
   - Summary report to stdout (classification, evidence, impact per finding)
   - Format: see references/output-format.md

Constraints

Coverage scope mode requires Python + pytest-cov. Commits and Files modes are language-agnostic.
Coverage scope uses scripts/discover_targets.py — run with --help for options
If >50 targets, ask user to narrow scope

Stop Conditions

Max 5 generation attempts per target — if 3 attempts yield nothing, shift defect category (boundary → state → contract → error path) rather than retrying the same angle
50% of targets yield no findings → report as confidence signal for tested areas, continue with remaining targets
15 findings before all targets analyzed → pause for triage

liza-mas/white-box-red-testing

skills/white-box-red-testing/SKILL.md

Find bugs by writing tests that should pass but don't. Invoke manually on user-chosen scope (commits, files, or coverage threshold). Outputs red tests with structured rationale. Use when user asks to "stress-test", "find bugs in", "attack", or "break" code.

138 stars

development

Updated Apr 18, 2026

$ install --global

skillsauth

npx skillsauth add liza-mas/liza white-box-red-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 18, 2026, 7:16 AM37.2s4 files scanned

SKILL.md

name:: white-box-red-testing
description:: >

White-Box Red Testing

Hard Boundaries

NEVER modify source code. Analyze and classify only.
NEVER weaken existing tests.
Unexported/private members: test only if explicitly scoped.
Output to user-specified directory or language-appropriate default.

When to Use

You have specific targets: changed files, a commit range, or low-coverage functions
You need classified findings (confirmed-bug / likely-bug / specification-gap)
Contracts exist (docstrings, types, tests) and you want to verify code matches them
After a commit or PR — "did these changes introduce bugs?"

If you don't know where to look or want to explore broadly, use the black-box-red-testing skill instead.

Scope Modes

User chooses scope. Never run unsolicited.

Note on coverage targeting: Low coverage indicates under-tested code, not necessarily buggy code. Use as a targeting heuristic to prioritize where to look, not as a bug predictor.

Workflow

1. IDENTIFY TARGETS
   - commits: git diff → parse changed functions
   - files: AST parse → list public callables
   - coverage: run scripts/discover_targets.py

2. GATHER CONTRACT EVIDENCE per target
   - docstrings, type annotations, existing passing tests
   - function/param names, assertions, call sites, commit messages
   - No evidence? → findings become "specification-gap"

3. FORM HYPOTHESES per target as structured one-liners:
   [code_path] × [defect_class] → [observable_symptom]

   Defect classes to consider:
   - boundary inputs (empty, zero, None, unicode, tz-naive)
   - state/mutation (shared state, call sequences, input mutation)
   - implicit contracts (name promises vs actual behavior)
   - error paths (timeouts, missing data, malformed input)

4. GENERATE ADVERSARIAL TESTS — one test per hypothesis

5. SELF-VALIDATE (mandatory)
   - Run all generated tests
   - Red → candidate finding
   - Green → record hypothesis in confidence section of report
   - Broken → fix or discard

6. CLASSIFY per references/finding-classification.md
   - confirmed-bug | likely-bug | specification-gap

7. APPLY DISTINCTNESS FILTER
   Each finding must differ from others in at least one of:
   - The code path exercised
   - The category of defect found
   - The component boundary tested
   Shallow variations of the same finding → consolidate into one.

8. OUTPUT
   - Test files to output directory (test code only, no classification in tests)
   - Summary report to stdout (classification, evidence, impact per finding)
   - Format: see references/output-format.md

Constraints

Coverage scope mode requires Python + pytest-cov. Commits and Files modes are language-agnostic.
Coverage scope uses scripts/discover_targets.py — run with --help for options
If >50 targets, ask user to narrow scope

Stop Conditions

Max 5 generation attempts per target — if 3 attempts yield nothing, shift defect category (boundary → state → contract → error path) rather than retrying the same angle
50% of targets yield no findings → report as confidence signal for tested areas, continue with remaining targets
15 findings before all targets analyzed → pause for triage

Related Skills

liza-mas/adversarial-pairing

development

VerifiedTrustedCommunity

Coordinate Pairing-mode doer/reviewer sessions through a Markdown blackboard. Use when the user invokes /adversarial-pairing with role and blackboard-path arguments or asks multiple pairing agents to coordinate plan review, implementation, staged code review, and follow-up review rounds without Liza multi-agent mode.

257SKILL.mdUpdated May 29, 2026

liza-mas/adversarial-pairing

liza-mas/liza-logs

data-ai

VerifiedTrustedCommunity

Analyze Liza agents logs

255SKILL.mdUpdated Apr 18, 2026

liza-mas/code-review

development

VerifiedTrustedCommunity

Code Review Protocol

235SKILL.mdUpdated Apr 18, 2026

liza-mas/context-engineering

tools

VerifiedTrustedCommunity

Analyze Liza `.liza/agent-prompts/` and `.liza/agent-outputs/` from a context-engineering perspective: prompt payload shape, context budget use, cacheability, duplicated or missing context, instruction hierarchy, tool-output pressure, role-specific context fit, and prompt-output feedback loops. Use when diagnosing agent context bloat, prompt drift, poor agent handoffs, repeated misunderstandings, excessive tool output, or whether Liza agents received the right information at the right time.

229SKILL.mdUpdated Apr 28, 2026

liza-mas/context-engineering

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/liza-mas/liza.git

# Copy into Claude Code skills folder (global)
cp -r liza/skills/white-box-red-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

liza-mas/liza

138 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT