skills/white-box-red-testing/SKILL.md
Find bugs by writing tests that should pass but don't. Invoke manually on user-chosen scope (commits, files, or coverage threshold). Outputs red tests with structured rationale. Use when user asks to "stress-test", "find bugs in", "attack", or "break" code.
npx skillsauth add liza-mas/liza white-box-red-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
If you don't know where to look or want to explore broadly, use the black-box-red-testing skill instead.
User chooses scope. Never run unsolicited.
| Mode | Trigger | Targets |
|------|---------|---------|
| Commits | --commits HEAD~3..HEAD | Changed/added functions in diff |
| Files | --files <path> or --module <dir> | All public callables in specified paths |
| Coverage | --coverage-below 70 [--branch] | Functions below threshold via scripts/discover_targets.py |
Note on coverage targeting: Low coverage indicates under-tested code, not necessarily buggy code. Use as a targeting heuristic to prioritize where to look, not as a bug predictor.
1. IDENTIFY TARGETS
- commits: git diff → parse changed functions
- files: AST parse → list public callables
- coverage: run scripts/discover_targets.py
2. GATHER CONTRACT EVIDENCE per target
- docstrings, type annotations, existing passing tests
- function/param names, assertions, call sites, commit messages
- No evidence? → findings become "specification-gap"
3. FORM HYPOTHESES per target as structured one-liners:
[code_path] × [defect_class] → [observable_symptom]
Defect classes to consider:
- boundary inputs (empty, zero, None, unicode, tz-naive)
- state/mutation (shared state, call sequences, input mutation)
- implicit contracts (name promises vs actual behavior)
- error paths (timeouts, missing data, malformed input)
4. GENERATE ADVERSARIAL TESTS — one test per hypothesis
5. SELF-VALIDATE (mandatory)
- Run all generated tests
- Red → candidate finding
- Green → record hypothesis in confidence section of report
- Broken → fix or discard
6. CLASSIFY per references/finding-classification.md
- confirmed-bug | likely-bug | specification-gap
7. APPLY DISTINCTNESS FILTER
Each finding must differ from others in at least one of:
- The code path exercised
- The category of defect found
- The component boundary tested
Shallow variations of the same finding → consolidate into one.
8. OUTPUT
- Test files to output directory (test code only, no classification in tests)
- Summary report to stdout (classification, evidence, impact per finding)
- Format: see references/output-format.md
pytest-cov. Commits and Files modes are language-agnostic.scripts/discover_targets.py — run with --help for options50% of targets yield no findings → report as confidence signal for tested areas, continue with remaining targets
15 findings before all targets analyzed → pause for triage
development
Coordinate Pairing-mode doer/reviewer sessions through a Markdown blackboard. Use when the user invokes /adversarial-pairing with role and blackboard-path arguments or asks multiple pairing agents to coordinate plan review, implementation, staged code review, and follow-up review rounds without Liza multi-agent mode.
data-ai
Analyze Liza agents logs
development
Code Review Protocol
tools
Analyze Liza `.liza/agent-prompts/` and `.liza/agent-outputs/` from a context-engineering perspective: prompt payload shape, context budget use, cacheability, duplicated or missing context, instruction hierarchy, tool-output pressure, role-specific context fit, and prompt-output feedback loops. Use when diagnosing agent context bloat, prompt drift, poor agent handoffs, repeated misunderstandings, excessive tool output, or whether Liza agents received the right information at the right time.