plugins/nw/skills/nw-property-based-testing/SKILL.md
Property-based testing strategies, mutation testing, shrinking, and combined PBT+mutation workflow for test quality validation
npx skillsauth add nwave-ai/nwave nw-property-based-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Deferred to Phase 2.25: Mutation testing runs ONCE per feature as final quality gate at orchestrator Phase 2.25 (after all steps complete). Do NOT run mutation testing during inner TDD loop.
Instead of examples ("given X, expect Y"), write properties ("for all valid inputs, condition Z holds"). Framework generates hundreds/thousands of inputs checking property. Dramatically expands test coverage.
When property fails, framework auto-finds minimal failing input. Dramatically accelerates debugging. Algorithm: find failing input -> try simpler variants -> if still fails, use as new candidate -> repeat.
| Language | Framework | |----------|-----------| | Python | Hypothesis | | JavaScript/TypeScript | fast-check | | Haskell | QuickCheck | | Rust | quickcheck | | Java | jqwik | | C# | FsCheck |
Adopted by Amazon, Volvo, Stripe, Jane Street (ICSE 2024 study).
HIGH value: algorithms | data structures | serialization | business rules (validation, calculations) | protocols/state machines. LOW value: simple CRUD | UI logic | external API integrations. PBT complements example-based testing, doesn't replace it.
Properties = higher-level spec that survives refactoring better than examples.
Evaluates test suite quality by introducing artificial bugs (mutations) and checking if tests catch them. Mutation score = killed mutants / total mutants. Stronger metric than code coverage.
| Score | Quality | |-------|---------| | < 60% | Weak suite, significant gaps | | 60-80% | Moderate, some gaps | | > 80% | Strong, few gaps |
Target: 75-80% minimum. Not all survivors indicate bad tests (equivalent mutants exist).
Change == to != | + to - | remove method call | change constant | modify loop boundary | alter comparison.
| Language | Tool | |----------|------| | Java | PIT | | JavaScript/TypeScript/C# | Stryker | | Python | mutmut, Cosmic Ray |
Computationally expensive. Use incremental: on changed code in PRs, full codebase weekly.
Quality ratchet: each technique exposes gaps others miss. Prioritize critical paths and complex algorithms.
Modern frameworks allow configuring example count per context.
Combines the delta-first paradigm (see nw-tdd-methodology::Delta-First Test Paradigm) with Hypothesis shrinking to cover production code that branches on input shape.
path_strategy() — composite Hypothesis strategyLocation: nwave_ai/state_delta/strategies/path_strategy.py
Generates realistic PATH string shapes covering 4 production branches:
$HOME/bin literal (unexpanded shell variable)/usr/local/bin only)Lazy-import boundary: hypothesis is NOT imported at import nwave_ai.state_delta.matcher time. It is loaded only when path_strategy() is called. This is verified by a subprocess-isolated test at tests/state_delta/unit/test_lazy_import.py — importing the matcher in a hypothesis-free environment must not raise ImportError.
from hypothesis import given, settings
from nwave_ai.state_delta.strategies.path_strategy import path_strategy
from nwave_ai.state_delta import assert_state_delta, prepended_with, unchanged
@given(path_strategy())
@settings(max_examples=500)
def test_path_injection_all_shapes(initial_path):
before = {"env.PATH": initial_path, "env.OTHER": "x"}
result_path = inject_nwave_bin(initial_path)
after = {"env.PATH": result_path, "env.OTHER": "x"}
assert_state_delta(
before,
after,
universe={"env.PATH", "env.OTHER"},
expected={"env.PATH": prepended_with("/home/user/.nwave/bin"),
"env.OTHER": unchanged()},
)
Hypothesis shrinking finds the minimal failing PATH shape automatically when a branch is broken.
@given replaces N parametrized example tests covering the same branches.tests/state_delta/integration/test_pilot_bug48.py::test_pilot_bug48_post_fix_validated — 500 examples, GREEN in 0.88s.testing
Acceptance test creation methodology for the DISTILL wave. Domain knowledge for the acceptance designer agent: port-to-port principle, prior wave reading, wave-decision reconciliation, graceful degradation, and document back-propagation.
testing
Methodology for minimizing test count while maximizing behavioral coverage - behavior definition, anti-pattern catalog, consolidation patterns, stopping criterion, coverage-preserving validation
testing
Methodology for minimizing test count while maximizing behavioral coverage - behavior definition, anti-pattern catalog, consolidation patterns, stopping criterion, coverage-preserving validation
development
Design mandates for acceptance tests - hexagonal boundary, business language abstraction, user journey completeness, pure function extraction, 3 Pillars (domain language / chained narrative / production composition), and the layered ATD discipline (Universe-bound assertion, layer-dependent PBT mode, two-tier acceptance, example-based sad paths)