nWave/skills/nw-property-based-testing/SKILL.md
Property-based testing strategies, mutation testing, shrinking, and combined PBT+mutation workflow for test quality validation
npx skillsauth add nwave-ai/nwave nw-property-based-testingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Deferred to Phase 2.25: Mutation testing runs ONCE per feature as final quality gate at orchestrator Phase 2.25 (after all steps complete). Do NOT run mutation testing during inner TDD loop.
Instead of examples ("given X, expect Y"), write properties ("for all valid inputs, condition Z holds"). Framework generates hundreds/thousands of inputs checking property. Dramatically expands test coverage.
When property fails, framework auto-finds minimal failing input. Dramatically accelerates debugging. Algorithm: find failing input -> try simpler variants -> if still fails, use as new candidate -> repeat.
| Language | Framework | |----------|-----------| | Python | Hypothesis | | JavaScript/TypeScript | fast-check | | Haskell | QuickCheck | | Rust | quickcheck | | Java | jqwik | | C# | FsCheck |
Adopted by Amazon, Volvo, Stripe, Jane Street (ICSE 2024 study).
HIGH value: algorithms | data structures | serialization | business rules (validation, calculations) | protocols/state machines | unbounded input domain with universal invariant. LOW value: simple CRUD | UI logic | external API integrations | closed-world finite domain (use parametrize instead — see falsifier-gate below). PBT complements example-based testing, doesn't replace it.
If the input domain is finite + enumerable (N known files, M known event types, K known skill names, fixed Python versions), PBT is the wrong tool:
Hypothesis import (~457ms) + per-example bookkeeping > @pytest.mark.parametrize overheadDecision rule: enumerate the domain. If listable ([a, b, c, ...]), use parametrize-collapse or dict-iteration (see nw-test-optimization §3.1, §3.2). Reserve PBT for "for all X in DOMAIN, P(X) holds" where DOMAIN is infinite (all strings, all integers, all valid JSON, all sorted lists).
Empirical anchor 2026-05-18: 155-file closed-world skill registry PBT migration was correctly aborted at recon stage by the falsifier-gate. Solution: set-difference parametrize-collapse (commit c2637f6c8), 5.42s → 0.71s (8.9× faster). Mass-migrating closed-world tests to PBT would have made the suite slower, not faster.
See nw-test-optimization §4-bis Paradigm-Match Decision Rule for the full shape-to-paradigm table.
Properties = higher-level spec that survives refactoring better than examples.
Evaluates test suite quality by introducing artificial bugs (mutations) and checking if tests catch them. Mutation score = killed mutants / total mutants. Stronger metric than code coverage.
| Score | Quality | |-------|---------| | < 60% | Weak suite, significant gaps | | 60-80% | Moderate, some gaps | | > 80% | Strong, few gaps |
Target: 75-80% minimum. Not all survivors indicate bad tests (equivalent mutants exist).
Change == to != | + to - | remove method call | change constant | modify loop boundary | alter comparison.
| Language | Tool | |----------|------| | Java | PIT | | JavaScript/TypeScript/C# | Stryker | | Python | mutmut, Cosmic Ray |
Computationally expensive. Use incremental: on changed code in PRs, full codebase weekly.
Quality ratchet: each technique exposes gaps others miss. Prioritize critical paths and complex algorithms.
Modern frameworks allow configuring example count per context.
Combines the delta-first paradigm (see nw-tdd-methodology::Delta-First Test Paradigm) with Hypothesis shrinking to cover production code that branches on input shape.
path_strategy() — composite Hypothesis strategyLocation: nwave_ai/state_delta/strategies/path_strategy.py
Generates realistic PATH string shapes covering 4 production branches:
$HOME/bin literal (unexpanded shell variable)/usr/local/bin only)Lazy-import boundary: hypothesis is NOT imported at import nwave_ai.state_delta.matcher time. It is loaded only when path_strategy() is called. This is verified by a subprocess-isolated test at tests/state_delta/unit/test_lazy_import.py — importing the matcher in a hypothesis-free environment must not raise ImportError.
from hypothesis import given, settings
from nwave_ai.state_delta.strategies.path_strategy import path_strategy
from nwave_ai.state_delta import assert_state_delta, prepended_with, unchanged
@given(path_strategy())
@settings(max_examples=500)
def test_path_injection_all_shapes(initial_path):
before = {"env.PATH": initial_path, "env.OTHER": "x"}
result_path = inject_nwave_bin(initial_path)
after = {"env.PATH": result_path, "env.OTHER": "x"}
assert_state_delta(
before,
after,
universe={"env.PATH", "env.OTHER"},
expected={"env.PATH": prepended_with("/home/user/.nwave/bin"),
"env.OTHER": unchanged()},
)
Hypothesis shrinking finds the minimal failing PATH shape automatically when a branch is broken.
@given replaces N parametrized example tests covering the same branches.tests/state_delta/integration/test_pilot_bug48.py::test_pilot_bug48_post_fix_validated — 500 examples, GREEN in 0.88s.testing
Runs feature-scoped mutation testing to validate test suite quality. Use after implementation to verify tests catch real bugs (kill rate >= 80%).
development
Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.
development
Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.
testing
Methodology for minimizing test count while maximizing behavioral coverage - behavior definition, anti-pattern catalog, consolidation patterns, stopping criterion, coverage-preserving validation