plugins/nw/skills/nw-formal-verification-tlaplus/SKILL.md
TLA+ and PlusCal for specifying distributed system invariants. Decision heuristics for when formal verification adds value, key patterns, state explosion management, and alternatives comparison.
npx skillsauth add nwave-ai/nwave nw-formal-verification-tlaplusInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Is the system distributed or concurrent?
|
+-- No --> Complex state machine with high failure cost?
| +-- No --> NOT cost-effective. Use property-based testing.
| +-- Yes --> CONSIDER TLA+
|
+-- Yes --> Consensus, coordination, or distributed transactions?
| +-- Yes --> RECOMMEND TLA+
| +-- No --> Could concurrency bug cause data loss or safety issues?
| +-- Yes --> RECOMMEND TLA+
| +-- No --> OFFER as option
| Domain | Why TLA+ Adds Value | Evidence | |--------|-------------------|----------| | Distributed consensus (Paxos, Raft) | Subtle interleaving bugs in leader election | Raft TLA+ spec ~400 lines, found implementation bugs | | Financial distributed transactions | Atomicity violations cause monetary loss | AWS DynamoDB replication verified | | Leader election, distributed locking | Split-brain, deadlock, stale-lock | AWS lock manager verified | | Eventual consistency / CRDTs | Convergence proofs required | TLA+ CRDT framework verifies SEC | | Safety-critical state machines | Regulatory requirements | DO-178C, CENELEC recognize formal methods | | Multi-party coordination (sagas, 2PC) | Compensation ordering, partial failure | 2PC is canonical TLA+ example | | Data replication protocols | Ordering, consistency under failure | Elasticsearch, MongoDB, Cosmos DB verified |
TLA+ describes what a system should do (allowed behaviors), not how to implement it. Specifications are mathematical objects checked for correctness before any code exists.
| Property Type | Meaning | Expression | Example |
|--------------|---------|------------|---------|
| Safety | Nothing bad happens | Invariant: predicate true in every reachable state | "Two processes never hold same lock" |
| Liveness | Something good eventually happens | Temporal: <> (eventually), []<> (infinitely often) | "Every request eventually gets response" |
Safety violations produce counterexample traces (the debugging artifact). Liveness requires fairness conditions.
PlusCal compiles to TLA+ with programming-like syntax. Start with PlusCal for first 2-3 specs, then learn raw TLA+ for cases PlusCal cannot express.
Key PlusCal constructs: variables (state) | labels (atomic action boundaries) | either/or (nondeterministic choice) | await (blocking) | process \in 1..N (concurrent processes) | fair process (weak fairness)
Labels define concurrency granularity: everything between two labels is one atomic step. Two processes interleave only at label boundaries.
State space grows exponentially: (states per node)^(nodes) x (message permutations).
| Strategy | Technique | Impact |
|----------|-----------|--------|
| Bound parameters | Start with 2-3 nodes, 2-4 messages | Most bugs appear at small N |
| Symmetry reduction | SYMMETRY Permutations(Nodes) | Up to N! reduction |
| Reduce labels | Merge labels where fine-grained atomicity unnecessary | Orders of magnitude |
| State constraints | CONSTRAINT Len(log[n]) < MaxLogLength | Prune uninteresting states |
| Abstraction | Model protocol not implementation (TCP -> message set) | Dramatic reduction |
| Decomposition | Multiple focused specs, not one monolith | Each independently checkable |
| Progressive refinement | 2 nodes -> 3 nodes -> add failures -> add liveness | Incremental verification |
| Simulation mode | java -jar tla2tools.jar -simulate -depth 100 | Trades completeness for speed |
| Unique States | Expected Time | Memory | Approach |
|--------------|---------------|--------|----------|
| < 10K | Seconds | < 1 GB | Exhaustive, single thread |
| 10K - 1M | Minutes | 1-4 GB | Exhaustive, -workers auto |
| 1M - 100M | Hours | 4-32 GB | Exhaustive with constraints |
| 100M - 1B | Days | 32-64 GB | Large instance or simulation |
| > 1B | Weeks | 60+ GB | Simulation, TLAPS, or decompose |
Consistency)ElectionSafety)LogMatching)OrderInvariant)CompensationComplete)MutualExclusion)| Tool | Best For | Learning | Distributed Systems | Temporal Properties | |------|----------|----------|--------------------|--------------------| | TLA+/PlusCal | Distributed protocols, consensus | 2-3 weeks | Excellent | Native | | Alloy | Data models, structural properties | 1-2 weeks | Adequate | Limited (Alloy 6) | | Property-Based Testing | Implementation correctness | Hours-days | With stateful testing | None | | XState/Statecharts | UI workflows, single-process | Days | Not applicable | None | | Session Types/Scribble | Communication patterns | Moderate | Good (message patterns) | Implicit | | TLAPS (proofs) | Critical certification | Months | Excellent | Full |
## Decision: Use TLA+ for [Component Name]
### Context
[Component] implements [protocol] with [N] participants
and [concurrency/distribution characteristics].
### Problem
Informal reasoning about [failure/interleaving scenario]
is insufficient because [reason].
### Decision
Formally specify [component] in TLA+/PlusCal and verify:
- Safety: [specific invariants]
- Liveness: [specific temporal properties]
### Model Parameters
- Nodes: [2-3 for initial verification]
- Messages: [bounded to N]
- Failure modes: [crash, partition, message loss]
### Estimated Effort
- Specification: [1-2 weeks]
- Model checking: [hours to days]
### Not Modeling (out of scope)
- [performance, serialization, UI, etc.]
| Organization | Systems Verified | Outcome | |-------------|-----------------|---------| | AWS | 14 projects across 10 systems (DynamoDB, S3, EBS) | Found subtle bugs in every system; management actively encourages adoption | | Azure Cosmos DB | All 5 consistency levels | Specs became authoritative reference, replaced ambiguous docs | | MongoDB | Replication, reconfiguration, transactions | Logless reconfig deployed since 4.4, no protocol bugs | | Elasticsearch | Cluster coordination, data replication | 4 TLA+ specs + Isabelle proofs, open-sourced | | CockroachDB | Transaction layer | TLA+ specs in repository under docs/tla-plus/ |
testing
Runs feature-scoped mutation testing to validate test suite quality. Use after implementation to verify tests catch real bugs (kill rate >= 80%).
development
Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.
development
Canonical AT completeness gate — research-anchored 7-category taxonomy (C1-C7) + 15-item mechanical checklist. Paradigm-neutral. Drives acceptance-designer reviewer verdict deterministically.
testing
Methodology for minimizing test count while maximizing behavioral coverage - behavior definition, anti-pattern catalog, consolidation patterns, stopping criterion, coverage-preserving validation