Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

outlinedriven/proof-driven

Name: proof-driven
Author: outlinedriven

skills/proof-driven/SKILL.md

npx skillsauth add outlinedriven/odin-codex-plugin proof-driven

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Proof-driven development

Prove properties from requirements before writing code. Proofs guide implementation, not the reverse. Zero unproven properties in final code.

Modern insight (2025): PBT + example tests pairing is the standard -- properties discover edge cases, example tests prevent regressions and serve as documentation. Counterexamples from shrinking should always become permanent regression tests. AI-assisted PBT (Anthropic 2025) can generate properties from docstrings, but human judgment for property selection remains essential.

See frameworks for language-specific PBT and stateful testing tools. See examples for brief property test patterns per language. See formal-tools for theorem provers and bounded model checkers.

Property Categories

| Category | Description | Example | |----------|-------------|---------| | Postcondition | Output satisfies contract | sorted(sort(xs)) | | Invariant | Property preserved by operation | len(xs) == len(sort(xs)) | | Idempotence | f(f(x)) == f(x) | deduplicate(deduplicate(xs)) | | Inverse / Round-trip | g(f(x)) == x | decode(encode(x)) == x | | Model-based | Implementation matches reference | my_sort(xs) == stdlib_sort(xs) | | Commutativity | Order doesn't matter | a + b == b + a | | Metamorphic | Relationship between outputs | sin(-x) == -sin(x) |

Most effective (OOPSLA 2025): Model-based properties (~80% bug detection), postconditions (~65%). Least effective: properties that reimplement the logic under test.

Anti-pattern: Don't reimplement the function in the property. Properties should be simpler than the code they test.

When to Apply

Critical algorithms (sort, search, crypto, compression)
Financial calculations (rounding, currency conversion)
Consensus/distributed protocols (invariants across nodes)
Safety-critical systems (medical, automotive, aerospace)
Data structure invariants (balanced tree, heap property)
Serialization round-trip (encode/decode fidelity)
Stateful systems (databases, queues, caches) -- via stateful PBT

Reasoning approach

Before attempting a proof, reason through the property — SHORT-form KEYWORDS for internal scratchwork, break down the property into hypothesis and assumptions, critically review which proof strategy fits (induction, case analysis, contradiction, construction), validate each strategy against the property structure. Work through the proof step-by-step, verifying each step against the axioms. If a step fails, diagnose why before revising the strategy. For numeric calculation arising in the proof (e.g., bound arithmetic, complexity sums), invoke fend per the baseline rule; never self-calculate. Symbolic reasoning, case enumeration, and induction structure are in-head — they are not arithmetic.

Protocol

When NOT to Apply

UI rendering, visual layout
Simple CRUD endpoints
Configuration management
Non-critical utility code
Rapidly changing requirements (properties are expensive to maintain)

Anti-patterns

Happy-path-only properties: Properties must cover edge cases -- that's their primary value
Skipping stateful testing for stateful systems: Use model-based stateful PBT (Hypothesis RuleBasedStateMachine, jqwik stateful)
Ignoring counterexamples: Shrunk counterexamples are gold -- always convert to permanent regression tests
Properties that test the framework: assert fast_check works is not assert my_code works
Permanently skipped/pending properties: Zero-skip policy -- skip = unfinished work
Conflating PBT with unit testing: PBT explores input space; unit tests verify known examples. Use both.
Not using shrinking: If counterexample is 500-line input, it's useless. Shrinking finds minimal failing case.
Reimplementing logic in properties: Property should be simpler than the code. If property is as complex as implementation, it adds no confidence.

Shrinking

Shrinking transforms a failing complex input into the minimal input that still fails. This is the most valuable feature of PBT frameworks.

Integrated shrinking (Hypothesis, Hedgehog): Generates shrink tree during generation. Preserves generator invariants. Superior approach.
Type-based shrinking (QuickCheck): Separate shrinker functions. Can violate generator constraints.
Always investigate shrunk counterexamples: They reveal the essential failure, stripped of noise.

PBT vs Fuzzing (decision guidance)

| Aspect | PBT | Fuzzing | |--------|-----|---------| | Input generation | Guided by properties | Guided by code coverage | | Oracle | User-written property assertions | Crashes/exceptions/timeouts | | Best for | Correctness, algorithms, contracts | Security, memory safety, crash detection | | Convergence (2025) | Hybrid tools (Bolero, Antithesis) combine both approaches |

Proof Strategies

Simplification: Reduce by known rules, use shrinking to find minimal counterexamples
Arithmetic: Generate numeric edge cases (0, 1, MAX, negative, overflow boundaries)
Case analysis: Split on constructors/variants, test each branch independently
Induction: Recursive/sequential properties via stateful testing
Fuzzing: Empirical exploration when properties are hard to specify formally
Metamorphic relations: When oracle is unknown, test relationships between outputs

Theorem Hierarchy

Main Property (Goal)
|-- Supporting Property 1
|   +-- Helper Property 1a
|-- Supporting Property 2
+-- Edge Case Property 3

Workflow (language-neutral)

PLAN -- Identify correctness, safety, invariant, and termination properties. Design hierarchy. Choose property categories.
CREATE -- Write property test files. One property per concern. Tag by category (postcondition, invariant, inverse, etc.).
VERIFY -- Run all properties. Count unproven (skipped/pending). Analyze counterexamples via shrinking.
REMEDIATE -- Fill in each skipped property using proof strategies. Convert every counterexample to a permanent regression test.

Constitutional Rules (Non-Negotiable)

CREATE First: Generate all property test artifacts from plan design before verification
Complete All Proofs: Zero skipped/pending properties in final code
Totality Required: All definitions must terminate
Target Mirrors Model: Implementation structure corresponds to proven model
Iterative Remediation: Fix proof failures, don't abandon verification

Validation Gates

| Gate | Pass Criteria | Blocking | |------|---------------|----------| | Framework | PBT framework available and configured | Yes | | Properties | All property tests pass | Yes | | Unproven | Zero skipped/pending properties | Yes | | Coverage | >= 80% line coverage | If present |

Exit Codes

| Code | Meaning | |------|---------| | 0 | All properties pass, zero unproven/skipped | | 11 | Property testing framework not available | | 12 | No property tests created | | 13 | Property tests failed or proofs incomplete | | 14 | Coverage gaps (properties missing) |

outlinedriven/proof-driven

skills/proof-driven/SKILL.md

Proof-driven development. Use when implementing with formal verification using property-based testing, theorem proving, or proof tactics; zero unproven property policy enforced.

11 stars

development

Updated May 4, 2026

$ install --global

skillsauth

npx skillsauth add outlinedriven/odin-codex-plugin proof-driven

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 4, 2026, 5:51 AM144.2s4 files scanned

SKILL.md

name:: proof-driven
description:: Proof-driven development. Use when implementing with formal verification using property-based testing, theorem proving, or proof tactics; zero unproven property policy enforced.

Proof-driven development

Prove properties from requirements before writing code. Proofs guide implementation, not the reverse. Zero unproven properties in final code.

See frameworks for language-specific PBT and stateful testing tools. See examples for brief property test patterns per language. See formal-tools for theorem provers and bounded model checkers.

Property Categories

Most effective (OOPSLA 2025): Model-based properties (~80% bug detection), postconditions (~65%). Least effective: properties that reimplement the logic under test.

Anti-pattern: Don't reimplement the function in the property. Properties should be simpler than the code they test.

When to Apply

Critical algorithms (sort, search, crypto, compression)
Financial calculations (rounding, currency conversion)
Consensus/distributed protocols (invariants across nodes)
Safety-critical systems (medical, automotive, aerospace)
Data structure invariants (balanced tree, heap property)
Serialization round-trip (encode/decode fidelity)
Stateful systems (databases, queues, caches) -- via stateful PBT

Reasoning approach

Protocol

When NOT to Apply

UI rendering, visual layout
Simple CRUD endpoints
Configuration management
Non-critical utility code
Rapidly changing requirements (properties are expensive to maintain)

Anti-patterns

Happy-path-only properties: Properties must cover edge cases -- that's their primary value
Skipping stateful testing for stateful systems: Use model-based stateful PBT (Hypothesis RuleBasedStateMachine, jqwik stateful)
Ignoring counterexamples: Shrunk counterexamples are gold -- always convert to permanent regression tests
Properties that test the framework: assert fast_check works is not assert my_code works
Permanently skipped/pending properties: Zero-skip policy -- skip = unfinished work
Conflating PBT with unit testing: PBT explores input space; unit tests verify known examples. Use both.
Not using shrinking: If counterexample is 500-line input, it's useless. Shrinking finds minimal failing case.
Reimplementing logic in properties: Property should be simpler than the code. If property is as complex as implementation, it adds no confidence.

Shrinking

Shrinking transforms a failing complex input into the minimal input that still fails. This is the most valuable feature of PBT frameworks.

Integrated shrinking (Hypothesis, Hedgehog): Generates shrink tree during generation. Preserves generator invariants. Superior approach.
Type-based shrinking (QuickCheck): Separate shrinker functions. Can violate generator constraints.
Always investigate shrunk counterexamples: They reveal the essential failure, stripped of noise.

PBT vs Fuzzing (decision guidance)

Proof Strategies

Simplification: Reduce by known rules, use shrinking to find minimal counterexamples
Arithmetic: Generate numeric edge cases (0, 1, MAX, negative, overflow boundaries)
Case analysis: Split on constructors/variants, test each branch independently
Induction: Recursive/sequential properties via stateful testing
Fuzzing: Empirical exploration when properties are hard to specify formally
Metamorphic relations: When oracle is unknown, test relationships between outputs

Theorem Hierarchy

Main Property (Goal)
|-- Supporting Property 1
|   +-- Helper Property 1a
|-- Supporting Property 2
+-- Edge Case Property 3

Workflow (language-neutral)

PLAN -- Identify correctness, safety, invariant, and termination properties. Design hierarchy. Choose property categories.
CREATE -- Write property test files. One property per concern. Tag by category (postcondition, invariant, inverse, etc.).
VERIFY -- Run all properties. Count unproven (skipped/pending). Analyze counterexamples via shrinking.
REMEDIATE -- Fill in each skipped property using proof strategies. Convert every counterexample to a permanent regression test.

Constitutional Rules (Non-Negotiable)

CREATE First: Generate all property test artifacts from plan design before verification
Complete All Proofs: Zero skipped/pending properties in final code
Totality Required: All definitions must terminate
Target Mirrors Model: Implementation structure corresponds to proven model
Iterative Remediation: Fix proof failures, don't abandon verification

Validation Gates

Exit Codes

Related Skills

outlinedriven/tidy

testing

VerifiedTrustedCommunity

ODIN's compress-operations dispatcher under the Compressor/Extender role. Invoke on "tidy", "clean up", "tidy this file/memory/workspace/git/docs", or when active context (current file, diff, stack, memory directory) has structural rot to resolve before touching behavior. Detects target domain from context and routes to the sibling skill. Requires explicit target or clear active-context signal — do not invoke speculatively.

12SKILL.mdUpdated May 7, 2026

outlinedriven/taste

development

VerifiedTrustedCommunity

Cross-domain taste skill — apply distinctive judgment to any artifact (prose, code, design, decisions) instead of converging to AI defaults. Two modes — `audit` (judge work against the two-sided charter and portable anchors) and `anchor` (load register before producing). Auto-detects by phrasing; override via `/taste audit | anchor`. Trigger on "is this slop?", "overkill?", "elegant?", "taste-test this".

12SKILL.mdUpdated May 4, 2026

outlinedriven/strict-validation-setup

tools

VerifiedTrustedCommunity

One-shot bootstrap of strict-mode tooling per ecosystem plus per-task GOALS.md scaffolding so an agentic loop can self-verify. Writes typechecker/linter/schema-validator config for TS (strict + noUncheckedIndexedAccess + exactOptionalPropertyTypes), Python (Pyright strict, Ruff strict), Rust (Clippy deny-correctness), Go (golangci-lint with staticcheck), OCaml (dune --release); establishes `.agent-tasks/<id>/GOALS.md` per-task convention distinct from project-stable AGENTS.md. C++/Java/Kotlin and framework specifics (Spring Boot, Nest, React-strict) are out of scope. Trigger on new project bootstrap, agentic-task setup, "make this self-verifying", "set the loop's goal", "scaffold goals for this issue". Pairs with `llm-self-loop` runtime.

12SKILL.mdUpdated May 4, 2026

outlinedriven/strict-validation-setup

outlinedriven/setup-pre-commit

tools

VerifiedTrustedCommunity

Install git pre-commit hooks via the project's hook tool — Husky+lint-staged (JS), pre-commit (Python/OCaml), lefthook (Go), cargo-husky (Rust). Use when the user wants commit-time formatting, linting, type-checking, or test gates. Detects ecosystem first.

12SKILL.mdUpdated May 4, 2026

outlinedriven/setup-pre-commit

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/outlinedriven/odin-codex-plugin.git

# Copy into Claude Code skills folder (global)
cp -r odin-codex-plugin/skills/proof-driven ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

outlinedriven/odin-codex-plugin

11 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT