codex/skills/lean/SKILL.md
Use for Lean 4 work: programs, proof fixes, formal specs, implementation correctness proofs, external behavior models, termination proofs, trust-boundary audits, mathlib, and Lake/toolchain diagnosis. Do not use for Coq, Isabelle, Agda, or generic pseudocode unless comparison is requested.
npx skillsauth add tkersey/dotfiles leanInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are working in Lean 4.
Produce compilable Lean code first, then give a concise explanation of what changed, what was proved, and what remains outside the proof boundary.
Write Lean programs, specifications, models, and proofs that:
The goal is not merely to make Lean accept a file. The goal is to leave behind an auditable proof artifact whose claim is precise.
Before proving anything about software, state the verification boundary.
Identify the artifact under proof:
Identify the exact theorem claim:
Identify what is not proved:
Identify trust assumptions:
native_decide, code generation, @[implemented_by], or @[csimp] participates in the claimDo not write “the software is proved correct” unless the production implementation itself is inside the checked Lean artifact, generated from the checked Lean artifact under stated assumptions, or connected to the checked Lean artifact by a checked refinement theorem.
Prefer precise claims:
Before editing, inspect:
lean-toolchainlakefile.lean or lakefile.tomllake-manifest.json, if relevantMathlibTreat the pinned toolchain as authoritative. Do not assume code from newer docs, newer mathlib docs, examples, blog posts, or release notes will work unchanged in the current repo.
Then classify the task.
Lean-internal task types:
Verification mode:
Lean-native verified implementation
The executable function is written in Lean.
Prove one or more of:
impl_eq_specimpl_soundimpl_completeimpl_refines_specimpl_preserves_invariantExternal-code model proof
The implementation is not Lean code.
Formalize the relevant behavior in Lean. Prove properties of that model. Do not claim the external implementation is proved unless there is a checked refinement link from the implementation to the model.
Contract or test-case proof
Formalize declared behavior, examples, fixtures, or test cases as Lean definitions and theorem obligations. Prove concrete cases and general laws when possible.
Generated-code path
Prefer writing the verified pure core in Lean and generating or wrapping code from it. State compiler, runtime, and wrapper trust assumptions explicitly.
Stateful, protocol, or trace proof
Model state, transitions, inputs, outputs, errors, and traces explicitly. Prove preservation, safety, idempotency, budget, authorization, ordering, or forbidden-action properties.
Trust-boundary audit
Inspect for placeholders, axioms, unsafe features, nontermination, noncomputability, native-code trust, and unchecked replacement implementations.
Do not leave any of the following unless the user explicitly asks for placeholders or a sketch:
sorryadmitaxiomsDo not invent theorem names.
Confirm every library lemma with at least one of:
#check#printAvoid partial unless the user explicitly wants runtime partiality or possible nontermination. Prefer total definitions and prove termination.
Keep implementation and theorem statement aligned. If the specification is wrong, say so and change it deliberately.
Match existing namespaces, notation, imports, and file style unless a small refactor clearly improves the result.
Do not silently weaken the theorem to make the proof easier. If the original theorem is false, explain the counterexample or mismatch and propose the corrected theorem.
Do not use a broad automation block as a substitute for understanding the proof architecture when an invariant or helper lemma is needed.
Reproduce the issue on the smallest declaration, theorem, or example block that still fails.
Interrogate the environment:
#check name
#print name
#print axioms theoremName
#eval expression
Use #eval only for executable pure code or harmless exploration. It is not a proof.
Normalize before searching for fancy tactics:
rfl
simp
simpa
simp_all
Use targeted rewriting only when you want a specific transformation:
rw [h]
rw [← h]
nth_rewrite 1 [h]
conv => ...
Structure the proof explicitly:
intro
constructor
cases
rcases
refine
have
suffices
change
show
Match recursive code with the right induction principle:
Escalate to automation only after the goal is simplified:
exact?
apply?
aesop?
grind
omega
linarith
ring
norm_num
Use only tactics available in the pinned toolchain and imported libraries.
Replace broad, opaque automation with clearer local lemmas when that makes the proof more stable.
Run the project-aware command that actually checks the changed file or target.
Always separate these concerns.
Write the clearest pure definition, relation, predicate, invariant, or theorem statement that captures correctness.
Prefer mathematical clarity over implementation efficiency in the specification.
Examples:
def spec (input : Input) : Output := ...
def SpecRel (input : Input) (output : Output) : Prop := ...
def StateInvariant (s : State) : Prop := ...
Write the executable function.
If performance matters, an optimized implementation is fine, but keep a simple specification nearby.
Examples:
def impl (input : Input) : Output := ...
def implFast (input : Input) : Output := ...
Prove the implementation agrees with, refines, or satisfies the specification.
Examples:
theorem impl_eq_spec (input : Input) :
impl input = spec input := by
...
theorem impl_sound (input : Input) (output : Output) :
impl input = output -> SpecRel input output := by
...
theorem implFast_eq_impl (input : Input) :
implFast input = impl input := by
...
Use this pattern aggressively:
Keep IO at the boundary. Prove correctness for the pure core first, then wrap it in IO.
For invariants, prefer:
SubtypeDo not reach for highly indexed encodings unless they genuinely simplify the API and proof burden.
For arrays, loops, mutable refs, or imperative-looking do code:
Use theorem names that make the claim auditable.
Good suffixes:
_eq_spec_refines_spec_sound_complete_preserves_invariant_roundtrip_idempotent_normalized_terminates_case_...def spec (i : Input) : Output := ...
def impl (i : Input) : Output := ...
theorem impl_eq_spec (i : Input) :
impl i = spec i := by
...
inductive Error
| invalidInput
| outOfRange
| unsupported
deriving DecidableEq, Repr
def spec (i : Input) : Except Error Output := ...
def impl (i : Input) : Except Error Output := ...
theorem impl_eq_spec (i : Input) :
impl i = spec i := by
...
Use this when the spec is nondeterministic, abstract, or easier to state as a predicate.
def SpecRel (i : Input) (o : Output) : Prop := ...
def impl (i : Input) : Except Error Output := ...
theorem impl_sound (i : Input) (o : Output) :
impl i = .ok o -> SpecRel i o := by
...
theorem impl_complete
(i : Input) (o : Output) :
Preconditions i ->
SpecRel i o ->
impl i = .ok o := by
...
Only prove completeness when the implementation really selects every output admitted by the relation, or when the relation is functional enough for the statement to be true.
def SpecPred (i : Input) : Prop := ...
def decidePred (i : Input) : Bool := ...
theorem decidePred_sound (i : Input) :
decidePred i = true -> SpecPred i := by
...
theorem decidePred_complete (i : Input) :
SpecPred i -> decidePred i = true := by
...
theorem decidePred_correct (i : Input) :
decidePred i = true ↔ SpecPred i := by
constructor
· exact decidePred_sound i
· exact decidePred_complete i
def Normalized (x : α) : Prop := ...
def normalize (x : α) : α := ...
theorem normalize_normalized (x : α) :
Normalized (normalize x) := by
...
theorem normalize_idempotent (x : α) :
normalize (normalize x) = normalize x := by
...
Be precise about direction.
def parse : String -> Except Error α := ...
def serialize : α -> String := ...
theorem parse_serialize_roundtrip (x : α) :
parse (serialize x) = .ok x := by
...
The reverse direction usually requires a normalization condition:
theorem serialize_parse_normalizes (s : String) (x : α) :
parse s = .ok x ->
serialize x = normalizeString s := by
...
structure State where
-- state fields
structure StepResult where
output : Output
state' : State
def stepSpec (s : State) (i : Input) : Except Error StepResult := ...
def StateInvariant (s : State) : Prop := ...
theorem initial_invariant :
StateInvariant initialState := by
...
theorem step_preserves_invariant
(s : State) (i : Input) (r : StepResult) :
StateInvariant s ->
stepSpec s i = .ok r ->
StateInvariant r.state' := by
...
inductive Event where
| allowed
| forbidden
deriving DecidableEq, Repr
def traceSpec (input : Input) : List Event := ...
def TraceSafe (events : List Event) : Prop :=
Event.forbidden ∉ events
theorem traceSpec_safe (input : Input) :
TraceSafe (traceSpec input) := by
...
Use concrete theorem names that preserve the case identity.
/-- case_id: normalize_trims_ascii_space -/
theorem case_normalize_trims_ascii_space :
normalize " abc " = "abc" := by
rfl
Ground cases are useful, but they are not a substitute for general correctness theorems unless the domain is finite and coverage is proved.
When the implementation is not Lean code, use Lean as a formal contract and model workbench.
Workflow:
Do not conflate these claims:
P.P.The last claim requires the implementation to be in Lean, generated from verified Lean under stated assumptions, or connected to the Lean spec by a checked semantics/refinement proof.
Default path:
StateExceptIO at the boundaryTypical shape:
def step : State -> Input -> Except Error (Output × State) := ...
def Inv : State -> Prop := ...
theorem step_preserves_inv
(s s' : State) (i : Input) (o : Output) :
Inv s ->
step s i = .ok (o, s') ->
Inv s' := by
...
If the pinned toolchain exposes Hoare-logic or monadic verification support, you may use it, but only after checking that the local project has the relevant modules, syntax, and tactics.
If such support is unavailable, do not upgrade the project just to use it. Fall back to explicit pure state-transition modeling.
Prefer total definitions.
Use structural recursion when possible.
When structural recursion is not obvious, consider:
termination_by ...
decreasing_by ...
If proving termination is difficult, first ask whether the recursion should be refactored.
Good repairs include:
Avoid partial for logic-facing definitions.
A partial function is usually unsuitable as the subject of a correctness theorem. If runtime partiality is genuinely intended, isolate it at the boundary and prove properties about a total model.
Automation is allowed, but final proofs should remain auditable.
Preferred order:
Definitional equality and simplification
rfl
simp
simpa
simp_all
Structural proof
intro
constructor
cases
rcases
induction
fun_induction
refine
have
suffices
Domain tactics
omega
linarith
ring
norm_num
decide
native_decide
Use native_decide only when the extra trust in native compilation is acceptable for the claim.
Search and solver tactics
exact?
apply?
aesop?
grind
When automation solves a proof:
Prefer kernel-reduction, ordinary simplification, explicit induction, and small helper lemmas for foundational claims.
Never fabricate library facts.
Routine:
Identify the key constants, constructors, and head symbols in the goal.
Inspect candidates:
#check candidate
#print candidate
Search the current file and nearby files.
Search imported project files.
Search dependency sources under .lake/packages/... when available.
Search version-matched docs only after checking the local environment.
Search by symbol names and constructors more often than English paraphrases.
If a theorem name is uncertain, qualify the namespace or search again instead of guessing.
simp hygieneUse simp with intent.
Good:
simp [foo, bar]
simpa using h
simp at h
simp_all
For stability:
simp only [lemma1, lemma2, theorem3]
Add [simp] only when:
Do not mark expansive, reversible, or context-specific rewrites as [simp].
rw hygieneUse rw when the selected rewrite is the proof idea.
Good uses:
If a proof becomes a long chain of rw steps merely to expose a normal form, switch to simp, simp only, or a helper lemma.
When the user asks for high assurance, production verification, foundational trust, or proof audit, inspect the changed Lean files for:
sorryadmitaxiomunsafepartialnoncomputablenative_decideimplemented_by@[implemented_by]csimp@[csimp]For each top-level theorem that supports the final claim, run or add temporarily:
#print axioms theorem_name
Report the axiom footprint.
Distinguish:
Treat these carefully:
native_decide adds trust in native compilation@[implemented_by] does not by itself prove equivalence between the logical definition and the implementation used for execution@[csimp] can affect compiled behavior and should be audited when executable correctness mattersunsafe implementation details are not part of a kernel-checked correctness proof unless isolated behind a checked specification boundarynoncomputable may be fine for pure mathematics but is usually inappropriate for executable verified codeFor high-assurance executable claims, prefer proofs that do not depend on native-code execution or unchecked replacement implementations.
Prefer project-aware commands through Lake.
Build the whole workspace:
lake build
Check one file inside the project environment:
lake env lean path/to/File.lean
Run a small Lean script or executable file inside the project environment:
lake env lean --run path/to/File.lean
Inspect the Lake environment:
lake env
For a single-file toy example outside a Lake project:
lean --run Hello.lean
For mathlib-heavy projects, after cloning or when compiled dependencies are missing:
lake exe cache get
Use lake update only when changing dependency resolution is appropriate for the task. Do not casually update a project when the user asked for a proof or local fix.
The pinned toolchain wins.
Use these as local facts:
lean-toolchainWhen a modern feature is rejected by the local toolchain, do not fight the repo.
Fallback ladder:
cases / inductionsimp / rwDo not upgrade Lean, mathlib, or dependencies unless the user explicitly asks for an upgrade or the task is impossible without it and you clearly explain the tradeoff.
Use this order:
Restate the goal:
show ...
change ...
Expose definitions selectively:
simp [foo, bar]
unfold foo
Inspect hypotheses and constructors.
Move the failing shape into a small local example.
Prove a helper lemma on the exact recursive or algebraic shape needed.
Strengthen the theorem:
Switch from data induction to functional induction if recursion drives the theorem.
Search the imported library again.
Try heavier automation only after the goal is normalized.
If a tactic finds a proof script, simplify it before finalizing unless the generated script is already short and stable.
Before finishing:
For program-correctness, model-checking, invariant, state-machine, or external-code verification tasks, finish with:
Keep the explanation concise, but make the claim exact.
Read these files selectively when present:
references/verification-boundaries.mdreferences/external-code-verification.mdreferences/trust-audit.mdreferences/setup-and-workflow.mdreferences/proof-playbook.mdreferences/program-correctness.mdreferences/mathlib-search-and-style.mdreferences/version-sensitive-features.mdThese references are advisory. The pinned project, local imports, and actual Lean errors are authoritative.
tools
Convert markdown plans into beads with dependencies using br CLI. Use when creating task graphs, polishing beads before implementation, or bridging planning to agent swarm execution.
development
Orchestrate Codex skill optimization during active sessions through $cas goal control, $shadow single-session evidence, $tune diagnosis/refinement briefs, and the skill-optimizer custom subagent. Trigger for $opt, skill optimization loops, session-driven skill tuning, meta-skill audits, or explicit validated skill edits. Do not use for general code optimization, product optimization, or performance tuning.
development
Run a targeted fresh-eyes blunder pass over code, specs, plans, adjudications, closure gates, skill edits, or negative-evidence ledgers. Trigger when asked to reread with fresh eyes, find obvious bugs, catch mistakes/oversights/omissions, check for embarrassing misses, or perform a second independent blunder pass before closure. Do not use as a substitute for implementation, adjudication, or verification; use it as the final falsification/check pass for those workflows.
development
Explicitly shadow, tail, watch, follow, monitor, supervise, or companion exactly one Codex session id/path through `$seq`, then apply a named target skill as an interpretation/reporting/proposal/action lens until the watched session stops.