skills/skills-codex/proof-checker/SKILL.md
Rigorous mathematical proof verification and fixing workflow. Reads a LaTeX proof, identifies gaps via cross-model review (Codex GPT-5.5 xhigh), fixes each gap with full derivations, re-reviews, and generates an audit report. Use when user says "检查证明", "verify proof", "proof check", "审证明", "check this proof", or wants rigorous mathematical verification of a theory paper.
npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep proof-checkerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematically verify a mathematical proof via cross-model adversarial review, fix identified gaps, re-review until convergence, and generate a detailed audit report with proof-obligation accounting.
gpt-5.5 via Codex reviewer agent, reasoning effort always xhighcodex — Default: Codex reviewer agent (spawn_agent, xhigh). Override with — reviewer: oracle-pro for GPT-5.5 Pro via Oracle MCP. See shared-references/reviewer-routing.md.PROOF_AUDIT.md at the paper directory root, alongside main.tex (cumulative log; when invoked via /paper-writing, this is paper/PROOF_AUDIT.md)proof_audit_report.tex (formal before/after PDF)PROOF_CHECK_STATE.json (for recovery)PROOF_SKELETON.md (micro-claim inventory)true (default), auto-render PROOF_AUDIT.md to HTML at workflow end via /render-html. Uses full review gate (audit-class, math-heavy — render-fidelity check protects against MathJax breakage). Set false to skip, or pass — render html: false. Non-blocking: failures don't invalidate the proof audit.The proof passes when ALL of the following hold:
| Category | Description | Example | |----------|-------------|---------| | UNJUSTIFIED_ASSERTION | Claim stated without proof or reference | "The Hessian splits into Gram blocks" | | UNPROVEN_SUBCLAIM | "Clearly" / "it follows" hides a nontrivial lemma | "By symmetry, the cross-terms vanish" without checking | | QUANTIFIER_ERROR | Wrong order ∀/∃, missing "for sufficiently small κ" | "For all π, there exists ε" vs "there exists ε for all π" | | IMPLICATION_REVERSAL | Uses (A⇒B) as (B⇒A), or claims equivalence with only one direction | | | CASE_INCOMPLETE | Misses boundary/degenerate cases | Singular covariance, zero weight, non-unique argmin | | CIRCULAR_DEPENDENCY | Lemma uses theorem that depends on it | | | LOGICAL_GAP | A step is not justified by what precedes it | B=Θ(1) → β_K=0 without analyzing W |
| Category | Description | Example | |----------|-------------|---------| | ILLEGAL_INTERCHANGE | Swaps limit/expectation/derivative/integral without DCT/MCT/Fubini | Differentiating under E without domination | | NONUNIFORM_CONVERGENCE | Pointwise convergence used as uniform | sup and limit swapped | | MISSING_DOMINATION | DCT cited but no dominating function given | | | INTEGRABILITY_GAP | Uses E|X|^p without proving/assuming finite moments | | | REGULARITY_GAP | Differentiability/Lipschitz/convexity used but not established | | | STOCHASTIC_MODE_CONFUSION | Mixes a.s./in prob./in L²/in expectation | |
| Category | Description | Example | |----------|-------------|---------| | MISSING_DERIVATION | A quantity is used but never derived from the model | Risk functional with undefined B, W | | HIDDEN_ASSUMPTION | Proof silently uses a condition not in the theorem | Gaussianity assumed but not stated | | INSUFFICIENT_ASSUMPTION | Hypotheses too weak for proof (counterexample exists) | Moment conditions admitting 2-point distributions | | DIMENSION_TRACKING | Parameter dependence (d, n, K, ...) not explicit | d enters only through κ | | NORMALIZATION_MISMATCH | Coordinate/scaling conventions inconsistent | Rescaled vs raw coordinates | | CONSTANT_DEPENDENCE_HIDDEN | "C" depends on d,n,K but treated as universal | |
| Category | Description | Example | |----------|-------------|---------| | SCOPE_OVERCLAIM | Conclusion stated more broadly than proof supports | "β_K=0" with only generic overlap | | REFERENCE_MISMATCH | Cited theorem's hypotheses not verified at point of use | |
| Status | Meaning | |--------|---------| | INVALID | Statement false as written (counterexample exists or contradiction) | | UNJUSTIFIED | Could be true, but current proof does not establish it | | UNDERSTATED | True only after strengthening assumptions | | OVERSTATED | True only after weakening conclusion / adding qualifiers | | UNCLEAR | Ambiguous notation / definition drift (not wrong per se) |
| Impact | Meaning | |--------|---------| | GLOBAL | Breaks main theorem or core dependency chain | | LOCAL | Affects a side result but not the main theorem | | COSMETIC | Exposition only |
| Label | Definition | |-------|------------| | FATAL | INVALID + GLOBAL | | CRITICAL | (INVALID + LOCAL) or (UNJUSTIFIED + GLOBAL) | | MAJOR | (UNJUSTIFIED + LOCAL) or (UNDERSTATED/OVERSTATED + GLOBAL) | | MINOR | Clarity / notation / dimension bookkeeping that doesn't change claims |
When the proof invokes any of the following, require explicit verification of ALL listed conditions:
| Theorem | Required Conditions | |---------|-------------------| | DCT (Dominated Convergence) | Pointwise a.e. convergence + integrable dominating function | | MCT (Monotone Convergence) | Monotone increasing + non-negative | | Fubini/Tonelli | Product measurability + integrability (Fubini) or non-negative (Tonelli) | | Leibniz integral rule | Continuity of integrand + dominating function for derivative | | Implicit Function Theorem | Continuous differentiability + non-singular Jacobian | | Taylor with remainder | Sufficient differentiability + remainder form (Lagrange/integral) | | Jensen's inequality | Convexity of function + integrability | | Cauchy-Schwarz | Correct inner product space + integrability of both factors | | Weyl/Davis-Kahan | Symmetry/Hermiticity + perturbation bound conditions | | Analytic continuation | Domain connectivity + identity theorem conditions | | WLOG reduction | Invariance under claimed symmetry + reduction is reversible |
.tex file(s).Build formal accounting artifacts. Save to PROOF_SKELETON.md:
Nodes = Definitions / Assumptions / Lemmas / Theorems. Edges = "uses". Detect cycles (including semantic circularity where Lemma A uses a corollary that quietly depends on A).
For each theorem/lemma, list every hypothesis with WHERE each is verified (or mark "UNVERIFIED"). Track usage-minimal assumption sets — which assumptions were actually used vs merely stated.
Each symbol must have a type signature:
κ : scalar ∈ (0,1), depends on (d, α_t, Σ, μ)
u* : vector ∈ ℝ^d, u* = C^{-1}m
B^even : matrix ∈ ℝ^{(L+1)×(L+1)}, symmetric PSD
Ψ_v : function ℝ → ℝ, analytic in (ζ,κ), parity determined by v
Flag any symbol whose meaning changes or whose type is inconsistent across uses.
For each theorem/lemma, rewrite the statement with explicit quantifiers, domains, and limit order:
∀K ≥ 3, ∀π ∈ Π_K^{ms,∘} \ E_K, ∃κ_0 > 0 such that ∀κ ∈ (0, κ_0):
h_act^{(K,π)} = Θ(κ^{α_K^act}) [uniform in π on compact subsets]
If you cannot restate a theorem this precisely, mark it UNCLEAR — needs disambiguation.
Every nontrivial step becomes a numbered micro-claim in sequent form:
MC-17: Context: [Lemma 3.1, κ < κ_0, Z_κ has bounded moments up to order 2m+2]
⊢ Goal: P̂_0 is positive definite
Rule: monomials linearly independent on support of continuous distribution
Side-conditions: positive density near origin ✓ (by GMM weak convergence)
Each micro-claim has: justification rule name + required conditions + where conditions are proven.
Track every asymptotic statement's limit order and uniformity scope:
h_act = Θ(κ^α) [as κ→0, uniform in π on compact subsets of Π_K, for fixed K]
τ_act ~ (b/a)n [as n→∞, for fixed κ,K,π with x_K ≪ 1]
Flag any statement where limit order is ambiguous or uniformity is unclear.
Submit the complete proof content with the following mandatory reviewer checklist in the prompt:
spawn_agent:
model: gpt-5.5
reasoning_effort: xhigh
message: |
You are performing a rigorous mathematical proof review. For EVERY theorem,
lemma, and proposition, check ALL of the following:
## MANDATORY CHECKS
A. DEFINITIONS: List any symbol whose meaning is ambiguous or changes.
B. HYPOTHESIS DISCHARGE: For each lemma/theorem APPLICATION (not statement),
list each hypothesis and whether it was verified, with location.
C. INEQUALITY AUDIT: For each inequality chain, verify direction, missing
absolute values, missing conditions (convexity, PSD, integrability).
D. INTERCHANGE AUDIT: Flag every limit/derivative/expectation/integral
interchange. State which theorem justifies it (DCT/MCT/Fubini/Leibniz)
and which conditions are verified/missing.
E. PROBABILITY MODE: Track whether claims are a.s./in prob./in expectation/
w.h.p. Ensure transitions are justified.
F. UNIFORMITY & CONSTANTS: For every O(·), o(·), Θ(·), ≲, state whether
it is uniform over all parameters. List hidden parameter dependence.
G. EDGE/DEGENERATE CASES: Attempt to break each key lemma with a 1D,
low-rank, or extreme-parameter construction.
H. DEPENDENCY CONSISTENCY: Detect cycles or forward references to unproven
results.
## OUTPUT FORMAT (per issue)
For each issue found, provide:
- id: sequential number
- status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED / UNCLEAR
- impact: GLOBAL / LOCAL / COSMETIC
- category: [from taxonomy]
- location: section/equation/line
- statement: what the proof claims
- why_invalid: why this is wrong or unjustified
- counterexample: YES (describe) / NO / CANDIDATE (describe attempt)
- affects: which downstream results break if this is wrong
- minimal_fix: how to fix it
[FULL PROOF CONTENT HERE]
Save the reviewer agent_id. Parse into structured issue list. Write to PROOF_AUDIT.md.
For each CRITICAL or MAJOR issue, and for every key lemma that introduces:
Systematically attempt to construct counterexamples using:
| Strategy | Description | |----------|-------------| | Dimensional collapse | Set d=1 or 2, K=2, n small | | Degeneracy | Singular covariance, tiny weight, overlapping means, identical components | | Extremal distributions | Two-point ±a, bounded non-subGaussian, heavy tails | | Adversarial parameter scaling | Pick parameters making neglected terms dominate | | Numeric falsification | Translate lemma to a function, brute-force optimize over small domain |
Rule: Label "counterexample found" ONLY if algebraically verified. Otherwise log as "candidate counterexample — needs verification."
Record all attempts (successful or not) in PROOF_AUDIT.md.
For each issue, ordered by severity (FATAL → CRITICAL → MAJOR → MINOR):
For each issue, explicitly choose one of:
Log this choice — it is a scope-changing decision when it alters theorem statements.
.tex file\label references where possible### Fix N: [SHORT TITLE]
**Issue**: [id] [CATEGORY] — [description]
**Severity**: FATAL / CRITICAL / MAJOR / MINOR
**Status**: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED
**Impact**: GLOBAL / LOCAL / COSMETIC
**Fix strategy**: ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE
**Location**: Section X, Lines Y-Z
**BEFORE**: [what the proof originally did]
**WHY WRONG**: [mathematical problem, with counterexample if applicable]
**AFTER**: [what the fix does]
**KEY EQUATION**: [central new equation]
**PROOF OBLIGATIONS ADDED**: [new conditions/lemmas introduced]
**DOWNSTREAM EFFECTS**: [which results now need re-checking]
pdflatex -interaction=nonstopmode <file>.tex 2>&1 | grep -E "Error|Warning|undefined"
Launch a fresh reviewer agent for the next review round. Do not use send_input here; proof-checker keeps each round independent. Request the same mandatory checklist.
Check acceptance gate. If not met, repeat Phases 2-3 (up to MAX_REVIEW_ROUNDS).
After all fixes, verify the proof as a whole:
For any fix that resolved a FATAL or CRITICAL issue, submit the fixed section alone (without showing the previous critique) to a fresh Codex thread:
spawn_agent:
model: gpt-5.5
reasoning_effort: xhigh
message: |
Blind review of the following proof section. You have NOT seen any prior
review or discussion. Check every step for correctness, hidden assumptions,
illegal interchanges, and counterexamples.
[FIXED SECTION ONLY]
If the blind reviewer finds new issues, re-enter Phase 2.
After fixes, re-run:
If acceptance gate is not met after MAX_REVIEW_ROUNDS, output a Proof Unrecoverable Report:
Do NOT silently declare success. The report must be honest.
Generate proof_audit_report.tex with:
Compile: pdflatex proof_audit_report.tex && pdflatex proof_audit_report.tex
Write PROOF_CHECK_STATE.json:
{
"status": "completed",
"rounds": 2,
"review_agent_ids": ["..."],
"fatal_fixed": 0,
"critical_fixed": 3,
"major_fixed": 2,
"minor_fixed": 1,
"counterexamples_found": 1,
"counterexample_candidates": 2,
"acceptance_gate": "PASS",
"timestamp": "..."
}
agent_id for traceability, but launch a new spawn_agent for each review round. Do not use send_input across proof-checker rounds.| File | Content | When |
|------|---------|------|
| PROOF_SKELETON.md | Dependency DAG + assumption ledger + micro-claims | Phase 0.5 |
| PROOF_AUDIT.md | Cumulative round-by-round audit log | Updated each round |
| PROOF_AUDIT.json | Machine-readable submission verdict (see below) | Always emitted |
| proof_audit_report.tex/.pdf | Formal before/after report | Phase 4 |
| PROOF_CHECK_STATE.json | State for recovery | Phase 5 |
| PROOF_AUDIT.html (+ .review.json sidecar) | Single-file HTML view auto-rendered via /render-html "PROOF_AUDIT.md" --json "PROOF_AUDIT.json". Non-blocking — if /render-html fails the audit still counts as complete. | Workflow end (when RENDER_HTML = true, default) |
This skill always writes PROOF_AUDIT.json at the paper directory
root (i.e. paper/PROOF_AUDIT.json when invoked from /paper-writing
with paper-dir paper/; <your-paper-dir>/PROOF_AUDIT.json when invoked
standalone), regardless of caller or whether the paper contains theorems.
A paper with no \begin{theorem} / \begin{lemma} / \begin{proof} emits
verdict NOT_APPLICABLE; silent skip is forbidden. paper-writing
Phase 6 and verify_paper_audits.sh both rely on this artifact
existing at <paper-dir>/PROOF_AUDIT.json.
The artifact conforms to the schema in shared-references/assurance-contract.md:
{
"audit_skill": "proof-checker",
"verdict": "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
"reason_code": "all_proofs_complete | minor_gaps | critical_gap | no_theorems | ...",
"summary": "One-line human-readable verdict summary.",
"audited_input_hashes": {
"main.tex": "sha256:...",
"sections/4.theory.tex": "sha256:..."
},
"trace_path": ".aris/traces/proof-checker/<date>_run<NN>/",
"thread_id": "<codex mcp thread id>",
"reviewer_model": "gpt-5.5",
"reviewer_reasoning": "xhigh",
"generated_at": "<UTC ISO-8601>",
"details": {
"theorems_audited": <int>,
"issues": [ { "id": "T1-H3", "severity": "FATAL|CRITICAL|MAJOR|MINOR",
"category": "quantifier|domination|...",
"location": "sections/4.theory.tex:L182",
"note": "..." }, ... ]
}
}
audited_input_hashes scopeHash the declared input set actually reviewed — the theorem-bearing
.tex files passed into this invocation — not a repo-wide union and not
the reviewer's self-reported opened subset. The external verifier rehashes
these entries; any mismatch flags STALE.
Path convention (must match verify_paper_audits.sh): keys are
paths relative to the paper directory (no paper/ prefix — the
verifier resolves relative to the paper dir; prefixing produces
paper/paper/... and false-fails as STALE). Use absolute paths for
files outside the paper dir.
| Input state | Verdict | reason_code example |
|-------------------------------------------------------|------------------|-----------------------|
| No theorems / lemmas / proofs in paper | NOT_APPLICABLE | no_theorems |
| Theorems present but referenced files unreadable | BLOCKED | source_unreadable |
| All proof obligations discharged, no gaps | PASS | all_proofs_complete |
| Only MINOR issues (notation / exposition) | WARN | minor_gaps |
| Any FATAL or CRITICAL issue (logic gap, wrong claim) | FAIL | critical_gap |
| Reviewer invocation failed (network / malformed) | ERROR | reviewer_error |
MAJOR issues alone map to WARN or FAIL at the reviewer's discretion and
must carry an explicit justification in summary + details.issues.
Every invocation uses a fresh reviewer agent. Never use send_input across
proof-checker runs. Do not accept prior audit outputs
(PAPER_CLAIM_AUDIT, CITATION_AUDIT, EXPERIMENT_LOG) as input — the fresh
thread preserves reviewer independence per
shared-references/reviewer-independence.md.
This skill never blocks by itself; paper-writing Phase 6 plus the
verifier decide whether the verdict blocks finalization based on the
assurance level.
/proof-checker "neurips_2025.tex"
/proof-checker "check the GMM generalization proof, focus on dimension dependence"
/proof-checker "verify proof in paper.tex — difficulty: nightmare"
data-ai
Generate and rank research ideas given a broad direction. Use when user says "找idea", "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
development
Get a deep critical review of research from GPT using a secondary Codex agent. Use when user says "review my research", "help me review", "get external review", or wants critical feedback on research ideas, papers, or experimental results.
data-ai
Generate and rank research ideas given a broad direction. Use when user says "找idea", "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
development
Autonomous multi-round research review loop. Repeatedly reviews using a secondary Codex agent, implements fixes, and re-reviews until positive assessment or max rounds reached. Use when user says "auto review loop", "review until it passes", or wants autonomous iterative improvement.