skills/proof-checker/SKILL.md
Rigorous mathematical proof verification and fixing workflow. Reads a LaTeX proof, identifies gaps via cross-model review (Codex GPT-5.4 xhigh), fixes each gap with full derivations, re-reviews, and generates an audit report. Use when user says "检查证明", "verify proof", "proof check", "审证明", "check this proof", or wants rigorous mathematical verification of a theory paper.
npx skillsauth add shaun-z/auto-claude-code-research-in-sleep proof-checkerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematically verify a mathematical proof via cross-model adversarial review, fix identified gaps, re-review until convergence, and generate a detailed audit report with proof-obligation accounting.
gpt-5.4 via Codex MCP, reasoning effort always xhighcodex — Default: Codex MCP (xhigh). Override with — reviewer: oracle-pro for GPT-5.4 Pro via Oracle MCP. See shared-references/reviewer-routing.md.PROOF_AUDIT.md at the paper directory root, alongside main.tex (cumulative log; when invoked via /paper-writing, this is paper/PROOF_AUDIT.md)proof_audit_report.tex (formal before/after PDF)PROOF_CHECK_STATE.json (for recovery)PROOF_SKELETON.md (micro-claim inventory)The proof passes when ALL of the following hold:
| Category | Description | Example | |----------|-------------|---------| | UNJUSTIFIED_ASSERTION | Claim stated without proof or reference | "The Hessian splits into Gram blocks" | | UNPROVEN_SUBCLAIM | "Clearly" / "it follows" hides a nontrivial lemma | "By symmetry, the cross-terms vanish" without checking | | QUANTIFIER_ERROR | Wrong order ∀/∃, missing "for sufficiently small κ" | "For all π, there exists ε" vs "there exists ε for all π" | | IMPLICATION_REVERSAL | Uses (A⇒B) as (B⇒A), or claims equivalence with only one direction | | | CASE_INCOMPLETE | Misses boundary/degenerate cases | Singular covariance, zero weight, non-unique argmin | | CIRCULAR_DEPENDENCY | Lemma uses theorem that depends on it | | | LOGICAL_GAP | A step is not justified by what precedes it | B=Θ(1) → β_K=0 without analyzing W |
| Category | Description | Example | |----------|-------------|---------| | ILLEGAL_INTERCHANGE | Swaps limit/expectation/derivative/integral without DCT/MCT/Fubini | Differentiating under E without domination | | NONUNIFORM_CONVERGENCE | Pointwise convergence used as uniform | sup and limit swapped | | MISSING_DOMINATION | DCT cited but no dominating function given | | | INTEGRABILITY_GAP | Uses E|X|^p without proving/assuming finite moments | | | REGULARITY_GAP | Differentiability/Lipschitz/convexity used but not established | | | STOCHASTIC_MODE_CONFUSION | Mixes a.s./in prob./in L²/in expectation | |
| Category | Description | Example | |----------|-------------|---------| | MISSING_DERIVATION | A quantity is used but never derived from the model | Risk functional with undefined B, W | | HIDDEN_ASSUMPTION | Proof silently uses a condition not in the theorem | Gaussianity assumed but not stated | | INSUFFICIENT_ASSUMPTION | Hypotheses too weak for proof (counterexample exists) | Moment conditions admitting 2-point distributions | | DIMENSION_TRACKING | Parameter dependence (d, n, K, ...) not explicit | d enters only through κ | | NORMALIZATION_MISMATCH | Coordinate/scaling conventions inconsistent | Rescaled vs raw coordinates | | CONSTANT_DEPENDENCE_HIDDEN | "C" depends on d,n,K but treated as universal | |
| Category | Description | Example | |----------|-------------|---------| | SCOPE_OVERCLAIM | Conclusion stated more broadly than proof supports | "β_K=0" with only generic overlap | | REFERENCE_MISMATCH | Cited theorem's hypotheses not verified at point of use | |
| Status | Meaning | |--------|---------| | INVALID | Statement false as written (counterexample exists or contradiction) | | UNJUSTIFIED | Could be true, but current proof does not establish it | | UNDERSTATED | True only after strengthening assumptions | | OVERSTATED | True only after weakening conclusion / adding qualifiers | | UNCLEAR | Ambiguous notation / definition drift (not wrong per se) |
| Impact | Meaning | |--------|---------| | GLOBAL | Breaks main theorem or core dependency chain | | LOCAL | Affects a side result but not the main theorem | | COSMETIC | Exposition only |
| Label | Definition | |-------|------------| | FATAL | INVALID + GLOBAL | | CRITICAL | (INVALID + LOCAL) or (UNJUSTIFIED + GLOBAL) | | MAJOR | (UNJUSTIFIED + LOCAL) or (UNDERSTATED/OVERSTATED + GLOBAL) | | MINOR | Clarity / notation / dimension bookkeeping that doesn't change claims |
When the proof invokes any of the following, require explicit verification of ALL listed conditions:
| Theorem | Required Conditions | |---------|-------------------| | DCT (Dominated Convergence) | Pointwise a.e. convergence + integrable dominating function | | MCT (Monotone Convergence) | Monotone increasing + non-negative | | Fubini/Tonelli | Product measurability + integrability (Fubini) or non-negative (Tonelli) | | Leibniz integral rule | Continuity of integrand + dominating function for derivative | | Implicit Function Theorem | Continuous differentiability + non-singular Jacobian | | Taylor with remainder | Sufficient differentiability + remainder form (Lagrange/integral) | | Jensen's inequality | Convexity of function + integrability | | Cauchy-Schwarz | Correct inner product space + integrability of both factors | | Weyl/Davis-Kahan | Symmetry/Hermiticity + perturbation bound conditions | | Analytic continuation | Domain connectivity + identity theorem conditions | | WLOG reduction | Invariance under claimed symmetry + reduction is reversible |
.tex file(s).Build formal accounting artifacts. Save to PROOF_SKELETON.md:
Nodes = Definitions / Assumptions / Lemmas / Theorems. Edges = "uses". Detect cycles (including semantic circularity where Lemma A uses a corollary that quietly depends on A).
For each theorem/lemma, list every hypothesis with WHERE each is verified (or mark "UNVERIFIED"). Track usage-minimal assumption sets — which assumptions were actually used vs merely stated.
Each symbol must have a type signature:
κ : scalar ∈ (0,1), depends on (d, α_t, Σ, μ)
u* : vector ∈ ℝ^d, u* = C^{-1}m
B^even : matrix ∈ ℝ^{(L+1)×(L+1)}, symmetric PSD
Ψ_v : function ℝ → ℝ, analytic in (ζ,κ), parity determined by v
Flag any symbol whose meaning changes or whose type is inconsistent across uses.
For each theorem/lemma, rewrite the statement with explicit quantifiers, domains, and limit order:
∀K ≥ 3, ∀π ∈ Π_K^{ms,∘} \ E_K, ∃κ_0 > 0 such that ∀κ ∈ (0, κ_0):
h_act^{(K,π)} = Θ(κ^{α_K^act}) [uniform in π on compact subsets]
If you cannot restate a theorem this precisely, mark it UNCLEAR — needs disambiguation.
Every nontrivial step becomes a numbered micro-claim in sequent form:
MC-17: Context: [Lemma 3.1, κ < κ_0, Z_κ has bounded moments up to order 2m+2]
⊢ Goal: P̂_0 is positive definite
Rule: monomials linearly independent on support of continuous distribution
Side-conditions: positive density near origin ✓ (by GMM weak convergence)
Each micro-claim has: justification rule name + required conditions + where conditions are proven.
Track every asymptotic statement's limit order and uniformity scope:
h_act = Θ(κ^α) [as κ→0, uniform in π on compact subsets of Π_K, for fixed K]
τ_act ~ (b/a)n [as n→∞, for fixed κ,K,π with x_K ≪ 1]
Flag any statement where limit order is ambiguous or uniformity is unclear.
Submit the complete proof content with the following mandatory reviewer checklist in the prompt:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
You are performing a rigorous mathematical proof review. For EVERY theorem,
lemma, and proposition, check ALL of the following:
## MANDATORY CHECKS
A. DEFINITIONS: List any symbol whose meaning is ambiguous or changes.
B. HYPOTHESIS DISCHARGE: For each lemma/theorem APPLICATION (not statement),
list each hypothesis and whether it was verified, with location.
C. INEQUALITY AUDIT: For each inequality chain, verify direction, missing
absolute values, missing conditions (convexity, PSD, integrability).
D. INTERCHANGE AUDIT: Flag every limit/derivative/expectation/integral
interchange. State which theorem justifies it (DCT/MCT/Fubini/Leibniz)
and which conditions are verified/missing.
E. PROBABILITY MODE: Track whether claims are a.s./in prob./in expectation/
w.h.p. Ensure transitions are justified.
F. UNIFORMITY & CONSTANTS: For every O(·), o(·), Θ(·), ≲, state whether
it is uniform over all parameters. List hidden parameter dependence.
G. EDGE/DEGENERATE CASES: Attempt to break each key lemma with a 1D,
low-rank, or extreme-parameter construction.
H. DEPENDENCY CONSISTENCY: Detect cycles or forward references to unproven
results.
## OUTPUT FORMAT (per issue)
For each issue found, provide:
- id: sequential number
- status: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED / UNCLEAR
- impact: GLOBAL / LOCAL / COSMETIC
- category: [from taxonomy]
- location: section/equation/line
- statement: what the proof claims
- why_invalid: why this is wrong or unjustified
- counterexample: YES (describe) / NO / CANDIDATE (describe attempt)
- affects: which downstream results break if this is wrong
- minimal_fix: how to fix it
[FULL PROOF CONTENT HERE]
Save the threadId. Parse into structured issue list. Write to PROOF_AUDIT.md.
For each CRITICAL or MAJOR issue, and for every key lemma that introduces:
Systematically attempt to construct counterexamples using:
| Strategy | Description | |----------|-------------| | Dimensional collapse | Set d=1 or 2, K=2, n small | | Degeneracy | Singular covariance, tiny weight, overlapping means, identical components | | Extremal distributions | Two-point ±a, bounded non-subGaussian, heavy tails | | Adversarial parameter scaling | Pick parameters making neglected terms dominate | | Numeric falsification | Translate lemma to a function, brute-force optimize over small domain |
Rule: Label "counterexample found" ONLY if algebraically verified. Otherwise log as "candidate counterexample — needs verification."
Record all attempts (successful or not) in PROOF_AUDIT.md.
For each issue, ordered by severity (FATAL → CRITICAL → MAJOR → MINOR):
For each issue, explicitly choose one of:
Log this choice — it is a scope-changing decision when it alters theorem statements.
.tex file\label references where possible### Fix N: [SHORT TITLE]
**Issue**: [id] [CATEGORY] — [description]
**Severity**: FATAL / CRITICAL / MAJOR / MINOR
**Status**: INVALID / UNJUSTIFIED / UNDERSTATED / OVERSTATED
**Impact**: GLOBAL / LOCAL / COSMETIC
**Fix strategy**: ADD_DERIVATION / STRENGTHEN_ASSUMPTION / WEAKEN_CLAIM / ADD_REFERENCE
**Location**: Section X, Lines Y-Z
**BEFORE**: [what the proof originally did]
**WHY WRONG**: [mathematical problem, with counterexample if applicable]
**AFTER**: [what the fix does]
**KEY EQUATION**: [central new equation]
**PROOF OBLIGATIONS ADDED**: [new conditions/lemmas introduced]
**DOWNSTREAM EFFECTS**: [which results now need re-checking]
pdflatex -interaction=nonstopmode <file>.tex 2>&1 | grep -E "Error|Warning|undefined"
Use codex-reply with saved threadId. Include fix summaries. Request the same mandatory checklist.
Check acceptance gate. If not met, repeat Phases 2-3 (up to MAX_REVIEW_ROUNDS).
After all fixes, verify the proof as a whole:
For any fix that resolved a FATAL or CRITICAL issue, submit the fixed section alone (without showing the previous critique) to a fresh Codex thread:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
Blind review of the following proof section. You have NOT seen any prior
review or discussion. Check every step for correctness, hidden assumptions,
illegal interchanges, and counterexamples.
[FIXED SECTION ONLY]
If the blind reviewer finds new issues, re-enter Phase 2.
After fixes, re-run:
If acceptance gate is not met after MAX_REVIEW_ROUNDS, output a Proof Unrecoverable Report:
Do NOT silently declare success. The report must be honest.
Generate proof_audit_report.tex with:
Compile: pdflatex proof_audit_report.tex && pdflatex proof_audit_report.tex
Write PROOF_CHECK_STATE.json:
{
"status": "completed",
"rounds": 2,
"threadId": "...",
"fatal_fixed": 0,
"critical_fixed": 3,
"major_fixed": 2,
"minor_fixed": 1,
"counterexamples_found": 1,
"counterexample_candidates": 2,
"acceptance_gate": "PASS",
"timestamp": "..."
}
codex-reply for follow-up rounds.| File | Content | When |
|------|---------|------|
| PROOF_SKELETON.md | Dependency DAG + assumption ledger + micro-claims | Phase 0.5 |
| PROOF_AUDIT.md | Cumulative round-by-round audit log | Updated each round |
| PROOF_AUDIT.json | Machine-readable submission verdict (see below) | Always emitted |
| proof_audit_report.tex/.pdf | Formal before/after report | Phase 4 |
| PROOF_CHECK_STATE.json | State for recovery | Phase 5 |
This skill always writes PROOF_AUDIT.json at the paper directory
root (i.e. paper/PROOF_AUDIT.json when invoked from /paper-writing
with paper-dir paper/; <your-paper-dir>/PROOF_AUDIT.json when invoked
standalone), regardless of caller or whether the paper contains theorems.
A paper with no \begin{theorem} / \begin{lemma} / \begin{proof} emits
verdict NOT_APPLICABLE; silent skip is forbidden. paper-writing
Phase 6 and tools/verify_paper_audits.sh both rely on this artifact
existing at <paper-dir>/PROOF_AUDIT.json.
The artifact conforms to the schema in shared-references/assurance-contract.md:
{
"audit_skill": "proof-checker",
"verdict": "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
"reason_code": "all_proofs_complete | minor_gaps | critical_gap | no_theorems | ...",
"summary": "One-line human-readable verdict summary.",
"audited_input_hashes": {
"main.tex": "sha256:...",
"sections/4.theory.tex": "sha256:..."
},
"trace_path": ".aris/traces/proof-checker/<date>_run<NN>/",
"thread_id": "<codex mcp thread id>",
"reviewer_model": "gpt-5.4",
"reviewer_reasoning": "xhigh",
"generated_at": "<UTC ISO-8601>",
"details": {
"theorems_audited": <int>,
"issues": [ { "id": "T1-H3", "severity": "FATAL|CRITICAL|MAJOR|MINOR",
"category": "quantifier|domination|...",
"location": "sections/4.theory.tex:L182",
"note": "..." }, ... ]
}
}
audited_input_hashes scopeHash the declared input set actually reviewed — the theorem-bearing
.tex files passed into this invocation — not a repo-wide union and not
the reviewer's self-reported opened subset. The external verifier rehashes
these entries; any mismatch flags STALE.
Path convention (must match tools/verify_paper_audits.sh): keys are
paths relative to the paper directory (no paper/ prefix — the
verifier resolves relative to the paper dir; prefixing produces
paper/paper/... and false-fails as STALE). Use absolute paths for
files outside the paper dir.
| Input state | Verdict | reason_code example |
|-------------------------------------------------------|------------------|-----------------------|
| No theorems / lemmas / proofs in paper | NOT_APPLICABLE | no_theorems |
| Theorems present but referenced files unreadable | BLOCKED | source_unreadable |
| All proof obligations discharged, no gaps | PASS | all_proofs_complete |
| Only MINOR issues (notation / exposition) | WARN | minor_gaps |
| Any FATAL or CRITICAL issue (logic gap, wrong claim) | FAIL | critical_gap |
| Reviewer invocation failed (network / malformed) | ERROR | reviewer_error |
MAJOR issues alone map to WARN or FAIL at the reviewer's discretion and
must carry an explicit justification in summary + details.issues.
Every invocation uses a fresh mcp__codex__codex thread. Never
codex-reply across proof-checker runs. Do not accept prior audit outputs
(PAPER_CLAIM_AUDIT, CITATION_AUDIT, EXPERIMENT_LOG) as input — the fresh
thread preserves reviewer independence per
shared-references/reviewer-independence.md.
This skill never blocks by itself; paper-writing Phase 6 plus the
verifier decide whether the verdict blocks finalization based on the
assurance level.
/proof-checker "neurips_2025.tex"
/proof-checker "check the GMM generalization proof, focus on dimension dependence"
/proof-checker "verify proof in paper.tex — difficulty: nightmare"
development
Generate publication-quality academic illustrations through a local Codex app-server bridge that uses Codex native image generation. This is a separate experimental alternative to `paper-illustration`, intended for Claude Code users who want a GPT-image-style renderer without modifying the original skill.
development
Two-way sync between a local paper directory and an Overleaf project via the Overleaf Git bridge (Premium feature). Lets you keep ARIS audit/edit workflows on the local copy while collaborators edit in the Overleaf web UI. Token never touches the agent — user does the one-time auth via macOS Keychain. Use when user says "同步 overleaf", "overleaf sync", "推送到 overleaf", "connect overleaf", "Overleaf 桥接", "pull overleaf", "push overleaf", or wants to bridge their ARIS paper directory with an Overleaf project.
development
Zero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says "审查引用", "check citations", "citation audit", "verify references", "引用核对", or before submission to ensure bibliography integrity.
data-ai
Paragraph-level structural blueprint for 10-12 page systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. Provides page allocation, paragraph templates, and writing patterns. Use when user says "写系统论文", "systems paper structure", "OSDI paper", "SOSP paper", or wants fine-grained structural guidance for a systems conference submission.