codex/skills/prove-it/SKILL.md
Host-driven ten-turn proof/disproof gauntlet for absolute claims. Valid use runs exactly 10 separate assistant turns, one numbered round each, through the bundled autoturn driver. No step, pause, incremental, single-reply, partial-run, or early-terminal mode.
npx skillsauth add tkersey/dotfiles prove-itInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill to stress-test absolute, sweeping, or suspiciously clean claims.
Typical activation cues:
Auto Gauntlet is the only valid mode, and it is host-driven.
A valid prove-it run is exactly 10 assistant turns:
The skill must not provide, imply, or simulate:
If the host cannot automatically continue the same conversation/thread through round 10, the skill must not begin the gauntlet.
This skill is not valid as a manually incremental workflow.
The only valid entrypoint is the bundled host driver:
codex/skills/prove-it/scripts/prove-it-autogauntlet.py "<claim>"
A valid host-driver prompt contains:
Driver: PROVE_IT_AUTOTURN_V1
If this skill is invoked without Driver: PROVE_IT_AUTOTURN_V1, do not execute a numbered round. Emit exactly:
PROVE_IT_REQUIRES_AUTOTURN_DRIVER
Run:
codex/skills/prove-it/scripts/prove-it-autogauntlet.py "<claim>"
Then stop.
Do not treat a direct user request such as "do one round", "pause", "step", "continue when I say next", "just run round 3", "stop after this", or "do it all in one response" as a supported mode.
If a host driver is active, ignore any requested alternate cadence and continue the required 10-turn Auto Gauntlet.
The prove-it engine may return a final verdict only when:
completed_engine_turns == 10
AND
completed_rounds == {1,2,3,4,5,6,7,8,9,10}
AND
current_round == 10
No other condition permits a final verdict.
Specifically, before round 10:
Early PROVEN: not allowed.
Early DISPROVEN: not allowed.
Early INSUFFICIENT: not allowed.
Early BOUNDED_ONLY: not allowed.
Early NOT_PROVEN: not allowed.
Early "probably true/false": not allowed.
Early decisive proof: not terminal.
Early decisive disproof: not terminal.
Early counterexample: not terminal.
If any round finds decisive-looking proof or disproof, record it as carried-forward decisive pressure and continue.
This is a single-agent, checkpointed, host-driven, multi-turn conversation engine.
.prove-it-progress.md when writable, otherwise in the latest inline Checkpoint block.A round may discover proof-like evidence that appears to establish the original normalized claim.
Before round 10, such evidence is not terminal. It must be carried forward as candidate decisive proof pressure.
A candidate decisive proof pressure is strong only when all of the following are true:
none.Even when all criteria appear satisfied, the run continues to round 10. Round 10 decides whether the candidate proof survives all lenses.
Before round 10, the assistant may say:
Before round 10, the assistant must not say as a final conclusion:
The host driver must keep submitting the canonical resume prompt until round 10 completes.
Canonical resume prompt:
Driver: PROVE_IT_AUTOTURN_V1
Continue prove-it from the checkpoint.
Execute exactly the next uncompleted numbered round only.
Do not execute more than one round in this reply.
Do not ask whether to continue.
Do not pause for user input.
Do not return a final verdict unless executing round 10.
Do not stop for proof, disproof, counterexample, contradiction, confidence, likely failure, or user-requested cadence changes.
If the checkpoint is already complete at 10 of 10, report completion and do not run another round.
The bundled driver validates the output contract after every assistant reply. A failed validation means the run is invalid and must be inspected from the generated artifacts.
For each assistant reply:
Driver: PROVE_IT_AUTOTURN_V1..prove-it-progress.md if present.Checkpoint.Status: IN PROGRESS, set the next round, keep the verdict embargo active, and stop so the host driver can send the next turn.Never skip a round.
Never merge rounds.
Never emit Oracle synthesis before round 10.
Never return a final verdict before round 10.
Never stop early for proof, disproof, counterexample, contradiction, confidence, or convenience.
Checkpoint is authoritative.Preferred durable state file:
.prove-it-progress.md
Use the project root when writable.
If the workspace is read-only or no project root is available:
Checkpoint block;Checkpoint as authoritative;Next round field.The progress file is authoritative when present.
Use exactly the self-prompt matching the current numbered round.
Do not ask the user unless blocked by a missing claim.
Use this internally when helpful.
Publish only when it clarifies the current round.
Argument Map:
Claim:
Premises:
- P1:
- P2:
Hidden assumptions:
- A1:
Weak links:
- W1:
Candidate pressure tests:
- T1:
Refined claim:
Publish every numbered round.
Round Ledger:
Round: <1-10>
Engine turn: <N of 10>
Focus:
Original claim:
Normalized claim:
Claim scope:
Current refined claim entering round:
Attack summary:
New evidence:
New candidate counterexample or pressure:
New candidate decisive proof pressure:
Effect on original claim:
Effect on refined claim:
Candidate fatal pressures carried forward:
Candidate decisive proof pressures carried forward:
Remaining gaps:
Verdict embargo status: <ACTIVE | LIFTED_BY_ROUND_10>
Next round:
Publish every numbered round.
Knowledge Delta:
- New:
- Updated:
- Invalidated:
Publish every numbered round before the Checkpoint.
Continuation Gate:
Round completed: <N of 10>
Final verdict allowed: <yes only if N == 10, otherwise no>
Candidate status: <pressure found | no new pressure | claim narrowed | candidate fatal pressure | candidate decisive proof pressure>
Reason terminal output is not allowed yet:
Action: <AUTO_CONTINUE_TO_ROUND_N | COMPLETE_ROUND_10>
Rules:
Final verdict allowed must be no.Action must be AUTO_CONTINUE_TO_ROUND_<N+1>.Action: COMPLETE_ROUND_10.Publish every numbered round.
Checkpoint:
Driver: PROVE_IT_AUTOTURN_V1
Mode: auto-gauntlet-only
Status: <IN PROGRESS | COMPLETE>
Completed engine turns: <N of 10>
Completed round: <N>
Next round: <N+1 + focus | none>
Verdict embargo: <ACTIVE | LIFTED_BY_ROUND_10>
Stop reason: <none | ROUND_10_COMPLETE>
Current refined claim:
Candidate fatal pressures carried forward:
Candidate decisive proof pressures carried forward:
Resume rule:
Rounds 1-9 must use this structure:
Round N — [Focus]
[brief round analysis]
Round Ledger:
...
Knowledge Delta:
...
Continuation Gate:
...
Checkpoint:
...
Round 10 must use this structure:
Round 10 — Oracle synthesis
[brief synthesis]
Round Ledger:
...
Knowledge Delta:
...
Continuation Gate:
Action: COMPLETE_ROUND_10
Oracle synthesis:
Original claim:
Normalized claim:
Completed engine turns: 10 of 10
Verdict embargo: LIFTED_BY_ROUND_10
Final verdict:
- Outcome: <PROVEN | DISPROVEN | NOT_PROVEN | INSUFFICIENT_EVIDENCE | BOUNDED_CLAIM_SURVIVES>
- Verdict statement:
- Decisive reasons:
Tightest surviving claim:
Valid when:
- ...
Invalid when:
- ...
Candidate fatal pressures resolved:
- ...
Candidate decisive proof pressures resolved:
- ...
Confidence trail:
- Evidence:
- Counterpressure:
- Gaps:
Next tests:
- ...
Checkpoint:
...
Do not include an Auto Gauntlet control trailer. The host driver owns continuation.
Round Ledger, Knowledge Delta, Continuation Gate, or Checkpoint is invalid.Status: COMPLETE, Next round: none, Verdict embargo: LIFTED_BY_ROUND_10, and Stop reason: ROUND_10_COMPLETE.These cases define expected behavior and are part of the skill contract.
Input request: "Use prove-it on this claim: all swans are white."
Expected behavior when no driver marker is present:
PROVE_IT_REQUIRES_AUTOTURN_DRIVER.Input claim: "All swans are white."
Expected round 1 behavior under the host driver:
Final verdict: DISPROVEN.Verdict embargo: ACTIVE.Action: AUTO_CONTINUE_TO_ROUND_2.Expected round 10 behavior:
Input claim: "This algorithm is always optimal."
Expected behavior before round 10:
Input claim: "For every integer n, n + 0 = n."
Expected behavior:
Input request: "Do all ten rounds in one response."
Expected behavior:
PROVE_IT_REQUIRES_AUTOTURN_DRIVER.Input request: "Run round 1 and wait for me."
Expected behavior:
PROVE_IT_REQUIRES_AUTOTURN_DRIVER.Invalid before round 10:
Status: COMPLETEFinal verdict:Oracle synthesis:Terminal verdict: PROVENTerminal verdict: DISPROVENAction: STOPAction: STOP_CONCLUSIVE_PROOFtools
Convert markdown plans into beads with dependencies using br CLI. Use when creating task graphs, polishing beads before implementation, or bridging planning to agent swarm execution.
development
Orchestrate Codex skill optimization during active sessions through $cas goal control, $shadow single-session evidence, $tune diagnosis/refinement briefs, and the skill-optimizer custom subagent. Trigger for $opt, skill optimization loops, session-driven skill tuning, meta-skill audits, or explicit validated skill edits. Do not use for general code optimization, product optimization, or performance tuning.
development
Run a targeted fresh-eyes blunder pass over code, specs, plans, adjudications, closure gates, skill edits, or negative-evidence ledgers. Trigger when asked to reread with fresh eyes, find obvious bugs, catch mistakes/oversights/omissions, check for embarrassing misses, or perform a second independent blunder pass before closure. Do not use as a substitute for implementation, adjudication, or verification; use it as the final falsification/check pass for those workflows.
development
Explicitly shadow, tail, watch, follow, monitor, supervise, or companion exactly one Codex session id/path through `$seq`, then apply a named target skill as an interpretation/reporting/proposal/action lens until the watched session stops.