codex/skills/context-bounded-verification/SKILL.md
Use for nontrivial code changes, refactors, bug fixes, PR reviews, AI-generated edits, blast-radius analysis, verification planning, regression tests, rollout/rollback, closure/readiness claims, implementation handoffs, or correctness under incomplete context/hidden constraints. Do not use for textual edits or trivial formatting unless risk analysis is requested. Alias: context-bound-verification.
npx skillsauth add tkersey/dotfiles context-bounded-verificationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Convert an unbounded software-correctness question into a bounded, reviewable, testable, and observable claim about the current artifact state.
The goal is not global proof. The goal is to reduce the verification gap by making scope, artifact state, assumptions, blast radius, evidence, validation, and residual risk explicit enough that a human can audit the next action.
Never claim arbitrary code is globally "correct", "safe", "ready", or "done". Make a bounded claim:
Given the current artifact state
[artifact_state_id], the inspected evidence[evidence refs], the stated scope[scope], and the checks actually run[commands/results], this appears to satisfy[specific property]. Remaining uncertainty:[specific gaps].
A claim is not actionable merely because it is plausible. A finding, fix, review comment, test result, subagent packet, or prior-session memory becomes actionable only when it is bound to the current artifact state and current workflow objective.
Always separate these dimensions before implementation, handoff, closure, or a pass/fail decision:
| Dimension | Meaning | Anti-laundering rule | | --- | --- | --- | | Claim validity | The claim may be true. | Validity alone does not authorize code changes or closure. | | Actionability | The current workflow should act on the claim now. | Requires current-state evidence, scope fit, and a permitted route. | | Evidence | Concrete artifact, command, test, diff, log, or manual inspection. | Reviewer authority, intuition, memory, and prior sessions are not proof by themselves. | | Artifact state | The exact tree/diff/PR/checkout that was inspected. | Old-tree evidence cannot prove the delivered tree after edits. | | Direction/scope fit | The claim advances the current objective and does not violate non-goals. | Locally correct work is blocked if it pursues the wrong objective. | | Criticality/severity | The accepted materiality after evidence review. | Asserted severity, CAS labels, or reviewer pressure do not auto-escalate to implementation. | | Proof/validation | Checks that can confirm or falsify the claim. | Green unrelated checks are not proof for the changed behavior. | | Closure/readiness | A bounded decision that enough proof exists for the next step. | Closure is not actual correctness; it must state residual risk. | | Material findings | Issues that can change correctness, safety, compatibility, or closure. | Preference-only or review-closure-only concerns do not become implementation work. | | Root authority | Decisions owned by the primary agent. | Subagents provide evidence, not final permission. |
Classify before implementation, review closure, handoff, or broad validation.
Default to the lowest tier that honestly fits the evidence. Escalate when the change touches public contracts, persisted data, auth, billing, infrastructure, migrations, async jobs, scheduled work, irreversible behavior, customer-visible semantics, proof/certificate surfaces, generated artifacts, or cross-repo compatibility.
| Tier | Shape | Required output | | --- | --- | --- | | Tier 0: Mechanical | Formatting, comment cleanup, compiler-supported rename, dead-code removal with no behavioral surface. | One compact verification note. No full packet unless closure/handoff is requested. | | Tier 1: Local behavioral | Small bug fix, single-function behavior, isolated UI/parser logic, narrow test repair. | Compact Verification Note; include callers/tests checked and residual risk. | | Tier 2: System-affecting | API/schema/serialization/proof/generator/checker/queue/cache/cron/authz/billing-adjacent/runtime behavior, broad PR review, readiness/merge claims. | Full Verification Packet and mechanical gate before closure or implementation handoff. | | Tier 3: High-risk or irreversible | Security controls, permissions, data deletion, privacy/compliance, payments, irreversible migrations, production infra, public contract breakage. | Plan first. Require explicit authorization before destructive or production-affecting implementation. Full packet plus owner questions/approval points. |
Tier is root-owned. A subagent may recommend escalation or de-escalation, but the root must declare the final tier and rationale.
When this skill is active:
The root agent owns and cannot delegate:
Subagents may supply bounded evidence packets. A subagent packet can clear a narrow uncertainty, veto a route, or mark an issue unresolved. It cannot authorize implementation, broaden scope, close the task, or waive missing proof.
Use read-only subagents only when parallelization or authority separation materially improves the decision. Do not spawn generic reviewers. Use bounded roles:
context_evidence_authority: confirms whether claims are grounded in current artifacts and current evidence.context_scope_direction_authority: checks current objective fit, non-goals, and stale/wrong-objective plan pressure.context_blast_radius_authority: maps contract, operational, downstream, and rollback surfaces.context_closure_authority: audits proof adequacy, residual risk, and closure/readiness consistency.Fanout is recommended for Tier 2/Tier 3 work when any of these are true:
Fanout is not needed for routine Tier 0/Tier 1 work with direct evidence and low residual risk. If fanout is used, the final packet must include each accepted subagent packet and all vetoes/unresolved items.
Before editing or declaring readiness, establish:
verification_preflight:
packet_version: CBV-GATE-v1
mode: implementation | review | closure | verification | validation-only | no-change | handoff | audit
artifact_state_id: "commit/diff/checksum/PR/head/base or explicit current-tree label"
current_workflow_objective: "..."
semantic_change: "..."
tier: tier0 | tier1 | tier2 | tier3
scope_fit: aligned | partial | conflicting | unknown
proof_route: tests | typecheck | build | manual-diff | runtime-observation | validation-only | blocked
gate_required: yes | no
gate_required is yes for Tier 2/Tier 3, for any implementation handoff, for closure/readiness/pass claims, for review findings promoted into changes, and whenever current-state evidence is materially contested.
Inspect the surrounding implementation before changing or closing:
Use repository evidence (rg, language references, package scripts, test discovery, diff/commit/PR inspection) over prompt-only assumptions.
For each proposed claim or change, check affected surfaces:
If blast radius is unknown and material, do not close as pass. Narrow, validate, or block.
Look for constraints not encoded in the local code path:
If organizational context is unavailable, list human questions rather than inventing answers.
Allowed routes:
| Route | Use when | Required evidence |
| --- | --- | --- |
| implement | A current-state, in-scope defect/change is actionable now. | Current artifact evidence, scope fit, proof route, no blocking veto. |
| validate-only | Claim is plausible but not yet actionable. | Specific falsifiable validation step and expected decision delta. |
| proof-only | Artifact already changed or issue is already fixed, but closure proof is needed. | Current-state proof target and no mutation agenda. |
| no-change | Concern is unsupported, out of scope, stale, preference-only, or defeated by current artifacts. | Evidence or explicit missing-evidence rationale. |
| defer | Valid concern belongs outside current objective or requires owner decision. | Owner/condition needed before action. |
| blocked | Safe action cannot proceed without missing context, authorization, or proof. | Concrete blocker and required unblock condition. |
Do not use a valid concern to launder a wrong fix, broad refactor, unrelated cleanup, or premature closure.
While editing:
git diff before finalizing;For generated artifacts, certificates, source maps, quota counters, policy switches, allowlists, capability gates, and verifier/checker changes, verify the exact evidence surface carried by the delivered artifact. Presence-only witnesses, caller-provided override hashes, unrelated summary counts, broad allow flags, or cardinality-only checks are not proof of identity, ownership, route position, target coordinate, or policy support.
When verifier strength is the finding, a green suite is insufficient by itself. Re-read the changed predicate and add or identify a negative fixture that would fail the too-weak version.
Use the strongest practical evidence available:
If tests cannot be run, state exactly why and downgrade closure to validation-only, defer, or blocked unless manual evidence is genuinely sufficient for the tier.
For Tier 2/Tier 3, implementation handoff, closure/readiness/pass, or contested evidence, emit or save a verification_packet and run:
python codex/skills/context-bounded-verification/tools/context_verification_gate.py path/to/packet.md
The checker validates output shape, current-state binding, evidence/actionability separation, direction fit, blast-radius coverage, validation proof, authority packet handling, closure consistency, and handoff safety. It cannot prove semantic correctness; it blocks laundering weak or stale claims into pass, implementation, or handoff.
Verified: checked current diff `[artifact_state_id]`; no behavioral files changed. Remaining risk: low, limited to review oversight.
## Compact Verification Note
- Artifact state: `[current tree/diff/commit]`
- Intended change: `[specific behavior]`
- Scope checked: `[files/callers/tests]`
- Verification performed: `[commands/tests/manual checks]`
- Actionability: `implement | validate-only | proof-only | no-change | defer | blocked`
- Remaining risk: `[specific residual risk]`
Use this shape for gated outputs. YAML or JSON is acceptable.
verification_packet:
packet_version: CBV-GATE-v1
skill: context-bounded-verification
mode: implementation | review | closure | verification | validation-only | no-change | handoff | audit
objective:
current_workflow_objective: "..."
semantic_change: "..."
done_condition: "..."
artifact_state:
state_id: "commit/diff/checksum/PR/head/base/current-tree label"
source: current-tree | current-diff | pull-request | supplied-snippet | prior-session | unknown
freshness: current | stale | superseded | unknown
dirty_state: clean | dirty | unknown | not-applicable
evidence_refs: []
tier:
declared: tier0 | tier1 | tier2 | tier3
drivers: []
rationale: "..."
scope:
in_scope: []
out_of_scope: []
must_not_change: []
direction_fit:
current_objective_fit: aligned | partial | conflicting | unknown
direction_source: user-current-instruction | plan | pr-body | issue | repo-convention | current-artifact | unknown
stale_or_wrong_objective_pressure: []
evidence_ledger:
- id: E1
claim: "..."
claim_type: validity | actionability | scope | risk | proof | closure
evidence_kind: current-artifact | current-diff | current-test | current-ci | current-command | manual-inspection | runtime-observation | prior-session | memory | reviewer-claim | none
evidence_ref: "file:line, command, test, log, diff, or concrete artifact ref"
artifact_state_match: yes | no | unknown
supports: yes | partial | no | unknown
actionability: implement | validate-only | proof-only | no-change | defer | blocked | closure-pass | none
blast_radius:
surfaces_checked:
- name: "public-api | schema | serialization | generated-artifact | auth | billing | queue | cache | rollout | rollback | ..."
status: checked | not-applicable | unchecked | unknown
evidence_ref: "..."
unchecked_material_surfaces: []
validation:
commands:
- command: "..."
result: pass | fail | not-run | skipped
evidence_ref: "..."
artifact_state_match: yes | no | unknown
tests_added_or_updated: []
negative_or_counterexample_checks: []
proof_surface_changed: yes | no | unknown
test_gap_reason: "..."
authority:
root_owned:
- tier-decision
- scope-boundary
- artifact-state-acceptance
- final-readiness
- implementation-authorization
- handoff-routing
fanout:
required: yes | no
reason: "..."
subagent_packets: []
closure:
readiness: pass | pass-with-residual-risk | validate-only | proof-only | no-change | defer | blocked
closure_claim: "..."
blockers: []
remaining_risk: []
next_action: "..."
handoff:
allowed: yes | no
target: none | accretive-implementer | fixed-point-driver | review-adjudication | verification-closure | human-owner | other
agenda: []
must_not_do: []
When reviewing an existing diff or PR, do not rewrite immediately. First produce:
Focus review comments on semantic risk, proof weakness, and contract drift, not style preferences. A finding may be material but still validate-only, proof-only, no-change, defer, or blocked.
Before this is safe to merge, a human owner should answer:
1. Does any downstream system depend on [contract/format/side effect]?
2. Is [odd existing behavior] intentional or historical accident?
3. Are there deployment windows, customer commitments, or compliance rules for this path?
4. Are staging fixtures representative of production for [edge case]?
5. Who owns rollback if [failure mode] appears after deploy?
Pause, block, or provide only a plan when the requested action would:
A successful use leaves the user with:
tools
Convert markdown plans into beads with dependencies using br CLI. Use when creating task graphs, polishing beads before implementation, or bridging planning to agent swarm execution.
development
Orchestrate Codex skill optimization during active sessions through $cas goal control, $shadow single-session evidence, $tune diagnosis/refinement briefs, and the skill-optimizer custom subagent. Trigger for $opt, skill optimization loops, session-driven skill tuning, meta-skill audits, or explicit validated skill edits. Do not use for general code optimization, product optimization, or performance tuning.
development
Run a targeted fresh-eyes blunder pass over code, specs, plans, adjudications, closure gates, skill edits, or negative-evidence ledgers. Trigger when asked to reread with fresh eyes, find obvious bugs, catch mistakes/oversights/omissions, check for embarrassing misses, or perform a second independent blunder pass before closure. Do not use as a substitute for implementation, adjudication, or verification; use it as the final falsification/check pass for those workflows.
development
Explicitly shadow, tail, watch, follow, monitor, supervise, or companion exactly one Codex session id/path through `$seq`, then apply a named target skill as an interpretation/reporting/proposal/action lens until the watched session stops.