MJ-skill/claude-decision-discipline/SKILL.md
Use BEFORE proposing any multi-hour task, recommending compromise-vs-clean choice, citing a "named" rule/mechanism, or skipping plan iteration. Five guards against Claude's training-time biases that systematically override user-aligned principles. Triggered by proposal, recommendation, citation, or plan authoring.
npx skillsauth add develata/deve-skills claude-decision-disciplineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Claude has documented biases that override user-aligned principles when not consciously checked:
| Bias | Symptom | Guard | |---|---|---| | Generate-strong / verify-weak asymmetry | Proposes without grep/arithmetic/artifact-scan | G1 V1-V4 | | Risk-aversion prior beats user long-term framing | "Save 3-7h via compromise" wins over clean design | G2 4-question | | Path-of-least-resistance shortcuts | sys.path widening / outside-fork imports / re-run instead of frozen pin | G3 long-term durability | | Self-authorization (paraphrase → "named rule") | "按 X 机制" where X is Claude's own framing | G4 no-self-auth | | Single-shot plan → missed gaps | codex finds 3 BRIEF_GAPS in Stage 3 Class B | G5 plan*do iteration |
Before ANY multi-hour sub-task / new source change / new cowork dispatch / new paper framing / new methodology choice, run all 4:
| Verify | Question | Fail action | |---|---|---| | V1 source-grep | Did I grep every file:line / function / constant I'm about to cite? | grep + verify, then propose | | V2 arithmetic close | Does the arithmetic actually support my "X% lift" / "Y hours saved" claim? | self-downgrade target | | V3 existing-artifact scan | Has this measurement/computation already been done in some run_dir / dataset / ladder probe? | cite existing, don't redo | | V4 paper-defense pre-mortem | What reviewer attack surface does this create? (simulation-only / single-seed / no real-data / LOCKED-CORE touch) | adjust framing or downgrade |
Any V fails → NOT propose. Fix first.
Trigger: when recommending between cleanest-design path (matches user instinct, preserves LOCKED-CORE, single-cohort/single-method) and compromise path (mixes cohorts/methods, partial-credit). Ask IN ORDER (not weighted-mixed):
The order is the rule. Permuting it re-introduces the risk-aversion bias. Q1+Q2 must weigh before Q3.
Skip for: tactical-only decisions (which fold first / which test to write) — no cleanness-axis differential.
Trigger amplifier: user instinct contradicting first-pass Claude recommendation = strongest signal Claude is running on training-prior. Stop, re-run 4-question in writing.
Architectural / dependency / integration trade-offs: pick cleanest principled option that preserves long-term archivability + isolation, NOT shortest path.
Apply:
legacy_fork/ over sys.path widening or direct cross-package import_paths.pyReject: "just import from outside" / "just patch over there" / "just add features ahead of empirical need".
Before citing any named rule / mechanism / protocol as decision authority, verify source:
CLAUDE.md / AGENTS.md / memory written by user instruction?If (3) → forbidden:
Counter-example: "按 paper-grade 防御机制" (Claude paraphrase) vs "你 instructed 过长期 clean; 第一性原理这一行 fix 风险接近零" (user's actual words + direct reasoning).
Each "按 X 机制 / 按 Y 协议 / 按 Z 规则" must pass source check. If unsure → drop the named reference, state principle directly.
Two modes based on task type:
Trigger: design-judgment-heavy task / plan with ambiguity / Spec or AGENTS.md condition cite involved.
Phase 1: Claude draft plan (NOT full code — plan only)
Phase 2: codex critique plan (independent BRIEF_GAPS check; no Claude anchor in prompt)
Phase 3: converge 1-3 rounds (≥4 rounds → escalate user)
Phase 4: coder by task type (codex implementer | Claude inline | Claude skeleton + codex fill)
Phase 5: independent review (Claude code → codex review; codex code → Claude review; NEVER cowork audit)
Skip for: trivial / pure measurement / lit review (use websearch-cowork) / patch-verify dry-run.
Trigger: binary / N-ary decision (Option A/B/C); selecting scope items; ACCEPT/REVISE/FAIL verdict on draft.
Round 1: Claude + codex independent analysis (brief lists options + decision dimensions; no mutual anchor)
→ verdict match? CONVERGE / Round 2
Round 2: each updates given other's Round 1 verdict + objections
→ verdict match? CONVERGE / escalate user (do NOT Round 3)
Convergence signal: both verdicts AGREE + evidence sources OVERLAP → high confidence. Verdicts agree but evidence disjoint → mid confidence (may be coincidence; Round 2 to verify).
Any guard fails → STOP. Fix first. "But I still think the direction is worth doing" = fail expression, not justification.
| Scenario | Skip | |---|---| | Trivial bug fix (1-line typo) | All 5 | | Single-line answer / Q&A no propose | G1, G5 | | User explicit instruction overriding | G2, G4 (user instinct already stated) | | Already-running task's micro-step | G1, G5 (gate is task boundary) |
When ambiguous → default APPLY (over-apply cost ~1-2 min; miss cost = hours of rework).
| Guard | Pre-correction (failed) | Post-correction |
|---|---|---|
| G1 V1 | Proposed "3-6h Oracle Bayes T3" without grep | grep gamma_sensor_pipeline.run_sensor_cycle → forward-only → withdraw or redirect |
| G2 4Q | Recommended Option C (~5h compromise) using E[cost] framing | Q1+Q2 favor STAGED_GO; Q3 sunk cost FLOOR not RANKING → CHOSEN STAGED_GO |
| G3 | "Just sys.path widen to a-sibling-data-repo/code/" | Fork into legacy_fork/; pay one-time isolation cost |
| G4 | "按 paper-grade 防御机制 sidecar 协议字面要求 audit" | "第一性原理 + 你 instructed 过长期 clean, 风险接近零, 直接 sign-off" |
| G5 | Stage 3 Class B brief solo draft → executed → 3 GAPS in output | Phase 2 codex critique caught 3 GAPS before execution |
dual-agent-original-request-review — Verification Discipline is the cowork-side mirror of G1artifact-grounded-review Rule 4 — "claims require artifacts" is G1 V1 enforcementcodex-orchestration § 2 — role assignment for G5 Phase 4 coder selectionuser-explanation-5step — Step 3 (risk-if-wrong) is G2 Q4 (failure-knowledge) operationalized for user-facing reportstools
Collect and audit Codex token usage with a bundled Python CLI and optional Windows batch launchers. Use this skill when the user asks to check Codex token usage, generate daily token audit logs, calculate monthly CostUSD totals, review Codex spending, or run Codex token usage scripts on Windows, Bash, WSL, or Linux.
tools
Use when giving the user an INLINE reply that carries a trade-off, a decision, a verdict, or a non-trivial finding (decision brief / round verdict / failure root-cause). NOT for "done"/status confirmations, one-line answers, or pure data dumps. Forces a compact decision-brief shape and blocks internal tool-name / file-path bleed into user-facing text.
development
Use for cross-file or cross-chapter terminology audits and corpus-wide term unification in thesis/paper sources — extract candidate term drift, build a decision queue, classify each occurrence, apply accepted replacements safely, and verify counts/build. Trigger on "术语审计", "术语统一", "术语一致性", "逐词审", "这个词全文怎么用", "把 X 全文改成 Y", "terminology audit", or "unify term X". Do NOT use for ordinary prose drafting or a single known-location edit; use academic-writing for prose quality and claim-boundary judgment.
tools
Use for ANY codex CLI dispatch via dispatch wrapper (no time threshold; presence-of-risk triggers, not estimated wall — stdin-EOF stalls occur at <60s). Combines internal log-inactivity watchdog wrapper + external Claude-session cron probe + sentinel hook enforcement. Detects stalls in ≤60-270s vs hours-without-detection failure mode.