MJ-skill/artifact-grounded-review/SKILL.md
Enforces that Claude (and Codex when applicable) must read actual code and artifacts before any claim — covers both REVIEW tasks (dual analysis) and AUTHORING tasks (writing briefs / specs that reference code state).
npx skillsauth add develata/deve-skills artifact-grounded-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This protocol applies to two complementary task families:
Any user-facing or artifact-persisted text that makes a claim about current code / data / manifest / artifact state:
All of the above trigger Rule 0 below.
When Claude feeds Codex a pre-digested summary instead of raw file paths, two failures occur:
Trigger: any user-facing claim OR artifact-persisted text making an assertion about current code / data / manifest / artifact state. Covers brief / spec / RFC AND analysis-question answers / progress reports / verdict explanations.
Principle: Before saying or writing "code is X" / "no code change to Y" / quoting a threshold / API contract / manifest fact / runtime behavior, the author MUST Read the relevant file(s) and cite file:line. Text-level reasoning over remembered facts (including the author's own prior text within this session) is insufficient.
Why: text-level reasoning systematically misses generator call order, manifest cardinality drift, API schema details, silent threshold shifts, and post-commit state changes. In review tasks codex catches these as "blocking spec bugs"; in direct user analysis, unverified claims become false answers the user acts on. Both failure modes were preventable at claim time.
How to apply (during drafting / answering):
[TODO-VERIFY: <file>] inline (briefs) OR explicitly disclosed as "from memory, may be stale" (user analysis) — they do NOT survive to finalized text without Read+citeTrigger calibration (RCA 2026-05-29, Claude+Codex cowork): Rule 0 fires by fact type and artifact / user-facing consequence, not by felt uncertainty. High-confidence convention facts (signatures, CLI flags, config keys, exact quotes, numbers) are still Read+cite facts when they enter durable text or a final answer; "I'm sure" is a reason to open the source, not to skip it. (You are most confident on convention-bound facts — exactly where a project-specific tail you cannot know hides.) Scope = Rule 0's Trigger + Counter-example list above; this is not a new trigger, it removes "I'm certain" as a valid reason to skip an existing one.
Calibration example (authoring): v2 brief said "drop _assign_envelopes_length_aware() from builder is the only change". Author should have Read r3_cohort_builder.py:292-305 to discover the function runs AFTER full-100 HP10 generation, meaning v2's "Group A 1:1 inheritance" requires reversing the entire generator call order. Codex round-2 caught this with read-only re-derivation showing 20/144 units would violate EOL constraint.
Calibration example (user analysis): user asks "现在 Component-C 输出的 envelope schema 是什么". Wrong: answer from memory of P2 v2 brief text. Right: Read component_c_io.py:119-122 first, then cite the actual schema with file:line.
Counter-example (where this rule does NOT apply):
CLAUDE.md / AGENTS.md LOCKED anchors — by-construction verified at lock time (but cite the anchor)Both Claude and Codex must read primary sources before forming any analytical conclusion.
When Claude dispatches Codex for any analytical task:
DO:
DO NOT:
/tmp/claude_pre_review_<ID>.md (even if brief). This creates an immutable record that prevents unconscious anchoring to Codex's framing. The pre-review file is referenced in the final report's evidence trail.dual-agent-original-request-review/SKILL.md § Standard Process step 5 — the pre-review file serves as Claude's independent baseline for the audit.Behavior claims need code evidence (file:line); result claims need artifact evidence (file:key). Script existence ≠ result production; design intent without executed artifact = 0 credit. Trail format and verification cost rules: see dual-agent-original-request-review/SKILL.md § Verification Discipline.
Before trusting any result artifact, verify it is not stale:
stat -f '%m' <artifact> vs git log -1 --format='%ct' -- <script>Staleness verdict:
STALE in the report; do not use for scoring or conclusions without re-running.assumption-dependent.Before emitting any score:
Codex dispatch template for scoring:
Read these files, then evaluate:
Source: [paths]
Artifacts: [paths]
Paper: [path]
Checklist: [specific verification questions]
Rubric: [dimensions]
Rules: cite file:key for every justification. Flag UNVERIFIED for unbacked claims.
Before forming interpretive conclusions:
Before diagnosing or judging:
For run-level errors in dispatched experiments. Composes Rules 1-4 above + dual-agent-original-request-review § Conflict Resolution.
Trigger: codex dispatch FAIL / .stall marker / artifact missing / acceptance gate FAIL / metric outside expected band / DISPATCH_STATUS=stalled.
Round 1 — Parallel independent RCA
Both parties read failing artifacts directly (Rule 1); each writes its own _rca_<side>.md BEFORE reading the other (Rule 3). Each RCA must include:
| Field | Content | |---|---| | Root cause | One-sentence smallest causal claim | | Evidence | path:line / artifact:key (Rule 4) | | Class | code-bug / data-corruption / spec-drift / dep-change / seed / env / external | | Cost | Minutes-to-hours remediation estimate | | Confidence | HIGH / MEDIUM / LOW |
Round 2 — Convergence (verdict labels per websearch-cowork § Stage 4):
_RCA_CONVERGE.md + remediation plan, exit coworkRound 3 — Focused re-investigation on disputed axis only
Both sides re-read the single artifact / line range that disambiguates the dispute. Do NOT expand scope.
Round 4 (arbitration) — Claude decides 1-shot
Per autonomous-mode governance (≤3 cowork rounds hard cap + Claude arbitration on convergence failure):
_rca_*.md outputs side by side + any additional evidence cited by either side_RCA_ARBITRATION.md with final verdict + reasoning + evidence citedDeliverables (one per error event):
<run_dir>/_rca_claude.md
<run_dir>/_rca_codex.md
<run_dir>/_RCA_CONVERGE.md (if R1/R2/R3 converges)
<run_dir>/_RCA_ARBITRATION.md (if Round 4 arbitration fires)
<run_dir>/_REMEDIATION_PLAN.md (always, after verdict)
Anti-patterns:
## Evidence Gaps (place first)
- [claim] — [missing artifact or code evidence]
## Dimension Scores
| Dimension | Claude | Codex | Converged | Evidence |
|---|---:|---:|---:|---|
## Overall
- Thesis: X/10
- Journal: X/10
## Integrity Risks
1. [risk + specific file:line or artifact:key]
## Evidence Gaps (place first)
- [claim] — [missing artifact or code evidence]
- [artifact] — staleness status: Fresh / Stale / Unknown
## Findings (ordered by impact)
| # | Finding | Evidence (file:line or artifact:key) | Confidence |
|---|---------|--------------------------------------|------------|
## Points of Agreement
- [both agents agree on X — shared evidence trail]
## Points of Divergence
- [Claude: X (evidence)] vs [Codex: Y (evidence)] → resolution or escalation
## Verdict-Affecting Claims Audit
| Claim | Executor status | Reviewer disposition | Trail |
|-------|----------------|---------------------|-------|
## Residual Unknowns
- [what remains unverified and why]
## Recommendation
- [actionable next step with justification]
tools
Collect and audit Codex token usage with a bundled Python CLI and optional Windows batch launchers. Use this skill when the user asks to check Codex token usage, generate daily token audit logs, calculate monthly CostUSD totals, review Codex spending, or run Codex token usage scripts on Windows, Bash, WSL, or Linux.
tools
Use when giving the user an INLINE reply that carries a trade-off, a decision, a verdict, or a non-trivial finding (decision brief / round verdict / failure root-cause). NOT for "done"/status confirmations, one-line answers, or pure data dumps. Forces a compact decision-brief shape and blocks internal tool-name / file-path bleed into user-facing text.
development
Use for cross-file or cross-chapter terminology audits and corpus-wide term unification in thesis/paper sources — extract candidate term drift, build a decision queue, classify each occurrence, apply accepted replacements safely, and verify counts/build. Trigger on "术语审计", "术语统一", "术语一致性", "逐词审", "这个词全文怎么用", "把 X 全文改成 Y", "terminology audit", or "unify term X". Do NOT use for ordinary prose drafting or a single known-location edit; use academic-writing for prose quality and claim-boundary judgment.
tools
Use for ANY codex CLI dispatch via dispatch wrapper (no time threshold; presence-of-risk triggers, not estimated wall — stdin-EOF stalls occur at <60s). Combines internal log-inactivity watchdog wrapper + external Claude-session cron probe + sentinel hook enforcement. Detects stalls in ≤60-270s vs hours-without-detection failure mode.