skills/skills-codex/citation-audit/SKILL.md
Zero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports — catching hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations. Use when user says "审查引用", "check citations", "citation audit", "verify references", "引用核对", or before submission to ensure bibliography integrity.
npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep citation-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Verify every \cite{...} in a paper against three independent layers:
This skill is the fourth layer of \aris{}'s evidence-and-claim assurance, complementing experiment-audit (code), result-to-claim (science verdict), and paper-claim-audit (numerical claims). Together they form a bottom-up integrity stack from raw evaluation code to manuscript bibliography.
Run before submission. The right gating point is:
paper-write has produced the LaTeX draft and bib filepaper-claim-audit has verified numerical claimspaper-compile for submissionDo not run this on a half-written draft — most of the work is in cross-checking each \cite against context, which is wasted on placeholder text.
The dangerous citation problems are not wildly fake citations — those are easy to spot. The dangerous ones are:
gpt-5.5 — Used via Codex MCP. Default for cross-model review with web access.fresh — Each audit run uses a new reviewer thread (REVIEWER_BIAS_GUARD). Continue only with send_input when explicitly resuming the same audit.CITATION_AUDIT.md — Human-readable per-entry verdict report.CITATION_AUDIT.json — Machine-readable verdict ledger consumable by downstream tools.false — When true (set via — soft-only / — soft_only flag), the audit runs all three layers normally but forbids any .bib file mutation. Findings that would otherwise mutate the bib (FIX / REPLACE / REMOVE) are translated into per-occurrence sentence-rewrite proposals against the citing *.tex files. Used by /resubmit-pipeline Phase 1 to honor the user's hard "freeze the bib" constraint.true (default), auto-render CITATION_AUDIT.md to HTML after writing the report. Uses full review gate (audit-class artifact). Set false to skip, or pass — render html: false. Non-blocking: failures don't invalidate the audit verdict.Locate:
references.bib (or paper.bib / similar) under the paper directory*.tex files containing \cite{...} calls (typically sec/ or sections/)If multiple bib files exist, audit each separately.
For each \cite{key1,key2,...} invocation in the paper:
Output a flat list of (key, file, line, surrounding_sentence) tuples.
Also build the inverse: for each bib entry, the list of all places it is cited.
Define two protocol sets used throughout the rest of the workflow: cited_keys is the set of unique cite keys appearing in any \cite{...} invocation across the audited *.tex files (de-duplicated), and bib_keys is the set of keys parsed from the audited bib file(s). cited_keys drives Step 3 (audit only cited entries); bib_keys \ cited_keys is the uncited residual surfaced by the --uncited opt-in.
If the user passed --uncited, also compute the set difference bib_keys \ cited_keys here and stash it for use in Steps 5 and the JSON aggregation; see "Uncited Entry Detection (opt-in)" below for the protocol. The set-diff is a string operation only and does not consume reviewer budget.
Save the extracted contexts to paper/.aris/citation-audit/contexts.txt so the reviewer can read it directly. Use the paper-dir-relative path .aris/citation-audit/contexts.txt when recording the file in audited_input_hashes; do not stage under /tmp or other transient locations that the verifier cannot rehash later.
For each cited bib entry — i.e., each key in cited_keys with at least one extracted citation context — launch a fresh Codex reviewer agent. Do not reuse the same reviewer across entries. Do not spawn an agent for entries in bib_keys \ cited_keys; those are detect-only and surface only when --uncited is explicitly enabled (see "Uncited Entry Detection" below).
spawn_agent:
model: gpt-5.5
reasoning_effort: xhigh
message: |
You are auditing a bibliographic entry. Use web/DBLP/arXiv search.
## Bib entry
@article{key2024example,
author = {...}, title = {...}, journal = {...}, year = {...}, ...
}
## Where this entry is cited in the paper
[paste extracted contexts]
For this entry, verify:
1. EXISTENCE: does this paper exist at the claimed arXiv ID / DOI / venue?
Output: YES / NO / UNCERTAIN, with the verifying URL.
2. METADATA: are author names, year, venue, title correct?
For each, output: correct / wrong: should be ... / typo: ...
3. CONTEXT: for each use, does the cited paper actually support the surrounding claim?
Output per-use: SUPPORTS / WEAK / WRONG, with one-sentence reasoning.
VERDICT: KEEP / FIX / REPLACE / REMOVE
- KEEP: entry is clean, all uses are appropriate
- FIX: metadata needs correction; uses are appropriate
- REPLACE: cite is wrong-context, find a different paper that actually supports the claim
- REMOVE: entry is hallucinated or unsupportable
Be honest. If you cannot verify online, say UNCERTAIN; do not guess.
Save the response to .aris/traces/citation-audit/<date>_runNN/<key>.md per the review-tracing protocol.
Build CITATION_AUDIT.json following the schema defined in "Submission
Artifact Emission" below (single authoritative schema for this file).
Per-entry ledger data goes under details.per_entry, not under a
top-level entries field. The top-level verdict is a single overall
value (PASS / WARN / FAIL / NOT_APPLICABLE / BLOCKED / ERROR) derived
from per-entry verdicts per the decision table in "Submission Artifact
Emission"; the top-level summary is a one-line human-readable string.
Concretely, details carries the per-entry ledger:
"details": {
"total_entries": 29,
"counts": { "KEEP": 11, "FIX": 14, "REPLACE": 3, "REMOVE": 1 },
"per_entry": [
{
"key": "lu2024aiscientist",
"verdict": "KEEP",
"axis_failures": [],
"uses": [
{"file": "sections/1.intro.tex", "line": 11, "verdict": "SUPPORTS"},
{"file": "sections/6.related.tex", "line": 8, "verdict": "SUPPORTS"}
]
},
{
"key": "madaan2023selfrefine",
"verdict": "FIX",
"axis_failures": ["CONTEXT"],
"uses": [
{"file": "sections/2.overview.tex", "line": 42, "verdict": "WRONG",
"note": "Self-Refine demonstrates iterative improvement, not correlated errors"},
{"file": "sections/6.related.tex", "line": 13, "verdict": "SUPPORTS"}
]
}
]
}
See "Submission Artifact Emission" for the full artifact (top-level
fields audit_skill, verdict, reason_code, summary,
audited_input_hashes, trace_path, thread_id, reviewer_model,
reviewer_reasoning, generated_at, details).
Write CITATION_AUDIT.md:
# Citation Audit Report
**Date**: 2026-04-19
**Bib file(s)**: references.bib
**Total entries**: 29
## Summary
| Verdict | Count |
|---------|------|
| KEEP | 11 |
| FIX | 14 |
| REPLACE | 3 |
| REMOVE | 1 |
## Priority Fixes (CRITICAL — apply before submission)
### REMOVE: anon2025placeholder
- Author listed as "Anonymous" — canonical record exists with real authors and full title
- Title is incomplete
- ACTION: Replace key with the canonical citekey, update authors and title
### REPLACE-CONTEXT: example2023priorwork in sec/2.overview.tex:42
- Cited to support a specific technical claim
- The cited paper actually demonstrates a different (related but distinct) phenomenon
- ACTION: Rewrite the sentence; cite the prior work for what it actually establishes
[... continues for each entry ...]
## All-Clean Entries (no action needed)
[list of KEEP keys]
When --uncited is set, append the following section after "All-Clean Entries":
## Uncited Entries (opt-in)
The following bib entries are present in the audited bib file(s) but are not referenced by any `\cite{...}` in the paper body:
- `author2010example` — suggestion: prune (uncited; no local evidence of intent)
- `someone2015othercite` — suggestion: prune (uncited; no local evidence of intent)
- `third2024todo` — suggestion: check (a `% TODO: cite third2024todo` comment was found in `sections/3.related.tex`)
This section is detect-only; it does not change the top-level verdict.
For each FIX/REPLACE/REMOVE verdict, prompt the user:
Fix [key]?
Change: <description of change>
Files affected: references.bib + sec/X.tex:Y
[Apply / Skip / Defer]
If AUTO_APPLY = true, apply all FIX-level changes (metadata corrections only). REPLACE and REMOVE always require human approval — they involve content changes.
latexmk -C && latexmk -pdf -interaction=nonstopmode main.tex
Confirm:
Citation undefined warningsReference undefined warningsDefault: disabled. Existing users see no behavior change — only \cite{...} keys are audited, and bib entries with no \cite reference in the manuscript are silently ignored.
Opt-in: pass --uncited on invocation. The skill then performs a set-diff after Step 2 and reports bib entries that appear in any audited bib file(s) but are not cited anywhere in the paper. Detect-only — uncited entries are not sent to the reviewer agent, so there is no extra reviewer/web-lookup cost.
This skill's headline output is the three-axis audit on cited entries. Surfacing uncited bib entries by default would (a) change long-form output for every existing run, and (b) noise up the verdict for users who intentionally maintain a superset bib file (e.g., shared lab bib, in-progress section reorder where the cite has been removed but the entry intentionally retained). The flag preserves zero behavior change for existing callers.
When --uncited is set:
CITATION_AUDIT.md gains a ## Uncited Entries (opt-in) section listing the keys with a one-line suggestion each: prune (entry is dead weight; recommend deleting) or check (entry might be intentional; flag for user review). Default suggestion is prune; only emit check when there is concrete local evidence (e.g., a TODO comment in a .tex file mentioning the key, or a recently removed \cite visible in git diff). Do not infer intent from the bib key string alone.CITATION_AUDIT.json details gains an uncited_entries array; see "Submission Artifact Emission" below for the schema.verdict is unchanged: uncited entries do not upgrade or downgrade the PASS / WARN / FAIL / etc. classification. The reason_code and summary are likewise unchanged in shape; only the details.uncited_entries field appears.paper-writing Phase 6, verify_paper_audits.sh) MUST NOT treat the presence of uncited_entries as a blocking signal.If --uncited is enabled but full bib-key enumeration fails (e.g., malformed bib syntax that the parser cannot recover), the cited-entry audit must still proceed if at all possible. In that case:
verdict, reason_code, or summary.details.uncited_entries as an empty array [].details.uncited_entries_status: "unavailable" plus a one-line note explaining why (e.g., "bib parser could not enumerate keys; cited-entry audit completed normally").unavailable the same as the field being absent: not blocking.If the bib file cannot be read well enough to audit even the cited entries, fall back to the existing BLOCKED / bib_unreadable path defined in the verdict decision table; this is the same behavior as the no-flag default.
CITATION_AUDIT.json with a verdict; the decision to block finalization lives in paper-writing Phase 6 + verify_paper_audits.sh, driven by the assurance level. See "Submission Artifact Emission" below.--uncited--soft-only, citation-audit emits text-rewrite proposals only; bib files are never mutated regardless of finding severity. The audit semantics (existence + metadata + context) and the per-entry KEEP/FIX/REPLACE/REMOVE ledger are preserved verbatim; only the action layer is translated to per-occurrence sentence rewrites in the citing *.tex files. Refuse any downstream-proposed bib edit while --soft-only is set.| Skill | What it audits | What it catches |
|-------|---------------|-----------------|
| /experiment-audit | Evaluation code | Fake ground truth, self-normalized scores, phantom results |
| /result-to-claim | Result-to-claim mapping | Claims unsupported by evidence |
| /paper-claim-audit | Numerical claims in manuscript | Number inflation, best-seed cherry-pick, config mismatch |
| /citation-audit | Bibliographic entries | Hallucinated refs, wrong-context citations, metadata errors |
Together: code → result → numerical claim → cited claim. Each layer has cross-family review with no executor in the validator path.
and others are conventional and not flagged unless the truncation hides a co-author the user explicitly cares about.After each reviewer agent call, save the trace following shared-references/review-tracing.md (Policy C — forensic; never silently skip). Use save_trace.sh (resolved per the chain in shared-references/integration-contract.md §2) or write files directly to .aris/traces/citation-audit/<date>_run<NN>/. Respect the --- trace: parameter (default: full).
CITATION_AUDIT.md (human-readable report) at paper rootCITATION_AUDIT.json (machine-readable ledger; schema below) at paper root.aris/traces/citation-audit/<date>_runNN/ (per-entry review traces)references.bib + sec/*.tex (with --apply flag)details.uncited_entries field in JSON + ## Uncited Entries (opt-in) MD section (with --uncited flag; field absent and section omitted when flag is unset)CITATION_AUDIT.html (when RENDER_HTML = true, default) — auto-rendered single-file HTML view via /render-html "CITATION_AUDIT.md" --json "CITATION_AUDIT.json". Full review gate. Sidecar .review.json carries render-fidelity verdict. Non-blocking: failures are logged and the audit MD + JSON remain the canonical outputs.Default: disabled. The audit emits the standard KEEP / FIX / REPLACE / REMOVE per-entry verdicts and a downstream caller (or the --apply path of Step 6) is free to mutate the bib.
Opt-in: pass — soft-only (also accepts — soft_only) on invocation. This mode is designed for callers — notably /resubmit-pipeline Phase 1 — that operate under a hard "freeze the bib" constraint: if a citation is wrong-context, soften the surrounding sentence; do not change, add, or remove the cite itself.
The audit semantics are unchanged: existence + metadata + context-appropriateness checks all run, the reviewer is still invoked once per cited entry, and the per-entry KEEP/FIX/REPLACE/REMOVE verdicts are still computed and emitted exactly as in default mode. Only the action layer changes — soft-only translates each base verdict into a text-rewrite proposal instead of a bib mutation.
| Base verdict | Soft-only translation | Notes |
|---|---|---|
| KEEP | keep_unchanged | No action. Cite + sentence are both fine. |
| FIX (metadata wrong) | keep_metadata_drift_acknowledged | Bib stays as-is. Flag for human review at submission time. Append note: "metadata drift detected but not fixed under --soft-only". |
| REPLACE (wrong-context cite) | soften_citing_sentence | Per-occurrence sentence-rewrite proposal. For each \cite{X} in the body, locate the surrounding sentence and propose a softened version that does not claim what X actually establishes. |
| REMOVE (cite refers to nonexistent paper — i.e., hallucinated citation) | drop_cite_in_body_only | The bib entry is left untouched (per the --soft-only invariant), but the inline \cite{X} references in the body MUST be removed and the surrounding sentence rewritten so it no longer relies on a nonexistent paper. Two sub-strategies: (a) drop the inline \cite{X} entirely and rephrase the sentence; (b) re-attribute to a different in-bib source that genuinely supports the claim. Never leave a \cite{X} to a hallucinated paper in the body. The bib entry itself stays (uncited entries are harmless and surfaced separately by uncited-detection). |
--soft-only)When the flag is set, the standard top-level fields (audit_skill, verdict, reason_code, summary, etc.) and the existing details.per_entry ledger are emitted exactly as in default mode. In addition:
soft_only_mode: true boolean is added.details gains a soft_only_actions array — one entry per audited bib key, in the same order as details.per_entry.{
"audit_skill": "citation-audit",
"verdict": "...",
"soft_only_mode": true,
"details": {
"soft_only_actions": [
{
"citekey": "smith2023example",
"base_verdict": "REPLACE",
"soft_action": "soften_citing_sentence",
"occurrences": [
{
"file": "sec/3.method.tex",
"line": 142,
"current_sentence": "Smith et al. [2023] proves a generic result that...",
"proposed_rewrite": "Smith et al. [2023] discusses a related setting; while not directly applicable, the framing motivates...",
"rationale": "Original sentence claims smith2023 'proves' a result, but smith2023 actually only conjectures it. Softened to 'discusses ... motivates' to remove the unsupported claim."
}
]
}
]
}
}
soft_action is one of keep_unchanged | keep_metadata_drift_acknowledged | soften_citing_sentence | drop_cite_in_body_only. For keep_unchanged and keep_metadata_drift_acknowledged, occurrences MAY be omitted or emitted as []. For soften_citing_sentence and drop_cite_in_body_only, occurrences MUST list one object per \cite{X} site in the body that triggered the verdict.
For drop_cite_in_body_only, the proposed_rewrite field shows the sentence with the inline \cite{X} removed (or replaced by a \cite{Y} to an alternate in-bib source). The bib_entry_action field is fixed to "leave_as_is_per_soft_only" — the bib record itself is never modified by the audit.
CITATION_AUDIT.md gains a new section ## Soft-Only Rewrites (— soft-only mode) listing each occurrence with the proposed sentence rewrite for human approval. Example:
## Soft-Only Rewrites (— soft-only mode)
The bib file is frozen. The following sentence rewrites are proposed in lieu of bib edits.
### `smith2023example` — base verdict REPLACE → `soften_citing_sentence`
- **File**: `sec/3.method.tex:142`
- **Current**: "Smith et al. [2023] proves a generic result that..."
- **Proposed**: "Smith et al. [2023] discusses a related setting; while not directly applicable, the framing motivates..."
- **Rationale**: Original sentence claims smith2023 "proves" a result, but smith2023 actually only conjectures it. Softened to "discusses ... motivates" to remove the unsupported claim.
The existing per-entry verdict table in the Summary block is kept but FIX/REPLACE/REMOVE rows are annotated with a 🔒 bib frozen by --soft-only badge so downstream readers see immediately why the bib was not mutated.
--soft-only.bib file mutations under any circumstance. Step 6 ("Apply fixes (interactive)") is bypassed for the bib file; only *.tex rewrite proposals are produced (and still require human approval before any text edit).paper-writing Phase 6 or any wrapper — proposes a bib edit while --soft-only is set, refuse it: emit a one-line refusal in the trace and continue to the next finding.verdict decision table is unchanged: a wrong-context cite still produces FAIL with reason_code: wrong_context. Soft-only does not silence the finding; it only constrains the action layer.--soft-only composes with --uncited: both flags can be set together. Uncited entries remain detect-only and are not subject to soft-only translation (there is no citing sentence to soften).This skill always writes paper/CITATION_AUDIT.json, regardless of
caller or detector outcome. A paper with no .bib file or no \cite{...}
usage emits verdict NOT_APPLICABLE; silent skip is forbidden.
paper-writing Phase 6 and verify_paper_audits.sh both rely on
this artifact existing at a predictable path.
The artifact conforms to the schema in shared-references/assurance-contract.md:
{
"audit_skill": "citation-audit",
"verdict": "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
"reason_code": "all_entries_keep | metadata_drift | wrong_context | hallucinated | ...",
"summary": "One-line human-readable verdict summary.",
"audited_input_hashes": {
"references.bib": "sha256:...",
"main.tex": "sha256:...",
"sections/3.related.tex": "sha256:..."
},
"trace_path": ".aris/traces/citation-audit/<date>_run<NN>/",
"thread_id": "<codex mcp thread id>",
"reviewer_model": "gpt-5.5",
"reviewer_reasoning": "xhigh",
"generated_at": "<UTC ISO-8601>",
"details": {
"total_entries": <int>, // count of audited cited entries (= |cited_keys|), NOT the bib-file size
"per_entry": [ { "key": "madaan2023selfrefine",
"verdict": "KEEP | FIX | REPLACE | REMOVE",
"axis_failures": [ "CONTEXT" | "METADATA" | "EXISTENCE" ],
"note": "..." }, ... ]
}
}
details.uncited_entries (only when --uncited is set)"details": {
...
"uncited_entries": [
{"key": "<bibkey>", "suggestion": "prune" | "check", "note": "..."}
],
"uncited_entries_status": "ok" | "unavailable"
}
Field semantics:
uncited_entries_status is "ok" and uncited_entries lists the detected keys (possibly empty if every bib entry is cited).uncited_entries_status is "unavailable" and uncited_entries is []. Downstream consumers MUST treat "unavailable" identically to the field being absent: not blocking.suggestion is advisory only; the verifier and paper-writing Phase 6 do not block on it.audited_input_hashes scopeHash the declared input set actually passed to this audit: the .bib
file, main.tex, and every sections/*.tex file that supplied citation
contexts. Do NOT hash extracted contexts from /tmp or other transient
paths — if you need to stage extracted contexts, materialize them under
paper/.aris/ so the verifier can rehash reproducibly. Do NOT hash
repo-wide unions or the reviewer's self-reported opened subset.
Path convention (must match verify_paper_audits.sh): keys are
paths relative to the paper directory (no paper/ prefix — the
verifier already resolves relative to the paper dir; prefixing produces
paper/paper/... and false-fails as STALE). Use absolute paths for
any file outside the paper dir.
| Input state | Verdict | reason_code example |
|----------------------------------------------------------------|------------------|-----------------------|
| No .bib file or no \cite{...} usage | NOT_APPLICABLE | no_citations |
| .bib file referenced but unreadable / missing | BLOCKED | bib_unreadable |
| Every entry KEEP, all three axes green | PASS | all_entries_keep |
| Only FIX verdicts (metadata drift, no context errors) | WARN | metadata_drift |
| Any REPLACE or REMOVE (wrong-context or hallucinated entry) | FAIL | wrong_context |
| Web lookups timed out / reviewer invocation failed | ERROR | reviewer_error |
The --uncited flag does not appear in this table: uncited entries are advisory only and never alter the top-level verdict or reason_code. They surface exclusively through details.uncited_entries and the optional MD section.
Every invocation uses a fresh reviewer agent. Never reuse send_input across
different bibliography entries. Do not accept prior audit outputs (PROOF_AUDIT,
PAPER_CLAIM_AUDIT, EXPERIMENT_LOG) as input — the fresh thread preserves
reviewer independence per shared-references/reviewer-independence.md.
This skill never blocks by itself; paper-writing Phase 6 plus the
verifier decide whether the verdict blocks finalization based on the
assurance level.
/paper-claim-audit — sibling skill for numerical claim verification/experiment-audit — sibling skill for evaluation code integrity/result-to-claim — claim verdict assignment from resultsshared-references/citation-discipline.md — protocol document for citation hygieneshared-references/reviewer-independence.md — cross-model review constraintsdata-ai
Generate and rank research ideas given a broad direction. Use when user says "找idea", "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
development
Get a deep critical review of research from GPT using a secondary Codex agent. Use when user says "review my research", "help me review", "get external review", or wants critical feedback on research ideas, papers, or experimental results.
data-ai
Generate and rank research ideas given a broad direction. Use when user says "找idea", "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
development
Autonomous multi-round research review loop. Repeatedly reviews using a secondary Codex agent, implements fixes, and re-reviews until positive assessment or max rounds reached. Use when user says "auto review loop", "review until it passes", or wants autonomous iterative improvement.