skills/citation-audit/SKILL.md
Zero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says "审查引用", "check citations", "citation audit", "verify references", "引用核对", or before submission to ensure bibliography integrity.
npx skillsauth add shaun-z/auto-claude-code-research-in-sleep citation-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Verify every \cite{...} in a paper against three independent layers:
This skill is the fourth layer of \aris{}'s evidence-and-claim assurance, complementing experiment-audit (code), result-to-claim (science verdict), and paper-claim-audit (numerical claims). Together they form a bottom-up integrity stack from raw evaluation code to manuscript bibliography.
Run before submission. The right gating point is:
paper-write has produced the LaTeX draft and bib filepaper-claim-audit has verified numerical claimspaper-compile for submissionDo not run this on a half-written draft — most of the work is in cross-checking each \cite against context, which is wasted on placeholder text.
The dangerous citation problems are not wildly fake citations — those are easy to spot. The dangerous ones are:
gpt-5.4 — Used via Codex MCP. Default for cross-model review with web access.fresh — Each audit run uses a new reviewer thread (REVIEWER_BIAS_GUARD). Never codex-reply.CITATION_AUDIT.md — Human-readable per-entry verdict report.CITATION_AUDIT.json — Machine-readable verdict ledger consumable by downstream tools.Locate:
references.bib (or paper.bib / similar) under the paper directory*.tex files containing \cite{...} calls (typically sec/ or sections/)If multiple bib files exist, audit each separately.
For each \cite{key1,key2,...} invocation in the paper:
Output a flat list of (key, file, line, surrounding_sentence) tuples.
Also build the inverse: for each bib entry, the list of all places it is cited.
Save the extracted contexts to paper/.aris/citation-audit/contexts.txt so the reviewer can read it directly. Use the paper-dir-relative path .aris/citation-audit/contexts.txt when recording the file in audited_input_hashes; do not stage under /tmp or other transient locations that the verifier cannot rehash later.
For each bib entry, invoke mcp__codex__codex (NOT codex-reply — fresh thread per entry, or batch with explicit per-entry isolation):
mcp__codex__codex:
model: gpt-5.4
config: {"model_reasoning_effort": "xhigh"}
sandbox: read-only
prompt: |
You are auditing a bibliographic entry. Use web/DBLP/arXiv search.
## Bib entry
@article{key2024example,
author = {...}, title = {...}, journal = {...}, year = {...}, ...
}
## Where this entry is cited in the paper
[paste extracted contexts]
For this entry, verify:
1. EXISTENCE: does this paper exist at the claimed arXiv ID / DOI / venue?
Output: YES / NO / UNCERTAIN, with the verifying URL.
2. METADATA: are author names, year, venue, title correct?
For each, output: correct / wrong: should be ... / typo: ...
3. CONTEXT: for each use, does the cited paper actually support the surrounding claim?
Output per-use: SUPPORTS / WEAK / WRONG, with one-sentence reasoning.
VERDICT: KEEP / FIX / REPLACE / REMOVE
- KEEP: entry is clean, all uses are appropriate
- FIX: metadata needs correction; uses are appropriate
- REPLACE: cite is wrong-context, find a different paper that actually supports the claim
- REMOVE: entry is hallucinated or unsupportable
Be honest. If you cannot verify online, say UNCERTAIN; do not guess.
Save the response to .aris/traces/citation-audit/<date>_runNN/<key>.md per the review-tracing protocol.
Build CITATION_AUDIT.json following the schema defined in "Submission
Artifact Emission" below (single authoritative schema for this file).
Per-entry ledger data goes under details.per_entry, not under a
top-level entries field. The top-level verdict is a single overall
value (PASS / WARN / FAIL / NOT_APPLICABLE / BLOCKED / ERROR) derived
from per-entry verdicts per the decision table in "Submission Artifact
Emission"; the top-level summary is a one-line human-readable string.
Concretely, details carries the per-entry ledger:
"details": {
"total_entries": 29,
"counts": { "KEEP": 11, "FIX": 14, "REPLACE": 3, "REMOVE": 1 },
"per_entry": [
{
"key": "lu2024aiscientist",
"verdict": "KEEP",
"axis_failures": [],
"uses": [
{"file": "sections/1.intro.tex", "line": 11, "verdict": "SUPPORTS"},
{"file": "sections/6.related.tex", "line": 8, "verdict": "SUPPORTS"}
]
},
{
"key": "madaan2023selfrefine",
"verdict": "FIX",
"axis_failures": ["CONTEXT"],
"uses": [
{"file": "sections/2.overview.tex", "line": 42, "verdict": "WRONG",
"note": "Self-Refine demonstrates iterative improvement, not correlated errors"},
{"file": "sections/6.related.tex", "line": 13, "verdict": "SUPPORTS"}
]
}
]
}
See "Submission Artifact Emission" for the full artifact (top-level
fields audit_skill, verdict, reason_code, summary,
audited_input_hashes, trace_path, thread_id, reviewer_model,
reviewer_reasoning, generated_at, details).
Write CITATION_AUDIT.md:
# Citation Audit Report
**Date**: 2026-04-19
**Bib file**: references.bib
**Total entries**: 29
## Summary
| Verdict | Count |
|---------|------|
| KEEP | 11 |
| FIX | 14 |
| REPLACE | 3 |
| REMOVE | 1 |
## Priority Fixes (CRITICAL — apply before submission)
### REMOVE: hidden2025aiscientistpitfalls
- Author listed as "Anonymous" — actual authors are Luo, Kasirzadeh, Shah
- Title is incomplete
- ACTION: Replace key with `luo2025aiscientistpitfalls`, update authors and title
### REPLACE-CONTEXT: madaan2023selfrefine in sec/2.overview.tex:42
- Cited to support: "single-model self-refinement can produce correlated errors"
- Self-Refine paper actually demonstrates iterative IMPROVEMENT, not correlated errors
- ACTION: Rewrite the sentence; cite Self-Refine for "self-feedback loop" framing instead
[... continues for each entry ...]
## All-Clean Entries (no action needed)
[list of KEEP keys]
For each FIX/REPLACE/REMOVE verdict, prompt the user:
Fix [key]?
Change: <description of change>
Files affected: references.bib + sec/X.tex:Y
[Apply / Skip / Defer]
If AUTO_APPLY = true, apply all FIX-level changes (metadata corrections only). REPLACE and REMOVE always require human approval — they involve content changes.
latexmk -C && latexmk -pdf -interaction=nonstopmode main.tex
Confirm:
Citation undefined warningsReference undefined warningsCITATION_AUDIT.json with a verdict; the decision to block finalization lives in paper-writing Phase 6 + tools/verify_paper_audits.sh, driven by the assurance level. See "Submission Artifact Emission" below.| Skill | What it audits | What it catches |
|-------|---------------|-----------------|
| /experiment-audit | Evaluation code | Fake ground truth, self-normalized scores, phantom results |
| /result-to-claim | Result-to-claim mapping | Claims unsupported by evidence |
| /paper-claim-audit | Numerical claims in manuscript | Number inflation, best-seed cherry-pick, config mismatch |
| /citation-audit | Bibliographic entries | Hallucinated refs, wrong-context citations, metadata errors |
Together: code → result → numerical claim → cited claim. Each layer has cross-family review with no executor in the validator path.
and others are conventional and not flagged unless the truncation hides a co-author the user explicitly cares about.After each mcp__codex__codex reviewer call, save the trace following shared-references/review-tracing.md. Use tools/save_trace.sh or write files directly to .aris/traces/citation-audit/<date>_run<NN>/. Respect the --- trace: parameter (default: full).
CITATION_AUDIT.md (human-readable report) at paper rootCITATION_AUDIT.json (machine-readable ledger; schema below) at paper root.aris/traces/citation-audit/<date>_runNN/ (per-entry review traces)references.bib + sec/*.tex (with --apply flag)This skill always writes paper/CITATION_AUDIT.json, regardless of
caller or detector outcome. A paper with no .bib file or no \cite{...}
usage emits verdict NOT_APPLICABLE; silent skip is forbidden.
paper-writing Phase 6 and tools/verify_paper_audits.sh both rely on
this artifact existing at a predictable path.
The artifact conforms to the schema in shared-references/assurance-contract.md:
{
"audit_skill": "citation-audit",
"verdict": "PASS | WARN | FAIL | NOT_APPLICABLE | BLOCKED | ERROR",
"reason_code": "all_entries_keep | metadata_drift | wrong_context | hallucinated | ...",
"summary": "One-line human-readable verdict summary.",
"audited_input_hashes": {
"references.bib": "sha256:...",
"main.tex": "sha256:...",
"sections/3.related.tex": "sha256:..."
},
"trace_path": ".aris/traces/citation-audit/<date>_run<NN>/",
"thread_id": "<codex mcp thread id>",
"reviewer_model": "gpt-5.4",
"reviewer_reasoning": "xhigh",
"generated_at": "<UTC ISO-8601>",
"details": {
"total_entries": <int>,
"per_entry": [ { "key": "madaan2023selfrefine",
"verdict": "KEEP | FIX | REPLACE | REMOVE",
"axis_failures": [ "CONTEXT" | "METADATA" | "EXISTENCE" ],
"note": "..." }, ... ]
}
}
audited_input_hashes scopeHash the declared input set actually passed to this audit: the .bib
file, main.tex, and every sections/*.tex file that supplied citation
contexts. Do NOT hash extracted contexts from /tmp or other transient
paths — if you need to stage extracted contexts, materialize them under
paper/.aris/ so the verifier can rehash reproducibly. Do NOT hash
repo-wide unions or the reviewer's self-reported opened subset.
Path convention (must match tools/verify_paper_audits.sh): keys are
paths relative to the paper directory (no paper/ prefix — the
verifier already resolves relative to the paper dir; prefixing produces
paper/paper/... and false-fails as STALE). Use absolute paths for
any file outside the paper dir.
| Input state | Verdict | reason_code example |
|----------------------------------------------------------------|------------------|-----------------------|
| No .bib file or no \cite{...} usage | NOT_APPLICABLE | no_citations |
| .bib file referenced but unreadable / missing | BLOCKED | bib_unreadable |
| Every entry KEEP, all three axes green | PASS | all_entries_keep |
| Only FIX verdicts (metadata drift, no context errors) | WARN | metadata_drift |
| Any REPLACE or REMOVE (wrong-context or hallucinated entry) | FAIL | wrong_context |
| Web lookups timed out / reviewer invocation failed | ERROR | reviewer_error |
Every invocation uses a fresh mcp__codex__codex thread. Never
codex-reply. Do not accept prior audit outputs (PROOF_AUDIT,
PAPER_CLAIM_AUDIT, EXPERIMENT_LOG) as input — the fresh thread preserves
reviewer independence per shared-references/reviewer-independence.md.
This skill never blocks by itself; paper-writing Phase 6 plus the
verifier decide whether the verdict blocks finalization based on the
assurance level.
/paper-claim-audit — sibling skill for numerical claim verification/experiment-audit — sibling skill for evaluation code integrity/result-to-claim — claim verdict assignment from resultsshared-references/citation-discipline.md — protocol document for citation hygieneshared-references/reviewer-independence.md — cross-model review constraintsdevelopment
Generate publication-quality academic illustrations through a local Codex app-server bridge that uses Codex native image generation. This is a separate experimental alternative to `paper-illustration`, intended for Claude Code users who want a GPT-image-style renderer without modifying the original skill.
development
Two-way sync between a local paper directory and an Overleaf project via the Overleaf Git bridge (Premium feature). Lets you keep ARIS audit/edit workflows on the local copy while collaborators edit in the Overleaf web UI. Token never touches the agent — user does the one-time auth via macOS Keychain. Use when user says "同步 overleaf", "overleaf sync", "推送到 overleaf", "connect overleaf", "Overleaf 桥接", "pull overleaf", "push overleaf", or wants to bridge their ARIS paper directory with an Overleaf project.
data-ai
Paragraph-level structural blueprint for 10-12 page systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. Provides page allocation, paragraph templates, and writing patterns. Use when user says "写系统论文", "systems paper structure", "OSDI paper", "SOSP paper", or wants fine-grained structural guidance for a systems conference submission.
devops
Rent, manage, and destroy GPU instances on vast.ai. Use when user says "rent gpu", "vast.ai", "rent a server", "cloud gpu", or needs on-demand GPU without owning hardware.