skills/verify-citation-claims/SKILL.md
Audit whether a paper's prose actually matches the content of its cited references. Use before submitting / publishing a paper, after substantial prose revisions, when adding a new citation, when reviewing someone else's paper, or any time a user asks to "check the citations" / "verify claims" / "make sure X is supported." Dispatches parallel subagents that fetch each cited paper and verify the specific claim against specific sections. Complements `bibliography-from-ids`, which prevents fabricated metadata; this skill prevents miscited claims and factual drift in surrounding prose.
npx skillsauth add AMindToThink/claude-code-settings verify-citation-claimsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Core rule: every \cite{…} in a paper must be backed by a claim that the cited paper actually makes. This skill is how you check.
It is the companion to bibliography-from-ids:
| Problem | Skill | When |
|---|---|---|
| Fabricated author lists / titles / years | bibliography-from-ids | write time |
| Miscited claims: citation exists, but paper doesn't support the claim | verify-citation-claims | review time |
| Factual errors in prose near a citation (wrong context length, wrong volume) | verify-citation-claims | review time |
Use both. The first prevents the bib from lying about who wrote what. The second prevents the body text from lying about what they said.
Fire this skill whenever any of these apply:
bibliography-from-ids has fixed the metadata, but before a final PR / mergeDo NOT fire for:
import-content)bibliography-from-ids)A citation is supported iff:
If any of (1)–(3) fails, there is a problem that must be reported.
Parse the paper for all \cite{…}, \citep{…}, \citet{…} etc. For each, record:
If paper/refs_ids.toml exists (from bibliography-from-ids), the claim = "..." field is a useful starting point but must not be trusted alone — it can be stale, and the surrounding prose may have drifted.
For each citation, record the authoritative URL:
https://ar5iv.labs.arxiv.org/html/<id> (first choice — version-agnostic, readable HTML). Fall back to https://arxiv.org/abs/<id> only if ar5iv is down.https://doi.org/<id>https://aclanthology.org/<id>/ (and .pdf)https://openreview.net/forum?id=<id>arXiv URL guidance — lessons from the 2026-04-22 first-invocation test:
https://ar5iv.labs.arxiv.org/html/<id> — Use this first. It does not require a version suffix (no v1 / v2) and returns readable HTML that WebFetch can parse section-by-section.https://arxiv.org/html/<id>v<N> — Avoid. Agents routinely guess the wrong version number and get 404s.https://arxiv.org/abs/<id> — Returns the abstract only. Too thin to verify section-level claims; use only as a last-resort fallback if ar5iv is unreachable.https://arxiv.org/pdf/<id> — Avoid via WebFetch. Comes back as unreadable binary.This is the heavy-lifting step. Use the dispatching-parallel-agents pattern. A dozen citations verified sequentially takes hours; in parallel, minutes.
Task-tool fallback: If the Task / parallel-agent tool is not
available in your harness (as was the case during the 2026-04-22
first-invocation test), substitute parallel WebFetch calls issued
in a single message from the main conversation. The work still
parallelizes at the HTTP level; you lose the isolated-context benefit
(the main agent now carries every paper's content in its window), but
the verification itself still works. Pass the same prompt-template
content as inline instructions, and synthesize the findings yourself.
Per-citation subagent prompt template (also usable as inline guidance
for the WebFetch fallback):
Task: Verify a citation in a research paper by reading the actual cited
paper and checking that its claims match what our paper says.
**Our paper cites:**
<author(s)> (<year>), "<title>" — <venue>. <identifier>.
**Our paper claims this reference supports:**
1. In <section/paragraph>: "<exact quoted sentence from our paper>"
2. <additional claims if any>
**Your task:**
1. Find the paper via <URL>.
- For **arXiv** papers, use `https://ar5iv.labs.arxiv.org/html/<id>` as
the first-try URL. It is version-agnostic (no `v1` / `v2` suffix)
and returns readable HTML. Do NOT try `arxiv.org/html/<id>v<N>`
(agents routinely guess the wrong version and get 404s), and do NOT
WebFetch `arxiv.org/pdf/<id>` (comes back as unreadable binary).
Fall back to `arxiv.org/abs/<id>` only if ar5iv is down — the
abstract alone is too thin to verify section-level claims.
- For DOIs, OpenReview, and ACL Anthology, use the canonical URLs.
2. Read the abstract, introduction, and methods. Then locate the
specific section(s) relevant to each claim above.
3. Verify:
- Does the paper actually make this claim?
- If a specific number is mentioned (e.g., "0.72 bits/byte"), does it
match what the paper reports? Within rounding?
- If a specific method is mentioned (e.g., "self-BLEU"), does the
paper actually use that method?
4. Identify a specific section / equation / table reference that most
directly supports each claim. Recommend a citation more precise than
"[cite]" (e.g., "Smith et al. 2024, §3.2" or "Smith et al. 2024,
Theorem 1").
**If direct fetch fails** (paywall, 404, unreadable PDF), try a
**WebSearch for the specific claim phrase** — a numerical figure, a
section title, or a distinctive quote. It often surfaces the fact on a
cache or citing secondary source (Semantic Scholar, OpenReview
discussion, NASA ADS for physics, DBLP for CS venues, EleutherAI
leaderboards for LM benchmarks). This is particularly useful for
published-venue metadata (journal volumes, page counts) that appears in
indices like ADS, DBLP, or Semantic Scholar.
**If you still cannot access the paper** after trying ar5iv, the
fallback URL, and a WebSearch for the claim phrase, flag clearly:
"UNABLE TO VERIFY — needs manual review." Do NOT guess.
**Return a short report (~300 words) with:**
- arxiv / DOI / canonical URL
- Whether each claim is supported, partially supported, or unsupported
- Specific section / page / equation reference for each claim
- Any concerns (e.g., paper doesn't make the claim we attribute to it,
numbers off by more than rounding, wrong method name)
- Any additional relevant information worth flagging (while reading,
did you notice OTHER nearby claims in our paper that this paper could
support or contradict?)
Critical prompt details:
bibliography-from-ids). Include both the user-supplied fields AND ask the subagent to verify the author list against the actual paper.For each citation, report: Verified / Discrepancy / Unable to Verify. Group by severity:
Save the report to paper/citation_verification_report.md (or reports/…) so it's a diffable artifact. Include a summary table at the top, detailed findings per citation below, and explicitly flag items needing manual review.
For each flagged item, the fix belongs in one of three places — be clear about which:
bibliography-from-ids territory. Add / fix / swap the identifier in refs_ids.toml. Don't hand-edit the .bib.\cite{} targets the wrong paper → swap the citation. If replacing a preprint with a conference version, usually keep the key and just change the identifier in the TOML. If swapping to a different paper entirely, add a new TOML entry and change the \cite{} site.When in doubt, ask the user which of the three to apply. Some disagreements are judgment calls (e.g., whether a paper's Shannon-entropy diversity measure "typically operates at the surface level" — arguably yes in spirit, arguably no in letter).
A sanitized example audit report ships with this skill at
examples/example_report.md. It shows the expected output format:
summary table, per-citation detailed findings, items-needing-review,
items-that-are-fine, and papers-unable-to-access sections.
The real 2026-04-21 audit that motivated this skill (12 citations,
2 minutes wall-clock via parallel subagents, 4 fabricated author lists
found) followed the same template. Findings were routed per Step 5:
bib-side fixes via bibliography-from-ids, prose-side factual errors
(e.g. a wrong context-window number) via hand-edit of the .tex, and one
cite-side fix (citation didn't support its claim) via a paper-swap that
touched both the bib and the \cite{} site.
bibliography-from-idsIf the user sets up the bibliography with bibliography-from-ids from the start, this skill becomes cheaper and more targeted:
claim = "..." TOML field is pre-written, so Step 1 is mostly done.If the project doesn't use bibliography-from-ids yet, you can do Step 1 yourself by reading the .tex and grep'ing for \cite{} sites. Consider recommending bibliography-from-ids to the user as part of the findings.
Citation verification routinely hits dead ends — a PDF is paywalled, the HTML link is stale, arXiv times out. Before giving up and marking "UNABLE TO VERIFY," try in this order:
arxiv.org/html/<id>v<N> (prone to version mismatch) or
arxiv.org/pdf/<id> (unreadable binary). Redirect it to
ar5iv.labs.arxiv.org/html/<id>, which is version-agnostic.import-content.bibliography-from-ids — the write-time companion. Prevents fabricated metadata.import-content — the table-and-numbers analog. Same "script-generated, never hand-typed" principle.dispatching-parallel-agents — the underlying mechanism for Step 3. Use its conventions for the subagent prompts.development
Use when the user asks to check, audit, or improve a website or web project for accessibility (a11y), WCAG compliance, screen reader support, keyboard navigation, color contrast, or alt text. Triggers a plan-mode investigation against the TeachAccess design and code checklists, then implements approved fixes.
development
--- name: make-anonymous-branch description: Use when preparing a research repo for double-blind submission via anonymous.4open.science (ICML/NeurIPS/ICLR/workshop). Builds a single `anon-submission` branch with code+data+paper, scrubs identity leaks (author names, home paths, emails, wandb metadata, PDF author fields), patches LaTeX for pdf.js compatibility, and leaves `main` untouched. Triggers: "make an anonymous branch", "anonymize my repo for X submission", "set up anonymous.4open.science",
development
Translate math (formulas, estimators, algorithms) into code so the implementation faithfully matches what the source actually specifies. Use when writing code from a formula, reviewing an LLM-generated implementation of a formula, debugging a numerical mismatch with a paper, designing a new metric/estimator, or refactoring an existing math-heavy computation. Especially load-bearing whenever aggregation operators (sums, means, expectations, products, geometric means) appear over indices that can be reordered, or whenever the same English label can refer to multiple non-equivalent estimators (e.g. ratio-of-means vs mean-of-ratios, micro-average vs macro-average, sample-weighted vs unweighted). Prevents the failure mode where a code path silently implements the wrong estimator under the same name as the intended one.
development
Use when the user asks to review, find, summarize, or check Claude Code chat transcripts from a past date or time range ("review my chats from May 1st", "what was I working on yesterday", "any unfinished sessions this week"). Reads transcripts under `~/.claude/projects/`, handles local-time vs UTC correctly so late-evening sessions don't get dropped, and flags chats whose last assistant turn looks like an unanswered question.