skills/citation-audit/SKILL.md
Verify that every citation in a manuscript is real, correctly attributed, and accurately described. Detects ghost papers, wrong arXiv IDs, inverted claims, and dead links by fetching each cited work. Optional fix mode applies bib metadata corrections and surfaces prose rewrites for claim errors. Triggers on: "check my citations", "verify references", "citation audit", "are my references real", "check bib", "reference check", "bib audit", "citation verification". Companion to manuscript-review (Pass 5 hygiene); this skill audits factual truth.
npx skillsauth add mathews-tom/armory citation-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Verify every citation in a manuscript against its actual source. LLMs hallucinate citations, invent arXiv IDs, misattribute findings, and confuse authors. This skill catches all of that by fetching and reading each cited work.
LLMs are unreliable with citations in three distinct ways:
All three are invisible to structural audits (cross-reference checks, compilation tests). They require reading the actual cited work.
.tex file(s).bib fileFor each \citep{}, \citet{}, \cite{} in the manuscript:
For each bib entry, verify against the actual source:
For arXiv papers (eprint field present):
https://arxiv.org/abs/{eprint_id}For conference/journal papers (booktitle or journal field):
For web resources (howpublished with URL):
For each entry, check:
@inproceedings for conferences,
@article for journals, @misc for preprints/blogs)For each FACTUAL or POSITIONAL claim:
For INACCURATE and INVERTED findings, provide:
Scan the manuscript for:
### [bib_key] — [VERDICT]
**Bib entry:** [title] by [authors] ([year])
**Actual paper:** [actual title] by [actual authors] ([actual year])
**Metadata match:** title [✓/✗] | authors [✓/✗] | year [✓/✗] | venue [✓/✗]
**Claim in manuscript (line N):** "[exact text]"
**What the paper actually says:** "[summary of actual finding]"
**Claim accuracy:** [ACCURATE / INACCURATE / INVERTED / OVERCLAIMED / UNDERCLAIMED]
**Fix required:** [description of what needs to change, or "None"]
| Bib Key | Exists | Metadata | Claim | Verdict |
|---------|--------|----------|-------|---------|
| key1 | ✓ | ✓ | ✓ | PASS |
| key2 | ✓ | ✗ | ✗ | FAIL |
| key3 | ✗ | — | — | GHOST |
eprint (arXiv ID) and a booktitle
(venue), verify both independently.When the user invokes with an argument containing "fix" or "on", execute Phases 1–4 as above, then apply fixes for every non-PASS citation.
These are mechanical corrections with a single correct answer:
METADATA errors (paper exists, bib entry wrong):
eprint with the correct ID@misc/@inproceedings as appropriateDEAD links:
howpublished URL to the final destinationMinor author corrections:
Present these and wait for the user:
GHOST papers:
INVERTED or INACCURATE claims:
Dead URLs with no replacement found:
.bib file.tex file.bib and .tex to confirm all fixes applied[FIXED], [RESOLVED],
or [DEFERRED][name]-citation-audit.md in the manuscript directory.
testing
Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.
development
Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.
testing
Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.
development
Hypothesis-driven debugging with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses", "isolate this issue", "minimal reproduction". NOT for general reasoning.