skills/bibliography-from-ids/SKILL.md
Set up script-generated bibliography entries so citation metadata is never hand-typed. Use when writing or editing a paper's bibliography / references / related-work, adding a citation, verifying an existing bibliography's correctness, or setting up a new paper project. Prevents fabricated author lists, titles, years, and venues by requiring every entry to be fetched from arXiv / Crossref / ACL Anthology via a build script.
npx skillsauth add AMindToThink/claude-code-settings bibliography-from-idsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Core rule: Never hand-type citation metadata. You list identifiers; a script fetches canonical BibTeX. Author lists, titles, years, and venues are ALL derived from authoritative APIs — never typed from memory.
This is the citation analog of import-content (which does the same for tables and numbers).
LLMs — including you — fabricate citation metadata at rates of 14 % to 95 % across domains (see CheckIfExist 2026, GhostCite 2026, BibTeX Citation Hallucinations 2026). The characteristic failure mode is: real paper title, real year, completely wrong author list. The author list is what gets rendered in every in-text citation, so this is a load-bearing error.
A real incident during paper writing: an audit of 12 citations found 4 fabricated author lists, 1 unsupported claim, 1 wrong volume number, and 4 wrong titles. Every fabricated entry pointed to a real paper — the identifier would have been correct if recorded. Identifier-first authoring would have prevented every single case.
Use this skill before writing any @article{...}, @inproceedings{...}, \bibitem{...}, or similar BibTeX entry. Triggers include:
If the user types \cite{foo} and foo isn't in the bib yet, this skill applies.
Do not use for:
.tex source, in grep, in reviewers' mental models. A key like lam2025noveltybench on a paper whose first author is Yiming Zhang is a hallucinated attribution preserved in amber, even if the rendered PDF says "Zhang et al." correctly. Hallucinations must be eliminated wherever they occur, including keys. If a canonical community key exists (ACL Anthology format, DBLP style, conference-preferred), prefer that; otherwise construct <firstauthor><year><shorttitleword> from real metadata. If you find a key that encodes a wrong attribution, rename it (both in refs_ids.toml and every \cite{} site) — don't leave the lie in the source.Look for these files, in order:
refs_ids.toml / refs_ids.yaml / refs.bib with a % GENERATED FILE bannerscripts/build_bib.py or similarCITATIONS.md / BIBLIOGRAPHY.md workflow docIf the pattern exists: add new entries to the TOML, run the build script, done.
If not: set it up (Step 2).
Minimum four files. Below is a compact reference template. Full working copies of the scripts ship with this skill in examples/ (see the Reference Implementation section below); copy them into the target project to get started faster.
paper/refs_ids.toml — the only human-editable file# Source of truth. Each [[cite]] = citation key + ONE identifier + claim.
# Run: uv run python scripts/build_bib.py
[[cite]]
key = "vaswani2017attention"
arxiv = "1706.03762"
claim = "Transformer architecture; §3.2 scaled dot-product attention."
[[cite]]
key = "devlin2019bert"
acl = "N19-1423" # ACL Anthology ID if published there
claim = "Pretraining via masked LM."
[[cite]]
key = "someauthor2024doi"
doi = "10.1234/xyz.5678"
claim = "Specific claim in §X."
[[cite]]
key = "radford2019language" # OpenAI tech report (no arXiv / DOI)
manual = true
entry = """
@techreport{radford2019language,
author = {Radford, Alec and Wu, Jeffrey and ...},
...
}
"""
claim = "GPT-2 architecture / training."
Precedence when multiple identifiers are given: acl > doi > arxiv > manual. The script picks the first that applies, preferring published metadata over preprint.
scripts/build_bib.py — fetches canonical BibTeXCore responsibilities:
refs_ids.toml (Python 3.12 tomllib is stdlib — no dep).acl:<id> → GET https://aclanthology.org/<id>.bib (returns BibTeX directly)doi:<id> → GET https://doi.org/<id> with Accept: application/x-bibtex (Crossref content negotiation)arxiv:<id> → GET https://export.arxiv.org/api/query?id_list=<id> (returns Atom XML — parse manually)manual → emit the entry = """...""" body verbatim@type{KEY,...} key to match the TOML key= field (ACL/Crossref emit their own keys).% source: and % claim: comment lines.paper/refs.bib atomically (temp-file + os.replace) so a failed run leaves the previous refs.bib intact.Fail-loud philosophy: never silently skip errors. continue on a missing field is forbidden.
{Qwen}) in BibTeX so name-parsing doesn't split them. Also, if a corporate author appears first, natbib will render "Qwen et al. (2024)" instead of the conventional "Yang et al. (2024)" — provide a skip_authors = ["Qwen"] override in the TOML to filter them.: as an "author" between corporate and individual authors — filter out any name with no alphabetic characters.published is in ISO 8601; extract year with [:4].title may contain multi-line whitespace — collapse with " ".join(title.split()).acl: or doi: instead, or add a year_override field.scripts/verify_cites.py — offline linterPure stdlib, no network:
.tex for all \cite, \citep, \citet, \citealp, \citeauthor, etc. variants. Strip comments before matching so % \cite{dead} is ignored.refs.bib entry keys with @\w+\s*\{\s*([^,\s}]+).\cite{} key is missing from the bib; exit 3 if any bib entry is unused (waivable with --ignore-unused).Regex for \cite (handles optional bracket arguments and comma-separated keys):
CITE_RE = re.compile(r"\\cite[a-zA-Z]*\*?(?:\[[^\]]*\])?(?:\[[^\]]*\])?\{([^}]+)\}")
paper/CITATIONS.md — workflow sidecarEvery project that uses the pattern needs this. It explains:
Users should be able to pick up the pattern from CITATIONS.md alone.
tests/test_bib_pipeline.py — unit tests--offline-manual-only flag for CI / smoke-testing without network.@type{KEY,...} when manual entry's key differs from TOML's key.verify_cites.py happy path, missing key, unused key, \cite variant handling, commented-out cites.Replace any \begin{thebibliography}...\end{thebibliography} block with \bibliography{refs}:
\bibliographystyle{plainnat}
\bibliography{refs}
Then run latexmk -pdf — bibtex picks up refs.bib automatically.
Be explicit with the user about the system's limits:
claim = "..." TOML field is the hook for the sister skill verify-citation-claims, which dispatches parallel subagents to check each claim against the actual cited paper.verify-citation-claims as part of span-level review.refs_ids.toml, the tool cannot suggest one.| Symptom | Cause | Fix |
|---|---|---|
| author = {{Qwen}} and ... literal braces showing in PDF | over-bracing corporate author | single braces {Qwen} inside field delimiter |
| First in-text citation rendering as "Team et al." instead of first human author | corporate author recorded by arXiv | skip_authors = ["Team"] in TOML |
| \cite{key} rendering as [?] | key missing from bib | re-run build_bib.py; if still missing, add to TOML |
| Two different "Smith et al. 2025" papers cited | genuine year+author collision | natbib adds a/b suffixes automatically — no action needed |
| arXiv submission year differs from venue year | paper published at later conference | prefer acl: / doi: over arxiv:, or accept the submission year |
| Crossref returns volume 13 for a paper you thought was volume 15 | the arxiv: preprint had wrong metadata | trust the DOI — the published volume is canonical |
| \cite{lam2025X} renders as "Zhang et al. 2025" (key and rendered author don't match) | key was hand-typed against a hallucinated first author and never corrected | rename the key to match the real first author; update refs_ids.toml and every \cite{} site; rebuild refs.bib and re-run verify_cites.py |
Portable reference implementation ships with this skill in examples/:
examples/build_bib.py — the resolver (arXiv / Crossref / ACL Anthology → paper/refs.bib, atomic writes, fail-loud).examples/verify_cites.py — the offline linter. \cite{} ↔ refs.bib correspondence.examples/test_bib_pipeline.py — unit tests (10 tests; uses --offline-manual-only so CI doesn't need network).examples/refs_ids.toml.example — sanitized demo TOML with all four identifier types (arxiv / doi / acl / manual) and the skip_authors override.To use: copy examples/build_bib.py, examples/verify_cites.py, examples/test_bib_pipeline.py into the target project's scripts/ and tests/. Rename examples/refs_ids.toml.example to paper/refs_ids.toml and populate with the project's own citations. Add paper/CITATIONS.md with the project-specific workflow (this SKILL.md is a template for its content).
The absolute paths in these files assume scripts/ and paper/ live at the project root (matching the project layout described in this skill). Tweak the paths at the top of each script if your project uses a different layout.
Don't reimplement from scratch unless the project's stack (e.g. non-Python) requires it.
import-content — the table/numbers analog. Uses the same "identifier → script → generated artifact" pattern.development
Use when the user asks to check, audit, or improve a website or web project for accessibility (a11y), WCAG compliance, screen reader support, keyboard navigation, color contrast, or alt text. Triggers a plan-mode investigation against the TeachAccess design and code checklists, then implements approved fixes.
development
--- name: make-anonymous-branch description: Use when preparing a research repo for double-blind submission via anonymous.4open.science (ICML/NeurIPS/ICLR/workshop). Builds a single `anon-submission` branch with code+data+paper, scrubs identity leaks (author names, home paths, emails, wandb metadata, PDF author fields), patches LaTeX for pdf.js compatibility, and leaves `main` untouched. Triggers: "make an anonymous branch", "anonymize my repo for X submission", "set up anonymous.4open.science",
development
Translate math (formulas, estimators, algorithms) into code so the implementation faithfully matches what the source actually specifies. Use when writing code from a formula, reviewing an LLM-generated implementation of a formula, debugging a numerical mismatch with a paper, designing a new metric/estimator, or refactoring an existing math-heavy computation. Especially load-bearing whenever aggregation operators (sums, means, expectations, products, geometric means) appear over indices that can be reordered, or whenever the same English label can refer to multiple non-equivalent estimators (e.g. ratio-of-means vs mean-of-ratios, micro-average vs macro-average, sample-weighted vs unweighted). Prevents the failure mode where a code path silently implements the wrong estimator under the same name as the intended one.
development
Use when the user asks to review, find, summarize, or check Claude Code chat transcripts from a past date or time range ("review my chats from May 1st", "what was I working on yesterday", "any unfinished sessions this week"). Reads transcripts under `~/.claude/projects/`, handles local-time vs UTC correctly so late-evening sessions don't get dropped, and flags chats whose last assistant turn looks like an unanswered question.