skills/43-wentorai-research-plugins/skills/tools/document/anystyle-api/SKILL.md
Citation reference parser using machine learning
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research anystyle-apiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
AnyStyle is a fast and smart citation reference parser that uses machine learning (specifically conditional random fields, CRFs) to extract structured bibliographic data from unformatted citation strings. It can parse raw reference text into structured fields such as author, title, journal, volume, pages, year, and DOI, handling the enormous variety of citation formats found in academic literature.
The AnyStyle service provides both a web interface and an API endpoint for programmatic citation parsing. Unlike rule-based parsers that rely on specific citation style templates, AnyStyle uses a trained machine learning model that generalizes across citation formats, making it effective for parsing references from diverse disciplines and publication traditions where citation styles vary widely.
Researchers, librarians, digital humanists, and research software developers use AnyStyle to extract structured references from PDF documents, legacy bibliographies, dissertation reference lists, and scanned documents. It is particularly valuable for building citation networks, enriching bibliographic databases, migrating references between management tools, and processing large volumes of unstructured citation data that would be impractical to parse manually.
No authentication required. The AnyStyle web service is freely accessible without any API key, token, or registration. The service can be used via the web interface at https://anystyle.io/ or through its API endpoint. For heavy usage or private deployments, AnyStyle is also available as an open-source Ruby gem that can be installed locally.
Submit raw citation text and receive structured bibliographic data extracted by the machine learning parser. The endpoint accepts one or more citation strings and returns parsed fields for each reference.
POST https://anystyle.io/parse| Parameter | Type | Required | Description |
|-----------|--------|----------|--------------------------------------------------------------|
| body | string | Yes | Raw citation text (one reference per line in the POST body) |
| format | string | No | Output format: json (default), xml, bib (BibTeX) |
# Parse a single citation
curl -X POST "https://anystyle.io/parse" \
-H "Content-Type: text/plain" \
-d "Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998-6008."
# Parse multiple citations (one per line)
curl -X POST "https://anystyle.io/parse" \
-H "Content-Type: text/plain" \
-d "Vaswani, A. et al. (2017). Attention is all you need. NeurIPS 30, 5998-6008.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. NeurIPS 25."
[
{
"author": [{"family": "Vaswani", "given": "A."}, {"family": "Shazeer", "given": "N."}],
"title": ["Attention is all you need"],
"date": ["2017"],
"container-title": ["Advances in Neural Information Processing Systems"],
"volume": ["30"],
"pages": ["5998-6008"],
"type": "article-journal"
}
]
Key response fields include author (array of name objects), title, date, container-title (journal/conference name), volume, issue, pages, doi, url, publisher, location, and type (inferred reference type).
No formal rate limits are documented for the AnyStyle web service. However, the service is provided as a free community resource, so users should exercise responsible usage patterns. For high-volume parsing tasks (thousands of citations or more), it is strongly recommended to install the AnyStyle Ruby gem locally:
gem install anystyle
The local installation provides the same parsing capabilities without any network dependencies or rate concerns, and supports batch processing of large reference lists and PDF files directly.
Extract structured data from a raw reference list copied from a PDF:
import requests
references = """Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT, 4171-4186.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., et al. (2020). Language models are few-shot learners. NeurIPS 33, 1877-1901."""
resp = requests.post(
"https://anystyle.io/parse",
headers={"Content-Type": "text/plain"},
data=references
)
for ref in resp.json():
authors = ", ".join(
f"{a.get('family', '')} {a.get('given', '')}" for a in ref.get("author", [])
)
title = ref.get("title", [""])[0]
year = ref.get("date", [""])[0]
journal = ref.get("container-title", [""])[0]
print(f"{authors} ({year}). {title}. {journal}")
Process reference lists from multiple papers for citation network analysis:
import requests
def parse_references(raw_text):
"""Parse raw citation text into structured records."""
resp = requests.post(
"https://anystyle.io/parse",
headers={"Content-Type": "text/plain"},
data=raw_text
)
if resp.status_code == 200:
return resp.json()
return []
# Process references from multiple source documents
documents = {
"paper_A": "Smith, J. (2020). Title A. Journal X, 1, 1-10.\nDoe, J. (2019). Title B. Journal Y, 2, 20-30.",
"paper_B": "Jones, K. (2021). Title C. Conference Z, 100-110.\nSmith, J. (2020). Title A. Journal X, 1, 1-10."
}
citation_graph = {}
for doc_id, refs in documents.items():
parsed = parse_references(refs)
citation_graph[doc_id] = parsed
print(f"{doc_id}: parsed {len(parsed)} references")
Transform unstructured references into BibTeX entries for use with LaTeX:
import requests
citation = "Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press."
resp = requests.post(
"https://anystyle.io/parse",
headers={"Content-Type": "text/plain"},
data=citation
)
parsed = resp.json()[0]
# Build a BibTeX entry from parsed fields
authors = " and ".join(
f"{a.get('family', '')}, {a.get('given', '')}" for a in parsed.get("author", [])
)
bib_key = parsed.get("author", [{}])[0].get("family", "unknown").lower() + parsed.get("date", ["0000"])[0]
print(f"@book{{{bib_key},")
print(f" author = {{{authors}}},")
print(f" title = {{{parsed.get('title', [''])[0]}}},")
print(f" year = {{{parsed.get('date', [''])[0]}}},")
print(f" publisher = {{{parsed.get('publisher', [''])[0] if parsed.get('publisher') else 'Unknown'}}}")
print("}")
For large-scale processing, install AnyStyle locally as a Ruby gem:
# Install the gem
gem install anystyle
# Parse references from command line
anystyle parse "Smith, J. (2020). My Paper. Journal, 1, 1-10."
# Parse references from a text file
anystyle parse references.txt --format json > parsed.json
# Parse references directly from a PDF
anystyle find document.pdf --format json > extracted_refs.json
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.