skills/openalex/SKILL.md
Query OpenAlex API from the command line with curl and jq for publication discovery, filtering, sorting, pagination, and PDF availability checks. Use when searching scholarly works/authors/sources, building or debugging OpenAlex queries, extracting results, or downloading available PDFs using OPENALEX_API_KEY.
npx skillsauth add ondata/skills openalexInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill to run reliable OpenAlex API workflows from shell.
IMPORTANT: Always write
curlcommands on a single line. Multi-line\continuation breaks argument parsing in agent environments and will cause errors.
SECURITY: Never expose the actual value of
OPENALEX_API_KEYanywhere — not in text responses, not in echoed commands, not in logs. Always reference it as$OPENALEX_API_KEY. If the key appears in any output, stop immediately and do not repeat it.
A task is complete when:
Results
display_name), year, citation countProcess
curl written on a single lineapi_key included in every requestselect= used to limit returned fieldsjq used to format outputPDF download (when requested)
export OPENALEX_API_KEY='...'
To verify it is set without printing the value:
[[ -n "${OPENALEX_API_KEY:-}" ]] && echo "key is set" || echo "ERROR: OPENALEX_API_KEY not set"
curl -sS --get 'https://api.openalex.org/works' --data-urlencode 'search="data quality" AND "open government data"' --data-urlencode 'filter=type:article,from_publication_date:2023-01-01' --data-urlencode 'sort=relevance_score:desc' --data-urlencode 'per-page=200' --data-urlencode 'select=display_name,publication_year,cited_by_count,doi' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {title:.display_name, year:.publication_year, cited_by:.cited_by_count, doi}'
works, authors, sources, etc.).search block with boolean logic (AND, OR, NOT, quotes, parentheses).filter constraints (type/date/language/OA/citation fields).select (root-level fields only).page or cursor=*.jq and save/transform as needed.Use this when building or debugging non-trivial queries.
per-page=5 or per-page=10) and minimal select=.display_name, year, DOI).filter=title.search:"..."search=... with same filterssearch, filter, sort, per-page, pagination mode).per-page=200, then cursor=* for deep pagination).Avoid jumping directly from a paper/spec to a full extraction script without this short validation loop.
title.search=: searches only in the title — use this by default for focused results. Must be passed inside filter=, not as a standalone parameter: filter=title.search:"your query".search=: full-text search across the entire document — use only when title-only matching is too restrictive.search.semantic=: semantic/conceptual search (costs $0.001/request; requires API key).filter=: exact/structured constraints; comma means AND.sort=: relevance_score:desc, cited_by_count:desc, publication_date:desc, etc.per-page=: 1..200. Default is 25 — always set per-page=200 for bulk queries (8× fewer API calls).page=: page number for standard pagination.cursor=*: deep pagination beyond first 10k records.select=: reduce payload; nested paths are not allowed in select.group_by=: aggregate results by a field (e.g. group_by=publication_year, group_by=topics.id).sample=: random sample of N results (e.g. sample=20). Add seed=42 for reproducibility.Filters are comma-separated AND conditions. Within a single attribute:
| Logic | Syntax | Example |
|-------|--------|---------|
| AND (comma) | filter=a:x,b:y | filter=type:article,is_oa:true |
| OR (pipe) | filter=type:article\|book | multiple values for same field |
| NOT (exclamation) | filter=type:!journal-article | negation |
| Greater than | filter=cited_by_count:>100 | comparison |
| Less than | filter=publication_year:<2020 | comparison |
| Range | filter=publication_year:2020-2023 | inclusive range |
Combine up to 50 IDs in one request using the pipe operator — avoid sequential calls:
# Batch DOI lookup (up to 50 per request)
curl -sS --get 'https://api.openalex.org/works' --data-urlencode 'filter=doi:https://doi.org/10.1/abc|https://doi.org/10.2/def' --data-urlencode 'per-page=50' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {title:.display_name, doi}'
Names are ambiguous; always resolve to an OpenAlex ID first, then filter.
Step 1 — find the entity ID:
curl -sS --get 'https://api.openalex.org/authors' --data-urlencode 'search=Heather Piwowar' --data-urlencode 'per-page=5' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {id, display_name}'
Step 2 — use the ID in a filter:
curl -sS --get 'https://api.openalex.org/works' --data-urlencode 'filter=authorships.author.id:A5023888391' --data-urlencode 'per-page=200' --data-urlencode 'select=id,display_name,publication_year,cited_by_count' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {title:.display_name, year:.publication_year}'
Applies to: authors (authorships.author.id), institutions (authorships.institutions.id), sources/journals (primary_location.source.id). External IDs are also accepted: ORCID, ROR, ISSN, DOI.
For a work ID:
.content_urls.pdf.best_oa_location.pdf_url.primary_location.pdf_url.locations[].pdf_urlapi_key query parameter when source is content.openalex.org.When displaying results, always show display_name as the title — never use doi or id in its place.
Minimal jq for a results table:
| jq -r '.results[] | [.display_name, .publication_year, .cited_by_count, .doi] | @tsv'
Or as structured objects:
| jq '.results[] | {title: .display_name, year: .publication_year, cited_by: .cited_by_count, doi}'
To save results as a CSV file, use jq with @csv and include a header row:
curl -sS --get 'https://api.openalex.org/works' ... --data-urlencode "api_key=$OPENALEX_API_KEY" | jq -r '["title","year","cited_by","doi"], (.results[] | [.display_name, .publication_year, .cited_by_count, (.doi // "")]) | @csv' > results.csv
Rules:
// "" for fields that may be null (e.g. doi) — @csv fails on null values.-r (raw output) so @csv produces plain text, not JSON strings.Implement exponential backoff on 403 (rate limit) and 500 (server error):
attempt 1 → wait 1s → attempt 2 → wait 2s → attempt 3 → wait 4s → attempt 4 → wait 8s
HTTP codes:
200 — success400 — invalid parameter or filter syntax; fix the query403 — rate limit exceeded; back off and retry404 — entity not found500 — temporary server error; retry with backoffWith the free $1/day budget:
| Request type | Cost | Daily limit |
|---|---|---|
| Singleton (/works/W123) | free | unlimited |
| List / filter | $0.0001 | ~10,000 requests |
| Search (full-text or semantic) | $0.001 | ~1,000 requests |
| PDF download (content.openalex.org) | $0.01 | ~100 downloads |
Use select= and per-page=200 to minimize request count.
.id or .doi as the title field in jq output — .id is an OpenAlex URL, .doi is a DOI URL; always use .display_name for human-readable titles.id in select= unless you need the OpenAlex URL for follow-up lookups — it is a URL, not a title, and confuses output.relevance_score without a search query.select (example: use open_access, then parse .open_access.is_oa with jq).per-page=25 (default) for bulk extraction — always set per-page=200.search= searches full text and can return loosely related results. Use title.search= when the topic must appear in the title.curl commands on a single line — multi-line \ continuation breaks argument parsing in agent environments.title.search is NOT a valid standalone parameter — always pass it inside filter=: filter=title.search:"your query".api_key=$OPENALEX_API_KEY in every request.$OPENALEX_API_KEY.
[[ -n "${OPENALEX_API_KEY:-}" ]] && echo "key is set" || echo "ERROR: OPENALEX_API_KEY not set".references/query-recipes.mdscripts/openalex_query.shscripts/openalex_download_pdf.shdevelopment
Guides users step by step in drafting a formal complaint (segnalazione) to Italy's Digital Civic Defender (Difensore Civico per il Digitale, DCD) at AGID for violations of the CAD (Codice dell'Amministrazione Digitale) or other digitalization norms by public administrations. Use this skill whenever someone wants to: report an Italian PA to AGID; write to the Difensore Civico per il Digitale; complain about open data violations, non-machine-readable public data, inaccessible PA portals, missing or restrictive licenses on public data, captchas blocking automated access, unanswered data reuse requests (D.Lgs. 36/2006 art. 5), failure to publish mandatory High Value Datasets (HVD, Reg. (UE) 2023/138), or a prior DCD complaint that got no response. Trigger even if the user does not name the skill — any Italian digital-rights complaint targeting a PA is a candidate.
development
Create charts, choropleth maps, and locator maps via the Datawrapper API. Use this skill whenever the user wants to publish a visualization on Datawrapper, create an interactive chart or map from data, generate a PNG/embed from Datawrapper, or use the Datawrapper REST API. Triggers on: "create a map with datawrapper", "publish a chart on datawrapper", "choropleth map", "locator map datawrapper", "export PNG from datawrapper", and any request involving creating or configuring Datawrapper charts/maps programmatically. Also triggers for Italian variants: "mappa coropletica datawrapper", "crea grafico datawrapper", "mappa datawrapper".
development
Generate PNG images for online communication — social media, carousels, infographics, posts — using Typst. Use this skill whenever the user wants to create slides, cards, visual posts or any digital graphic content, even if they don't explicitly mention Typst. The skill drives an interview about brand materials (logo, palette, fonts, DESIGN.md), proposes the formats best suited to the context (Instagram 1:1, Stories 9:16, LinkedIn 16:9, etc.) and produces ready-to-use PNGs.
testing
Comprehensive open data quality validator for two audiences: data analysts who need to assess whether a dataset is ready to use, and public administrations who want to self-evaluate their published data. Automatically adapts based on input type: (A) local CSV file only — performs file-level structural and content checks; (B) CKAN/open data portal dataset — adds metadata completeness, resource accessibility, URL reachability, and DCAT-AP compliance (supports all national profiles: DCAT-AP 2.x baseline, IT, BE, NL, DE, FR, UK, ES, and others). Always use this skill when the user mentions: data quality, validate dataset, check CSV, open data compliance, metadata audit, CKAN dataset review, "is this data usable?", or whenever a CSV file or CKAN dataset ID/URL is provided for quality assessment. Produces severity-ranked reports (blocker / major / minor) with concrete fixes, quality score, and a plain-language summary for non-technical stakeholders.