Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lyndonkl/fetch-preprint-recent

Name: fetch-preprint-recent
Author: lyndonkl

skills/fetch-preprint-recent/SKILL.md

npx skillsauth add lyndonkl/claude fetch-preprint-recent

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

fetch-preprint-recent

Fetch preprints from bioRxiv or medRxiv for a date window, normalize the records, and keyword-filter them.

Workflow

- [ ] Step 1: Validate inputs (server, from, to, keywords)
- [ ] Step 2: Page through the details endpoint until messages.cursor exhausted
- [ ] Step 3: Normalize each record to the canonical shape
- [ ] Step 4: Dedupe within the window (keep latest version per DOI)
- [ ] Step 5: Keyword-filter on lowercased title + abstract
- [ ] Step 6: Return matched records + a summary (total fetched, total matched, pages fetched)

Step 1 — Validate inputs

Required:

server: one of biorxiv or medrxiv (the API uses these literal strings)
from: YYYY-MM-DD, inclusive
to: YYYY-MM-DD, inclusive, must be ≥ from
keywords: list of strings; may include multi-word phrases; case-insensitive matching

Reject if window > 31 days (the API supports it but you almost never want to dump a month of preprints in one call without a stronger filter — flag and confirm).

Step 2 — Page through the endpoint

The endpoint is:

https://api.biorxiv.org/details/{server}/{from}/{to}/{cursor}

Start with cursor=0.

The response shape is:

{
  "messages": [{"status": "ok", "interval": "2026-05-04/2026-05-10", "cursor": "0", "count": 100, "total": 327}],
  "collection": [ { record }, { record }, ... ]
}

After consuming collection, increment cursor by 100 (the page size is fixed) and refetch until cursor + count >= total.
Cap pages at 20 (2,000 records) as a safety; if you hit the cap, surface a "window may be over-broad" warning and return what you have.

Use WebFetch with the URL. If WebFetch returns malformed JSON or a 5xx, retry once with a 2-second backoff; on second failure, return partial results with a fetch_errors field listing the failed cursors.

Step 3 — Normalize each record

The API returns fields like doi, title, authors, author_corresponding, author_corresponding_institution, date, version, type, license, category, jatsxml, abstract, published, server. Reduce to:

{
  "id": "10.1101/2026.05.07.123456",          // doi
  "title": "...",
  "authors": ["Smith J", "Doe A", ...],        // split the API's `authors` string on `;`
  "abstract": "...",
  "date": "2026-05-07",
  "server": "biorxiv",                          // or "medrxiv"
  "version": 1,
  "category": "neuroscience",                   // bioRxiv subject area
  "url": "https://www.biorxiv.org/content/10.1101/2026.05.07.123456v1",
  "published_doi": null                         // populated if the preprint has been published; from `published` field
}

URL pattern: https://www.{server}.org/content/{doi}v{version} (no https://doi.org/ redirect — direct to the preprint server keeps the abstract page accessible).

Step 4 — Dedupe within the window

The same DOI can appear with multiple version values if the authors revised mid-window. Keep the highest version per DOI. Drop the rest.

Step 5 — Keyword-filter

For each kept record, check whether any keyword (or phrase) appears in lowercase(title + " " + abstract). Match logic:

Multi-word keyword like "protein language model" → must appear as a contiguous substring.
Single-word keyword like "crispr" → must appear with word boundaries (don't match "crisper").
OR across all keywords (paper kept if any keyword matches).

Track which keyword(s) matched per paper — downstream paper-relevance-filter will use that signal.

Step 6 — Return

Return a payload like:

{
  "server": "biorxiv",
  "window": "2026-05-04/2026-05-10",
  "fetched_total": 327,
  "matched_total": 14,
  "pages_fetched": 4,
  "fetch_errors": [],
  "records": [ {normalized record + "matched_keywords": [...]} , ... ]
}

Cache the raw API JSON (pre-normalization) to the agent's .cache/ directory under {YYYY-WW}-{server}.json so a re-run can skip the network if the user wants to re-synthesize without re-fetching.

Common Patterns

Pattern A — One server, one window: standard call. Use this in a weekly digest.

Pattern B — Multi-week catch-up: call once per week, never one giant 28-day window. The cursor pagination is fine but the keyword filter is more honest at weekly granularity (matches the way papers are released and discussed).

Pattern C — Preprint-only follow-up of a known paper: if you already have a DOI, do not use this skill. Use a direct WebFetch on https://api.biorxiv.org/details/{server}/{doi} instead.

Guardrails

Don't fetch without a window. "Recent" must always resolve to specific from/to dates before the call.
Don't keyword-filter server-side — the API doesn't support it; attempting via URL params silently returns everything.
Don't trust the abstract field to be present. Some entries have empty abstracts; treat those as title-only matches and flag in the output.
Don't dedupe across servers in this skill. Cross-server dedupe (bioRxiv ↔ medRxiv ↔ PubMed) belongs to the calling agent, not here.
Don't transform DOIs. Keep them as-returned; downstream tools rely on the exact 10.1101/... string.
Don't claim "no papers" on a fetch error. Distinguish "fetched and filtered to zero" from "fetch failed." The first is a thin week; the second is a bug.

Quick Reference

| Field | Source | Notes | | -------------- | ------------------------------------------------------------- | --------------------------------------------------------------- | | Endpoint | api.biorxiv.org/details/{server}/{from}/{to}/{cursor} | Same host serves both bioRxiv and medRxiv; only {server} varies | | Auth | None | Public API. Be polite — don't hammer. | | Page size | 100, fixed | Cursor is the offset into the window's results | | Window cap | 31 days (soft); 7 days is the typical weekly call | Wider windows = thousands of records before keyword filter | | Rate limit | Not formally documented; ~1 req/sec is safe | Backoff on 5xx | | URL pattern | https://www.{server}.org/content/{doi}v{version} | Linkable to abstract page | | Server values | biorxiv, medrxiv | Case-sensitive in path |

lyndonkl/fetch-preprint-recent

skills/fetch-preprint-recent/SKILL.md

Fetches preprints posted to bioRxiv or medRxiv within a given date window, then keyword-filters the results client-side. Wraps the public `api.biorxiv.org/details/{server}/{from}/{to}/{cursor}` endpoint, handles cursor pagination, normalizes records to a stable shape (doi, title, authors, abstract, date, server, version, url), and applies a keyword-OR match against title + abstract. Domain-neutral — usable for any biology / clinical preprint scan, not just one project. Use when user mentions bioRxiv, medRxiv, weekly preprint scan, fetch preprints, last-N-days preprints, or when a literature-scan agent needs structured preprint records.

94 stars

tools

Updated May 10, 2026

$ install --global

skillsauth

npx skillsauth add lyndonkl/claude fetch-preprint-recent

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 10, 2026, 6:20 AM195.5s1 file scanned

SKILL.md

name:: fetch-preprint-recent
description:: Fetches preprints posted to bioRxiv or medRxiv within a given date window, then keyword-filters the results client-side. Wraps the public `api.biorxiv.org/details/{server}/{from}/{to}/{cursor}` endpoint, handles cursor pagination, normalizes records to a stable shape (doi, title, authors, abstract, date, server, version, url), and applies a keyword-OR match against title + abstract. Domain-neutral — usable for any biology / clinical preprint scan, not just one project. Use when user mentions bioRxiv, medRxiv, weekly preprint scan, fetch preprints, last-N-days preprints, or when a literature-scan agent needs structured preprint records.

fetch-preprint-recent

Fetch preprints from bioRxiv or medRxiv for a date window, normalize the records, and keyword-filter them.

Workflow

- [ ] Step 1: Validate inputs (server, from, to, keywords)
- [ ] Step 2: Page through the details endpoint until messages.cursor exhausted
- [ ] Step 3: Normalize each record to the canonical shape
- [ ] Step 4: Dedupe within the window (keep latest version per DOI)
- [ ] Step 5: Keyword-filter on lowercased title + abstract
- [ ] Step 6: Return matched records + a summary (total fetched, total matched, pages fetched)

Step 1 — Validate inputs

Required:

server: one of biorxiv or medrxiv (the API uses these literal strings)
from: YYYY-MM-DD, inclusive
to: YYYY-MM-DD, inclusive, must be ≥ from
keywords: list of strings; may include multi-word phrases; case-insensitive matching

Reject if window > 31 days (the API supports it but you almost never want to dump a month of preprints in one call without a stronger filter — flag and confirm).

Step 2 — Page through the endpoint

The endpoint is:

https://api.biorxiv.org/details/{server}/{from}/{to}/{cursor}

Start with cursor=0.

The response shape is:

{
  "messages": [{"status": "ok", "interval": "2026-05-04/2026-05-10", "cursor": "0", "count": 100, "total": 327}],
  "collection": [ { record }, { record }, ... ]
}

After consuming collection, increment cursor by 100 (the page size is fixed) and refetch until cursor + count >= total.
Cap pages at 20 (2,000 records) as a safety; if you hit the cap, surface a "window may be over-broad" warning and return what you have.

Step 3 — Normalize each record

The API returns fields like doi, title, authors, author_corresponding, author_corresponding_institution, date, version, type, license, category, jatsxml, abstract, published, server. Reduce to:

{
  "id": "10.1101/2026.05.07.123456",          // doi
  "title": "...",
  "authors": ["Smith J", "Doe A", ...],        // split the API's `authors` string on `;`
  "abstract": "...",
  "date": "2026-05-07",
  "server": "biorxiv",                          // or "medrxiv"
  "version": 1,
  "category": "neuroscience",                   // bioRxiv subject area
  "url": "https://www.biorxiv.org/content/10.1101/2026.05.07.123456v1",
  "published_doi": null                         // populated if the preprint has been published; from `published` field
}

URL pattern: https://www.{server}.org/content/{doi}v{version} (no https://doi.org/ redirect — direct to the preprint server keeps the abstract page accessible).

Step 4 — Dedupe within the window

The same DOI can appear with multiple version values if the authors revised mid-window. Keep the highest version per DOI. Drop the rest.

Step 5 — Keyword-filter

For each kept record, check whether any keyword (or phrase) appears in lowercase(title + " " + abstract). Match logic:

Multi-word keyword like "protein language model" → must appear as a contiguous substring.
Single-word keyword like "crispr" → must appear with word boundaries (don't match "crisper").
OR across all keywords (paper kept if any keyword matches).

Track which keyword(s) matched per paper — downstream paper-relevance-filter will use that signal.

Step 6 — Return

Return a payload like:

{
  "server": "biorxiv",
  "window": "2026-05-04/2026-05-10",
  "fetched_total": 327,
  "matched_total": 14,
  "pages_fetched": 4,
  "fetch_errors": [],
  "records": [ {normalized record + "matched_keywords": [...]} , ... ]
}

Cache the raw API JSON (pre-normalization) to the agent's .cache/ directory under {YYYY-WW}-{server}.json so a re-run can skip the network if the user wants to re-synthesize without re-fetching.

Common Patterns

Pattern A — One server, one window: standard call. Use this in a weekly digest.

Pattern C — Preprint-only follow-up of a known paper: if you already have a DOI, do not use this skill. Use a direct WebFetch on https://api.biorxiv.org/details/{server}/{doi} instead.

Guardrails

Don't fetch without a window. "Recent" must always resolve to specific from/to dates before the call.
Don't keyword-filter server-side — the API doesn't support it; attempting via URL params silently returns everything.
Don't trust the abstract field to be present. Some entries have empty abstracts; treat those as title-only matches and flag in the output.
Don't dedupe across servers in this skill. Cross-server dedupe (bioRxiv ↔ medRxiv ↔ PubMed) belongs to the calling agent, not here.
Don't transform DOIs. Keep them as-returned; downstream tools rely on the exact 10.1101/... string.
Don't claim "no papers" on a fetch error. Distinguish "fetched and filtered to zero" from "fetch failed." The first is a thin week; the second is a bug.

Quick Reference

Related Skills

lyndonkl/conf-theme-clustering

testing

VerifiedTrustedCommunity

Cluster a conference's event records into a small set of coarse themes with finer sub-clusters, an explicit outlier bucket, and soft (multi-membership) affinities — using the hybrid embed-then-label pipeline (embed abstracts, reduce, density-cluster, then LLM-label the clusters) when embedding libraries are available, and an LLM-reasoned hierarchical fallback when they are not. Embeddings do the grouping; the LLM only names the groups. Conference-agnostic. Use when turning structured event records into a navigable theme map for preference elicitation and scheduling, when you need 6-8 reasonable themes rather than 20 muddy ones, or when overlapping talks must belong to more than one theme. Trigger keywords - theme clustering, cluster talks, embed then label, soft membership, outlier talks, conference themes, topic map.

127SKILL.mdUpdated Jun 28, 2026

lyndonkl/conf-theme-clustering

lyndonkl/conf-schedule-optimization

development

VerifiedTrustedCommunity

Build a personal conference schedule as a constraint-optimization problem — hard constraints (no time overlap, room-to-room travel time, capacity/registration, the attendee's own must-attends and blackouts) plus a user-owned weighted objective trading interest against breadth, pacing (maximize contiguous free time), and serendipity. Surfaces unbreakable conflicts (two high-value overlapping talks the model cannot rank) as decisions for the human rather than silently picking, and reports what each choice traded away. Conference-agnostic. Use to turn a preference profile plus a theme map into a day-by-day plan, to resolve overlapping sessions, or to balance a packed vs paced schedule. Trigger keywords - schedule optimization, conference schedule, constraint optimization, overlapping talks, contiguous free time, conflict surfacing, packed vs paced.

127SKILL.mdUpdated Jun 28, 2026

lyndonkl/conf-schedule-optimization

lyndonkl/conf-program-extraction

development

VerifiedTrustedCommunity

Parse a heterogeneous conference program (markdown, HTML, PDF-derived text, or JSON) into normalized event records with per-field confidence scores and independent classification axes (topic, depth, format, prerequisites, recorded, capacity). Detects the program's format before extracting, treats every inferred field as uncertain (present vs inferred vs missing), and flags thin or missing abstracts so downstream enrichment can target them. Conference-agnostic. Use when ingesting a conference or event schedule into a structured store, normalizing a talk/session list, or extracting per-session metadata with calibrated confidence. Trigger keywords - program ingestion, parse schedule, session extraction, event records, conference program, talk metadata, per-field confidence.

127SKILL.mdUpdated Jun 28, 2026

lyndonkl/conf-program-extraction

lyndonkl/conf-preference-elicitation

development

VerifiedTrustedCommunity

Build a personalized preference profile from a small number of well-chosen, cluster-grounded questions instead of a long survey. Represents the person's interests as an uncertainty region over the theme map, picks the single highest-information-gain choice-based question (contrasting real talks from different clusters), balances exploiting known interests against exploring uncertain ones, deliberately injects outlier probes to fight selection bias, and stops as soon as the schedule would be stable. Also elicits the user-owned objective weights and hard constraints. Interactive — runs where it can actually ask the person. Conference-agnostic. Use to turn a theme map into a preference profile, to decide what to ask a conference attendee, or to elicit scheduling priorities. Trigger keywords - preference elicitation, ask few questions, information gain, choice-based questions, selection bias probe, objective weights, attendee preferences.

127SKILL.mdUpdated Jun 28, 2026

lyndonkl/conf-preference-elicitation

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lyndonkl/claude.git

# Copy into Claude Code skills folder (global)
cp -r claude/skills/fetch-preprint-recent ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lyndonkl/claude

94 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT