skills_all/biorxiv-database/SKILL.md
Efficient database search tool for bioRxiv preprint server. Use this skill when searching for life sciences preprints by keywords, authors, date ranges, or categories, retrieving paper metadata, downloading PDFs, or conducting literature reviews.
npx skillsauth add microck/ordinary-claude-skills biorxiv-databaseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides efficient Python-based tools for searching and retrieving preprints from the bioRxiv database. It enables comprehensive searches by keywords, authors, date ranges, and categories, returning structured JSON metadata that includes titles, abstracts, DOIs, and citation information. The skill also supports PDF downloads for full-text analysis.
Use this skill when:
Search for preprints containing specific keywords in titles, abstracts, or author lists.
Basic Usage:
python scripts/biorxiv_search.py \
--keywords "CRISPR" "gene editing" \
--start-date 2024-01-01 \
--end-date 2024-12-31 \
--output results.json
With Category Filter:
python scripts/biorxiv_search.py \
--keywords "neural networks" "deep learning" \
--days-back 180 \
--category neuroscience \
--output recent_neuroscience.json
Search Fields:
By default, keywords are searched in both title and abstract. Customize with --search-fields:
python scripts/biorxiv_search.py \
--keywords "AlphaFold" \
--search-fields title \
--days-back 365
Find all papers by a specific author within a date range.
Basic Usage:
python scripts/biorxiv_search.py \
--author "Smith" \
--start-date 2023-01-01 \
--end-date 2024-12-31 \
--output smith_papers.json
Recent Publications:
# Last year by default if no dates specified
python scripts/biorxiv_search.py \
--author "Johnson" \
--output johnson_recent.json
Retrieve all preprints posted within a specific date range.
Basic Usage:
python scripts/biorxiv_search.py \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
--output january_2024.json
With Category Filter:
python scripts/biorxiv_search.py \
--start-date 2024-06-01 \
--end-date 2024-06-30 \
--category genomics \
--output genomics_june.json
Days Back Shortcut:
# Last 30 days
python scripts/biorxiv_search.py \
--days-back 30 \
--output last_month.json
Retrieve detailed metadata for a specific preprint.
Basic Usage:
python scripts/biorxiv_search.py \
--doi "10.1101/2024.01.15.123456" \
--output paper_details.json
Full DOI URLs Accepted:
python scripts/biorxiv_search.py \
--doi "https://doi.org/10.1101/2024.01.15.123456"
Download the full-text PDF of any preprint.
Basic Usage:
python scripts/biorxiv_search.py \
--doi "10.1101/2024.01.15.123456" \
--download-pdf paper.pdf
Batch Processing: For multiple PDFs, extract DOIs from a search result JSON and download each paper:
import json
from biorxiv_search import BioRxivSearcher
# Load search results
with open('results.json') as f:
data = json.load(f)
searcher = BioRxivSearcher(verbose=True)
# Download each paper
for i, paper in enumerate(data['results'][:10]): # First 10 papers
doi = paper['doi']
searcher.download_pdf(doi, f"papers/paper_{i+1}.pdf")
Filter searches by bioRxiv subject categories:
animal-behavior-and-cognitionbiochemistrybioengineeringbioinformaticsbiophysicscancer-biologycell-biologyclinical-trialsdevelopmental-biologyecologyepidemiologyevolutionary-biologygeneticsgenomicsimmunologymicrobiologymolecular-biologyneurosciencepaleontologypathologypharmacology-and-toxicologyphysiologyplant-biologyscientific-communication-and-educationsynthetic-biologysystems-biologyzoologyAll searches return structured JSON with the following format:
{
"query": {
"keywords": ["CRISPR"],
"start_date": "2024-01-01",
"end_date": "2024-12-31",
"category": "genomics"
},
"result_count": 42,
"results": [
{
"doi": "10.1101/2024.01.15.123456",
"title": "Paper Title Here",
"authors": "Smith J, Doe J, Johnson A",
"author_corresponding": "Smith J",
"author_corresponding_institution": "University Example",
"date": "2024-01-15",
"version": "1",
"type": "new results",
"license": "cc_by",
"category": "genomics",
"abstract": "Full abstract text...",
"pdf_url": "https://www.biorxiv.org/content/10.1101/2024.01.15.123456v1.full.pdf",
"html_url": "https://www.biorxiv.org/content/10.1101/2024.01.15.123456v1",
"jatsxml": "https://www.biorxiv.org/content/...",
"published": ""
}
]
}
python scripts/biorxiv_search.py \
--keywords "organoids" "tissue engineering" \
--start-date 2023-01-01 \
--end-date 2024-12-31 \
--category bioengineering \
--output organoid_papers.json
import json
with open('organoid_papers.json') as f:
data = json.load(f)
print(f"Found {data['result_count']} papers")
for paper in data['results'][:5]:
print(f"\nTitle: {paper['title']}")
print(f"Authors: {paper['authors']}")
print(f"Date: {paper['date']}")
print(f"DOI: {paper['doi']}")
from biorxiv_search import BioRxivSearcher
searcher = BioRxivSearcher()
selected_dois = ["10.1101/2024.01.15.123456", "10.1101/2024.02.20.789012"]
for doi in selected_dois:
filename = doi.replace("/", "_").replace(".", "_") + ".pdf"
searcher.download_pdf(doi, f"papers/{filename}")
Track research trends by analyzing publication frequencies over time:
python scripts/biorxiv_search.py \
--keywords "machine learning" \
--start-date 2020-01-01 \
--end-date 2024-12-31 \
--category bioinformatics \
--output ml_trends.json
Then analyze the temporal distribution in the results.
Monitor specific researchers' preprints:
# Track multiple authors
authors = ["Smith", "Johnson", "Williams"]
for author in authors:
python scripts/biorxiv_search.py \
--author "{author}" \
--days-back 365 \
--output "{author}_papers.json"
For more complex workflows, import and use the BioRxivSearcher class directly:
from scripts.biorxiv_search import BioRxivSearcher
# Initialize
searcher = BioRxivSearcher(verbose=True)
# Multiple search operations
keywords_papers = searcher.search_by_keywords(
keywords=["CRISPR", "gene editing"],
start_date="2024-01-01",
end_date="2024-12-31",
category="genomics"
)
author_papers = searcher.search_by_author(
author_name="Smith",
start_date="2023-01-01",
end_date="2024-12-31"
)
# Get specific paper details
paper = searcher.get_paper_details("10.1101/2024.01.15.123456")
# Download PDF
success = searcher.download_pdf(
doi="10.1101/2024.01.15.123456",
output_path="paper.pdf"
)
# Format results consistently
formatted = searcher.format_result(paper, include_abstract=True)
Use appropriate date ranges: Smaller date ranges return faster. For keyword searches over long periods, consider splitting into multiple queries.
Filter by category: When possible, use --category to reduce data transfer and improve search precision.
Respect rate limits: The script includes automatic delays (0.5s between requests). For large-scale data collection, add additional delays.
Cache results: Save search results to JSON files to avoid repeated API calls.
Version tracking: Preprints can have multiple versions. The version field indicates which version is returned. PDF URLs include the version number.
Handle errors gracefully: Check the result_count in output JSON. Empty results may indicate date range issues or API connectivity problems.
Verbose mode for debugging: Use --verbose flag to see detailed logging of API requests and responses.
from datetime import datetime, timedelta
# Last quarter
end_date = datetime.now()
start_date = end_date - timedelta(days=90)
python scripts/biorxiv_search.py \
--start-date {start_date.strftime('%Y-%m-%d')} \
--end-date {end_date.strftime('%Y-%m-%d')}
Limit the number of results returned:
python scripts/biorxiv_search.py \
--keywords "COVID-19" \
--days-back 30 \
--limit 50 \
--output covid_top50.json
When only metadata is needed:
# Note: Abstract inclusion is controlled in Python API
from scripts.biorxiv_search import BioRxivSearcher
searcher = BioRxivSearcher()
papers = searcher.search_by_keywords(keywords=["AI"], days_back=30)
formatted = [searcher.format_result(p, include_abstract=False) for p in papers]
Integrate search results into downstream analysis pipelines:
import json
import pandas as pd
# Load results
with open('results.json') as f:
data = json.load(f)
# Convert to DataFrame for analysis
df = pd.DataFrame(data['results'])
# Analyze
print(f"Total papers: {len(df)}")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")
print(f"\nTop authors by paper count:")
print(df['authors'].str.split(',').explode().str.strip().value_counts().head(10))
# Filter and export
recent = df[df['date'] >= '2024-06-01']
recent.to_csv('recent_papers.csv', index=False)
To verify that the bioRxiv database skill is working correctly, run the comprehensive test suite.
Prerequisites:
uv pip install requests
Run tests:
python tests/test_biorxiv_search.py
The test suite validates:
Expected Output:
🧬 bioRxiv Database Search Skill Test Suite
======================================================================
🧪 Test 1: Initialization
✅ BioRxivSearcher initialized successfully
🧪 Test 2: Date Range Search
✅ Found 150 papers between 2024-01-01 and 2024-01-07
First paper: Novel CRISPR-based approach for genome editing...
[... additional tests ...]
======================================================================
📊 Test Summary
======================================================================
✅ PASS: Initialization
✅ PASS: Date Range Search
✅ PASS: Category Filtering
✅ PASS: Keyword Search
✅ PASS: DOI Lookup
✅ PASS: Result Formatting
✅ PASS: Interval Search
======================================================================
Results: 7/7 tests passed (100%)
======================================================================
🎉 All tests passed! The bioRxiv database skill is working correctly.
Note: Some tests may show warnings if no papers are found in specific date ranges or categories. This is normal and does not indicate a failure.
For detailed API specifications, endpoint documentation, and response schemas, refer to:
references/api_reference.md - Complete bioRxiv API documentationThe reference file includes:
development
Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5. Use when the user wants to update their codebase, prompts, or API calls to use Opus 4.5. Handles model string updates and prompt adjustments for known Opus 4.5 behavioral differences. Does NOT migrate Haiku 4.5.
development
Analyzes Claude Code usage patterns and provides comprehensive recommendations. Runs usage analysis, discovers GitHub community resources, suggests CLAUDE.md improvements, and fetches latest docs on-demand. Use when user wants to optimize their Claude Code workflow, create configurations (agents/skills/commands), or set up project documentation.
development
Quantum computing framework for building, simulating, optimizing, and executing quantum circuits. Use this skill when working with quantum algorithms, quantum circuit design, quantum simulation (noiseless or noisy), running on quantum hardware (Google, IonQ, AQT, Pasqal), circuit optimization and compilation, noise modeling and characterization, or quantum experiments and benchmarking (VQE, QAOA, QPE, randomized benchmarking).
tools
Browser automation, debugging, and performance analysis using Puppeteer CLI scripts. Use for automating browsers, taking screenshots, analyzing performance, monitoring network traffic, web scraping, form automation, and JavaScript debugging.