skills/literature/search/base-academic-search/SKILL.md
Search 400M+ open access documents via the BASE search engine API
npx skillsauth add wentorai/research-plugins base-academic-searchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
BASE is one of the world's largest search engines for academic open access web resources. Operated by Bielefeld University Library, it indexes 400M+ documents from 11,000+ content providers including institutional repositories, preprint servers, and digital libraries. Unlike Google Scholar, BASE provides structured metadata, license information, and full-text links. The API is free with registration.
https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi
# Basic keyword search (JSON response)
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=climate+change+adaptation&format=json&hits=20"
# Search with field filters
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=dctitle:transformer+AND+dcsubject:NLP&format=json"
# Filter by document type and year
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=deep+learning&dctypenorm=121&dcyear:2024&format=json"
# Open access only
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=CRISPR&dcrights:open&format=json"
| Field | Description | Example |
|-------|-------------|---------|
| dctitle | Title | dctitle:attention+mechanism |
| dccreator | Author | dccreator:vaswani |
| dcsubject | Subject/keywords | dcsubject:machine+learning |
| dcdescription | Abstract | dcdescription:neural+network |
| dcyear | Publication year | dcyear:2024 |
| dctype | Document type text | dctype:article |
| dctypenorm | Normalized type code | 121 (journal article) |
| dcrights | Access rights | dcrights:open |
| dclang | Language | dclang:eng |
| dclink | Source URL | dclink:arxiv.org |
| dcoa | Open access status | dcoa:1 (OA), dcoa:2 (restricted) |
| dcprovider | Content provider | dcprovider:arxiv.org |
| Code | Type |
|------|------|
| 121 | Journal article |
| 122 | Book / monograph |
| 14 | Conference paper |
| 15 | Thesis / dissertation |
| 17 | Report |
| 18 | Preprint |
| Parameter | Description | Default |
|-----------|-------------|---------|
| func | Must be PerformSearch | Required |
| query | Search query with optional field prefixes | Required |
| format | Response format: json or xml | xml |
| hits | Results per page (max 125) | 10 |
| offset | Pagination offset | 0 |
| sortby | Sort: dcyear desc, score desc | relevance |
{
"response": {
"numFound": 45200,
"start": 0,
"docs": [
{
"dctitle": "Attention Is All You Need",
"dccreator": ["Ashish Vaswani", "Noam Shazeer"],
"dcyear": "2017",
"dcsubject": ["machine learning", "attention mechanism"],
"dcdescription": "The dominant sequence transduction models...",
"dcidentifier": "https://arxiv.org/abs/1706.03762",
"dcsource": "arXiv.org",
"dcprovider": "arxiv.org",
"dcdocid": "abc123xyz",
"dcoa": 1,
"dctypenorm": ["18"],
"dclang": ["eng"]
}
]
}
}
import requests
BASE_URL = "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi"
def search_base(query: str, hits: int = 20,
doc_type: int = None, oa_only: bool = False) -> list:
"""Search BASE for academic open access documents."""
q = query
if doc_type:
q += f" AND dctypenorm:{doc_type}"
if oa_only:
q += " AND dcoa:1"
params = {
"func": "PerformSearch",
"query": q,
"format": "json",
"hits": hits,
"sortby": "dcyear desc",
}
resp = requests.get(BASE_URL, params=params)
resp.raise_for_status()
data = resp.json()
results = []
for doc in data.get("response", {}).get("docs", []):
results.append({
"title": doc.get("dctitle"),
"authors": doc.get("dccreator", []),
"year": doc.get("dcyear"),
"source": doc.get("dcsource"),
"url": doc.get("dcidentifier"),
"abstract": (doc.get("dcdescription") or "")[:300],
"open_access": doc.get("dcoa") == 1,
"type": doc.get("dctypenorm", []),
})
return results
def search_dissertations(topic: str, lang: str = "eng") -> list:
"""Find dissertations and theses on a topic."""
query = f"{topic} AND dctypenorm:15 AND dclang:{lang}"
return search_base(query, hits=50)
def search_by_provider(query: str, provider: str) -> list:
"""Search within a specific content provider."""
full_query = f"{query} AND dcprovider:{provider}"
return search_base(full_query)
# Example: find recent open access ML papers
papers = search_base("transformer self-attention", hits=10, oa_only=True)
for p in papers:
oa = "OA" if p["open_access"] else "restricted"
print(f"[{p['year']}] {p['title']} ({oa}) — {p['source']}")
# Example: find dissertations on climate modeling
theses = search_dissertations("climate modeling ocean")
for t in theses:
print(f"[{t['year']}] {t['title']} — {', '.join(t['authors'][:2])}")
| Feature | BASE | Google Scholar | OpenAlex | |---------|------|---------------|----------| | Records | 400M+ | Unknown | 250M+ | | Open access focus | Yes | No | Yes | | Structured API | Yes | No official API | Yes | | License metadata | Yes | No | Partial | | Dissertation coverage | Excellent | Good | Limited | | Repository-level filtering | Yes | No | No |
tools
10 document processing skills. Trigger: extracting text from PDFs, parsing references, document Q&A. Design: parsing pipelines (GROBID, marker) and structured extraction tools.
documentation
Guide to tldraw for infinite canvas whiteboarding and diagram creation
testing
Create graphical abstracts, schematic diagrams, and scientific illustrations
documentation
Create UML diagrams and architecture visualizations with PlantUML