.claude/skills/arxiv-mcp/SKILL.md
Search and retrieve academic papers from arXiv.org using WebFetch and Exa. No MCP server required - uses existing tools to access arXiv API directly.
npx skillsauth add oimiragieo/agent-studio arxiv-mcpInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Mode: Cognitive/Prompt-Driven — No standalone utility script; use via agent context.
This skill uses existing tools to access arXiv:
Works immediately - no MCP server, no restart needed.
<capabilities> - Search academic papers by keywords, authors, categories, or date ranges - Retrieve detailed paper metadata (title, authors, abstract, categories, PDF link) - Get specific papers by arXiv ID - Find related papers based on categories and keywords - Filter by arXiv categories (cs.AI, cs.LG, cs.CV, math.*, physics.*, etc.) - No API key required - uses public arXiv API </capabilities>arxiv-mcp returns academic papers. To prevent memory exhaustion:
Why the limit?
The arXiv API is publicly accessible at http://export.arxiv.org/api/query.
// ✓ GOOD: Limit results to 20
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});
// ✓ GOOD: Use specific filters to reduce result set
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention+2025&max_results=20&sortBy=submittedDate',
prompt: 'Extract recent papers on transformer attention',
});
// ✗ BAD: Old behavior - unlimited or >20 results
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:neural+networks',
// Too broad - will get 100s of results
});
// ✗ BAD: Exceeds memory limit
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:deep+learning&max_results=100',
// Over limit - memory risk
});
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=au:LeCun&max_results=10&sortBy=submittedDate',
prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=15&sortBy=submittedDate',
prompt: 'Extract paper titles, authors, abstracts, categories, and arXiv IDs',
});
WebFetch({
url: 'http://export.arxiv.org/api/query?id_list=2301.07041',
prompt:
'Extract full details: title, all authors, abstract, categories, published date, PDF link',
});
| Parameter | Description | Example |
| -------------- | ----------------------------------------------------------- | --------------------------------------------- |
| search_query | Search terms with field prefixes | all:transformer, au:LeCun, ti:attention |
| id_list | Comma-separated arXiv IDs | 2301.07041,2302.13971 |
| max_results | Number of results (default 10, max 100) | max_results=20 |
| start | Offset for pagination | start=10 |
| sortBy | Sort order: relevance, lastUpdatedDate, submittedDate | sortBy=submittedDate |
| sortOrder | ascending or descending | sortOrder=descending |
| Prefix | Field | Example |
| ------ | ---------- | ------------------------- |
| all: | All fields | all:machine+learning |
| ti: | Title | ti:transformer |
| au: | Author | au:Vaswani |
| abs: | Abstract | abs:attention+mechanism |
| cat: | Category | cat:cs.LG |
| co: | Comment | co:accepted |
Combine terms with AND, OR, ANDNOT:
search_query=ti:transformer+AND+abs:attention
search_query=au:LeCun+OR+au:Bengio
search_query=cat:cs.LG+ANDNOT+ti:survey
pnpm search:code or ripgrep skill on codebase (Grep/Glob as fallback)arxiv-mcp is best for:
Use Exa for more natural language queries with arXiv filtering:
mcp__Exa__web_search_exa({
query: 'site:arxiv.org transformer architecture attention mechanism deep learning',
numResults: 10,
});
mcp__Exa__web_search_exa({
query: 'site:arxiv.org large language model scaling laws 2024',
numResults: 15,
});
mcp__Exa__web_search_exa({
query: 'site:arxiv.org author:"Yann LeCun" deep learning',
numResults: 10,
});
| Category | Field | | ---------- | ------------------------------- | | cs.AI | Artificial Intelligence | | cs.LG | Machine Learning | | cs.CL | Computation and Language (NLP) | | cs.CV | Computer Vision | | cs.SE | Software Engineering | | cs.CR | Cryptography and Security | | stat.ML | Machine Learning (Statistics) | | math.* | Mathematics (all subcategories) | | physics.* | Physics (all subcategories) | | q-bio.* | Quantitative Biology | | econ.* | Economics |
// Start with broad Exa search for semantic matching
mcp__Exa__web_search_exa({
query: 'site:arxiv.org transformer attention mechanism neural networks',
numResults: 10,
});
// Get details for interesting papers by ID
WebFetch({
url: 'http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971',
prompt: 'Extract full metadata for each paper: title, authors, abstract, categories, PDF URL',
});
// Search by category of interesting paper
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+ti:attention&max_results=10&sortBy=submittedDate',
prompt: 'Find related papers, extract titles and abstracts',
});
// Latest papers in the field
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
prompt: 'Extract the 20 most recent machine learning papers',
});
</execution_process>
<best_practices>
sortBy=submittedDate&sortOrder=descending</best_practices> </instructions>
<examples> <usage_example> **Example 1: Search for transformer papers**:WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&max_results=10&sortBy=relevance',
prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});
Example 2: Find papers by researcher:
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=au:Vaswani&max_results=15',
prompt: 'List all papers by this author with titles and dates',
});
Example 3: Get recent ML papers:
WebFetch({
url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
prompt: 'Extract the 20 most recent machine learning papers with titles and abstracts',
});
Example 4: Semantic search with Exa:
mcp__Exa__web_search_exa({
query: 'site:arxiv.org multimodal large language models vision 2024',
numResults: 10,
});
Example 5: Get specific paper details:
WebFetch({
url: 'http://export.arxiv.org/api/query?id_list=1706.03762',
prompt: "Extract complete details for the 'Attention Is All You Need' paper",
});
</usage_example> </examples>
This skill is automatically assigned to:
search_query=neural+networks returns thousands of results; always scope with ti:, au:, cat:, or abs: prefixes to target the query.| Anti-Pattern | Why It Fails | Correct Approach |
| ---------------------------------------- | ----------------------------------------------------------- | ---------------------------------------------- |
| Using max_results=100 or no limit | Context explosion; 100 papers × 300 bytes = 30KB+ metadata | Always set max_results=20 (hard limit) |
| Fetching full paper PDFs | Single paper can be 100KB+; kills context budget | Extract abstract + metadata only via API |
| Broad query without field prefix | Returns irrelevant results across all fields | Use ti:, au:, cat:, or abs: prefix |
| Using only WebFetch for discovery | Misses semantically related papers not matching exact terms | Use Exa for semantic discovery first |
| Citing paper titles instead of arXiv IDs | Titles can be ambiguous or duplicated | Always include the arXiv ID (e.g., 1706.03762) |
Before starting:
cat .claude/context/memory/learnings.md
After completing:
.claude/context/memory/learnings.md.claude/context/memory/issues.md.claude/context/memory/decisions.mdASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.
tools
Comprehensive biosignal processing toolkit for analyzing physiological data including ECG, EEG, EDA, RSP, PPG, EMG, and EOG signals. Use this skill when processing cardiovascular signals, brain activity, electrodermal responses, respiratory patterns, muscle activity, or eye movements. Applicable for heart rate variability analysis, event-related potentials, complexity measures, autonomic nervous system assessment, psychophysiology research, and multi-modal physiological signal integration.
tools
Comprehensive toolkit for creating, analyzing, and visualizing complex networks and graphs in Python. Use when working with network/graph data structures, analyzing relationships between entities, computing graph algorithms (shortest paths, centrality, clustering), detecting communities, generating synthetic networks, or visualizing network topologies. Applicable to social networks, biological networks, transportation systems, citation networks, and any domain involving pairwise relationships.
data-ai
Molecular featurization for ML (100+ featurizers). ECFP, MACCS, descriptors, pretrained models (ChemBERTa), convert SMILES to features, for QSAR and molecular ML.
development
Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.