paper2skill/paper2skill-v0.0.3/SKILL.md
Convert arXiv papers into ready-to-use agent skills using category-aware extraction. First classifies the paper into one or more of 11 research categories, then applies a specialized extraction pipeline for each category — because different types of papers produce different types of usable knowledge. A single paper can yield multiple skills if it spans categories. Use this skill whenever the user wants to turn a paper into a skill, extract practical techniques from research, build a skill library from papers, convert arXiv papers into reusable agent instructions, or batch-process multiple papers into skills. Also trigger when someone asks about extracting actionable knowledge from papers, making research practical for LLM agents, or systematically converting academic contributions into structured agent capabilities.
npx skillsauth add ADu2021/skillXiv paper2skill-v0.0.3Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill converts arXiv papers into agent skills by first classifying what kind of contribution a paper makes, then applying the right extraction pipeline for that contribution type. The key insight: a scaling-law paper and a dataset paper contain fundamentally different kinds of useful knowledge, so they should be extracted differently.
A single paper can produce multiple skills if it genuinely spans categories (e.g., a paper that introduces both a new method and a new benchmark).
Paper (arXiv link)
→ Step 1: Categorize (which of 11 types?)
→ Step 2: For each category, extract with the specialized pipeline
→ Step 3: Tag each skill (assign 1-3 broad-area tags from the registry)
→ Output: 1+ skills, each tailored to the knowledge type
Read the paper's title and abstract, then follow the classification process in references/paper-categorizer.md.
The categorizer assigns:
The 11 categories and their extraction targets:
| # | Category | What to Extract | Reference File |
|---|----------|----------------|----------------|
| 1 | Application Transfer | Domain adaptation recipe, deployment lessons | references/paper2skill-application-transfer.md |
| 2 | Evaluation Infrastructure | Dataset collection protocol OR benchmark design | references/paper2skill-evaluation-infrastructure.md |
| 3 | Paradigm Challenge | Prior belief → falsifying experiment → revised principle | references/paper2skill-paradigm-challenge.md |
| 4 | Systematic Empiricism | Ranked tricks, ablations, conditions of applicability | references/paper2skill-systematic-empiricism.md |
| 5 | Component Innovation | What was swapped, why, when it helps, performance delta | references/paper2skill-component-innovation.md |
| 6 | Insight-Driven | The "aha" observation + minimal reproduction recipe | references/paper2skill-insight-driven.md |
| 7 | Research Infrastructure | Design decisions, API patterns, trade-offs | references/paper2skill-research-infrastructure.md |
| 8 | Field Foundation | Problem definition, vocabulary, opened directions | references/paper2skill-field-foundation.md |
| 9 | Mechanistic Analysis | Analytical methodology (not just findings) | references/paper2skill-mechanistic-analysis.md |
| 10 | Survey & Synthesis | Taxonomy, decision trees, open problems | references/paper2skill-survey-synthesis.md |
| 11 | Scaling & Efficiency | Empirical laws, budget-performance trade-offs | references/paper2skill-scaling-efficiency.md |
For each category assigned to the paper (primary + any secondaries), load the corresponding reference file and follow its extraction pipeline. Each reference contains:
Only load the reference files you need. If a paper is classified as Category 5 (Component Innovation) with no secondaries, only read references/paper2skill-component-innovation.md. Don't load the other 10.
When a paper has secondary categories:
low confidence often doesn't warrant its own skill — the primary skill can mention it briefly instead.flash-attention-efficiency (primary: Scaling & Efficiency) and flash-attention-architecture (secondary: Component Innovation).Skip extraction for a category if:
low and the primary skill already covers the insightAfter extraction, assign 1-3 broad-area tags to each skill from the tag registry at tags.json in the project root. Tags categorize skills by research area for browsing and filtering.
The registry contains ~20 broad research-area tags such as: Reinforcement Learning, Large Language Models, Computer Vision, Natural Language Processing, Multimodal Learning, Generative Models, Agents, Robotics, Optimization, Inference Efficiency, Representation Learning, Graph Learning, AI Safety, Evaluation, ML Systems, Speech, Information Retrieval, Time Series, Recommender Systems, Science.
Reinforcement Learning (not "Policy Optimization")Large Language Models (not "Parameter Efficient Fine-tuning")AI Safety or the most relevant area (not "Calibration")Generative Models, Computer Visiontags: [Computer Vision, Generative Models]. Place the tags field immediately after keywords in the frontmatter.All generated skills follow this frontmatter format:
---
name: meaningful-kebab-case-name
title: "Actual Paper Title Here"
version: 0.0.3
engine: skillxiv-v0.0.3-claude-opus-4.6
license: MIT
url: "https://arxiv.org/abs/XXXX.XXXXX"
keywords: [Keyword One, Keyword Two, Keyword Three]
tags: [Broad Area Tag One, Broad Area Tag Two]
description: "Outcome-focused description under 1024 chars, plain text only, no angle brackets"
category: "Category Name"
---
The name field (also the folder name) must be descriptive kebab-case that communicates the skill's purpose. Strictly prohibited: raw arXiv IDs (2505-00212), generic names (paper-skill), acronyms without context.
Structure: [What it does — outcome] + [When to use — triggers]. Under 1024 characters, plain text only (no < > tags), double-quoted string on one line. Focus on outcomes, not features.
The url must be a verified, working arXiv link. Construct as https://arxiv.org/abs/XXXX.XXXXX and verify it resolves. Never use placeholders.
5-10 keywords in Title Case, inline YAML list: keywords: [Model Architecture, Mamba, State Space Models]. Never use YAML block list syntax.
1-3 broad research-area tags from the tag registry (tags.json), inline YAML list: tags: [Large Language Models, Inference Efficiency]. See Step 3 above for selection rules. The tags line goes immediately after keywords in the frontmatter.
python, not bare)scripts/, referenced from SKILL.mdAlways read the original arXiv paper. Never generate skills from summaries, blog posts, or secondary sources.
Preferred access order:
https://arxiv.org/html/XXXX.XXXXX) — best source, try firsthttps://arxiv.org/abs/XXXX.XXXXX) — metadata + abstracthttps://arxiv.org/pdf/XXXX.XXXXX) — fallback if no HTMLWhen converting multiple papers:
Run these checks on every generated skill:
All reference files are in the references/ directory. Load only what you need:
references/paper-categorizer.md — Full categorization logic with 11 category definitions, signals, and classification processreferences/paper2skill-application-transfer.md — Category 1 extraction pipelinereferences/paper2skill-evaluation-infrastructure.md — Category 2 extraction pipelinereferences/paper2skill-paradigm-challenge.md — Category 3 extraction pipelinereferences/paper2skill-systematic-empiricism.md — Category 4 extraction pipelinereferences/paper2skill-component-innovation.md — Category 5 extraction pipelinereferences/paper2skill-insight-driven.md — Category 6 extraction pipelinereferences/paper2skill-research-infrastructure.md — Category 7 extraction pipelinereferences/paper2skill-field-foundation.md — Category 8 extraction pipelinereferences/paper2skill-mechanistic-analysis.md — Category 9 extraction pipelinereferences/paper2skill-survey-synthesis.md — Category 10 extraction pipelinereferences/paper2skill-scaling-efficiency.md — Category 11 extraction pipelinetesting
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.