paper2skill/paper2skill-research-infrastructure/SKILL.md
Convert research infrastructure papers into design pattern guides. Extracts capability gaps addressed, API design decisions, performance/usability trade-offs, and integration patterns. Use this skill when extracting skills from Category 7 (Research Infrastructure) papers — PyTorch-style framework papers, evaluation harness tooling, or any paper where the tool itself is the contribution.
npx skillsauth add ADu2021/skillXiv paper2skill-research-infrastructureInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill specializes in converting research infrastructure papers (frameworks, libraries, tools, evaluation systems) into structured agent skills that teach design patterns and integration strategies.
Infrastructure papers are fundamentally different from algorithmic or theoretical papers. They address capability gaps by introducing new tools or frameworks that others build on. The extractable knowledge isn't a single algorithm — it's a collection of design decisions, API patterns, performance/usability trade-offs, and integration guidance that practitioners need to make informed choices.
Infrastructure papers introduce tools, frameworks, or systems designed to enable other research:
Not infrastructure: Papers with a single core algorithm (MAML, Adam, GPT), theoretical analyses, or empirical studies that don't introduce a reusable tool.
Ask these before extracting:
Capability Gap: What was broken or missing before this tool? What problem does it solve? (e.g., "PyTorch filled the gap between NumPy simplicity and CUDA complexity")
Design Decisions: What architectural choices does the paper document? API design? Modularity tradeoffs? Why these choices over alternatives?
Performance/Usability Trade-offs: What does this tool optimize for? What does it sacrifice? (e.g., "Simplicity over raw performance", "Automatic differentiation convenience vs memory overhead")
Integration Patterns: How does this tool fit with the ecosystem? Requires specific dependencies? Interoperable with alternatives?
| Infrastructure Type | Extractable Knowledge | Output Skill Teaches | |-------------------|----------------------|---------------------| | Framework | Core abstractions, plugin architecture, module lifecycle | Architecture patterns, when to extend vs when to fork, API design decisions | | Library | Algorithm library design, dependency strategy, caching patterns | Which library functions to use when, performance profiles, integration challenges | | Evaluation System | Benchmark design, metric computation, infrastructure for fairness | How to design robust evaluations, avoiding benchmark gaming, extensibility patterns | | Data Pipeline | Data loading abstractions, preprocessing stages, parallelization | Composing pipelines, handling distributed I/O, data validation strategies | | Experiment Management | Logging patterns, hyperparameter management, result aggregation | Tracking experiment metadata, avoiding common pitfalls, reproducibility practices |
Read the introduction and any architecture diagrams. Ask:
Read the design/implementation section carefully:
Look for:
Fill this before writing the skill:
PAPER: [title]
ARXIV: [verified arXiv ID]
URL: [full verified arXiv URL]
CAPABILITY GAP:
What was missing or broken before?
What user problem does this solve?
CORE ABSTRACTIONS:
Main classes/concepts (e.g., Module, Tensor, DataLoader)
How they compose together
Key lifecycle events (init, forward, cleanup)
DESIGN DECISIONS & RATIONALE:
Decision 1: [choice] because [reasoning]
Decision 2: [choice] because [reasoning]
(typically 3-5 major decisions)
PERFORMANCE vs USABILITY TRADE-OFFS:
Optimization axis 1: [what was prioritized] at cost of [what was sacrificed]
Optimization axis 2: [what was prioritized] at cost of [what was sacrificed]
API PATTERNS:
Key method signatures and their usage
Configuration patterns
Extension points (how to customize)
INTEGRATION CHALLENGES:
Known incompatibilities or awkward integrations with other tools
When to prefer alternative tools
Dependency management gotchas
CODE AVAILABLE: [yes/no, URL]
KEYWORDS: [5-10 infrastructure-focused keywords]
Title section:
# [Tool/Framework Name]: [Outcome — what users can accomplish with it]
Example: "PyTorch: Express ML Models in Python and Execute on Any Device"
Problem Statement (1-2 paragraphs): Ground in what existed before this tool and why it was insufficient. Use concrete pain points.
Example: "Before PyTorch, ML practitioners faced a stark choice: use NumPy and hand-code GPU kernels (painful), or use static-graph frameworks like TensorFlow 1.x (inflexible). PyTorch solved this by making dynamic computation graphs the default, so researchers could debug and iterate like Python code, while still compiling to efficient GPU operations."
Core Abstractions section: Explain the 2-3 main concepts users interact with. Use bullet points, not ASCII diagrams.
Example for PyTorch:
Design Decisions section: Cover 3-5 architectural choices. For each, explain the alternative they rejected and why.
Example structure:
Integration Patterns section: How does this tool fit with others? When to use vs alternatives?
Performance/Usability Trade-offs: Create a table or list of key decisions and their costs:
| Decision | What You Get | What You Sacrifice | |----------|------------|-------------------| | Dynamic graphs | Easy debugging, Pythonic API | Ahead-of-time graph optimization | | Automatic differentiation built-in | No manual backprop code | Slightly higher memory overhead | | GPU-agnostic API | Write once, run on CPU/GPU/TPU | Small abstraction overhead |
Implementation Considerations (if applicable): If the skill focuses on a specific design pattern (e.g., "how to build a custom Module"), show 1-2 concrete examples of ~20-30 lines each.
Example: "Creating a Custom Module with Gradient Checkpointing"
import torch
import torch.nn as nn
class GradCheckpointedModule(nn.Module):
"""Trade memory for compute by recomputing activations during backprop.
Useful for large models where activation storage is the bottleneck."""
def __init__(self, layer):
super().__init__()
self.layer = layer
def forward(self, x):
# During backward, this recomputes layer(x) instead of storing activations.
return torch.utils.checkpoint.checkpoint(self.layer, x, use_reentrant=False)
Ecosystem & Extensions: Document how users typically extend or integrate this tool:
Common Pitfalls & Anti-patterns: From the paper and experience, document what practitioners get wrong:
Reference:
Paper: https://arxiv.org/abs/XXXX.XXXXX
Code: https://github.com/pytorch/pytorch
Focus on design, not just features. Don't enumerate all the modules in PyTorch. Explain the design philosophy and why each core abstraction exists.
Include performance numbers. Infrastructure papers typically benchmark their tool. Include concrete speedup claims or memory profiles.
Document the ecosystem. Practitioners want to know what builds on top of this tool and how to combine them.
Explain trade-offs explicitly. Infrastructure always trades something off. Be honest about what this tool prioritizes vs sacrifices.
Give "when NOT to use" guidance. No tool is universally best. State clearly when alternatives are better.
Keep code examples focused on design patterns. Show how the tool's abstractions are meant to be used, not basic tutorials.
When triaging multiple infrastructure papers:
Infrastructure skill extraction adapted from Anthropic's skills guide and patterns in the Orchestra-Research repository. Infrastructure papers demand different extraction templates than algorithmic or theoretical papers because they teach architecture and design patterns, not step-by-step procedures.
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.