skills/replication/SKILL.md
Plan and execute a structured replication workflow for a paper, claim, or benchmark with environment selection and integrity checks.
npx skillsauth add moralespanitz/research-loop replicationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
EXTRACT → PLAN → ENVIRONMENT → EXECUTE → LOG → REPORT
Derive a short slug from the paper/claim name (lowercase, hyphens, no filler words, ≤5 words). Use this slug for all files.
Use the researcher subagent (.claude/agents/researcher.md) to pull implementation details from the target paper and any linked code repositories.
The researcher should extract:
Save extraction to .research-loop/sessions/<slug>/extraction.md.
If CHANGELOG.md (in .research-loop/) exists and this is a continuation, read the most recent relevant entries before planning.
Create .research-loop/sessions/<slug>/plan.md with:
Be explicit about what is verified, what is inferred, what is still missing, and which checks or test oracles will be used to decide whether the replication succeeded.
Before running anything, ask the user where to execute:
| Environment | When to use | Setup |
|-------------|-------------|-------|
| Local | Simple experiments in the current working directory | No setup needed |
| Virtual environment | Isolated Python environment needed | python -m venv .venv && source .venv/bin/activate |
| Docker | Full isolation, reproducible environment | Write a Dockerfile and docker build/run |
| Modal | Serverless GPU for burst jobs | pip install modal && modal setup; write a Modal-decorated script |
| RunPod | Long-running GPU experiments with SSH | runpodctl CLI + RUNPOD_API_KEY; provision pod, transfer files, execute |
| Plan only | No execution — produce the plan without running | — |
Do not proceed without user confirmation of the environment.
Implement and run the replication steps in the chosen environment.
For each step:
.research-loop/sessions/<slug>/scripts/step-N-<name>.py or equivalent.research-loop/sessions/<slug>/outputs/step-N-output.txtDo not call the outcome replicated unless the planned checks actually passed.
For multi-step or resumable replication work, append concise entries to .research-loop/CHANGELOG.md after:
Each entry must record: the active objective, what changed, what was checked, and the next step.
Write the final replication report to .research-loop/sessions/<slug>/report.md.
Include:
End with a Sources section containing direct URLs for all primary references.
| Agent | When dispatched | |-------|-----------------| | researcher | Step 1 — extract implementation details from paper and linked code |
The researcher agent definition lives in .claude/agents/researcher.md.
testing
End-to-end paper generation pipeline ported from AutoResearchClaw (Aiming Lab). 14 phases covering topic initiation through export/publish, with human- in-the-loop gates and quality gating at each handoff. Use this when the user wants a full paper pipeline run — topic to submission-ready manuscript. Delegates to researcher/reviewer/writer/verifier subagents for stage execution and to autonomous-iteration for experiment optimization loops.
testing
Run a structured literature review on a topic using parallel search, evidence tables with quality scoring, and primary-source synthesis.
development
Publication-quality figure generation for research papers. Decision agent selects figure type (code plot vs architecture diagram). Generates Matplotlib/Seaborn code for quantitative figures with iterative improvement loop. Style-matches conference templates (NeurIPS, ICML, ICLR). Use when the paper-pipeline reaches the figure generation phase, or when a user requests figures for an existing draft.
development
Experiment sandbox execution for Research Loop. Supports four modes: local (venv), Docker (isolated containers), SSH remote (GPU compute on servers), and Colab (Google Drive bridge). Provides experiment harness templates, code validation, metric collection, deterministic seeding, and compute budget enforcement. Use before running experiments generated by the paper-pipeline.