skills/ai-research-reproduction/SKILL.md
Rigor Reproduce compatible skill slug for README-first deep learning repository reproduction. Use when the user wants an end-to-end, minimal-trustworthy flow that reads the repository first, selects the smallest documented inference or evaluation target, coordinates intake, setup, trusted execution, optional trusted training, optional repository analysis, and optional paper-gap resolution, enforces conservative patch rules, records evidence assumptions deviations and human decision points, and writes the standardized `repro_outputs/` bundle. Do not use for paper summary, generic environment setup, isolated repo scanning, standalone command execution, silent protocol changes, score chasing, or broad research assistance outside repository-grounded reproduction.
npx skillsauth add lllllllama/ai-research-workflow-skills ai-research-reproductionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this as the Rigor Reproduce compatible skill slug for README-first deep
learning repository reproduction. The installed slug remains
ai-research-reproduction for compatibility. The skill guides the agent toward
a minimal trustworthy run with auditable evidence; it should not micromanage
implementation details that the model can infer from the repository.
Reproduction is not "make it run by changing anything"; it means faithfully
reading the README, environment, weights, datasets, and documented commands,
then recording results and deviations.
Start from the shared operating principles in
../../references/agent-operating-principles.md, then load
../../references/research-rigor-principles.md and
../../references/deep-learning-experiment-principles.md when scientific
meaning, comparability, or experiment details are at stake.
Use this skill when all are true:
Do not use this skill for paper summaries, generic environment setup, isolated repo scanning, standalone command execution, open-ended research design, or explicit candidate-only exploration.
Choose the smallest target that can honestly demonstrate repository-grounded reproduction:
Treat README guidance as the primary reproduction intent. Use repository files
to clarify the README, not to silently replace it. When the README and paper
conflict, record the conflict and use paper-context-resolver only for the
narrow reproduction-critical gap.
repo-intake-and-plan to extract documented commands and candidate
targets.env-and-assets-bootstrap only for target-specific environment,
checkpoint, dataset, and cache assumptions.analyze-project only when structure, insertion points, or suspicious
implementation patterns need read-only clarification.minimal-run-and-audit for documented inference, evaluation, smoke, or
sanity execution.run-train instead when the selected trusted target is training startup,
short-run verification, full kickoff, or resume.Prefer no repository edits. If edits are needed, keep them conservative and auditable:
repro/YYYY-MM-DD-short-task, keep verified patch commits sparse, and record
README-fidelity impact in PATCHES.md.See references/patch-policy.md.
Always target repro_outputs/:
SUMMARY.md
COMMANDS.md
LOG.md
SCIENTIFIC_CHANGELOG.md
COMPARABILITY_REPORT.md
status.json
PATCHES.md # only if patches were applied
Use the templates under assets/ and the field rules in
references/output-spec.md.
SUMMARY.md.COMMANDS.md.LOG.md.SCIENTIFIC_CHANGELOG.md.COMPARABILITY_REPORT.md.status.json.PATCHES.md when needed.references/language-policy.md when writing human-readable outputs.../../references/research-rigor-principles.md before making
comparability, contribution, or research-result claims.../../references/deep-learning-experiment-principles.md when dataset,
split, metric, checkpoint, training, or evaluation details matter.references/research-safety-principles.md before protocol-sensitive
decisions.references/patch-policy.md before modifying repository files.development
Rigor Debug / Rigor Audit skill for deep learning research work. Use when the user pastes a traceback, terminal error, CUDA OOM, checkpoint load failure, shape mismatch, NaN loss symptom, or training failure and wants conservative diagnosis before any patching, with debug fixes clearly separated from research contributions. Do not use for broad refactoring, speculative adaptation, automatic exploratory patching, or general repository familiarization.
development
Rigor Train skill for deep learning research repositories. Use when a documented or selected training command should be run conservatively for startup verification, short-run verification, full kickoff, or resume, with command, config, seed, log, checkpoint, status, and metric evidence written to standardized `train_outputs/`. Do not use for environment setup, exploratory sweeps, speculative idea implementation, or end-to-end orchestration.
tools
Rigor Intake helper for README-first deep learning repo reproduction. Use when the task is specifically to scan a repository, read the README and common project files, extract documented commands, classify inference, evaluation, and training candidates, and return the smallest trustworthy reproduction plan to the main orchestrator. Do not use for environment setup, asset download, command execution, final reporting, paper lookup, or end-to-end orchestration.
tools
Rigor Paper Context helper for README-first deep learning repo reproduction. Use only when the README and repository files leave a narrow reproduction-critical gap and the task is to resolve a specific paper detail such as dataset split, preprocessing, evaluation protocol, checkpoint mapping, or runtime assumption from primary paper sources while recording conflicts. Do not use for general paper summary, repo scanning, environment setup, command execution, title-only paper lookup, or replacing README guidance by default.