Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lllllllama/ai-research-reproduction

Name: ai-research-reproduction
Author: lllllllama

skills/ai-research-reproduction/SKILL.md

npx skillsauth add lllllllama/ai-research-workflow-skills ai-research-reproduction

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

ai-research-reproduction

Purpose

Use this as the Rigor Reproduce compatible skill slug for README-first deep learning repository reproduction. The installed slug remains ai-research-reproduction for compatibility. The skill guides the agent toward a minimal trustworthy run with auditable evidence; it should not micromanage implementation details that the model can infer from the repository. Reproduction is not "make it run by changing anything"; it means faithfully reading the README, environment, weights, datasets, and documented commands, then recording results and deviations.

Start from the shared operating principles in ../../references/agent-operating-principles.md, then load ../../references/research-rigor-principles.md and ../../references/deep-learning-experiment-principles.md when scientific meaning, comparability, or experiment details are at stake.

Fit

Use this skill when all are true:

The target is an AI code repository with a README, scripts, configs, or documented commands.
The request spans multiple trusted phases such as intake, setup, execution, training verification, analysis, paper-gap resolution, and reporting.
The desired result is a small reproducible target, not broad experimentation.

Do not use this skill for paper summaries, generic environment setup, isolated repo scanning, standalone command execution, open-ended research design, or explicit candidate-only exploration.

Trusted Target Selection

Choose the smallest target that can honestly demonstrate repository-grounded reproduction:

documented inference
documented evaluation
documented training startup or partial verification
full training only after explicit user confirmation

Treat README guidance as the primary reproduction intent. Use repository files to clarify the README, not to silently replace it. When the README and paper conflict, record the conflict and use paper-context-resolver only for the narrow reproduction-critical gap.

Workflow

Read the README and nearby repo signals.
Use repo-intake-and-plan to extract documented commands and candidate targets.
Select and justify the minimum trustworthy target.
Use env-and-assets-bootstrap only for target-specific environment, checkpoint, dataset, and cache assumptions.
Use analyze-project only when structure, insertion points, or suspicious implementation patterns need read-only clarification.
Use minimal-run-and-audit for documented inference, evaluation, smoke, or sanity execution.
Use run-train instead when the selected trusted target is training startup, short-run verification, full kickoff, or resume.
Pause for human review before fuller training claims or any change that could alter dataset, split, checkpoint, preprocessing, metric, loss, model semantics, or result interpretation.
Write the standardized outputs and give a concise final note in the user's language when practical.

Patch Boundary

Prefer no repository edits. If edits are needed, keep them conservative and auditable:

Try command-line arguments, environment variables, path fixes, dependency version fixes, or dependency-file fixes before code changes.
Reproduction fixes are allowed when needed, but they must not be hidden. State what changed, why it was necessary, whether it changes scientific meaning, and whether it affects comparability with the paper, README, or baseline.
Avoid changing model architecture, core inference semantics, training logic, loss functions, or experiment meaning.
If repository files must change, create a branch named repro/YYYY-MM-DD-short-task, keep verified patch commits sparse, and record README-fidelity impact in PATCHES.md.

See references/patch-policy.md.

Outputs

Always target repro_outputs/:

SUMMARY.md
COMMANDS.md
LOG.md
SCIENTIFIC_CHANGELOG.md
COMPARABILITY_REPORT.md
status.json
PATCHES.md   # only if patches were applied

Use the templates under assets/ and the field rules in references/output-spec.md.

Put the shortest high-value summary in SUMMARY.md.
Put copyable commands in COMMANDS.md.
Put process evidence, assumptions, failures, and decisions in LOG.md.
Put scientific meaning and change effects in SCIENTIFIC_CHANGELOG.md.
Put comparison anchors and protocol deviations in COMPARABILITY_REPORT.md.
Put durable machine-readable state in status.json.
Put branch, commit, validation, and README-fidelity impact in PATCHES.md when needed.
Distinguish verified facts from inferred guesses.

Reference Loading

Load references/language-policy.md when writing human-readable outputs.
Load ../../references/research-rigor-principles.md before making comparability, contribution, or research-result claims.
Load ../../references/deep-learning-experiment-principles.md when dataset, split, metric, checkpoint, training, or evaluation details matter.
Load references/research-safety-principles.md before protocol-sensitive decisions.
Load references/patch-policy.md before modifying repository files.
Keep specialized logic in sub-skills, scripts, templates, or references rather than expanding this entrypoint.

lllllllama/ai-research-reproduction

skills/ai-research-reproduction/SKILL.md

Rigor Reproduce compatible skill slug for README-first deep learning repository reproduction. Use when the user wants an end-to-end, minimal-trustworthy flow that reads the repository first, selects the smallest documented inference or evaluation target, coordinates intake, setup, trusted execution, optional trusted training, optional repository analysis, and optional paper-gap resolution, enforces conservative patch rules, records evidence assumptions deviations and human decision points, and writes the standardized `repro_outputs/` bundle. Do not use for paper summary, generic environment setup, isolated repo scanning, standalone command execution, silent protocol changes, score chasing, or broad research assistance outside repository-grounded reproduction.

112 stars

development

Updated May 27, 2026

$ install --global

skillsauth

npx skillsauth add lllllllama/ai-research-workflow-skills ai-research-reproduction

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 27, 2026, 2:12 AM9.8s13 files scanned

SKILL.md

name:: ai-research-reproduction
description:: Rigor Reproduce compatible skill slug for README-first deep learning repository reproduction. Use when the user wants an end-to-end, minimal-trustworthy flow that reads the repository first, selects the smallest documented inference or evaluation target, coordinates intake, setup, trusted execution, optional trusted training, optional repository analysis, and optional paper-gap resolution, enforces conservative patch rules, records evidence assumptions deviations and human decision points, and writes the standardized `repro_outputs/` bundle. Do not use for paper summary, generic environment setup, isolated repo scanning, standalone command execution, silent protocol changes, score chasing, or broad research assistance outside repository-grounded reproduction.

ai-research-reproduction

Purpose

Fit

Use this skill when all are true:

The target is an AI code repository with a README, scripts, configs, or documented commands.
The request spans multiple trusted phases such as intake, setup, execution, training verification, analysis, paper-gap resolution, and reporting.
The desired result is a small reproducible target, not broad experimentation.

Do not use this skill for paper summaries, generic environment setup, isolated repo scanning, standalone command execution, open-ended research design, or explicit candidate-only exploration.

Trusted Target Selection

Choose the smallest target that can honestly demonstrate repository-grounded reproduction:

documented inference
documented evaluation
documented training startup or partial verification
full training only after explicit user confirmation

Workflow

Read the README and nearby repo signals.
Use repo-intake-and-plan to extract documented commands and candidate targets.
Select and justify the minimum trustworthy target.
Use env-and-assets-bootstrap only for target-specific environment, checkpoint, dataset, and cache assumptions.
Use analyze-project only when structure, insertion points, or suspicious implementation patterns need read-only clarification.
Use minimal-run-and-audit for documented inference, evaluation, smoke, or sanity execution.
Use run-train instead when the selected trusted target is training startup, short-run verification, full kickoff, or resume.
Pause for human review before fuller training claims or any change that could alter dataset, split, checkpoint, preprocessing, metric, loss, model semantics, or result interpretation.
Write the standardized outputs and give a concise final note in the user's language when practical.

Patch Boundary

Prefer no repository edits. If edits are needed, keep them conservative and auditable:

Try command-line arguments, environment variables, path fixes, dependency version fixes, or dependency-file fixes before code changes.
Reproduction fixes are allowed when needed, but they must not be hidden. State what changed, why it was necessary, whether it changes scientific meaning, and whether it affects comparability with the paper, README, or baseline.
Avoid changing model architecture, core inference semantics, training logic, loss functions, or experiment meaning.
If repository files must change, create a branch named repro/YYYY-MM-DD-short-task, keep verified patch commits sparse, and record README-fidelity impact in PATCHES.md.

See references/patch-policy.md.

Outputs

Always target repro_outputs/:

SUMMARY.md
COMMANDS.md
LOG.md
SCIENTIFIC_CHANGELOG.md
COMPARABILITY_REPORT.md
status.json
PATCHES.md   # only if patches were applied

Use the templates under assets/ and the field rules in references/output-spec.md.

Put the shortest high-value summary in SUMMARY.md.
Put copyable commands in COMMANDS.md.
Put process evidence, assumptions, failures, and decisions in LOG.md.
Put scientific meaning and change effects in SCIENTIFIC_CHANGELOG.md.
Put comparison anchors and protocol deviations in COMPARABILITY_REPORT.md.
Put durable machine-readable state in status.json.
Put branch, commit, validation, and README-fidelity impact in PATCHES.md when needed.
Distinguish verified facts from inferred guesses.

Reference Loading

Load references/language-policy.md when writing human-readable outputs.
Load ../../references/research-rigor-principles.md before making comparability, contribution, or research-result claims.
Load ../../references/deep-learning-experiment-principles.md when dataset, split, metric, checkpoint, training, or evaluation details matter.
Load references/research-safety-principles.md before protocol-sensitive decisions.
Load references/patch-policy.md before modifying repository files.
Keep specialized logic in sub-skills, scripts, templates, or references rather than expanding this entrypoint.

Related Skills

lllllllama/safe-debug

development

VerifiedTrustedCommunity

Rigor Debug / Rigor Audit skill for deep learning research work. Use when the user pastes a traceback, terminal error, CUDA OOM, checkpoint load failure, shape mismatch, NaN loss symptom, or training failure and wants conservative diagnosis before any patching, with debug fixes clearly separated from research contributions. Do not use for broad refactoring, speculative adaptation, automatic exploratory patching, or general repository familiarization.

112SKILL.mdUpdated Apr 20, 2026

lllllllama/safe-debug

lllllllama/run-train

development

VerifiedTrustedCommunity

Rigor Train skill for deep learning research repositories. Use when a documented or selected training command should be run conservatively for startup verification, short-run verification, full kickoff, or resume, with command, config, seed, log, checkpoint, status, and metric evidence written to standardized `train_outputs/`. Do not use for environment setup, exploratory sweeps, speculative idea implementation, or end-to-end orchestration.

112SKILL.mdUpdated Apr 20, 2026

lllllllama/repo-intake-and-plan

tools

VerifiedTrustedCommunity

Rigor Intake helper for README-first deep learning repo reproduction. Use when the task is specifically to scan a repository, read the README and common project files, extract documented commands, classify inference, evaluation, and training candidates, and return the smallest trustworthy reproduction plan to the main orchestrator. Do not use for environment setup, asset download, command execution, final reporting, paper lookup, or end-to-end orchestration.

112SKILL.mdUpdated Apr 20, 2026

lllllllama/repo-intake-and-plan

lllllllama/paper-context-resolver

tools

VerifiedTrustedCommunity

Rigor Paper Context helper for README-first deep learning repo reproduction. Use only when the README and repository files leave a narrow reproduction-critical gap and the task is to resolve a specific paper detail such as dataset split, preprocessing, evaluation protocol, checkpoint mapping, or runtime assumption from primary paper sources while recording conflicts. Do not use for general paper summary, repo scanning, environment setup, command execution, title-only paper lookup, or replacing README guidance by default.

112SKILL.mdUpdated Apr 20, 2026

lllllllama/paper-context-resolver

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lllllllama/ai-research-workflow-skills.git

# Copy into Claude Code skills folder (global)
cp -r ai-research-workflow-skills/skills/ai-research-reproduction ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lllllllama/ai-research-workflow-skills

112 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT