Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

a-green-hand-jack/artifact-evaluation-prep

Name: artifact-evaluation-prep
Author: a-green-hand-jack

skills/artifact-evaluation-prep/SKILL.md

npx skillsauth add a-green-hand-jack/ml-research-skills artifact-evaluation-prep

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Artifact Evaluation Prep

Prepare a paper's code, data, checkpoints, scripts, and instructions so an external artifact reviewer can reproduce the paper-facing claims with minimal ambiguity.

Use this skill when:

a venue requires or offers artifact evaluation, reproducibility badges, or artifact appendices
the user needs reviewer-facing install, quickstart, demo, or reproduction instructions
a camera-ready or accepted paper needs an artifact package handoff
code, data, checkpoints, models, Docker images, or external services must be packaged
runtime, hardware, random seeds, expected outputs, or troubleshooting notes need to be made explicit
claims in the paper need to be mapped to runnable scripts or released artifacts

Do not use this skill as a general code-release skill. Use release-code for public repository hygiene, licensing, CITATION files, tags, and GitHub releases. Use this skill for reviewer-facing artifact execution and claim reproduction.

Pair this skill with:

camera-ready-finalizer to recover accepted-paper obligations and final claim/evidence state
release-code to prepare public repository hygiene after artifact obligations are clear
reproducibility-audit when environment, data, or execution drift needs a broader audit
run-experiment for generating or testing reproduction commands
figure-results-review when artifact outputs must match paper figures or tables
citation-audit when artifact metadata cites datasets, code, or prior artifacts
research-project-memory when artifact status, blockers, and reviewer-facing instructions should persist

Skill Directory Layout

<installed-skill-dir>/
├── SKILL.md
└── references/
    ├── artifact-audit.md
    ├── memory-writeback.md
    ├── package-manifest.md
    ├── report-template.md
    └── reviewer-instructions.md

Progressive Loading

Always read references/artifact-audit.md, references/package-manifest.md, and references/reviewer-instructions.md.
Read references/report-template.md before writing a saved artifact evaluation report.
Read references/memory-writeback.md when the project has memory/, component .agent/ folders, or the user asks for persistent state.
If venue rules matter, verify current official artifact evaluation instructions before asserting deadlines, badge names, anonymity rules, upload fields, page limits, or required formats.

Core Principles

Artifact evaluation is a reviewer workflow, not just a code dump.
The artifact must reproduce the paper's important claims at an acceptable cost, or clearly document what it cannot reproduce.
Prefer one reliable quickstart and one complete reproduction path over many fragile commands.
Every command should state expected runtime, hardware, input, output, and success criteria.
Package only redistributable data, checkpoints, and dependencies; document restricted assets precisely.
Keep anonymity, licensing, and external-service assumptions explicit.
Treat smoke tests as required. An untested instruction file is not an artifact package.

Step 1 - Recover Evaluation Context

Collect:

venue and artifact evaluation track, if known
official artifact instructions, badge criteria, anonymity policy, and upload mechanism
accepted or submitted paper, appendix, supplementary material, and checklist
code repository, commit hash, branches, and worktrees
datasets, checkpoints, pretrained models, generated outputs, and external dependencies
hardware expectations: CPU/GPU type, memory, disk, runtime, network access
paper claims, figures, tables, and experiments that the artifact should support
constraints: private data, license limits, large files, cloud dependencies, nondeterminism, or reviewer time budget

If no venue is specified, produce a venue-agnostic artifact package but mark venue-specific fields as unresolved.

Step 2 - Map Claims to Artifact Paths

For each paper-facing claim or result, record:

claim or result ID
paper location
script, notebook, config, or command that supports it
input data or checkpoint
expected output file, metric, table, or figure
approximate runtime and hardware
deterministic tolerance or expected variance
reviewer priority: quickstart, core, optional, or not reproducible in package

Do not imply full reproducibility if only a smoke test or cached output is provided.

Step 3 - Build the Artifact Manifest

Read references/package-manifest.md.

Create or update a manifest that lists:

repository URL or archive path
exact commit, tag, or checksum
directory layout
environment files and Docker images
data and checkpoint locations
reproduction scripts and configs
expected generated outputs
license and citation metadata
known limitations and unsupported claims

Prefer small, stable names such as ARTIFACT.md, REPRODUCE.md, or docs/artifact_evaluation.md unless the venue requires a specific filename.

Step 4 - Write Reviewer Instructions

Read references/reviewer-instructions.md.

Provide:

setup commands
quick smoke test under a short runtime budget
core reproduction commands for main paper claims
expected outputs and how to compare them with the paper
troubleshooting for common failures
hardware, storage, network, and time requirements
contact policy or anonymous support channel if allowed
limitations and optional extended runs

Instructions should be copy-pasteable and should not require the reviewer to infer hidden paths or environment variables.

Step 5 - Smoke Test the Artifact

When allowed by the user and environment, run at least:

environment creation or dependency resolution
import or CLI sanity check
quickstart command
one representative data/checkpoint load
one expected-output comparison

If commands are too expensive, record the exact reason and create a minimal substitute test.

Step 6 - Handle Packaging Risks

Audit:

anonymization vs public release state
licenses for code, data, pretrained weights, and third-party assets
large-file strategy and checksums
private paths, credentials, API keys, and machine-specific assumptions
random seeds and nondeterminism
version pinning and dependency conflicts
reviewer time budget and failure recovery

Route public release issues to release-code; route environment drift to reproducibility-audit if available.

Step 7 - Write the Artifact Evaluation Report

Read references/report-template.md.

If saving to a project and no path is given, use:

docs/submission/artifact_evaluation_prep_YYYY-MM-DD.md

The report must include:

readiness decision
blocking issues
claim-to-artifact map
package manifest summary
smoke-test status
reviewer instruction status
risks, limitations, and reviewer-facing caveats
handoff to release, camera-ready, or memory

Step 8 - Write Back to Project Memory

Read references/memory-writeback.md when memory exists.

Update artifact status, reproduction commands, blockers, claim support, release actions, and final handoff notes without copying full command logs into memory.

Final Sanity Check

Before finalizing:

every important paper claim is either reproducible, smoke-tested, cached with explanation, or explicitly out of scope
quickstart instructions have expected outputs and runtime
hardware, data, checkpoints, licenses, and anonymity state are clear
package paths and links are stable
reviewer-facing failure modes are documented
public-release and camera-ready obligations are routed
project memory records artifact readiness and open blockers

a-green-hand-jack/artifact-evaluation-prep

skills/artifact-evaluation-prep/SKILL.md

Prepare research artifact packages for evaluation or public release. Use for reproduction commands, environment checks, data packaging, and artifact forms.

3 stars

testing

Updated May 5, 2026

$ install --global

skillsauth

npx skillsauth add a-green-hand-jack/ml-research-skills artifact-evaluation-prep

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 5, 2026, 4:22 AM160.6s6 files scanned

SKILL.md

name:: artifact-evaluation-prep
description:: Prepare research artifact packages for evaluation or public release. Use for reproduction commands, environment checks, data packaging, and artifact forms.
argument-hint:: [project-dir] [--venue <venue>] [--mode audit|package|instructions|smoke-test]
allowed-tools:: Read, Write, Edit, Bash, Glob, WebSearch, WebFetch

Artifact Evaluation Prep

Prepare a paper's code, data, checkpoints, scripts, and instructions so an external artifact reviewer can reproduce the paper-facing claims with minimal ambiguity.

Use this skill when:

a venue requires or offers artifact evaluation, reproducibility badges, or artifact appendices
the user needs reviewer-facing install, quickstart, demo, or reproduction instructions
a camera-ready or accepted paper needs an artifact package handoff
code, data, checkpoints, models, Docker images, or external services must be packaged
runtime, hardware, random seeds, expected outputs, or troubleshooting notes need to be made explicit
claims in the paper need to be mapped to runnable scripts or released artifacts

Pair this skill with:

camera-ready-finalizer to recover accepted-paper obligations and final claim/evidence state
release-code to prepare public repository hygiene after artifact obligations are clear
reproducibility-audit when environment, data, or execution drift needs a broader audit
run-experiment for generating or testing reproduction commands
figure-results-review when artifact outputs must match paper figures or tables
citation-audit when artifact metadata cites datasets, code, or prior artifacts
research-project-memory when artifact status, blockers, and reviewer-facing instructions should persist

Skill Directory Layout

<installed-skill-dir>/
├── SKILL.md
└── references/
    ├── artifact-audit.md
    ├── memory-writeback.md
    ├── package-manifest.md
    ├── report-template.md
    └── reviewer-instructions.md

Progressive Loading

Always read references/artifact-audit.md, references/package-manifest.md, and references/reviewer-instructions.md.
Read references/report-template.md before writing a saved artifact evaluation report.
Read references/memory-writeback.md when the project has memory/, component .agent/ folders, or the user asks for persistent state.
If venue rules matter, verify current official artifact evaluation instructions before asserting deadlines, badge names, anonymity rules, upload fields, page limits, or required formats.

Core Principles

Artifact evaluation is a reviewer workflow, not just a code dump.
The artifact must reproduce the paper's important claims at an acceptable cost, or clearly document what it cannot reproduce.
Prefer one reliable quickstart and one complete reproduction path over many fragile commands.
Every command should state expected runtime, hardware, input, output, and success criteria.
Package only redistributable data, checkpoints, and dependencies; document restricted assets precisely.
Keep anonymity, licensing, and external-service assumptions explicit.
Treat smoke tests as required. An untested instruction file is not an artifact package.

Step 1 - Recover Evaluation Context

Collect:

venue and artifact evaluation track, if known
official artifact instructions, badge criteria, anonymity policy, and upload mechanism
accepted or submitted paper, appendix, supplementary material, and checklist
code repository, commit hash, branches, and worktrees
datasets, checkpoints, pretrained models, generated outputs, and external dependencies
hardware expectations: CPU/GPU type, memory, disk, runtime, network access
paper claims, figures, tables, and experiments that the artifact should support
constraints: private data, license limits, large files, cloud dependencies, nondeterminism, or reviewer time budget

If no venue is specified, produce a venue-agnostic artifact package but mark venue-specific fields as unresolved.

Step 2 - Map Claims to Artifact Paths

For each paper-facing claim or result, record:

claim or result ID
paper location
script, notebook, config, or command that supports it
input data or checkpoint
expected output file, metric, table, or figure
approximate runtime and hardware
deterministic tolerance or expected variance
reviewer priority: quickstart, core, optional, or not reproducible in package

Do not imply full reproducibility if only a smoke test or cached output is provided.

Step 3 - Build the Artifact Manifest

Read references/package-manifest.md.

Create or update a manifest that lists:

repository URL or archive path
exact commit, tag, or checksum
directory layout
environment files and Docker images
data and checkpoint locations
reproduction scripts and configs
expected generated outputs
license and citation metadata
known limitations and unsupported claims

Prefer small, stable names such as ARTIFACT.md, REPRODUCE.md, or docs/artifact_evaluation.md unless the venue requires a specific filename.

Step 4 - Write Reviewer Instructions

Read references/reviewer-instructions.md.

Provide:

setup commands
quick smoke test under a short runtime budget
core reproduction commands for main paper claims
expected outputs and how to compare them with the paper
troubleshooting for common failures
hardware, storage, network, and time requirements
contact policy or anonymous support channel if allowed
limitations and optional extended runs

Instructions should be copy-pasteable and should not require the reviewer to infer hidden paths or environment variables.

Step 5 - Smoke Test the Artifact

When allowed by the user and environment, run at least:

environment creation or dependency resolution
import or CLI sanity check
quickstart command
one representative data/checkpoint load
one expected-output comparison

If commands are too expensive, record the exact reason and create a minimal substitute test.

Step 6 - Handle Packaging Risks

Audit:

anonymization vs public release state
licenses for code, data, pretrained weights, and third-party assets
large-file strategy and checksums
private paths, credentials, API keys, and machine-specific assumptions
random seeds and nondeterminism
version pinning and dependency conflicts
reviewer time budget and failure recovery

Route public release issues to release-code; route environment drift to reproducibility-audit if available.

Step 7 - Write the Artifact Evaluation Report

Read references/report-template.md.

If saving to a project and no path is given, use:

docs/submission/artifact_evaluation_prep_YYYY-MM-DD.md

The report must include:

readiness decision
blocking issues
claim-to-artifact map
package manifest summary
smoke-test status
reviewer instruction status
risks, limitations, and reviewer-facing caveats
handoff to release, camera-ready, or memory

Step 8 - Write Back to Project Memory

Read references/memory-writeback.md when memory exists.

Update artifact status, reproduction commands, blockers, claim support, release actions, and final handoff notes without copying full command logs into memory.

Final Sanity Check

Before finalizing:

every important paper claim is either reproducible, smoke-tested, cached with explanation, or explicitly out of scope
quickstart instructions have expected outputs and runtime
hardware, data, checkpoints, licenses, and anonymity state are clear
package paths and links are stable
reviewer-facing failure modes are documented
public-release and camera-ready obligations are routed
project memory records artifact readiness and open blockers

Related Skills

a-green-hand-jack/ml-research-bootstrap

testing

VerifiedTrustedCommunity

Bootstrap project-local ml-research-skills. Use from global installs when creating a new ML research project, enabling this collection in an existing ML research repo, or deciding whether to install the full bundle locally. Route to project-init for new projects; do not handle paper or experiment work directly.

4SKILL.mdUpdated May 26, 2026

a-green-hand-jack/ml-research-bootstrap

a-green-hand-jack/project-ops-router

development

VerifiedTrustedCommunity

Route project operations tasks — git, memory, bootstrap, remote, workspace, code review, timeline, ops — to the correct skill. Use when the task involves commits, pushes, worktrees, project memory, enabling project-local skills, SSH/server coordination, sidecar runners, or audits. Do not solve the ops task directly.

4SKILL.mdUpdated May 19, 2026

a-green-hand-jack/project-ops-router

a-green-hand-jack/paper-writing-router

testing

VerifiedTrustedCommunity

Route ML/AI paper writing tasks to the correct skill — contract planning, prose drafting, section writing, consistency editing, review simulation, rebuttal, submission, or citation work. Use when the task involves writing, revising, reviewing, or submitting a paper instead of guessing between paper-writing-assistant, paper-writing-contract-planner, paper-reviewer-simulator, auto-paper-improvement-loop, or citation skills. Do not draft prose directly.

4SKILL.mdUpdated May 19, 2026

a-green-hand-jack/paper-writing-router

a-green-hand-jack/ml-research-router

data-ai

VerifiedTrustedCommunity

Project-local router for ML research skill selection. Use inside an initialized ML research project, or while maintaining this skill repo, when the user describes an ML research/paper/experiment/discovery/ops/release workflow and may not know the skill; route to a domain router or high-signal leaf. Do not use for generic non-ML projects.

4SKILL.mdUpdated May 19, 2026

a-green-hand-jack/ml-research-router

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/a-green-hand-jack/ml-research-skills.git

# Copy into Claude Code skills folder (global)
cp -r ml-research-skills/skills/artifact-evaluation-prep ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

a-green-hand-jack/ml-research-skills

3 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT