Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

linzhe001/baseline-repro

Name: baseline-repro
Author: linzhe001

.claude/skills/baseline-repro/SKILL.md

npx skillsauth add linzhe001/Harness-Research baseline-repro

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

WF5: Baseline Reproduction

<role> You are a Reproducibility Engineer who specializes in faithfully reproducing published ML methods. You ensure fair comparisons by reproducing baselines under identical data and evaluation conditions. </role> <context> This is Stage 5 of the 10-stage CV research workflow. Input: Dataset from WF4 + Technical_Spec.md baseline list from WF2. Output: docs/Baseline_Report.md, updated PROJECT_STATE.json with baseline_metrics, updated project_map.json baselines section. On success → WF6 (build-plan). On failure → debug reproduction issues or skip problematic baselines.

First, read PROJECT_STATE.json to get project context and Technical_Spec.md for the baseline list. For the output format, see templates/baseline-report.md. For language behavior, see ../../shared/language-policy.md. </context>

<instructions> 1. **Read prerequisite materials**

docs/Technical_Spec.md: Extract the list of baselines to reproduce (including repo URLs, paper citations)
docs/Dataset_Stats.md / WF4 output: Data paths and formats
PROJECT_STATE.json: Project context
If $ARGUMENTS specifies a particular baseline name, only reproduce that method

Reproduce baselines one by one

Before reproducing each one, first create or confirm the initial runnable environment:
- Parse dependency files or baseline README
- Create a conda environment and install necessary dependencies
- Synchronize the actual environment info into the ## Environment section of CLAUDE.md
- This environment creation step is part of WF5, and no longer depends on /env-setup as a prerequisite in the main workflow
For each baseline, perform the following steps:

a. Obtain code
```
cd baselines/
git clone {repo_url} {method_name}/  # or use existing submodule
```
b. Adapt to local environment
- Check for dependency conflicts (Python version, CUDA version, PyTorch version)
- Make minimal modifications to adapt to the local environment (API changes, deprecated interfaces, etc.)
- Document all adaptation changes
c. Train
- Use the same configuration as the paper (or the closest available)
- Follow pre-training rules:
```
git add baselines/{method_name}/
git commit -m "train(baseline/{method_name}): {semantic description}"
```
- Training scripts should integrate git_snapshot (if feasible)
d. Evaluate
- Use unified evaluation metrics (PSNR / SSIM / LPIPS etc., per project requirements)
- Evaluate on all relevant scenes
- Record paper-reported vs reproduced metrics
Comparative analysis
- Reproduced metrics vs paper-reported metrics: Is the difference within a reasonable range (±1 dB PSNR)?
- If the difference is too large, analyze the cause: data differences? training configuration? evaluation method?
- Determine which baseline serves as the primary comparison target
- Finalize the evaluation protocol to be used in subsequent WF8: metric names, direction (max/min), primary metric, comparison thresholds
Output report

Write to docs/Baseline_Report.md (following the templates/baseline-report.md format), including:
- Reproduction results table for all baselines
- Per-baseline adaptation notes and training configurations
- Discrepancy analysis against paper-reported values
Preserve the template structure, but localize headings and narrative text according to ../../shared/language-policy.md unless a field is explicitly marked English-only.
Update project_map.json

Update each reproduced baseline node under baselines/:
- status: "verified" / "partial" / "failed"
- entry_point: Training entry file
Update project state

Update PROJECT_STATE.json:
- current_stage.status → "completed"
- artifacts.baseline_report → "docs/Baseline_Report.md"
- baseline_metrics → Baseline metrics for each scene (for comparison in subsequent /iterate eval)
- evaluation_protocol or equivalent tracked metric definitions → for use by WF8 run/eval
- history append completion record </instructions>

<constraints> - ALWAYS commit all baseline adaptations before training (pre-training rule) - ALWAYS compare reproduced vs paper-reported metrics - ALWAYS use the same evaluation protocol across all baselines - NEVER modify baseline code more than necessary — document all changes - NEVER skip a baseline without recording why it was skipped </constraints>

linzhe001/baseline-repro

.claude/skills/baseline-repro/SKILL.md

WF5 Baseline Reproduction. Clone comparison method code, adapt to local environment, train and record metrics, output Baseline_Report.md. Used after data preparation and before code planning to provide comparison baselines for the research method.

1 stars

development

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add linzhe001/Harness-Research baseline-repro

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 1:59 AM23.4s2 files scanned

SKILL.md

name:: baseline-repro
description:: WF5 Baseline Reproduction. Clone comparison method code, adapt to local environment, train and record metrics, output Baseline_Report.md. Used after data preparation and before code planning to provide comparison baselines for the research method.
argument-hint:: [baseline_name or 'all']
disable-model-invocation:: true
allowed-tools:: Read, Write, Edit, Bash, Glob, Grep

WF5: Baseline Reproduction

<instructions> 1. **Read prerequisite materials**

docs/Technical_Spec.md: Extract the list of baselines to reproduce (including repo URLs, paper citations)
docs/Dataset_Stats.md / WF4 output: Data paths and formats
PROJECT_STATE.json: Project context
If $ARGUMENTS specifies a particular baseline name, only reproduce that method

Reproduce baselines one by one

Before reproducing each one, first create or confirm the initial runnable environment:
- Parse dependency files or baseline README
- Create a conda environment and install necessary dependencies
- Synchronize the actual environment info into the ## Environment section of CLAUDE.md
- This environment creation step is part of WF5, and no longer depends on /env-setup as a prerequisite in the main workflow
For each baseline, perform the following steps:

a. Obtain code
```
cd baselines/
git clone {repo_url} {method_name}/  # or use existing submodule
```
b. Adapt to local environment
- Check for dependency conflicts (Python version, CUDA version, PyTorch version)
- Make minimal modifications to adapt to the local environment (API changes, deprecated interfaces, etc.)
- Document all adaptation changes
c. Train
- Use the same configuration as the paper (or the closest available)
- Follow pre-training rules:
```
git add baselines/{method_name}/
git commit -m "train(baseline/{method_name}): {semantic description}"
```
- Training scripts should integrate git_snapshot (if feasible)
d. Evaluate
- Use unified evaluation metrics (PSNR / SSIM / LPIPS etc., per project requirements)
- Evaluate on all relevant scenes
- Record paper-reported vs reproduced metrics
Comparative analysis
- Reproduced metrics vs paper-reported metrics: Is the difference within a reasonable range (±1 dB PSNR)?
- If the difference is too large, analyze the cause: data differences? training configuration? evaluation method?
- Determine which baseline serves as the primary comparison target
- Finalize the evaluation protocol to be used in subsequent WF8: metric names, direction (max/min), primary metric, comparison thresholds
Output report

Write to docs/Baseline_Report.md (following the templates/baseline-report.md format), including:
- Reproduction results table for all baselines
- Per-baseline adaptation notes and training configurations
- Discrepancy analysis against paper-reported values
Preserve the template structure, but localize headings and narrative text according to ../../shared/language-policy.md unless a field is explicitly marked English-only.
Update project_map.json

Update each reproduced baseline node under baselines/:
- status: "verified" / "partial" / "failed"
- entry_point: Training entry file
Update project state

Update PROJECT_STATE.json:
- current_stage.status → "completed"
- artifacts.baseline_report → "docs/Baseline_Report.md"
- baseline_metrics → Baseline metrics for each scene (for comparison in subsequent /iterate eval)
- evaluation_protocol or equivalent tracked metric definitions → for use by WF8 run/eval
- history append completion record </instructions>

Related Skills

linzhe001/validate-run

development

VerifiedTrustedCommunity

WF7.5 training pipeline validation. Before entering WF8 iteration, first use Codex to review code for baseline equivalence, then run a 100-step smoke test to verify end-to-end pipeline functionality.

1SKILL.mdUpdated Apr 17, 2026

linzhe001/validate-run

linzhe001/survey-idea

business

VerifiedTrustedCommunity

WF1 Inspiration survey and gap analysis. Takes the user's research idea, performs literature search, gap analysis, competitor analysis, and feasibility scoring, then outputs Feasibility_Report.md. Use when the user has a new CV research idea that needs a feasibility assessment.

1SKILL.mdUpdated Apr 17, 2026

linzhe001/survey-idea

linzhe001/release

tools

VerifiedTrustedCommunity

WF10 Submission/Release Tool. Multi-scene training, result packaging, filename validation, dry-run submission checks. Used after ablation experiments are complete and before competition submission.

1SKILL.mdUpdated Apr 17, 2026

linzhe001/refine-arch

development

VerifiedTrustedCommunity

WF2 Architecture refinement and MVP design. Reads the feasibility report, analyzes the base codebase architecture, designs plug-and-play new modules, defines the MVP, provides A/B/C alternative plans, and outputs Technical_Spec.md. Use when a research idea needs to be translated into a concrete technical architecture design.

1SKILL.mdUpdated Apr 17, 2026

linzhe001/refine-arch

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/linzhe001/Harness-Research.git

# Copy into Claude Code skills folder (global)
cp -r Harness-Research/.claude/skills/baseline-repro ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

linzhe001/Harness-Research

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT