.claude/skills/baseline-repro/SKILL.md
WF5 Baseline Reproduction. Clone comparison method code, adapt to local environment, train and record metrics, output Baseline_Report.md. Used after data preparation and before code planning to provide comparison baselines for the research method.
npx skillsauth add linzhe001/Harness-Research baseline-reproInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
First, read PROJECT_STATE.json to get project context and Technical_Spec.md for the baseline list. For the output format, see templates/baseline-report.md. For language behavior, see ../../shared/language-policy.md. </context>
<instructions> 1. **Read prerequisite materials**docs/Technical_Spec.md: Extract the list of baselines to reproduce (including repo URLs, paper citations)docs/Dataset_Stats.md / WF4 output: Data paths and formats$ARGUMENTS specifies a particular baseline name, only reproduce that methodReproduce baselines one by one
Before reproducing each one, first create or confirm the initial runnable environment:
## Environment section of CLAUDE.md/env-setup as a prerequisite in the main workflowFor each baseline, perform the following steps:
a. Obtain code
cd baselines/
git clone {repo_url} {method_name}/ # or use existing submodule
b. Adapt to local environment
c. Train
git add baselines/{method_name}/
git commit -m "train(baseline/{method_name}): {semantic description}"
d. Evaluate
Comparative analysis
Output report
Write to docs/Baseline_Report.md (following the templates/baseline-report.md format), including:
Preserve the template structure, but localize headings and narrative text according to ../../shared/language-policy.md unless a field is explicitly marked English-only.
Update project_map.json
Update each reproduced baseline node under baselines/:
status: "verified" / "partial" / "failed"entry_point: Training entry fileUpdate project state
Update PROJECT_STATE.json:
current_stage.status → "completed"artifacts.baseline_report → "docs/Baseline_Report.md"baseline_metrics → Baseline metrics for each scene (for comparison in subsequent /iterate eval)evaluation_protocol or equivalent tracked metric definitions → for use by WF8 run/evalhistory append completion record
</instructions>
development
WF7.5 training pipeline validation. Before entering WF8 iteration, first use Codex to review code for baseline equivalence, then run a 100-step smoke test to verify end-to-end pipeline functionality.
business
WF1 Inspiration survey and gap analysis. Takes the user's research idea, performs literature search, gap analysis, competitor analysis, and feasibility scoring, then outputs Feasibility_Report.md. Use when the user has a new CV research idea that needs a feasibility assessment.
tools
WF10 Submission/Release Tool. Multi-scene training, result packaging, filename validation, dry-run submission checks. Used after ablation experiments are complete and before competition submission.
development
WF2 Architecture refinement and MVP design. Reads the feasibility report, analyzes the base codebase architecture, designs plug-and-play new modules, defines the MVP, provides A/B/C alternative plans, and outputs Technical_Spec.md. Use when a research idea needs to be translated into a concrete technical architecture design.