.claude/skills/orchestrator/SKILL.md
--- name: orchestrator description: CV research project orchestrator. Coordinates the 10-stage research workflow (WF1-WF10 + WF7.5 gate), tracks progress, and manages PROJECT_STATE.json. Supported commands: init (initialize), status (view state), next (advance stage), rollback (revert), decision (record decisions). Use when the user wants to manage CV research project progress, initialize a project, view status, or switch workflow stages. argument-hint: "[command: init|status|next|rollback|decis
npx skillsauth add linzhe001/Harness-Research .claude/skills/orchestratorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For the PROJECT_STATE.json schema, see templates/project-state-schema.json.
PROJECT_STATE.json — The single source of truth for stage/transition state. Only orchestrator and individual WF skills can write to it.iteration_log.json — The single source of truth for experiments. Only the iterate skill can write to it.project_map.json — The single source of truth for code structure. Only build-plan and code-debug can write to it.orchestrator reads WF8 iteration state (best_iteration, latest decision) from iteration_log.json, but does not write to iteration_log.json.
| ID | Name | Skill | Required Artifacts | |----|------|-------|--------------------| | 1 | survey_idea | /survey-idea | docs/Feasibility_Report.md | | 2 | refine_arch | /refine-arch | docs/Technical_Spec.md | | 3 | deep_check | /deep-check | docs/Sanity_Check_Log.md | | 4 | data_prep | /data-prep | docs/Dataset_Stats.md | | 5 | baseline_repro | /baseline-repro | docs/Baseline_Report.md, baseline_metrics in PROJECT_STATE | | 6 | build_plan | /build-plan | docs/Implementation_Roadmap.md, project_map.json | | 7 | code_expert | /code-expert | src/ code files | | 7.5 | validate_run | /validate-run | 100-step smoke test passed | | 8 | iterate | /iterate | iteration_log.json | | 9 | final_exp | /final-exp | docs/Final_Experiment_Matrix.md | | 10 | release | /release | submission package |
WF8 iteration loop: /iterate plan → /iterate code → /iterate run → /iterate eval → decision branch:
Utility Skills (non-numbered stages, can be called independently or invoked by WF8):
/code-debug — code fixes, called by /iterate code/evaluate — result analysis, called by /iterate eval/env-setup — maintenance-type environment refresh, not a prerequisite step in the main flowKey outputs: WF6 generates project_map.json (architecture blueprint), which both WF7 and code-debug must depend on.
For language behavior, see ../../shared/language-policy.md.
</context>
Execute the corresponding command based on $ARGUMENTS.
init - Initialize a new projectAsk the user for the following information (using the AskUserQuestion tool):
Create the project directory structure:
{project_root}/
├── docs/
│ └── iterations/ # per-iteration eval reports
├── src/
├── baselines/
├── configs/
├── scripts/
├── tests/
├── experiments/
└── PROJECT_STATE.json
Call /init-project init to generate a minimal CLAUDE.md
Only generate Environment (virtual environment, Python, GPU, dependencies) and Workflow overview.
Idea, Tech Stack, Project Structure, and other content will be filled in at later stages.
Generate the initial PROJECT_STATE.json file according to templates/project-state-schema.json:
project_meta: fill in user-provided informationcurrent_stage: workflow_id=1, workflow_name="survey_idea", status="not_started"artifacts: empty object {}baseline_metrics: empty object {}decisions: empty array []history: empty array []active_experiments: empty array []tracking: backend="none"status - View current state.auto_iterate/state.json exists, include loop state (current round, goal, progress) in the report (read-only — orchestrator never writes to that file).next - Advance to next stageBefore advancing, verify the following conditions:
WF5 (baseline_repro) special validation:
docs/Baseline_Report.md must existbaseline_metrics must be non-emptystatus in project_map.json must be verified or partial (cannot be untested)partial with an explanation in the reportWF7.5 (validate_run) gate:
/validate-run to verify: 100-step training passes, eval passes, checkpoint can be saved, wandb can connect/validate-run passes, orchestrator should auto-trigger /auto-iterate-goal check so that an iteration goal is set before WF8 beginsIf validation passes:
If validation fails:
CLAUDE.md auto-update (after stage completion):
/init-project update (fill in the confirmed idea description)/init-project update (fill in Tech Stack details)/init-project update (fill in Dataset paths and statistics)/init-project update (fill in Baseline metric references)/init-project update (fill in Project Structure + Core Artifacts)/init-project update (lock Entry Scripts into CLAUDE.md)WF8 → WF9 transition:
WF7/WF8 special logic:
/code-expert all (first-time full generation)/iterate sub-commands to manage the iteration loop:
/iterate plan → /iterate code → /iterate run → /iterate eval/iterate ablate for intra-iteration ablation experimentsrollback - Roll back to a specified stageParameter: target workflow_id (parsed from $ARGUMENTS)
decision - Record a key decisionStatus reports, direct questions, and summaries should follow ../../shared/language-policy.md. Keep workflow IDs, schema keys, and explicitly English-only fields unchanged.
| Current Stage | On Success | On Failure | |---------------|-------------|-------------| | WF1 survey-idea | WF2 refine-arch | Terminate project or redefine idea | | WF2 refine-arch | WF3 deep-check | Roll back to WF1 for re-survey | | WF3 deep-check | WF4 data-prep | Flag as high-risk, roll back to WF2 for alternative approach | | WF4 data-prep | WF5 baseline-repro | Manual intervention for data issues | | WF5 baseline-repro | WF6 build-plan | Mark unreproducible baselines as partial, continue | | WF6 build-plan | WF7 code-expert | Roll back to WF2 to adjust architecture | | WF7 code-expert | WF7.5 validate-run | First-time generation fails → check Roadmap | | WF7.5 validate-run | WF8 iterate | Smoke test fails → debug | | WF8 iterate (NEXT_ROUND) | Stay in WF8 — ordinary improvement round | Loop until CONTINUE/PIVOT/ABORT | | WF8 iterate (DEBUG) | Stay in WF8 — fix technical issues | Loop until CONTINUE/PIVOT/ABORT | | WF8 iterate (CONTINUE) | WF9 final-exp | — | | WF8 iterate (PIVOT) | Roll back to WF2 for alternative approach | — | | WF8 iterate (ABORT) | Terminate project | — | | WF9 final-exp | WF10 release | Run additional experiments or adjust design | | WF10 release | Project complete | Fix submission issues |
Branch strategy: Single-person projects can develop directly on master/main. For team collaboration, branches per stage are optional.
Commit format (choose by scenario):
train(research): {description} or train(baseline/{name}): {description} (see pre-training rule)[WF{n}] {type}: {message}, type = feat / fix / docs / refactor / exp
</instructions>
development
WF7.5 training pipeline validation. Before entering WF8 iteration, first use Codex to review code for baseline equivalence, then run a 100-step smoke test to verify end-to-end pipeline functionality.
business
WF1 Inspiration survey and gap analysis. Takes the user's research idea, performs literature search, gap analysis, competitor analysis, and feasibility scoring, then outputs Feasibility_Report.md. Use when the user has a new CV research idea that needs a feasibility assessment.
tools
WF10 Submission/Release Tool. Multi-scene training, result packaging, filename validation, dry-run submission checks. Used after ablation experiments are complete and before competition submission.
development
WF2 Architecture refinement and MVP design. Reads the feasibility report, analyzes the base codebase architecture, designs plug-and-play new modules, defines the MVP, provides A/B/C alternative plans, and outputs Technical_Spec.md. Use when a research idea needs to be translated into a concrete technical architecture design.