.claude/skills/deep-check/SKILL.md
WF3 Second-pass validation and value assessment. Acts as a "devil's advocate" to critically review the technical proposal, search for failure cases, assess risks, and make a Go/No-Go decision. Use after architecture design is complete but before investing in data engineering and coding, to avoid wasting subsequent effort.
npx skillsauth add linzhe001/Harness-Research deep-checkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Input: Technical_Spec.md from WF2. Output: Sanity_Check_Log.md. On GO → WF4 (data-prep). On NO-GO → rollback to WF2.
For the output format, see templates/sanity-check.md. For language behavior, see ../../shared/language-policy.md. </context>
<instructions> 1. **Read Prerequisite Materials**Read Technical_Spec.md and extract:
Search for Failure Cases
Use WebSearch to specifically search for negative results:
"{method_name} failure" OR "limitation""{method_name} does not work""why {method_name} fails""{core_technique} training instability"Record all failure modes and negative results found.
Theoretical Analysis
<thinking> As the devil's advocate, challenge each key assumption: - Assumption 1: [description] → Are there counterexamples? Under what conditions would it fail? - Assumption 2: [description] → Is there mathematical proof or experimental validation? - If the core assumption is invalidated, can the overall approach still work? - Are there edge cases the authors may have overlooked? </thinking>Specifically check:
Performance Estimation
Based on results from similar work, estimate this method's:
Risk Matrix
| Risk Item | Probability (1-5) | Impact (1-5) | Risk Score | Mitigation Strategy | |-----------|-------------------|-------------|------------|---------------------| | Training divergence | ... | ... | ... | ... | | Performance below expectations | ... | ... | ... | ... | | Insufficient compute resources | ... | ... | ... | ... | | ... | ... | ... | ... | ... |
Go/No-Go Decision
<thinking> Synthesize all analyses to make a final judgment: - Did the failure case search reveal any fatal negative evidence? - Does the risk matrix contain any high-probability and high-impact risks? - If proceeding, what is the most likely failure mode? - If rolling back, are the alternative plans more promising? </thinking>Codex Cross-Validation (always attempt)
WF3 is a critical gate, so always attempt Codex cross-validation (unlike WF8's selective triggering).
If Codex MCP is available (mcp__codex__codex tool exists):
a. Format the Technical_Spec core plan + the above risk analysis into a prompt:
"Review this CV research approach. Find risks or failure modes I may have missed."
b. Call the mcp__codex__codex MCP tool to submit the review request
c. Parse the returned concerns/suggestions
d. If new issues are found: WebSearch to investigate → update risk matrix → mcp__codex__codex-reply to confirm
e. Maximum 3 iteration rounds, until consensus is reached or rounds are exhausted
f. Record codex_review: "used" + content
If Codex MCP is unavailable: Record codex_review: "unavailable" and note it in the report.
Add a ## Codex Cross-Validation section to the output.
Output Report
Write to docs/Sanity_Check_Log.md, including:
Preserve the template structure and decision labels, but localize headings and narrative text according to ../../shared/language-policy.md unless a field is explicitly marked English-only.
Update Project State
Update PROJECT_STATE.json:
current_stage.status → "completed"artifacts.sanity_check_log → file pathhistory append completion recorddecisions record Go/No-Go decision
</instructions>
development
WF7.5 training pipeline validation. Before entering WF8 iteration, first use Codex to review code for baseline equivalence, then run a 100-step smoke test to verify end-to-end pipeline functionality.
business
WF1 Inspiration survey and gap analysis. Takes the user's research idea, performs literature search, gap analysis, competitor analysis, and feasibility scoring, then outputs Feasibility_Report.md. Use when the user has a new CV research idea that needs a feasibility assessment.
tools
WF10 Submission/Release Tool. Multi-scene training, result packaging, filename validation, dry-run submission checks. Used after ablation experiments are complete and before competition submission.
development
WF2 Architecture refinement and MVP design. Reads the feasibility report, analyzes the base codebase architecture, designs plug-and-play new modules, defines the MVP, provides A/B/C alternative plans, and outputs Technical_Spec.md. Use when a research idea needs to be translated into a concrete technical architecture design.