skills/skills-codex/result-to-claim/SKILL.md
Use when experiments complete to judge what claims the results support, what they don't, and what evidence is still missing. A secondary Codex agent evaluates results against intended claims and routes to next action (pivot, supplement, or confirm). Use after experiments finish — before writing the paper or running ablations.
npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep result-to-claimInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Experiments produce numbers; this gate decides what those numbers mean. Collect results from available sources, get a secondary Codex judgment, then auto-route based on the verdict.
Gather experiment data from whatever sources are available in the project:
wandb.Api().run("<entity>/<project>/<run_id>").history() — metrics, training curves, comparisonsssh server "tail -100 /path/to/training.log" if no other sourceAssemble the key information:
Send the collected results to a secondary Codex agent for objective evaluation:
spawn_agent:
reasoning_effort: xhigh
message: |
RESULT-TO-CLAIM EVALUATION
I need you to judge whether experimental results support the intended claim.
Intended claim: [the claim these experiments test]
Experiments run:
[list experiments with method, dataset, metrics]
Results:
[paste key numbers, comparison deltas, significance]
Baselines:
[baseline numbers and sources — reproduced or from paper]
Known caveats:
[any confounding factors, limited datasets, missing comparisons]
Please evaluate:
1. claim_supported: yes | partial | no
2. what_results_support: what the data actually shows
3. what_results_dont_support: where the data falls short of the claim
4. missing_evidence: specific evidence gaps
5. suggested_claim_revision: if the claim should be strengthened, weakened, or reframed
6. next_experiments_needed: specific experiments to fill gaps (if any)
7. confidence: high | medium | low
Be honest. Do not inflate claims beyond what the data supports.
A single positive result on one dataset does not support a general claim.
Extract structured fields from the secondary Codex response:
- claim_supported: yes | partial | no
- what_results_support: "..."
- what_results_dont_support: "..."
- missing_evidence: "..."
- suggested_claim_revision: "..."
- next_experiments_needed: "..."
- confidence: high | medium | low
Skip this step if EXPERIMENT_AUDIT.json does not exist.
if EXPERIMENT_AUDIT.json exists:
read integrity_status from file
attach to verdict output:
integrity_status: pass | warn | fail
if integrity_status == "fail":
append to verdict: "[INTEGRITY CONCERN] — audit found issues, see EXPERIMENT_AUDIT.md"
downgrade confidence to "low" regardless of Codex judgment
if integrity_status == "warn":
append to verdict: "[INTEGRITY: WARN] — audit flagged potential issues"
else:
integrity_status = "unavailable"
verdict is labeled "provisional — no integrity audit run"
(this does NOT block anything — pipeline continues normally)
See shared-references/experiment-integrity.md for the full integrity protocol.
no — Claim not supportedAGENTS.md or project notespartial — Claim partially supportedpartial on the same claim → record analysis in findings.md, consider whether to narrow the claim scope or switch ideasyes — Claim supported/ablation-plannerSkip this step entirely if research-wiki/ does not exist.
if research-wiki/ exists:
# 1. Create experiment page
Create research-wiki/experiments/<exp_id>.md with:
- node_id: exp:<id>
- idea_id: idea:<active_idea>
- date, hardware, duration, metrics
- verdict, confidence, reasoning summary
# 2. Update claim status
for each claim resolved by this verdict:
if verdict == "yes":
Update claim page: status → supported
run the installed ARIS research_wiki.py helper to add a supports edge from "exp:<id>" to "claim:<cid>"
elif verdict == "partial":
Update claim page: status → partial
run the installed ARIS research_wiki.py helper to add a partial supports edge from "exp:<id>" to "claim:<cid>"
else:
Update claim page: status → invalidated
run the installed ARIS research_wiki.py helper to add an invalidates edge from "exp:<id>" to "claim:<cid>"
# 3. Update idea outcome
Update research-wiki/ideas/<idea_id>.md:
- outcome: positive | mixed | negative
- If negative: fill "Failure / Risk Notes" and "Lessons Learned"
- If positive: fill "Actual Outcome" and "Reusable Components"
# 4. Rebuild + log
rebuild the query pack with the installed ARIS research_wiki.py helper
log "result-to-claim: exp:<id> verdict=<verdict> for idea:<idea_id>" with the installed ARIS research_wiki.py helper
# 5. Re-ideation suggestion
Count failed/partial ideas since last /idea-creator run.
If >= 3: print "💡 3+ ideas tested since last ideation. Consider re-running /idea-creator — the wiki now knows what doesn't work."
confidence is low, treat the judgment as inconclusive and add experiments rather than committing to a claim.[pending external review] - do not block the pipeline.After the secondary Codex judgment, save a trace following ../shared-references/review-tracing.md. Write files directly to .aris/traces/result-to-claim/<date>_run<NN>/ and include the prompt, raw reviewer response, parsed verdict, routing action, and whether the result is [pending external review]. Respect the --- trace: parameter when present (default: full).
research
Generate a structured paper outline from review conclusions and experiment results. Use when user says \"写大纲\", \"paper outline\", \"plan the paper\", \"论文规划\", or wants to create a paper plan before writing.
research
Generate a structured paper outline from review conclusions and experiment results. Use when user says "写大纲", "paper outline", "plan the paper", "论文规划", or wants to create a paper plan before writing.
development
Get a deep critical review of research from an external reviewer backend (Codex or manual). Use when user says "review my research", "help me review", "get external review", or wants critical feedback on research ideas, papers, or experimental results.
research
Turn a vague research direction into a problem-anchored, elegant, frontier-aware, implementation-oriented method plan via iterative GPT-5.5 review. Use when the user says "refine my approach", "帮我细化方案", "decompose this problem", "打磨idea", "refine research plan", "细化研究方案", or wants a concrete research method that stays simple, focused, and top-venue ready instead of a vague or overbuilt idea.