skills/result-diagnosis/SKILL.md
Use when results are valid but surprising, negative, unstable, or ambiguous — to decide debug/rerun/ablate/revise/park. Not for engineering failures like NaN/OOM (use experiment-debugger). Not for confound or claim-drift audit before locking results into the paper (use research-results-auditor).
npx skillsauth add a-green-hand-jack/ml-research-skills result-diagnosisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Diagnose what an experiment result means for the project. This skill is for decision-making after results exist, especially when they are negative, surprising, unstable, or hard to interpret.
Use this skill when:
Do not use this skill to write a polished report. Pair it with experiment-report-writer after the diagnosis is clear.
Pair this skill with:
research-project-memory when the diagnosis should update claims, evidence, risks, actions, or worktree statusexperiment-report-writer when results need a shareable reportalgorithm-design-planner when the diagnosis points to method revisionexperiment-design-planner when the diagnosis requires a new controlled experimentrun-experiment when the next step is a rerun, sanity check, or ablationconference-writing-adapter when the right action is to narrow or reframe paper claims<installed-skill-dir>/
├── SKILL.md
└── references/
├── diagnosis-taxonomy.md
├── evidence-audit.md
├── next-decision-rules.md
├── report-template.md
└── triage-protocol.md
references/diagnosis-taxonomy.md, references/triage-protocol.md, and references/next-decision-rules.md.references/evidence-audit.md when inspecting logs, configs, metrics, plots, runs, or code state.references/report-template.md for full diagnosis reports.Extract:
Rewrite vague input into:
Expected [method] to improve [metric/diagnostic] over [baseline] on [setting], but observed [result] under [controls].
If expected behavior was never defined, route back to experiment-design-planner.
Read references/diagnosis-taxonomy.md.
Classify the primary symptom:
Then classify likely diagnosis categories:
Read references/evidence-audit.md.
Prefer primary artifacts:
Mark missing evidence rather than guessing.
Read references/triage-protocol.md.
Use this order:
Stop early only when a blocking bug or invalid comparison is found.
For each plausible explanation, state:
At minimum consider:
Read references/next-decision-rules.md.
Choose one primary decision:
debug: result is not trustworthy until a bug or provenance issue is resolvedrerun: result is plausible but underpowered or missing controlsablate: result needs mechanism isolationrevise-method: mechanism likely needs design changenarrow-claim: evidence supports a smaller or different claimwrite: evidence is trustworthy enough to reportpark: result is inconclusive and not worth immediate computekill: claim or direction is falsified under fair controlsDo not pick write if basic provenance or fairness is unresolved.
Use references/report-template.md for full reports.
If saving to a project and no path is given, use:
docs/diagnosis/result_diagnosis_YYYY-MM-DD_<short-name>.md
Required output:
# Result Diagnosis: [Short Name]
## Result Snapshot
## Expected vs Observed
## Symptom Classification
## Evidence Checked
## Competing Explanations
## Most Likely Diagnosis
## Decision
## Next Checks or Actions
## Claim Impact
## Project Memory Writeback
If the project uses research-project-memory, update:
memory/evidence-board.md: observed result, limitations, and source pathsmemory/provenance-board.md: mark result provenance verified, stale, contradictory, or missing when diagnosis depends on source validitymemory/claim-board.md: claims supported, weakened, revised, evidence-needed, provisional, parked, or cutmemory/risk-board.md: bugs, metric risks, baseline risks, mechanism risks, or claim risksmemory/action-board.md: debug, rerun, ablation, method revision, writing, park, or kill actionsmemory/handoff-board.md: create handoffs to method design, experiment design, paper evidence, or writing when diagnosis changes downstream workmemory/phase-dashboard.md: update the active gate when diagnosis advances evidence production or regresses the project to debugging, method revision, or claim narrowing.agent/worktree-status.md "Local Hot Results": update here first when in a code-worktree; mark confirmed/invalidated/superseded status locally before any graduation<ProjectRoot>/memory/hot-results.md: graduate here only when the result is confirmed and changes a project-level claim; do not write here while diagnosis is still in progressmemory/decision-log.md: durable decisions such as killing a claim, changing method, or narrowing scopeUse observed for verified results and inferred for explanations. Mark stale claims explicitly.
Before finalizing:
testing
Bootstrap project-local ml-research-skills. Use from global installs when creating a new ML research project, enabling this collection in an existing ML research repo, or deciding whether to install the full bundle locally. Route to project-init for new projects; do not handle paper or experiment work directly.
development
Route project operations tasks — git, memory, bootstrap, remote, workspace, code review, timeline, ops — to the correct skill. Use when the task involves commits, pushes, worktrees, project memory, enabling project-local skills, SSH/server coordination, sidecar runners, or audits. Do not solve the ops task directly.
testing
Route ML/AI paper writing tasks to the correct skill — contract planning, prose drafting, section writing, consistency editing, review simulation, rebuttal, submission, or citation work. Use when the task involves writing, revising, reviewing, or submitting a paper instead of guessing between paper-writing-assistant, paper-writing-contract-planner, paper-reviewer-simulator, auto-paper-improvement-loop, or citation skills. Do not draft prose directly.
data-ai
Project-local router for ML research skill selection. Use inside an initialized ML research project, or while maintaining this skill repo, when the user describes an ML research/paper/experiment/discovery/ops/release workflow and may not know the skill; route to a domain router or high-signal leaf. Do not use for generic non-ML projects.