Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

wanshuiyin/ablation-planner

Name: ablation-planner
Author: wanshuiyin

skills/skills-codex/ablation-planner/SKILL.md

npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep ablation-planner

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Ablation Planner

Systematically design ablation studies that answer the questions reviewers will ask. The reviewer agent leads the design; the local executor reviews feasibility and implements.

Context: $ARGUMENTS

When to Use

Main results pass /result-to-claim with claim_supported = yes or partial
The user explicitly requests ablation planning
/auto-review-loop identifies missing ablations

Workflow

Step 1: Prepare Context

Read available project files to build the full picture:

Method description and components (from idea-stage/docs/research_contract.md, legacy docs/research_contract.md, project notes, or method docs)
Current experiment results (from EXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, or W&B)
Confirmed and intended claims (from /result-to-claim output or project notes)
Available compute resources (from server notes, run configs, or user-provided budget)

Step 2: Codex Designs Ablations

spawn_agent:
  model: gpt-5.6-sol
  reasoning_effort: xhigh
  message: |
    You are a rigorous ML reviewer planning ablation studies.
    Given this method and results, design ablations that:

    1. Isolate the contribution of each novel component
    2. Answer questions reviewers will definitely ask
    3. Test sensitivity to key hyperparameters
    4. Compare against natural alternative design choices

    Method: [description from project files]
    Components: [list of removable or replaceable components]
    Current results: [key metrics from experiments]
    Claims: [what we claim and current evidence]

    For each ablation, specify:
    - name: what to change (for example, "remove module X", "replace Y with Z")
    - what_it_tests: the specific question this answers
    - expected_if_component_matters: what we predict if the component is important
    - priority: 1 (must-run) to 5 (nice-to-have)

    Also provide:
    - coverage_assessment: what reviewer questions these ablations answer
    - unnecessary_ablations: experiments that seem useful but will not add insight
    - suggested_order: run order optimized for maximum early information
    - estimated_compute: total GPU-hours estimate

If delegation is unavailable, generate the same plan locally and mark it [pending external review].

Step 3: Parse Ablation Plan

Normalize the response into a structured format:

## Ablation Plan

### Component Ablations (highest priority)
| # | Name | What It Tests | Expected If Matters | Priority |
|---|------|---------------|---------------------|----------|
| 1 | remove module X | contribution of X | performance drops on metric Y | 1 |
| 2 | replace X with simpler Z | value of learned vs fixed | drops, especially on dataset A | 2 |

### Hyperparameter Sensitivity
| # | Parameter | Values to Test | What It Tests | Priority |
|---|-----------|----------------|---------------|----------|
| 3 | lambda | [0.01, 0.1, 1.0] | sensitivity to regularization | 3 |

### Design Choice Comparisons
| # | Name | What It Tests | Priority |
|---|------|---------------|----------|
| 4 | joint vs separate matching | whether joint adds value | 4 |

### Coverage Assessment
[What reviewer questions these ablations answer]

### Unnecessary Ablations
[Experiments that seem useful but will not add insight - skip these]

### Run Order
[Optimized for maximum early information]

### Estimated Compute
[Total GPU-hours]

Step 4: CC Reviews Feasibility

Before running anything, the local executor checks:

Compute budget - Can you afford all ablations with available GPUs?
Code changes - Which ablations need code modifications vs config-only changes?
Dependencies - Which ablations can run in parallel?
Cuts - If budget is tight, propose removing lower-priority ablations and ask the reviewer agent to re-prioritize when possible

Step 5: Implement and Run

Create configs or scripts for each ablation (config-only changes first)
Smoke test each ablation before the full run
Run in the suggested order, using descriptive names (for example, ablation-no-module-X)
Track results in EXPERIMENT_LOG.md
After all ablations complete, update findings.md with insights

Rules

The reviewer agent leads the design. Do not pre-filter or bias the ablation list before external review sees it. The reviewer thinks like a reviewer; the local executor thinks like an engineer.
Every ablation must have a clear what_it_tests and expected_if_component_matters. No "just try it" experiments.
Config-only ablations take priority over those needing code changes (faster, less error-prone).
If total compute exceeds budget, propose cuts and ask for re-prioritization - do not silently drop ablations.
Component ablations (remove or replace) take priority over hyperparameter sweeps.
Do not generate ablations for components identical to the baseline (no-op ablations).
Record all ablation results in EXPERIMENT_LOG.md, including negative results (for example, component removal had no effect).

wanshuiyin/ablation-planner

skills/skills-codex/ablation-planner/SKILL.md

Use when main results pass result-to-claim (`claim_supported = yes` or `partial`) and ablation studies are needed for paper submission. A secondary Codex agent designs ablations from a reviewer's perspective; the local executor reviews feasibility and implements.

13,243 stars

development

Updated Jul 11, 2026

$ install --global

skillsauth

npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep ablation-planner

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 11, 2026, 5:14 AM112.9s1 file scanned

SKILL.md

name:: ablation-planner
description:: Use when main results pass result-to-claim (`claim_supported = yes` or `partial`) and ablation studies are needed for paper submission. A secondary Codex agent designs ablations from a reviewer's perspective; the local executor reviews feasibility and implements.
allowed-tools:: Bash(*), Read, Grep, Glob, Write, Edit

Ablation Planner

Systematically design ablation studies that answer the questions reviewers will ask. The reviewer agent leads the design; the local executor reviews feasibility and implements.

Context: $ARGUMENTS

When to Use

Main results pass /result-to-claim with claim_supported = yes or partial
The user explicitly requests ablation planning
/auto-review-loop identifies missing ablations

Workflow

Step 1: Prepare Context

Read available project files to build the full picture:

Method description and components (from idea-stage/docs/research_contract.md, legacy docs/research_contract.md, project notes, or method docs)
Current experiment results (from EXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, or W&B)
Confirmed and intended claims (from /result-to-claim output or project notes)
Available compute resources (from server notes, run configs, or user-provided budget)

Step 2: Codex Designs Ablations

spawn_agent:
  model: gpt-5.6-sol
  reasoning_effort: xhigh
  message: |
    You are a rigorous ML reviewer planning ablation studies.
    Given this method and results, design ablations that:

    1. Isolate the contribution of each novel component
    2. Answer questions reviewers will definitely ask
    3. Test sensitivity to key hyperparameters
    4. Compare against natural alternative design choices

    Method: [description from project files]
    Components: [list of removable or replaceable components]
    Current results: [key metrics from experiments]
    Claims: [what we claim and current evidence]

    For each ablation, specify:
    - name: what to change (for example, "remove module X", "replace Y with Z")
    - what_it_tests: the specific question this answers
    - expected_if_component_matters: what we predict if the component is important
    - priority: 1 (must-run) to 5 (nice-to-have)

    Also provide:
    - coverage_assessment: what reviewer questions these ablations answer
    - unnecessary_ablations: experiments that seem useful but will not add insight
    - suggested_order: run order optimized for maximum early information
    - estimated_compute: total GPU-hours estimate

If delegation is unavailable, generate the same plan locally and mark it [pending external review].

Step 3: Parse Ablation Plan

Normalize the response into a structured format:

## Ablation Plan

### Component Ablations (highest priority)
| # | Name | What It Tests | Expected If Matters | Priority |
|---|------|---------------|---------------------|----------|
| 1 | remove module X | contribution of X | performance drops on metric Y | 1 |
| 2 | replace X with simpler Z | value of learned vs fixed | drops, especially on dataset A | 2 |

### Hyperparameter Sensitivity
| # | Parameter | Values to Test | What It Tests | Priority |
|---|-----------|----------------|---------------|----------|
| 3 | lambda | [0.01, 0.1, 1.0] | sensitivity to regularization | 3 |

### Design Choice Comparisons
| # | Name | What It Tests | Priority |
|---|------|---------------|----------|
| 4 | joint vs separate matching | whether joint adds value | 4 |

### Coverage Assessment
[What reviewer questions these ablations answer]

### Unnecessary Ablations
[Experiments that seem useful but will not add insight - skip these]

### Run Order
[Optimized for maximum early information]

### Estimated Compute
[Total GPU-hours]

Step 4: CC Reviews Feasibility

Before running anything, the local executor checks:

Compute budget - Can you afford all ablations with available GPUs?
Code changes - Which ablations need code modifications vs config-only changes?
Dependencies - Which ablations can run in parallel?
Cuts - If budget is tight, propose removing lower-priority ablations and ask the reviewer agent to re-prioritize when possible

Step 5: Implement and Run

Create configs or scripts for each ablation (config-only changes first)
Smoke test each ablation before the full run
Run in the suggested order, using descriptive names (for example, ablation-no-module-X)
Track results in EXPERIMENT_LOG.md
After all ablations complete, update findings.md with insights

Rules

The reviewer agent leads the design. Do not pre-filter or bias the ablation list before external review sees it. The reviewer thinks like a reviewer; the local executor thinks like an engineer.
Every ablation must have a clear what_it_tests and expected_if_component_matters. No "just try it" experiments.
Config-only ablations take priority over those needing code changes (faster, less error-prone).
If total compute exceeds budget, propose cuts and ask for re-prioritization - do not silently drop ablations.
Component ablations (remove or replace) take priority over hyperparameter sweeps.
Do not generate ablations for components identical to the baseline (no-op ablations).
Record all ablation results in EXPERIMENT_LOG.md, including negative results (for example, component removal had no effect).

Related Skills

wanshuiyin/web-debug-search

development

VerifiedTrustedCommunity

Search GitHub Issues and Discussions for software errors, version compatibility problems, and exact error-string matches. Use for debugging and discovery only; results are not paper-citation evidence.

13,732SKILL.mdUpdated Jul 23, 2026

wanshuiyin/web-debug-search

wanshuiyin/web-debug-search

development

VerifiedTrustedCommunity

13,732SKILL.mdUpdated Jul 23, 2026

wanshuiyin/web-debug-search

wanshuiyin/integrity-forensics

testing

VerifiedTrustedCommunity

Run the Anti-Autoresearch integrity-forensics sweep (span-anchored evidence ledger → GPT auditors propose findings → deterministic rules-only adjudicator) against a paper via a SHA-pinned thin launcher — then convert the verdict into a typed policy gate (BLOCK/WARN/NO_NEW_BLOCKER) and an append-only obligations ledger. Use when user says "integrity forensics", "forensic audit this paper", "投稿前自查诚信", "审这篇论文的诚信", or says "anti-autoresearch" when the upstream repo's own skills are not installed. Also invoked by /paper-writing (submission self-forensics, default ON), /peer-review (forensic appendix), /resubmit-pipeline.

13,401SKILL.mdUpdated Jul 13, 2026

wanshuiyin/integrity-forensics

wanshuiyin/meta-apply

testing

VerifiedTrustedCommunity

Privileged applier that LANDS meta-optimize / corpus-audit patches the user approved — the ONLY skill permitted to mutate the skill corpus from a self-modification proposal, with cross-model jury and human approval at landing. Use when the user says "meta apply", "/meta-apply", "land the staged patches", "应用优化", after a /meta-optimize run.

13,401SKILL.mdUpdated May 31, 2026

wanshuiyin/meta-apply

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git

# Copy into Claude Code skills folder (global)
cp -r Auto-claude-code-research-in-sleep/skills/skills-codex/ablation-planner ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

wanshuiyin/Auto-claude-code-research-in-sleep

13,243 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT