Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

sundial-org/icml-reviewer

Name: icml-reviewer
Author: sundial-org

skills/icml-reviewer/SKILL.md

npx skillsauth add sundial-org/skills icml-reviewer

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

ICML Paper Reviewer

Enables rigorous review of ML research papers following official ICML guidelines.

Workflow

Step 1: Input Analysis & Mode Selection

Determine input type:

Complete paper: PDF/text with abstract, methodology, experiments, results → Full Review Mode
Incomplete document: Missing major sections, labeled draft/proposal, or user indicates early stage → Early-Stage Feedback Mode
Code repository: User points to folder/repo path → Repository Review Mode

For complete papers, extract: title, abstract, main claims, methodology, experiments, results. Identify paper type: theoretical, methodological, algorithmic, empirical, bridge paper, or application-driven.

For code repositories, first explore: read README, scan code structure, find experiment scripts/results, identify the research question and what's implemented.

Step 2: Prior Work Grounding (Critical - All Modes)

This step applies to ALL input types. Grounding in reality is essential for any meaningful feedback.

Generate 3-5 search queries based on the research topic: benchmarks/baselines, same problem, related techniques
Use WebSearch to find recent arXiv papers and published work
Fetch abstracts of 5-10 most relevant papers
Critically synthesize:
- What specific claims in this paper are already addressed by prior work?
- What are the actual quantitative improvements over recent baselines?
- Are claimed "novelties" actually novel given the literature?
- What gaps truly exist vs. what the authors claim exists?

Critical mindset:

Your job is to verify claims against reality, not accept them at face value
Most papers overclaim—your review should ground their contributions in what the literature actually shows
Default to skepticism: Assume claims are overstated until proven otherwise by evidence
Authors have selection bias toward their own work; you represent the community's interests
Be the critical voice that ensures published work actually advances the field

Then proceed to mode-specific evaluation.

Full Review Mode (Complete Papers)

Step 3: Systematic Evaluation

Evaluate across 7 dimensions (see references/evaluation-criteria.md). Default to skepticism—require strong evidence to score highly.

| Dimension | Key Questions (Answer with Literature Evidence) | |-----------|---------------| | Originality | Is this truly novel given recent work X, Y, Z? What specific aspects are incremental vs. novel? | | Importance | Why does this problem matter? What's the real-world impact? Who will care? | | Claims Support | Do experiments actually prove the claims? What alternative explanations exist? | | Experimental Soundness | Are baselines from 2023+? Are comparisons fair? What's missing? | | Clarity | Can I reproduce this from the paper? Are claims precisely stated? | | Community Value | Will this change how people work? Or just add noise? | | Prior Work Context | Are comparisons accurate? What recent work (last 2 years) is missing? |

Evaluation mindset:

Start from neutral and require evidence to move up or down
Compare every claim against what you found in the literature search
Most papers are incremental—high originality scores are rare
Weak baselines or missing comparisons are critical flaws, not minor issues

Step 4: Critical Cross-Check Against Literature

Before writing the review, explicitly verify:

Baselines check: List baselines used in paper. List baselines from your literature search of adjacent papers. What's missing?
Methodology check: How do 2-3 adjacent papers approach this problem? Does this paper follow similar methodology? If not, why not?
Claims check: List main claims. For each, cite specific evidence from experiments or proofs. If insufficient, note it.
Citations check: Which papers from your search are cited? Which are missing? Why?
Novelty check: List claimed novelties. For each, cite specific prior work that does or doesn't do this.

This step is not optional. Your review must reference specific findings from your literature search.

Step 5: Generate Review

Follow the ICML review form (see references/review-template.md):

Summary - Neutral, factual (should not be disputed by authors)
Claims and Evidence - Are claims supported? Compare to what literature shows
Relation to Prior Work - Proper context? Missing citations? List specific missing papers
Strengths - Specific and substantive, compared to standards in adjacent work
Weaknesses - Constructive, explain severity, cite specific literature for comparison
Questions for Authors - Numbered, explain impact on evaluation
Minor Issues - Typos, suggestions
Overall Recommendation - 1-5 scale with justification grounded in literature comparison
Confidence Score - 1-5 scale

Step 6: Quality Check

Verify all claims in review are substantiated
Ensure constructive tone
Check specificity of strengths/weaknesses
Confirm questions are actionable

Key Principles

Be Rigorous AND Constructive

Your primary duty is to the research community—publishing weak papers dilutes the literature.

Be honest: Don't inflate scores to be nice. If baselines are weak, say so clearly.
Be specific: Always cite which literature contradicts or supports claims.
Be fair: Criticism should be substantiated by evidence or literature.
Be actionable: Tell authors exactly what would fix the issues.

"Review the papers of others as you would wish your own to be reviewed"—with rigor, honesty, and specific feedback grounded in the literature.

Be Specific

Bad: "The experiments are weak" Good: "Experiments compare only against [X] from 2019, but recent baselines [Y] (2024) and [Z] (2024) should be included."

Fair Novelty Assessment

Originality may arise from: creative combinations, new domains, removing restrictive assumptions, novel datasets, new problem formulations.

But: Most claimed novelty is actually incremental. Verify against literature before accepting novelty claims.

Score Calibration

Use this reference frame:

5s are rare: Reserve for papers that will clearly influence the field
4s are uncommon: Solid papers with rigorous execution and clear contributions
3s are common: Papers with merit but significant limitations
2s are common: Incremental work or work with major methodological issues
1s indicate fundamental problems: Wrong results, no contribution, or severe ethical issues

If you find yourself giving mostly 4s and 5s, you're likely being too generous. Re-calibrate against what the literature shows is standard.

Application-Driven Papers

For application-driven ML: methods should fit real-world constraints, non-standard datasets acceptable if documented, compare against domain baselines.

Rating Scales

Overall (1-5): Use the full range. Most papers should be 2-3.

5 (Strong Accept): Significant contribution, will be influential, no major flaws
4 (Accept): Solid contribution, rigorous execution, minor issues only
3 (Weak Accept): Contribution exists but limited; or good idea with execution flaws
2 (Weak Reject): Incremental contribution insufficient for venue; or significant methodological issues
1 (Reject): Fundamental flaws, not ready, or no meaningful contribution

Red flags that should lower scores:

Baselines older than 2 years (unless explicitly justified)
Missing comparisons to obvious related work from literature search
Claims not directly supported by presented experiments
Novelty claims contradicted by prior work

Confidence (1-5): 5=Expert/certain, 4=Confident, 3=Fairly confident, 2=Uncertain, 1=Not in area

Early-Stage Feedback Mode

Use this mode for incomplete drafts, research proposals, or code repositories. Focus shifts from "accept/reject evaluation" to "constructive guidance on how to make this publishable."

After completing Steps 1-2 (input analysis and prior work grounding), proceed here.

Step 3: Generate Formative Feedback

Use the Early-Stage Feedback Template (see references/review-template.md). No numerical scores—focus on constructive guidance.

For code repositories, additionally address:

Code quality and organization
Experiment design and reproducibility
What's missing for a paper (baselines, ablations, analysis)

References

references/evaluation-criteria.md - Detailed criteria for each dimension
references/review-template.md - Full template with examples
references/common-issues.md - Common paper issues to identify

sundial-org/icml-reviewer

skills/icml-reviewer/SKILL.md

Paper reviewer that evaluates machine learning research projects following official ICML reviewer guidelines. Provides comprehensive reviews with actionable feedback across all key dimensions: claims/evidence, relation to prior work, originality, significance, clarity, and reproducibility. Also provides formative feedback on incomplete drafts, proposals, and research code repositories. MANDATORY TRIGGERS: review paper, ICML review, paper review, evaluate paper, research paper feedback, ML paper review, conference review, academic review, paper critique, NeurIPS review, ICLR review, project proposal, research proposal, paper draft, early feedback, incomplete paper, work in progress, WIP review, review repo, review codebase, research project review

148 stars

development

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add sundial-org/skills icml-reviewer

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 8:58 AM178.1s1 file scanned

SKILL.md

name:: icml-reviewer
description:: |
Paper reviewer that evaluates machine learning research projects following official ICML reviewer guidelines. Provides comprehensive reviews with actionable feedback across all key dimensions:: claims/evidence, relation to prior work, originality, significance, clarity, and reproducibility. Also provides formative feedback on incomplete drafts, proposals, and research code repositories.
MANDATORY TRIGGERS:: review paper, ICML review, paper review, evaluate paper, research paper feedback, ML paper review, conference review, academic review, paper critique, NeurIPS review, ICLR review, project proposal, research proposal, paper draft, early feedback, incomplete paper, work in progress, WIP review, review repo, review codebase, research project review

ICML Paper Reviewer

Enables rigorous review of ML research papers following official ICML guidelines.

Workflow

Step 1: Input Analysis & Mode Selection

Determine input type:

Complete paper: PDF/text with abstract, methodology, experiments, results → Full Review Mode
Incomplete document: Missing major sections, labeled draft/proposal, or user indicates early stage → Early-Stage Feedback Mode
Code repository: User points to folder/repo path → Repository Review Mode

For code repositories, first explore: read README, scan code structure, find experiment scripts/results, identify the research question and what's implemented.

Step 2: Prior Work Grounding (Critical - All Modes)

This step applies to ALL input types. Grounding in reality is essential for any meaningful feedback.

Generate 3-5 search queries based on the research topic: benchmarks/baselines, same problem, related techniques
Use WebSearch to find recent arXiv papers and published work
Fetch abstracts of 5-10 most relevant papers
Critically synthesize:
- What specific claims in this paper are already addressed by prior work?
- What are the actual quantitative improvements over recent baselines?
- Are claimed "novelties" actually novel given the literature?
- What gaps truly exist vs. what the authors claim exists?

Critical mindset:

Your job is to verify claims against reality, not accept them at face value
Most papers overclaim—your review should ground their contributions in what the literature actually shows
Default to skepticism: Assume claims are overstated until proven otherwise by evidence
Authors have selection bias toward their own work; you represent the community's interests
Be the critical voice that ensures published work actually advances the field

Then proceed to mode-specific evaluation.

Full Review Mode (Complete Papers)

Step 3: Systematic Evaluation

Evaluate across 7 dimensions (see references/evaluation-criteria.md). Default to skepticism—require strong evidence to score highly.

Evaluation mindset:

Start from neutral and require evidence to move up or down
Compare every claim against what you found in the literature search
Most papers are incremental—high originality scores are rare
Weak baselines or missing comparisons are critical flaws, not minor issues

Step 4: Critical Cross-Check Against Literature

Before writing the review, explicitly verify:

Baselines check: List baselines used in paper. List baselines from your literature search of adjacent papers. What's missing?
Methodology check: How do 2-3 adjacent papers approach this problem? Does this paper follow similar methodology? If not, why not?
Claims check: List main claims. For each, cite specific evidence from experiments or proofs. If insufficient, note it.
Citations check: Which papers from your search are cited? Which are missing? Why?
Novelty check: List claimed novelties. For each, cite specific prior work that does or doesn't do this.

This step is not optional. Your review must reference specific findings from your literature search.

Step 5: Generate Review

Follow the ICML review form (see references/review-template.md):

Summary - Neutral, factual (should not be disputed by authors)
Claims and Evidence - Are claims supported? Compare to what literature shows
Relation to Prior Work - Proper context? Missing citations? List specific missing papers
Strengths - Specific and substantive, compared to standards in adjacent work
Weaknesses - Constructive, explain severity, cite specific literature for comparison
Questions for Authors - Numbered, explain impact on evaluation
Minor Issues - Typos, suggestions
Overall Recommendation - 1-5 scale with justification grounded in literature comparison
Confidence Score - 1-5 scale

Step 6: Quality Check

Verify all claims in review are substantiated
Ensure constructive tone
Check specificity of strengths/weaknesses
Confirm questions are actionable

Key Principles

Be Rigorous AND Constructive

Your primary duty is to the research community—publishing weak papers dilutes the literature.

Be honest: Don't inflate scores to be nice. If baselines are weak, say so clearly.
Be specific: Always cite which literature contradicts or supports claims.
Be fair: Criticism should be substantiated by evidence or literature.
Be actionable: Tell authors exactly what would fix the issues.

"Review the papers of others as you would wish your own to be reviewed"—with rigor, honesty, and specific feedback grounded in the literature.

Be Specific

Bad: "The experiments are weak" Good: "Experiments compare only against [X] from 2019, but recent baselines [Y] (2024) and [Z] (2024) should be included."

Fair Novelty Assessment

Originality may arise from: creative combinations, new domains, removing restrictive assumptions, novel datasets, new problem formulations.

But: Most claimed novelty is actually incremental. Verify against literature before accepting novelty claims.

Score Calibration

Use this reference frame:

5s are rare: Reserve for papers that will clearly influence the field
4s are uncommon: Solid papers with rigorous execution and clear contributions
3s are common: Papers with merit but significant limitations
2s are common: Incremental work or work with major methodological issues
1s indicate fundamental problems: Wrong results, no contribution, or severe ethical issues

If you find yourself giving mostly 4s and 5s, you're likely being too generous. Re-calibrate against what the literature shows is standard.

Application-Driven Papers

For application-driven ML: methods should fit real-world constraints, non-standard datasets acceptable if documented, compare against domain baselines.

Rating Scales

Overall (1-5): Use the full range. Most papers should be 2-3.

5 (Strong Accept): Significant contribution, will be influential, no major flaws
4 (Accept): Solid contribution, rigorous execution, minor issues only
3 (Weak Accept): Contribution exists but limited; or good idea with execution flaws
2 (Weak Reject): Incremental contribution insufficient for venue; or significant methodological issues
1 (Reject): Fundamental flaws, not ready, or no meaningful contribution

Red flags that should lower scores:

Baselines older than 2 years (unless explicitly justified)
Missing comparisons to obvious related work from literature search
Claims not directly supported by presented experiments
Novelty claims contradicted by prior work

Confidence (1-5): 5=Expert/certain, 4=Confident, 3=Fairly confident, 2=Uncertain, 1=Not in area

Early-Stage Feedback Mode

Use this mode for incomplete drafts, research proposals, or code repositories. Focus shifts from "accept/reject evaluation" to "constructive guidance on how to make this publishable."

After completing Steps 1-2 (input analysis and prior work grounding), proceed here.

Step 3: Generate Formative Feedback

Use the Early-Stage Feedback Template (see references/review-template.md). No numerical scores—focus on constructive guidance.

For code repositories, additionally address:

Code quality and organization
Experiment design and reproducibility
What's missing for a paper (baselines, ablations, analysis)

References

references/evaluation-criteria.md - Detailed criteria for each dimension
references/review-template.md - Full template with examples
references/common-issues.md - Common paper issues to identify

Related Skills

sundial-org/cs448b-visualization

development

VerifiedTrustedCommunity

Data visualization design based on Stanford CS448B. Use for: (1) choosing chart types, (2) selecting visual encodings, (3) critiquing visualizations, (4) building D3.js visualizations, (5) designing interactions/animations, (6) choosing colors, (7) visualizing networks, (8) visualizing text. Covers Bertin, Mackinlay, Cleveland & McGill.

148SKILL.mdUpdated Apr 15, 2026

sundial-org/cs448b-visualization

sundial-org/training-data-curation

testing

VerifiedTrustedCommunity

Guidelines for creating high-quality datasets for LLM post-training (SFT/DPO/RLHF). Use when preparing data for fine-tuning, evaluating data quality, or designing data collection strategies.

148SKILL.mdUpdated Apr 15, 2026

sundial-org/training-data-curation

sundial-org/tinker

development

VerifiedTrustedCommunity

Fine-tune LLMs using the Tinker API. Covers supervised fine-tuning, reinforcement learning, LoRA training, vision-language models, and both high-level Cookbook patterns and low-level API usage.

148SKILL.mdUpdated Apr 15, 2026

sundial-org/tinker-training-cost

data-ai

VerifiedTrustedCommunity

Calculate training costs for Tinker fine-tuning jobs. Use when estimating costs for Tinker LLM training, counting tokens in datasets, or comparing Tinker model training prices. Tokenizes datasets using the correct model tokenizer and provides accurate cost estimates.

148SKILL.mdUpdated Apr 15, 2026

sundial-org/tinker-training-cost

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/sundial-org/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/skills/icml-reviewer ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

sundial-org/skills

148 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT