Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

wanshuiyin/auto-review-loop-llm

Name: auto-review-loop-llm
Author: wanshuiyin

skills/auto-review-loop-llm/SKILL.md

npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep auto-review-loop-llm

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Error

VirusTotalMulti-engine malware detection

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Auto Review Loop (Generic LLM): Autonomous Research Improvement

🔒 Do not wrap this skill in /loop, /schedule, or CronCreate. Like /auto-review-loop, it already loops internally (review → fix → re-review), feeding each round's prior-round summary into the next review prompt (the backend is a stateless per-round API/MCP call, not a shared thread). An external timer re-enters from the top each tick, dropping that accumulated context and firing the verdict on wall-clock time instead of on artifact change — zero new signal, full token cost. Schedule the external wait that precedes it, not the verdict. See shared-references/external-cadence.md.

Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.

Context: $ARGUMENTS

Constants

MAX_ROUNDS = 4
POSITIVE_THRESHOLD: score >= 6/10 AND verdict ∈ {"ready", "almost"} — both must hold, matching the operative STOP check below. Verdict vocabulary is {"ready", "almost", "not ready"}. (Earlier wording used or and a stale verdict set; the AND form is authoritative.)
REVIEW_DOC: review-stage/AUTO_REVIEW.md (cumulative log) (fall back to ./AUTO_REVIEW.md for legacy projects)

LLM Configuration

This skill uses any OpenAI-compatible API for external review via the llm-chat MCP server.

Configuration via MCP Server (Recommended)

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "llm-chat": {
      "command": "/usr/bin/python3",
      "args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
      "env": {
        "LLM_API_KEY": "your-api-key",
        "LLM_BASE_URL": "https://api.deepseek.com/v1",
        "LLM_MODEL": "deepseek-chat"
      }
    }
  }
}

Supported Providers

| Provider | LLM_BASE_URL | LLM_MODEL | |----------|--------------|-----------| | OpenAI | https://api.openai.com/v1 | gpt-4o, o3 | | DeepSeek | https://api.deepseek.com/v1 | deepseek-chat, deepseek-reasoner | | MiniMax | https://api.minimax.io/v1 | MiniMax-M3 | | Kimi (Moonshot) | https://api.moonshot.cn/v1 | moonshot-v1-8k, moonshot-v1-32k | | ZhiPu (GLM) | https://open.bigmodel.cn/api/paas/v4 | glm-4, glm-4-plus | | SiliconFlow | https://api.siliconflow.cn/v1 | Qwen/Qwen2.5-72B-Instruct | | 阿里云百炼 | https://dashscope.aliyuncs.com/compatible-mode/v1 | qwen-max | | 零一万物 | https://api.lingyiwanwu.com/v1 | yi-large |

API Call Method

Primary: MCP Tool

mcp__llm-chat__chat:
  prompt: |
    [Review prompt content]
  model: "deepseek-chat"
  system: "You are a senior ML reviewer..."

Fallback: curl

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer..."},
      {"role": "user", "content": "[review prompt]"}
    ],
    "max_tokens": 4096
  }'

State Persistence (Compact Recovery)

Persist state to review-stage/REVIEW_STATE.json after each round:

{
  "round": 2,
  "status": "in_progress",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": [],
  "timestamp": "2026-03-15T10:00:00"
}

Write this file at the end of every Phase E (after documenting the round).

On completion, set "status": "completed".

Workflow

Initialization

Check review-stage/REVIEW_STATE.json for recovery (fall back to ./REVIEW_STATE.json if not found — legacy path)
Read project context and prior reviews
Initialize round counter

Loop (up to MAX_ROUNDS)

Phase A: Review

If MCP available:

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    [Full research context: claims, methods, results, known weaknesses]
    [Changes since last round, if any]

    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

If MCP NOT available:

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
      {"role": "user", "content": "[Full review prompt]"}
    ],
    "max_tokens": 4096
  }'

Phase B: Parse Assessment

CRITICAL: Save the FULL raw response verbatim. Then extract:

Score (numeric 1-10)
Verdict ("ready" / "almost" / "not ready")
Action items (ranked list of fixes)

STOP: If score >= 6 AND verdict ∈ {"ready", "almost"} (exact — "not ready" does NOT qualify)

Phase C: Implement Fixes

Priority: metric additions > reframing > new experiments

Phase D: Wait for Results

Monitor remote experiments

Phase E: Document Round

Append to review-stage/AUTO_REVIEW.md:

## Round N (timestamp)

### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]

### Reviewer Raw Response

<details>
<summary>Click to expand full reviewer response</summary>

[Paste the COMPLETE raw response here — verbatim, unedited.]

</details>

### Actions Taken
- [what was implemented/changed]

### Results
- [experiment outcomes, if any]

### Status
- [continuing to round N+1 / stopping]

Write review-stage/REVIEW_STATE.json with current state.

Termination

Set review-stage/REVIEW_STATE.json status to "completed"
Write final summary

Key Rules

Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Anti-hallucination citations: When adding references, NEVER fabricate BibTeX. Use DBLP → CrossRef → [VERIFY] chain. Do NOT generate BibTeX from memory.
Be honest about weaknesses
Implement fixes BEFORE re-reviewing
Document everything
Include previous context in round 2+ prompts
Prefer MCP tool over curl when available

Prompt Template for Round 2+

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    ## Previous Review Summary (Round N-1)
    - Previous Score: X/10
    - Previous Verdict: [ready/almost/not ready]
    - Previous Key Weaknesses: [list]

    ## Changes Since Last Review
    1. [Action 1]: [result]
    2. [Action 2]: [result]

    ## Updated Results
    [paste updated metrics/tables]

    Please re-score and re-assess:
    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

Output Protocols

Follow these shared protocols for all output files:

Output Versioning Protocol — write timestamped file first, then copy to fixed name

Output Manifest Protocol — log every output to MANIFEST.md

Output Language Protocol — respect the project's language setting

wanshuiyin/auto-review-loop-llm

skills/auto-review-loop-llm/SKILL.md

Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".

13,323 stars

tools

Updated Jul 13, 2026

$ install --global

skillsauth

npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep auto-review-loop-llm

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Error

VirusTotalMulti-engine malware detection

70%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 13, 2026, 4:39 AM366.0s1 file scanned

SKILL.md

name:: auto-review-loop-llm
description:: Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
argument-hint:: [topic-or-scope]
allowed-tools:: Bash(*), Read, Grep, Glob, Write, Edit, Skill

Auto Review Loop (Generic LLM): Autonomous Research Improvement

🔒 Do not wrap this skill in /loop, /schedule, or CronCreate. Like /auto-review-loop, it already loops internally (review → fix → re-review), feeding each round's prior-round summary into the next review prompt (the backend is a stateless per-round API/MCP call, not a shared thread). An external timer re-enters from the top each tick, dropping that accumulated context and firing the verdict on wall-clock time instead of on artifact change — zero new signal, full token cost. Schedule the external wait that precedes it, not the verdict. See shared-references/external-cadence.md.

Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.

Context: $ARGUMENTS

Constants

MAX_ROUNDS = 4
POSITIVE_THRESHOLD: score >= 6/10 AND verdict ∈ {"ready", "almost"} — both must hold, matching the operative STOP check below. Verdict vocabulary is {"ready", "almost", "not ready"}. (Earlier wording used or and a stale verdict set; the AND form is authoritative.)
REVIEW_DOC: review-stage/AUTO_REVIEW.md (cumulative log) (fall back to ./AUTO_REVIEW.md for legacy projects)

LLM Configuration

This skill uses any OpenAI-compatible API for external review via the llm-chat MCP server.

Configuration via MCP Server (Recommended)

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "llm-chat": {
      "command": "/usr/bin/python3",
      "args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
      "env": {
        "LLM_API_KEY": "your-api-key",
        "LLM_BASE_URL": "https://api.deepseek.com/v1",
        "LLM_MODEL": "deepseek-chat"
      }
    }
  }
}

Supported Providers

API Call Method

Primary: MCP Tool

mcp__llm-chat__chat:
  prompt: |
    [Review prompt content]
  model: "deepseek-chat"
  system: "You are a senior ML reviewer..."

Fallback: curl

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer..."},
      {"role": "user", "content": "[review prompt]"}
    ],
    "max_tokens": 4096
  }'

State Persistence (Compact Recovery)

Persist state to review-stage/REVIEW_STATE.json after each round:

{
  "round": 2,
  "status": "in_progress",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": [],
  "timestamp": "2026-03-15T10:00:00"
}

Write this file at the end of every Phase E (after documenting the round).

On completion, set "status": "completed".

Workflow

Initialization

Check review-stage/REVIEW_STATE.json for recovery (fall back to ./REVIEW_STATE.json if not found — legacy path)
Read project context and prior reviews
Initialize round counter

Loop (up to MAX_ROUNDS)

Phase A: Review

If MCP available:

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    [Full research context: claims, methods, results, known weaknesses]
    [Changes since last round, if any]

    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

If MCP NOT available:

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
      {"role": "user", "content": "[Full review prompt]"}
    ],
    "max_tokens": 4096
  }'

Phase B: Parse Assessment

CRITICAL: Save the FULL raw response verbatim. Then extract:

Score (numeric 1-10)
Verdict ("ready" / "almost" / "not ready")
Action items (ranked list of fixes)

STOP: If score >= 6 AND verdict ∈ {"ready", "almost"} (exact — "not ready" does NOT qualify)

Phase C: Implement Fixes

Priority: metric additions > reframing > new experiments

Phase D: Wait for Results

Monitor remote experiments

Phase E: Document Round

Append to review-stage/AUTO_REVIEW.md:

## Round N (timestamp)

### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]

### Reviewer Raw Response

<details>
<summary>Click to expand full reviewer response</summary>

[Paste the COMPLETE raw response here — verbatim, unedited.]

</details>

### Actions Taken
- [what was implemented/changed]

### Results
- [experiment outcomes, if any]

### Status
- [continuing to round N+1 / stopping]

Write review-stage/REVIEW_STATE.json with current state.

Termination

Set review-stage/REVIEW_STATE.json status to "completed"
Write final summary

Key Rules

Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Anti-hallucination citations: When adding references, NEVER fabricate BibTeX. Use DBLP → CrossRef → [VERIFY] chain. Do NOT generate BibTeX from memory.
Be honest about weaknesses
Implement fixes BEFORE re-reviewing
Document everything
Include previous context in round 2+ prompts
Prefer MCP tool over curl when available

Prompt Template for Round 2+

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    ## Previous Review Summary (Round N-1)
    - Previous Score: X/10
    - Previous Verdict: [ready/almost/not ready]
    - Previous Key Weaknesses: [list]

    ## Changes Since Last Review
    1. [Action 1]: [result]
    2. [Action 2]: [result]

    ## Updated Results
    [paste updated metrics/tables]

    Please re-score and re-assess:
    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

Output Protocols

Follow these shared protocols for all output files:

Output Versioning Protocol — write timestamped file first, then copy to fixed name

Output Manifest Protocol — log every output to MANIFEST.md

Output Language Protocol — respect the project's language setting

Related Skills

wanshuiyin/web-debug-search

development

VerifiedTrustedCommunity

Search GitHub Issues and Discussions for software errors, version compatibility problems, and exact error-string matches. Use for debugging and discovery only; results are not paper-citation evidence.

13,732SKILL.mdUpdated Jul 23, 2026

wanshuiyin/web-debug-search

wanshuiyin/web-debug-search

development

VerifiedTrustedCommunity

13,732SKILL.mdUpdated Jul 23, 2026

wanshuiyin/web-debug-search

wanshuiyin/integrity-forensics

testing

VerifiedTrustedCommunity

Run the Anti-Autoresearch integrity-forensics sweep (span-anchored evidence ledger → GPT auditors propose findings → deterministic rules-only adjudicator) against a paper via a SHA-pinned thin launcher — then convert the verdict into a typed policy gate (BLOCK/WARN/NO_NEW_BLOCKER) and an append-only obligations ledger. Use when user says "integrity forensics", "forensic audit this paper", "投稿前自查诚信", "审这篇论文的诚信", or says "anti-autoresearch" when the upstream repo's own skills are not installed. Also invoked by /paper-writing (submission self-forensics, default ON), /peer-review (forensic appendix), /resubmit-pipeline.

13,401SKILL.mdUpdated Jul 13, 2026

wanshuiyin/integrity-forensics

wanshuiyin/meta-apply

testing

VerifiedTrustedCommunity

Privileged applier that LANDS meta-optimize / corpus-audit patches the user approved — the ONLY skill permitted to mutate the skill corpus from a self-modification proposal, with cross-model jury and human approval at landing. Use when the user says "meta apply", "/meta-apply", "land the staged patches", "应用优化", after a /meta-optimize run.

13,401SKILL.mdUpdated May 31, 2026

wanshuiyin/meta-apply

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git

# Copy into Claude Code skills folder (global)
cp -r Auto-claude-code-research-in-sleep/skills/auto-review-loop-llm ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

wanshuiyin/Auto-claude-code-research-in-sleep

13,323 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT