Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

shaun-z/auto-review-loop-llm

Name: auto-review-loop-llm
Author: shaun-z

skills/auto-review-loop-llm/SKILL.md

npx skillsauth add shaun-z/auto-claude-code-research-in-sleep auto-review-loop-llm

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Auto Review Loop (Generic LLM): Autonomous Research Improvement

Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.

Context: $ARGUMENTS

Constants

MAX_ROUNDS = 4
POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
REVIEW_DOC: review-stage/AUTO_REVIEW.md (cumulative log) (fall back to ./AUTO_REVIEW.md for legacy projects)

LLM Configuration

This skill uses any OpenAI-compatible API for external review via the llm-chat MCP server.

Configuration via MCP Server (Recommended)

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "llm-chat": {
      "command": "/usr/bin/python3",
      "args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
      "env": {
        "LLM_API_KEY": "your-api-key",
        "LLM_BASE_URL": "https://api.deepseek.com/v1",
        "LLM_MODEL": "deepseek-chat"
      }
    }
  }
}

Supported Providers

| Provider | LLM_BASE_URL | LLM_MODEL | |----------|--------------|-----------| | OpenAI | https://api.openai.com/v1 | gpt-4o, o3 | | DeepSeek | https://api.deepseek.com/v1 | deepseek-chat, deepseek-reasoner | | MiniMax | https://api.minimax.io/v1 | MiniMax-M2.7 | | Kimi (Moonshot) | https://api.moonshot.cn/v1 | moonshot-v1-8k, moonshot-v1-32k | | ZhiPu (GLM) | https://open.bigmodel.cn/api/paas/v4 | glm-4, glm-4-plus | | SiliconFlow | https://api.siliconflow.cn/v1 | Qwen/Qwen2.5-72B-Instruct | | 阿里云百炼 | https://dashscope.aliyuncs.com/compatible-mode/v1 | qwen-max | | 零一万物 | https://api.lingyiwanwu.com/v1 | yi-large |

API Call Method

Primary: MCP Tool

mcp__llm-chat__chat:
  prompt: |
    [Review prompt content]
  model: "deepseek-chat"
  system: "You are a senior ML reviewer..."

Fallback: curl

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer..."},
      {"role": "user", "content": "[review prompt]"}
    ],
    "max_tokens": 4096
  }'

State Persistence (Compact Recovery)

Persist state to review-stage/REVIEW_STATE.json after each round:

{
  "round": 2,
  "status": "in_progress",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": [],
  "timestamp": "2026-03-15T10:00:00"
}

Write this file at the end of every Phase E (after documenting the round).

On completion, set "status": "completed".

Workflow

Initialization

Check review-stage/REVIEW_STATE.json for recovery (fall back to ./REVIEW_STATE.json if not found — legacy path)
Read project context and prior reviews
Initialize round counter

Loop (up to MAX_ROUNDS)

Phase A: Review

If MCP available:

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    [Full research context: claims, methods, results, known weaknesses]
    [Changes since last round, if any]

    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

If MCP NOT available:

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
      {"role": "user", "content": "[Full review prompt]"}
    ],
    "max_tokens": 4096
  }'

Phase B: Parse Assessment

CRITICAL: Save the FULL raw response verbatim. Then extract:

Score (numeric 1-10)
Verdict ("ready" / "almost" / "not ready")
Action items (ranked list of fixes)

STOP: If score >= 6 AND verdict contains "ready/almost"

Phase C: Implement Fixes

Priority: metric additions > reframing > new experiments

Phase D: Wait for Results

Monitor remote experiments

Phase E: Document Round

Append to review-stage/AUTO_REVIEW.md:

## Round N (timestamp)

### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]

### Reviewer Raw Response

<details>
<summary>Click to expand full reviewer response</summary>

[Paste the COMPLETE raw response here — verbatim, unedited.]

</details>

### Actions Taken
- [what was implemented/changed]

### Results
- [experiment outcomes, if any]

### Status
- [continuing to round N+1 / stopping]

Write review-stage/REVIEW_STATE.json with current state.

Termination

Set review-stage/REVIEW_STATE.json status to "completed"
Write final summary

Key Rules

Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Anti-hallucination citations: When adding references, NEVER fabricate BibTeX. Use DBLP → CrossRef → [VERIFY] chain. Do NOT generate BibTeX from memory.
Be honest about weaknesses
Implement fixes BEFORE re-reviewing
Document everything
Include previous context in round 2+ prompts
Prefer MCP tool over curl when available

Prompt Template for Round 2+

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    ## Previous Review Summary (Round N-1)
    - Previous Score: X/10
    - Previous Verdict: [ready/almost/not ready]
    - Previous Key Weaknesses: [list]

    ## Changes Since Last Review
    1. [Action 1]: [result]
    2. [Action 2]: [result]

    ## Updated Results
    [paste updated metrics/tables]

    Please re-score and re-assess:
    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

Output Protocols

Follow these shared protocols for all output files:

Output Versioning Protocol — write timestamped file first, then copy to fixed name

Output Manifest Protocol — log every output to MANIFEST.md

Output Language Protocol — respect the project's language setting

shaun-z/auto-review-loop-llm

skills/auto-review-loop-llm/SKILL.md

Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".

tools

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add shaun-z/auto-claude-code-research-in-sleep auto-review-loop-llm

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 6:49 PM6.6s1 file scanned

SKILL.md

name:: auto-review-loop-llm
description:: Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
argument-hint:: [topic-or-scope]
allowed-tools:: Bash(*), Read, Grep, Glob, Write, Edit, Agent, Skill

Auto Review Loop (Generic LLM): Autonomous Research Improvement

Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.

Context: $ARGUMENTS

Constants

MAX_ROUNDS = 4
POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
REVIEW_DOC: review-stage/AUTO_REVIEW.md (cumulative log) (fall back to ./AUTO_REVIEW.md for legacy projects)

LLM Configuration

This skill uses any OpenAI-compatible API for external review via the llm-chat MCP server.

Configuration via MCP Server (Recommended)

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "llm-chat": {
      "command": "/usr/bin/python3",
      "args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
      "env": {
        "LLM_API_KEY": "your-api-key",
        "LLM_BASE_URL": "https://api.deepseek.com/v1",
        "LLM_MODEL": "deepseek-chat"
      }
    }
  }
}

Supported Providers

API Call Method

Primary: MCP Tool

mcp__llm-chat__chat:
  prompt: |
    [Review prompt content]
  model: "deepseek-chat"
  system: "You are a senior ML reviewer..."

Fallback: curl

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer..."},
      {"role": "user", "content": "[review prompt]"}
    ],
    "max_tokens": 4096
  }'

State Persistence (Compact Recovery)

Persist state to review-stage/REVIEW_STATE.json after each round:

{
  "round": 2,
  "status": "in_progress",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": [],
  "timestamp": "2026-03-15T10:00:00"
}

Write this file at the end of every Phase E (after documenting the round).

On completion, set "status": "completed".

Workflow

Initialization

Check review-stage/REVIEW_STATE.json for recovery (fall back to ./REVIEW_STATE.json if not found — legacy path)
Read project context and prior reviews
Initialize round counter

Loop (up to MAX_ROUNDS)

Phase A: Review

If MCP available:

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    [Full research context: claims, methods, results, known weaknesses]
    [Changes since last round, if any]

    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

If MCP NOT available:

curl -s "${LLM_BASE_URL}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${LLM_API_KEY}" \
  -d '{
    "model": "${LLM_MODEL}",
    "messages": [
      {"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
      {"role": "user", "content": "[Full review prompt]"}
    ],
    "max_tokens": 4096
  }'

Phase B: Parse Assessment

CRITICAL: Save the FULL raw response verbatim. Then extract:

Score (numeric 1-10)
Verdict ("ready" / "almost" / "not ready")
Action items (ranked list of fixes)

STOP: If score >= 6 AND verdict contains "ready/almost"

Phase C: Implement Fixes

Priority: metric additions > reframing > new experiments

Phase D: Wait for Results

Monitor remote experiments

Phase E: Document Round

Append to review-stage/AUTO_REVIEW.md:

## Round N (timestamp)

### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]

### Reviewer Raw Response

<details>
<summary>Click to expand full reviewer response</summary>

[Paste the COMPLETE raw response here — verbatim, unedited.]

</details>

### Actions Taken
- [what was implemented/changed]

### Results
- [experiment outcomes, if any]

### Status
- [continuing to round N+1 / stopping]

Write review-stage/REVIEW_STATE.json with current state.

Termination

Set review-stage/REVIEW_STATE.json status to "completed"
Write final summary

Key Rules

Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Anti-hallucination citations: When adding references, NEVER fabricate BibTeX. Use DBLP → CrossRef → [VERIFY] chain. Do NOT generate BibTeX from memory.
Be honest about weaknesses
Implement fixes BEFORE re-reviewing
Document everything
Include previous context in round 2+ prompts
Prefer MCP tool over curl when available

Prompt Template for Round 2+

mcp__llm-chat__chat:
  system: "You are a senior ML reviewer (NeurIPS/ICML level)."
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    ## Previous Review Summary (Round N-1)
    - Previous Score: X/10
    - Previous Verdict: [ready/almost/not ready]
    - Previous Key Weaknesses: [list]

    ## Changes Since Last Review
    1. [Action 1]: [result]
    2. [Action 2]: [result]

    ## Updated Results
    [paste updated metrics/tables]

    Please re-score and re-assess:
    1. Score this work 1-10 for a top venue
    2. List remaining critical weaknesses (ranked by severity)
    3. For each weakness, specify the MINIMUM fix
    4. State clearly: is this READY for submission? Yes/No/Almost

    Be brutally honest. If the work is ready, say so clearly.

Output Protocols

Follow these shared protocols for all output files:

Output Versioning Protocol — write timestamped file first, then copy to fixed name

Output Manifest Protocol — log every output to MANIFEST.md

Output Language Protocol — respect the project's language setting

Related Skills

shaun-z/paper-illustration-image2

development

VerifiedTrustedCommunity

Generate publication-quality academic illustrations through a local Codex app-server bridge that uses Codex native image generation. This is a separate experimental alternative to `paper-illustration`, intended for Claude Code users who want a GPT-image-style renderer without modifying the original skill.

SKILL.mdUpdated Apr 25, 2026

shaun-z/paper-illustration-image2

shaun-z/overleaf-sync

development

VerifiedTrustedCommunity

Two-way sync between a local paper directory and an Overleaf project via the Overleaf Git bridge (Premium feature). Lets you keep ARIS audit/edit workflows on the local copy while collaborators edit in the Overleaf web UI. Token never touches the agent — user does the one-time auth via macOS Keychain. Use when user says "同步 overleaf", "overleaf sync", "推送到 overleaf", "connect overleaf", "Overleaf 桥接", "pull overleaf", "push overleaf", or wants to bridge their ARIS paper directory with an Overleaf project.

SKILL.mdUpdated Apr 25, 2026

shaun-z/overleaf-sync

shaun-z/citation-audit

development

VerifiedTrustedCommunity

Zero-context verification that every bibliographic entry in the paper is real, correctly attributed, and used in a context the cited paper actually supports. Uses a fresh cross-model reviewer with web/DBLP/arXiv lookup to catch hallucinated authors, wrong years, fabricated venues, version mismatches, and wrong-context citations (cite present but the cited paper does not establish the claim). Use when user says "审查引用", "check citations", "citation audit", "verify references", "引用核对", or before submission to ensure bibliography integrity.

SKILL.mdUpdated Apr 20, 2026

shaun-z/citation-audit

shaun-z/writing-systems-papers

data-ai

VerifiedTrustedCommunity

Paragraph-level structural blueprint for 10-12 page systems papers targeting OSDI, SOSP, ASPLOS, NSDI, and EuroSys. Provides page allocation, paragraph templates, and writing patterns. Use when user says "写系统论文", "systems paper structure", "OSDI paper", "SOSP paper", or wants fine-grained structural guidance for a systems conference submission.

SKILL.mdUpdated Apr 17, 2026

shaun-z/writing-systems-papers

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/shaun-z/auto-claude-code-research-in-sleep.git

# Copy into Claude Code skills folder (global)
cp -r auto-claude-code-research-in-sleep/skills/auto-review-loop-llm ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

shaun-z/auto-claude-code-research-in-sleep

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT