skills/skills-codex/auto-review-loop-llm/SKILL.md
Autonomous research review loop using any OpenAI-compatible LLM API. Configure via llm-chat MCP server or environment variables. Trigger with "auto review loop llm" or "llm review".
npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep auto-review-loop-llmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
review-stage/AUTO_REVIEW.md (cumulative log) (fall back to ./AUTO_REVIEW.md for legacy projects)This skill uses any OpenAI-compatible API for external review via the llm-chat MCP server.
Add to ~/.codex/settings.json:
{
"mcpServers": {
"llm-chat": {
"command": "/usr/bin/python3",
"args": ["/Users/yourname/.codex/mcp-servers/llm-chat/server.py"],
"env": {
"LLM_API_KEY": "your-api-key",
"LLM_BASE_URL": "https://api.deepseek.com/v1",
"LLM_MODEL": "deepseek-chat"
}
}
}
}
| Provider | LLM_BASE_URL | LLM_MODEL |
|----------|--------------|-----------|
| OpenAI | https://api.openai.com/v1 | gpt-4o, o3 |
| DeepSeek | https://api.deepseek.com/v1 | deepseek-chat, deepseek-reasoner |
| MiniMax | https://api.minimax.io/v1 | MiniMax-M3 |
| Kimi (Moonshot) | https://api.moonshot.cn/v1 | moonshot-v1-8k, moonshot-v1-32k |
| ZhiPu (GLM) | https://open.bigmodel.cn/api/paas/v4 | glm-4, glm-4-plus |
| SiliconFlow | https://api.siliconflow.cn/v1 | Qwen/Qwen2.5-72B-Instruct |
| 阿里云百炼 | https://dashscope.aliyuncs.com/compatible-mode/v1 | qwen-max |
| 零一万物 | https://api.lingyiwanwu.com/v1 | yi-large |
Primary: MCP Tool
mcp__llm-chat__chat:
message: |
[Review prompt content]
model: "deepseek-chat"
system: "You are a senior ML reviewer..."
Fallback: curl
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer..."},
{"role": "user", "content": "[review prompt]"}
],
"max_tokens": 4096
}'
Persist state to review-stage/REVIEW_STATE.json after each round:
{
"round": 2,
"status": "in_progress",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": [],
"timestamp": "2026-03-15T10:00:00"
}
Write this file at the end of every Phase E (after documenting the round).
On completion, set "status": "completed".
review-stage/REVIEW_STATE.json for recovery (fall back to ./REVIEW_STATE.json if not found — legacy path)If MCP available:
mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
message: |
[Round N/MAX_ROUNDS of autonomous review loop]
[Full research context: claims, methods, results, known weaknesses]
[Changes since last round, if any]
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.
If MCP NOT available:
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
{"role": "user", "content": "[Full review prompt]"}
],
"max_tokens": 4096
}'
CRITICAL: Save the FULL raw response verbatim. Then extract:
STOP: If score >= 6 AND verdict ∈ {"ready", "almost"} (exact — "not ready" does NOT qualify)
Priority: metric additions > reframing > new experiments
Monitor remote experiments
Append to review-stage/AUTO_REVIEW.md:
## Round N (timestamp)
### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]
### Reviewer Raw Response
<details>
<summary>Click to expand full reviewer response</summary>
[Paste the COMPLETE raw response here — verbatim, unedited.]
</details>
### Actions Taken
- [what was implemented/changed]
### Results
- [experiment outcomes, if any]
### Status
- [continuing to round N+1 / stopping]
Write review-stage/REVIEW_STATE.json with current state.
review-stage/REVIEW_STATE.json status to "completed"Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Be honest about weaknesses
Implement fixes BEFORE re-reviewing
Document everything
Include previous context in round 2+ prompts
Prefer MCP tool over curl when available
mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
message: |
[Round N/MAX_ROUNDS of autonomous review loop]
## Previous Review Summary (Round N-1)
- Previous Score: X/10
- Previous Verdict: [ready/almost/not ready]
- Previous Key Weaknesses: [list]
## Changes Since Last Review
1. [Action 1]: [result]
2. [Action 2]: [result]
## Updated Results
[paste updated metrics/tables]
Please re-score and re-assess:
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.
Follow these shared protocols for all output files:
- Output Versioning Protocol — write timestamped file first, then copy to fixed name
- Output Manifest Protocol — log every output to MANIFEST.md
- Output Language Protocol — respect the project's language setting
research
Generate a structured paper outline from review conclusions and experiment results. Use when user says \"写大纲\", \"paper outline\", \"plan the paper\", \"论文规划\", or wants to create a paper plan before writing.
research
Generate a structured paper outline from review conclusions and experiment results. Use when user says "写大纲", "paper outline", "plan the paper", "论文规划", or wants to create a paper plan before writing.
development
Get a deep critical review of research from an external reviewer backend (Codex or manual). Use when user says "review my research", "help me review", "get external review", or wants critical feedback on research ideas, papers, or experimental results.
research
Turn a vague research direction into a problem-anchored, elegant, frontier-aware, implementation-oriented method plan via iterative GPT-5.5 review. Use when the user says "refine my approach", "帮我细化方案", "decompose this problem", "打磨idea", "refine research plan", "细化研究方案", or wants a concrete research method that stays simple, focused, and top-venue ready instead of a vague or overbuilt idea.