skills/skills-codex-claude-review/auto-paper-improvement-loop/SKILL.md
Autonomously improve a generated paper via Claude review through claude-review MCP → implement fixes → recompile, for 2 rounds. Use when user says "改论文", "improve paper", "论文润色循环", "auto improve", or wants to iteratively polish a generated paper.
npx skillsauth add wanshuiyin/Auto-claude-code-research-in-sleep auto-paper-improvement-loopInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Override for Codex users who want Claude Code, not a second Codex agent, to act as the reviewer. Install this package after
skills/skills-codex/*.
Autonomously improve the paper at: $ARGUMENTS
This skill is designed to run after Workflow 3 (/paper-plan → /paper-figure → /paper-write → /paper-compile). It takes a compiled paper and iteratively improves it through external LLM review.
Unlike /auto-review-loop (which iterates on research — running experiments, collecting data, rewriting narrative), this skill iterates on paper writing quality — fixing theoretical inconsistencies, softening overclaims, adding missing content, and improving presentation.
claude-review — Claude reviewer invoked through the local claude-review MCP bridge. Set CLAUDE_REVIEW_MODEL if you need a specific Claude model override.PAPER_IMPROVEMENT_LOG.md — Cumulative log of all rounds, stored in paper directory.true, pause after each round's review and present score + weaknesses to the user. The user can approve fixes, provide custom modification instructions, skip specific fixes, or stop early. When false (default), runs fully autonomously.💡 Override:
/auto-paper-improvement-loop "paper/" — human checkpoint: true
paper/main.pdf + LaTeX source files.tex files — concatenated for review promptIf the context window fills up mid-loop, Codex auto-compacts. To recover, this skill writes PAPER_IMPROVEMENT_STATE.json after each round:
{
"current_round": 1,
"thread_id": "019ce736-...",
"last_score": 6,
"status": "in_progress",
"timestamp": "2026-03-13T21:00:00"
}
On startup: if PAPER_IMPROVEMENT_STATE.json exists with "status": "in_progress" AND timestamp is within 24 hours, read it + PAPER_IMPROVEMENT_LOG.md to recover context, then resume from the next round. Otherwise (file absent, "status": "completed", or older than 24 hours), start fresh.
After each round: overwrite the state file. On completion: set "status": "completed".
cp paper/main.pdf paper/main_round0_original.pdf
Concatenate all section files into a single text block for the review prompt:
# Collect all sections in order
for f in paper/sections/*.tex; do
echo "% === $(basename $f) ==="
cat "$f"
done > /tmp/paper_full_text.txt
Send the full paper text to Claude review:
mcp__claude-review__review_start:
prompt: |
You are reviewing a [VENUE] paper. Please provide a detailed, structured review.
## Full Paper Text:
[paste concatenated sections]
## Review Instructions
Please act as a senior ML reviewer ([VENUE] level). Provide:
1. **Overall Score** (1-10, where 6 = weak accept, 7 = accept)
2. **Summary** (2-3 sentences)
3. **Strengths** (bullet list, ranked)
4. **Weaknesses** (bullet list, ranked: CRITICAL > MAJOR > MINOR)
5. **For each CRITICAL/MAJOR weakness**: A specific, actionable fix
6. **Missing References** (if any)
7. **Verdict**: Ready for submission? Yes / Almost / No
Focus on: theoretical rigor, claims vs evidence alignment, writing clarity,
self-containedness, notation consistency.
After this start call, immediately save the returned jobId and poll mcp__claude-review__review_status with a bounded waitSeconds until done=true. Treat the completed status payload's response as the reviewer output, and save the completed threadId for any follow-up round.
Save the returned jobId, poll mcp__claude-review__review_status until done=true, then save the completed threadId for Round 2.
Skip if HUMAN_CHECKPOINT = false.
Present the review results and wait for user input:
📋 Round 1 review complete.
Score: X/10 — [verdict]
Key weaknesses (by severity):
1. [CRITICAL] ...
2. [MAJOR] ...
3. [MINOR] ...
Reply "go" to implement all fixes, give custom instructions, "skip 2" to skip specific fixes, or "stop" to end.
Parse user response same as /auto-review-loop: approve / custom instructions / skip / stop.
Parse the review and implement fixes by severity:
Priority order:
Common fix patterns:
| Issue | Fix Pattern |
|-------|-------------|
| Assumption-model mismatch | Rewrite assumption to match the model, add formal proposition bridging the gap |
| Overclaims | Soften language: "validate" → "demonstrate practical relevance", "comparable" → "qualitatively competitive" |
| Missing metrics | Add quantitative table with honest parameter counts and caveats |
| Theorem not self-contained | Add "Interpretation" paragraph listing all dependencies |
| Notation confusion | Rename conflicting symbols globally, add Notation paragraph |
| Missing references | Add to references.bib, cite in appropriate locations |
| Theory-practice gap | Explicitly frame theory as idealized; add synthetic validation subsection |
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round1.pdf
Verify: 0 undefined references, 0 undefined citations.
Use mcp__claude-review__review_reply_start with the saved completed threadId:
mcp__claude-review__review_reply_start:
threadId: [saved from Round 1]
prompt: |
[Round 2 update]
Since your last review, we have implemented:
1. [Fix 1]: [description]
2. [Fix 2]: [description]
...
Please re-score and re-assess. Same format:
Score, Summary, Strengths, Weaknesses, Actionable fixes, Verdict.
After this start call, immediately save the returned jobId and poll mcp__claude-review__review_status with a bounded waitSeconds until done=true. Treat the completed status payload's response as the reviewer output, and save the completed threadId for any follow-up round.
Skip if HUMAN_CHECKPOINT = false. Same as Step 2b — present Round 2 review, wait for user input.
Same process as Step 3. Typical Round 2 fixes:
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round2.pdf
After the final recompilation, run a format compliance check:
# 1. Page count vs venue limit
PAGES=$(pdfinfo paper/main.pdf | grep Pages | awk '{print $2}')
echo "Pages: $PAGES (limit: 9 main body for ICLR/NeurIPS)"
# 2. Overfull hbox warnings (content exceeding margins)
OVERFULL=$(grep -c "Overfull" paper/main.log 2>/dev/null || echo 0)
echo "Overfull hbox warnings: $OVERFULL"
grep "Overfull" paper/main.log 2>/dev/null | head -10
# 3. Underfull hbox warnings (loose spacing)
UNDERFULL=$(grep -c "Underfull" paper/main.log 2>/dev/null || echo 0)
echo "Underfull hbox warnings: $UNDERFULL"
# 4. Bad boxes summary
grep -c "badness" paper/main.log 2>/dev/null || echo "0 badness warnings"
Auto-fix patterns:
| Issue | Fix |
|-------|-----|
| Overfull hbox in equation | Wrap in \resizebox or split with \split/aligned |
| Overfull hbox in table | Reduce font (\small/\footnotesize) or use \resizebox{\linewidth}{!}{...} |
| Overfull hbox in text | Rephrase sentence or add \allowbreak / \- hints |
| Over page limit | Move content to appendix, compress tables, reduce figure sizes |
| Underfull hbox (loose) | Rephrase for better line filling or add \looseness=-1 |
If any overfull hbox > 10pt is found, fix it and recompile before documenting.
Create PAPER_IMPROVEMENT_LOG.md in the paper directory:
# Paper Improvement Log
## Score Progression
| Round | Score | Verdict | Key Changes |
|-------|-------|---------|-------------|
| Round 0 (original) | X/10 | No/Almost/Yes | Baseline |
| Round 1 | Y/10 | No/Almost/Yes | [summary of fixes] |
| Round 2 | Z/10 | No/Almost/Yes | [summary of fixes] |
## Round 1 Review & Fixes
<details>
<summary>Claude review Review (Round 1)</summary>
[Full raw review text, verbatim]
</details>
### Fixes Implemented
1. [Fix description]
2. [Fix description]
...
## Round 2 Review & Fixes
<details>
<summary>Claude review Review (Round 2)</summary>
[Full raw review text, verbatim]
</details>
### Fixes Implemented
1. [Fix description]
2. [Fix description]
...
## PDFs
- `main_round0_original.pdf` — Original generated paper
- `main_round1.pdf` — After Round 1 fixes
- `main_round2.pdf` — Final version after Round 2 fixes
Report to user:
After each round's review AND at final completion, check ~/.codex/feishu.json:
review_scored — "Round N: X/10 — [key changes]"pipeline_done — score progression table + final page count"off": skip entirely (no-op)paper/
├── main_round0_original.pdf # Original
├── main_round1.pdf # After Round 1
├── main_round2.pdf # After Round 2 (final)
├── main.pdf # = main_round2.pdf
└── PAPER_IMPROVEMENT_LOG.md # Full review log with scores
Follow these shared protocols for all output files:
- Output Versioning Protocol — write timestamped file first, then copy to fixed name
- Output Manifest Protocol — log every output to MANIFEST.md
- Output Language Protocol — respect the project's language setting
Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Preserve all PDF versions — user needs to compare progression
Save FULL raw review text — do not summarize or truncate Claude reviewer responses
Use mcp__claude-review__review_reply_start plus mcp__claude-review__review_status for Round 2 to maintain conversation context
Always recompile after fixes — verify 0 errors before proceeding
Do not fabricate experimental results — synthetic validation must describe methodology, not invent numbers
Respect the paper's claims — soften overclaims rather than adding unsupported new claims
Global consistency — when renaming notation or softening claims, check ALL files (abstract, intro, method, experiments, theory sections, conclusion, tables, figure captions)
Based on end-to-end testing on a real theory-paper run:
| Round | Score | Key Improvements | |-------|-------|-----------------| | Round 0 | 4/10 (content) | Baseline: assumption-model mismatch, overclaims, notation issues | | Round 1 | 6/10 (content) | Fixed assumptions, softened claims, added interpretation, renamed notation | | Round 2 | 7/10 (content) | Added synthetic validation, formal truncation proposition, stronger limitations | | Round 3 | 5→8.5/10 (format) | Removed hero fig, appendix, compressed conclusion, fixed overfull hbox |
+4.5 points across 3 rounds (2 content + 1 format) is typical for a well-structured but rough first draft. Final state at submission: clean overfull-hbox count and venue-format-compliant length.
research
Generate a structured paper outline from review conclusions and experiment results. Use when user says \"写大纲\", \"paper outline\", \"plan the paper\", \"论文规划\", or wants to create a paper plan before writing.
research
Generate a structured paper outline from review conclusions and experiment results. Use when user says "写大纲", "paper outline", "plan the paper", "论文规划", or wants to create a paper plan before writing.
development
Get a deep critical review of research from an external reviewer backend (Codex or manual). Use when user says "review my research", "help me review", "get external review", or wants critical feedback on research ideas, papers, or experimental results.
research
Turn a vague research direction into a problem-anchored, elegant, frontier-aware, implementation-oriented method plan via iterative GPT-5.5 review. Use when the user says "refine my approach", "帮我细化方案", "decompose this problem", "打磨idea", "refine research plan", "细化研究方案", or wants a concrete research method that stays simple, focused, and top-venue ready instead of a vague or overbuilt idea.