skill/SKILL.md
Generate publication-quality academic diagrams from paper methodology text
npx skillsauth add dwzhu-pku/paperbanana paperbananaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate publication-quality academic diagrams and pipeline figures from a paper's methodology section and figure caption. PaperBanana orchestrates a multi-agent pipeline (Retriever, Planner, Stylist, Visualizer, Critic) to produce camera-ready figures suitable for venues like NeurIPS, ICML, and ACL.
cd <repo-root>
uv pip install -r requirements.txt
Set your API key via environment variable or in configs/model_config.yaml.
Option 1 (Recommended): OpenRouter API key — one key for both text reasoning and image generation:
export OPENROUTER_API_KEY="sk-or-v1-..."
Option 2: Google API key — direct access to Gemini API:
export GOOGLE_API_KEY="your-key-here"
If both keys are configured, OpenRouter is used by default.
python skill/run.py \
--content "METHOD_TEXT" \
--caption "FIGURE_CAPTION" \
--task diagram \
--output output.png
| Parameter | Required | Default | Description |
|-----------|----------|---------|-------------|
| --content | Yes* | | Method section text to visualize |
| --content-file | Yes* | | Path to a file containing the method text (alternative to --content) |
| --caption | Yes | | Figure caption or visual intent |
| --task | No | diagram | Task type: diagram |
| --output | No | output.png | Output image file path |
| --aspect-ratio | No | 21:9 | Aspect ratio: 21:9, 16:9, or 3:2 |
| --max-critic-rounds | No | 3 | Maximum critic refinement iterations |
| --num-candidates | No | 10 | Number of parallel candidates to generate |
| --retrieval-setting | No | auto | Retrieval mode: auto, manual, random, or none |
| --main-model-name | No | gemini-3.1-pro-preview | Main model for VLM agents. Provider auto-detected from configured API key |
| --image-gen-model-name | No | gemini-3.1-flash-image-preview | Model for image generation. Also supports gemini-3-pro-image-preview |
| --exp-mode | No | demo_full | Pipeline: demo_full (with Stylist) or demo_planner_critic (without Stylist) |
*One of --content or --content-file is required.
When --num-candidates > 1, output files are named <stem>_0.png, <stem>_1.png, etc.
The absolute path of each saved image is printed to stdout, one per line.
python skill/run.py \
--content "We propose a transformer-based encoder-decoder architecture. The encoder consists of 12 self-attention layers with residual connections. The decoder uses cross-attention to attend to encoder outputs and generates the target sequence autoregressively." \
--caption "Figure 1: Overview of the proposed transformer architecture" \
--task diagram \
--output architecture.png
PaperBanana is based on the PaperVizAgent framework, a reference-driven multi-agent system for automated academic illustration. It was developed as part of the research paper:
PaperBanana: Automating Academic Illustration for AI Scientists Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon arXiv:2601.23265
The framework introduces a collaborative team of five specialized agents — Retriever, Planner, Stylist, Visualizer, and Critic — to transform raw scientific content into publication-quality diagrams. Evaluation is conducted on the PaperBananaBench benchmark.
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".
testing
Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).