skills/figure-agent/SKILL.md
Publication-quality figure generation for research papers. Decision agent selects figure type (code plot vs architecture diagram). Generates Matplotlib/Seaborn code for quantitative figures with iterative improvement loop. Style-matches conference templates (NeurIPS, ICML, ICLR). Use when the paper-pipeline reaches the figure generation phase, or when a user requests figures for an existing draft.
npx skillsauth add moralespanitz/research-loop figure-agentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Decision-driven figure generation for conference papers. Analyzes the paper content to determine figure types, generates Matplotlib/Seaborn code for quantitative figures, and iterates on quality via critic feedback.
When generating figures for a paper, first analyze the content to determine the appropriate figure type:
Use when the figure shows actual experimental results, metrics, or data:
| Figure Type | Best For | Matplotlib Function |
|-------------|----------|-------------------|
| Bar chart | Comparing discrete conditions, ablations | plt.bar() |
| Line plot | Convergence curves, training trajectories | plt.plot() |
| Scatter plot | Correlation analysis, parameter sensitivity | plt.scatter() |
| Heatmap | Ablation matrices, hyperparameter grids | plt.imshow() or sns.heatmap() |
| Box plot / Violin | Distribution of results across seeds | plt.boxplot() or sns.violinplot() |
| Histogram | Result distributions | plt.hist() |
| Stacked bar | Comparison across multiple metrics | plt.bar(stacked=True) |
| Grouped bar | Multiple methods across conditions | plt.bar() with offset positions |
Use when the figure shows the proposed method, system architecture, or conceptual framework. These should be drawn with Excalidraw or a similar diagramming tool. For code-generated alternatives, use Matplotlib patches.
Is the figure based on experimental data?
├── YES → Is it a comparison (method A vs B) or relationship (parameter vs metric)?
│ ├── Bar/Line/Scatter/Box → Generate Matplotlib code
│ └── Architecture/Flowchart → Generate diagram code
└── NO → It must be conceptual or illustrative
└── Architecture/Flowchart → Generate diagram code
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import json
# === LOAD DATA ===
with open("results.json") as f:
data = json.load(f)
# === STYLE SETUP ===
plt.rcParams.update({
"figure.figsize": (5.0, 3.5), # NeurIPS column width
"font.family": "serif",
"font.size": 10,
"axes.titlesize": 11,
"axes.labelsize": 10,
"xtick.labelsize": 9,
"ytick.labelsize": 9,
"legend.fontsize": 9,
"lines.linewidth": 1.5,
"axes.linewidth": 0.8,
"grid.alpha": 0.3,
"savefig.dpi": 300,
"savefig.bbox": "tight",
})
# === PLOT COLOURS (Color-blind friendly) ===
CB_COLORS = ["#0072B2", "#E69F00", "#009E73", "#CC79A7", "#56B4E9", "#F0E442"]
# === FIGURE ===
fig, ax = plt.subplots(1, 1)
# ... plotting code ...
# === STYLE ===
ax.set_xlabel("X Label")
ax.set_ylabel("Y Label")
ax.set_title("Title")
ax.legend()
ax.grid(True, alpha=0.3)
# === SAVE ===
plt.tight_layout()
plt.savefig("figure_1.pdf", dpi=300)
plt.savefig("figure_1.png", dpi=300)
plt.close()
print("FIGURE_SAVED: figure_1.pdf")
plt.rcParams.update({
"figure.figsize": (5.0, 3.5), # Single column
"font.family": "serif",
"font.size": 10,
})
plt.rcParams.update({
"figure.figsize": (5.5, 3.5), # Slightly wider
"font.family": "serif",
"font.size": 9,
})
plt.rcParams.update({
"figure.figsize": (5.0, 3.5),
"font.family": "serif",
"font.size": 10,
})
Use color-blind friendly palettes by default:
# Qualitative (for categories)
CB_COLORS = ["#0072B2", "#E69F00", "#009E73", "#CC79A7",
"#56B4E9", "#F0E442", "#D55E00", "#000000"]
# Sequential (for heatmaps)
CB_SEQUENTIAL = ["#FFFFFF", "#F0E442", "#D55E00", "#000000"]
# Diverging (for difference plots)
CB_DIVERGING = ["#0072B2", "#FFFFFF", "#D55E00"]
For each figure, run a 3-step improvement loop:
Generate the initial figure code from the data and figure specification.
Analyze the generated figure:
Apply critic feedback. Iterate up to 3 times or until the critic judges the figure as "publishable."
Each figure must be specified with:
figure:
id: 1
type: "bar" # bar | line | scatter | heatmap | box | histogram
caption: "Comparison of convergence rates across optimizers. SGD-Adam achieves 15% faster convergence than baselines."
data_source: "results.json" # or inline data
x: "optimizer" # column or variable name
y: "convergence_time" # column or variable name
hue: "dataset" # optional grouping variable
style: "NeurIPS" # NeurIPS | ICML | ICLR
width: 5.0 # inches
height: 3.5
color_scheme: "colorblind"
Figures are saved to sessions/<slug>/figures/:
figures/
├── figure_1.pdf # Vector format for paper
├── figure_1.png # Raster format for preview
├── figure_1_caption.md # Caption text
├── figure_1_code.py # Regeneratable code
├── figure_2.pdf
├── figure_2.png
├── figure_2_caption.md
└── figure_2_code.py
testing
Plan and execute a structured replication workflow for a paper, claim, or benchmark with environment selection and integrity checks.
testing
End-to-end paper generation pipeline ported from AutoResearchClaw (Aiming Lab). 14 phases covering topic initiation through export/publish, with human- in-the-loop gates and quality gating at each handoff. Use this when the user wants a full paper pipeline run — topic to submission-ready manuscript. Delegates to researcher/reviewer/writer/verifier subagents for stage execution and to autonomous-iteration for experiment optimization loops.
testing
Run a structured literature review on a topic using parallel search, evidence tables with quality scoring, and primary-source synthesis.
development
Experiment sandbox execution for Research Loop. Supports four modes: local (venv), Docker (isolated containers), SSH remote (GPU compute on servers), and Colab (Google Drive bridge). Provides experiment harness templates, code validation, metric collection, deterministic seeding, and compute budget enforcement. Use before running experiments generated by the paper-pipeline.