skills/data-visualization/scientific-visualization/SKILL.md
Guide for choosing and creating scientific visualizations for publications and talks. Covers chart-type selection by data structure, color theory for accessibility/print, figure composition, journal formatting (Nature, Cell, ACS), and common pitfalls. Consult when visualizing data or preparing submission figures.
npx skillsauth add jaechang-hits/scicraft scientific-visualizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Effective scientific visualization communicates data clearly, honestly, and accessibly. Poor chart choices, misleading axes, or inaccessible color palettes can obscure findings or introduce bias. This guide covers the full workflow of scientific figure preparation: from selecting the right chart type for your data structure through color theory, accessibility, and journal submission formatting requirements.
Every chart type is optimized for a specific data structure. Mismatches (e.g., pie charts for continuous distributions, bar charts for time series) hide structure and distort perception.
| Data Type | Recommended Chart | Avoid | |-----------|------------------|-------| | Continuous distribution (1 group) | Histogram, violin plot, ridge plot | Bar chart with mean only | | Continuous distribution (2–5 groups) | Violin + boxplot overlay, beeswarm | Grouped bar chart | | Two continuous variables, correlation | Scatter plot, hexbin (large N) | Line chart without temporal order | | Categorical counts / proportions | Bar chart (horizontal for long labels) | Pie chart (>4 categories) | | Change over time (continuous) | Line chart | Bar chart | | Change over time (sparse events) | Step chart, event raster | Connected scatter | | Part-to-whole (≤5 parts) | Stacked bar, waffle chart | 3D pie chart | | High-dimensional (>5 variables) | Heatmap (clustered), parallel coordinates | 3D scatter | | Spatial data | Map, spatial heatmap | Bubble chart | | Survival / time-to-event | Kaplan-Meier curve | Bar chart of median survival |
Color encodes information. Misused color introduces artifacts and fails readers with color vision deficiency (CVD; ~8% of males).
Sequential palettes encode ordered numeric data from low to high (e.g., expression level, concentration). Use perceptually uniform palettes: viridis, magma, cividis. These also print in grayscale.
Diverging palettes encode data with a meaningful midpoint (e.g., fold-change centered at 0, correlation from -1 to +1). Use RdBu, coolwarm, or vlag. Always ensure the midpoint maps to white/neutral.
Qualitative palettes encode unordered categories. Use Okabe-Ito (CVD-safe), tab10 (matplotlib default), or ColorBrewer qualitative palettes. Limit to ≤8 distinguishable colors; use shape or pattern as redundant encoding beyond that.
Color don'ts:
Scientific figures are typically multi-panel. Panel layout and labeling affect how readers parse information.
Major journals specify exact figure requirements for submission. Violating these causes desk-rejection delays.
| Journal/Style | Max Width | Resolution | Color Mode | Font | File Format | |---------------|-----------|------------|------------|------|-------------| | Nature family | 89 mm (1-col), 183 mm (2-col) | 300 dpi (photos), 600 dpi (line art) | RGB or CMYK | Arial 5–7 pt | PDF, TIFF, EPS | | Cell/iScience | 85 mm (1-col), 170 mm (2-col) | 300 dpi raster, 600 dpi halftone | RGB | Helvetica 6–8 pt | PDF, EPS, TIFF | | ACS journals | 3.25 in (1-col), 7 in (2-col) | 600 dpi (color), 1200 dpi (b&w line art) | RGB (screen), CMYK (print) | Arial/Helvetica 4.5–7 pt | TIFF, EPS, PDF | | PLOS ONE | No strict width | 300 dpi (raster), 600–1200 dpi (line art) | RGB | Any | TIFF, EPS, PDF |
Use this tree to select the right visualization for your analysis goal:
What is the primary message of this figure?
|
+-- Show a distribution or spread of values
| +-- One group --> Histogram or violin plot
| +-- 2-5 groups --> Violin + jitter (show all points if N < 100)
| +-- Many groups --> Ridge plot (joy plot)
|
+-- Compare quantities between categories
| +-- Few categories (2-5) --> Bar chart with error bars + individual points
| +-- Many categories (>8) --> Lollipop chart or dot plot (horizontal)
| +-- Paired measurements --> Slopegraph or paired dot plot
|
+-- Show a relationship between two continuous variables
| +-- N < 1000 --> Scatter plot
| +-- N > 1000 --> Hexbin or 2D density plot
| +-- Time ordered --> Line chart
|
+-- Show composition or part-to-whole
| +-- 2-4 parts --> Stacked bar or waffle chart
| +-- Over time --> Stacked area chart
| +-- Avoid pie chart unless <= 3 parts and proportions are obvious
|
+-- Show high-dimensional data
| +-- Genes x samples --> Clustered heatmap (seaborn.clustermap)
| +-- Embeddings (UMAP, PCA) --> Scatter colored by metadata
| +-- Feature importance --> Horizontal bar chart (sorted)
|
+-- Show spatial or geographic data
| +-- Microscopy --> Image overlay with colorbar
| +-- Geographic --> Choropleth map
| Analysis Goal | Chart Type | Library | Key Consideration |
|---------------|-----------|---------|-------------------|
| Gene expression across groups | Violin + jitter | seaborn, plotnine | Show all points if N < 50; never bar+SEM only |
| Differential expression | Volcano plot | matplotlib | Log2FC on x-axis, -log10(p) on y-axis |
| Clustering results | UMAP scatter | scanpy, matplotlib | One plot per annotation variable |
| Correlation matrix | Clustered heatmap | seaborn.clustermap | Use diverging palette centered at 0 |
| Protein structure | Ribbon diagram | PyMOL, ChimeraX | Not covered here — use dedicated molecular graphics tools |
| Survival analysis | Kaplan-Meier | lifelines | Include confidence bands and at-risk table |
| Time course | Line chart with CI | matplotlib | Show uncertainty; connect group means, not individual points |
Show the data, not just summaries: For N < 100, overlay individual data points on violin or box plots using jitter or beeswarm. Bar charts with only mean ± SEM conceal distribution shape, outliers, and bimodality.
Choose CVD-safe color palettes by default: Use Okabe-Ito or viridis/cividis for sequential data. Test your figure with a CVD simulator (e.g., Coblis) before submission.
Design at final publication size from the start: Set your figure canvas to the exact column width of the target journal (e.g., 89 mm for Nature single-column). Rescaling after the fact makes fonts too small or too large, and changes aspect ratios.
Label axes with units and use descriptive titles: Every axis must have a label with units in parentheses (e.g., "Expression level (log2 CPM)"). Avoid cryptic abbreviations without legend entries.
Use vector formats for line art and text: Save figures as PDF or SVG when they contain text and lines. Rasterize only when submitting to a journal that requires TIFF. Vector figures scale without pixelation and remain editable.
Match statistical annotations to the test performed: If you annotate significance stars (*), state in the caption which test was used, the exact p-value, and the sample size. "n.s." should still report the p-value.
Avoid dual y-axes: Two different y-axes on one plot are almost always misleading — the apparent relationship depends on scale choices. Use two separate panels instead.
Bar chart for continuous distributions (dynamite plot)
Truncated or broken y-axis that exaggerates differences
Using rainbow/jet colormap for heatmaps
viridis, magma, or inferno for sequential; RdBu or coolwarm for diverging data. These are the defaults in seaborn >= 0.12.Overlapping data points without jitter or transparency
seaborn.stripplot(jitter=True)), use transparency (alpha=0.3), or switch to a hexbin / 2D density plot for large N.P-value annotations without effect size or sample size
Figure text too small at publication size
Inconsistent style across panels in a multi-panel figure
matplotlib.rcParams style dictionary at the top of your figure script and apply it to all panels.Define the message first
Prepare the data
Prototype at screen resolution
Apply journal-specific styling
Add annotations and labels
Export at correct resolution
plt.savefig("fig1.pdf", bbox_inches="tight") in matplotlib.Accessibility check
rcParams at the top of every figure script; use fig.set_size_inches() to enforce journal dimensions; export with dpi=300 minimum.theme_classic() or a custom theme; set ggsave(width=..., units="mm", dpi=300).matplotlib.gridspec, patchworklib (Python), or cowplot/patchwork (R) for aligned panel grids.matplotlib-figures — Python implementation of publication-quality figures with matplotlib and seaborndata-visualization — general Python plotting recipesbiostatistics — statistical test selection to accompany figure annotationstools
Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.
tools
smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock-blind-docking; for the Python-bindings/Vinardo workflow use autodock-vina-docking.
development
mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.
development
Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.