skills/data-visualization/statistical-significance-annotation/SKILL.md
Guide for annotating statistical significance (p-value asterisks) on comparison plots. Covers standard notation (ns, *, **, ***, ****), matplotlib bracket+asterisk implementation, and use with seaborn box/violin/bar plots. Use when preparing publication-ready figures with significance markers.
npx skillsauth add jaechang-hits/sciagent-skills statistical-significance-annotationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Statistical significance annotations (asterisk notation) are visual markers placed on comparison plots to indicate the results of hypothesis tests between groups. They consist of brackets connecting two groups and asterisk symbols denoting the p-value range. Proper annotation ensures that the visual claims in a figure match the quantitative evidence, making plots publication-ready and scientifically rigorous. This guide covers the standard conventions, when and how to annotate, and a reusable matplotlib implementation.
The widely adopted convention maps p-value ranges to asterisk symbols:
| Symbol | P-value Range | Meaning | |--------|--------------|---------| | ns | p > 0.05 | Not significant | | * | p <= 0.05 | Significant | | ** | p <= 0.01 | Highly significant | | *** | p <= 0.001 | Very highly significant | | **** | p <= 0.0001 | Extremely significant |
The conversion function:
def pvalue_to_asterisk(p: float) -> str:
"""Convert a p-value to standard asterisk notation."""
if p <= 0.0001:
return "****"
elif p <= 0.001:
return "***"
elif p <= 0.01:
return "**"
elif p <= 0.05:
return "*"
else:
return "ns"
padj, ANOVA post-hoc): Use the adjusted values already provided.Not every pair of groups needs annotation. Select comparisons that:
Does the plot compare groups?
├── No (scatter, heatmap, PCA, line trend) → Do NOT annotate
└── Yes (box, violin, bar, strip)
├── Does the analysis claim significance? → Annotate the claimed comparisons
├── Exploratory (no specific claim) → Annotate vs control only, or skip
└── Too many groups (>6 pairwise) → Annotate key comparisons only
| Scenario | Annotate? | Which pairs | |----------|-----------|-------------| | DEG box plot: treatment vs control | Yes | Treatment vs Control | | Multi-group ANOVA with post-hoc | Yes | Significant post-hoc pairs only | | Gene expression across 10 cell types | Selectively | vs reference cell type only | | PCA or UMAP | No | N/A | | Heatmap or volcano plot | No | N/A | | Correlation scatter | No | Report r and p in text/legend | | Exploratory bar plot, no hypothesis | Optional | vs control if applicable |
fontweight='bold' on figure titles for publication readiness.Annotating all pairwise comparisons in a multi-group plot
Using raw p-values when multiple comparisons were performed
statsmodels.stats.multitest.multipletests(pvals, method='fdr_bh') or use adjusted p-values from upstream tools (DESeq2 padj).Bracket overlap with data or other brackets
Asterisks without stating which test was used
Inconsistent notation across figures
pvalue_to_asterisk() function throughout the analysis. Define it once and reuse.Annotating "ns" on every non-significant pair
Placing annotations below the data
Run the appropriate test and collect p-values before plotting:
from scipy import stats
# Two-group comparison
stat, pval = stats.mannwhitneyu(group_a, group_b, alternative='two-sided')
# or for normal data:
stat, pval = stats.ttest_ind(group_a, group_b)
# Multi-group: ANOVA + post-hoc
from scipy.stats import f_oneway
stat, pval_anova = f_oneway(group_a, group_b, group_c)
# Post-hoc pairwise (if ANOVA significant)
from itertools import combinations
from statsmodels.stats.multitest import multipletests
pairs = list(combinations(["A", "B", "C"], 2))
groups = {"A": group_a, "B": group_b, "C": group_c}
raw_pvals = []
for g1, g2 in pairs:
_, p = stats.mannwhitneyu(groups[g1], groups[g2], alternative='two-sided')
raw_pvals.append(p)
# Adjust for multiple comparisons
rejected, adj_pvals, _, _ = multipletests(raw_pvals, method='fdr_bh')
Use this helper function to draw brackets with asterisks on any matplotlib axes:
def add_significance_bracket(ax, x1, x2, y, p_value, dh=0.02, barh=0.015, fontsize=11):
"""Draw a significance bracket with asterisk notation between two x positions.
Args:
ax: matplotlib Axes object.
x1, x2: x-axis positions of the two groups (0-indexed).
y: y-coordinate for the bracket (top of bracket line).
p_value: p-value for the comparison.
dh: vertical offset above bracket for the text (in axes fraction).
barh: height of the bracket tips (in axes fraction).
fontsize: font size for the asterisk text.
"""
asterisk = pvalue_to_asterisk(p_value)
# Draw bracket: two tips and a connecting line
ax.plot([x1, x1, x2, x2], [y - barh, y, y, y - barh],
lw=1.2, color='black')
# Place asterisk text centered above the bracket
ax.text((x1 + x2) / 2, y + dh, asterisk,
ha='center', va='bottom', fontsize=fontsize, fontweight='bold')
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Example: box plot with significance annotation
fig, ax = plt.subplots(figsize=(6, 5))
sns.boxplot(data=df, x="group", y="value", ax=ax, palette="Set2")
sns.stripplot(data=df, x="group", y="value", ax=ax,
color="black", alpha=0.4, size=3, jitter=True)
# Determine bracket y-position from data
y_max = df["value"].max()
y_range = df["value"].max() - df["value"].min()
offset = y_range * 0.08 # spacing between brackets
# Add brackets for each significant comparison
# pairs_with_pvals: list of (group1_idx, group2_idx, p_value)
pairs_with_pvals = [(0, 1, 0.003), (0, 2, 0.042)]
for i, (x1, x2, pval) in enumerate(pairs_with_pvals):
bracket_y = y_max + offset * (i + 1)
add_significance_bracket(ax, x1, x2, bracket_y, pval)
ax.set_title("Gene Expression by Treatment", fontweight='bold', fontsize=14)
ax.set_ylabel("Expression (log2 CPM)")
# Extend y-axis to fit brackets
ax.set_ylim(top=y_max + offset * (len(pairs_with_pvals) + 1.5))
plt.tight_layout()
plt.savefig("expression_comparison.png", dpi=150, bbox_inches='tight')
For bar plots with error bars, position brackets above the error bars:
fig, ax = plt.subplots(figsize=(7, 5))
bar_plot = sns.barplot(data=df, x="gene", y="fold_change", hue="condition",
ax=ax, palette="Set2", ci="sd", capsize=0.05)
# For grouped bars, calculate x positions manually
# Each gene has multiple bars offset by group
n_groups = df["condition"].nunique()
n_genes = df["gene"].nunique()
bar_width = 0.8 / n_groups
for gene_idx in range(n_genes):
# x positions of the two bars within this gene group
x1 = gene_idx - bar_width / 2
x2 = gene_idx + bar_width / 2
# Get the max value + error for this gene
gene_data = df[df["gene"] == df["gene"].unique()[gene_idx]]
y_top = gene_data["fold_change"].mean() + gene_data["fold_change"].std()
p_val = pvals_per_gene[gene_idx] # pre-computed
if p_val <= 0.05: # only annotate significant results
add_significance_bracket(ax, x1, x2, y_top + 0.1, p_val)
ax.set_title("Fold Change by Condition", fontweight='bold', fontsize=14)
plt.tight_layout()
plt.savefig("fold_change_comparison.png", dpi=150, bbox_inches='tight')
add_significance_bracket function and pvalue_to_asterisk conversion across all figures in an analysis.ax.set_ylim(top=...) or ax.margins(y=0.15).padj (adjusted p-value) directly. Do not re-test the raw counts.seaborn-statistical-plots — Seaborn plotting fundamentals; use this guide's annotation workflow on top of seaborn figuresscientific-visualization — General scientific figure design principlestools
Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.
tools
smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock-blind-docking; for the Python-bindings/Vinardo workflow use autodock-vina-docking.
development
mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.
development
Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.