Version Compatibility

Reference examples tested with: ggseqlogo 0.2 (CRAN; per Wagih 2017), Logomaker 0.8+ (Python), WebLogo 3.7+ (CLI), Biopython 1.83+ (motif parsing), MEME suite 5.5+.

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Sequence Logos

"Plot a sequence motif" -> Render a per-position stack of letters whose total height encodes information content (Schneider-Stephens 1990 Nucleic Acids Res 18:6097) and individual letter height is proportional to base/aa frequency. The information-content encoding makes conserved positions visually tall and variable positions visually short — the visual is the conservation profile.

R: ggseqlogo::ggseqlogo (Wagih 2017 Bioinformatics 33:3645)
Python: logomaker.Logo
CLI: weblogo (Crooks 2004 Genome Res 14:1188)

The Single Most Important Modern Insight -- Bits vs Probability Are Different Visualizations

A sequence logo can encode each position as bits (information content) or probability (raw frequency). They look superficially similar; they communicate different things.

Bits (Schneider-Stephens 1990): position height = R = log2(K) − H(p) where K=4 for DNA, H is Shannon entropy. Maximum 2 bits for DNA, 4.3 bits for protein. A fully conserved position is 2 bits; a uniform position is 0. This is the canonical motif encoding.
Probability: position height = 1.0; letter height = frequency. Every position has the same total height. Cannot distinguish "conserved A" from "variable" — both can show 100% A at a position.
EDLogo (enrichment-depletion): Dey et al. 2018 — uses log-odds of observed vs background, supporting depleted-residue display.

Default to bits unless a specific reason exists otherwise. Bits is what reviewers expect to see for a TF binding site, splice site, or CRISPR spacer composition.

Decision Tree by Use Case

| Use case | Encoding | Background | Tool | |----------|----------|------------|------| | TF binding motif (JASPAR/CIS-BP PWM) | bits | uniform OR genome composition | ggseqlogo, Logomaker | | Splice-site motif (5'SS, 3'SS) | bits | uniform | ggseqlogo | | CRISPR sgRNA position-composition | probability | – | logomaker (custom alphabet) | | Protein motif (kinase substrate) | bits | proteome composition | Logomaker (matrix_type='counts') | | Alignment-conservation cartoon | bits OR probability | depends on intent | WebLogo | | Differential motif (TF-A vs TF-B) | EDLogo log-odds | TF-B | Logomaker (matrix_type='weight') |

ggseqlogo (R) -- Canonical Bioinformatics Default

Goal: Render a sequence motif as a per-position letter stack whose total height encodes information content (Schneider-Stephens 1990) and individual letter heights reflect frequency, optionally corrected for genome background.

Approach: Pass a PWM matrix (rows = letters, columns = positions) or vector of aligned same-length sequences to ggseqlogo() with method = 'bits' and explicit bg_freq for the relevant genome composition; stack multiple motifs as a named list.

library(ggseqlogo)

# Input: PWM matrix (rows = positions, columns = nucleotides A/C/G/T)
# or aligned sequence vector

# From a vector of aligned sequences (same length)
seqs <- c('ATGCAA', 'ATGCAC', 'ATGCAG', 'ATGCAT', 'ACGCAA')
ggseqlogo(seqs, method = 'bits')

# From a PWM matrix (probability or counts)
pwm <- matrix(c(0.7, 0.1, 0.1, 0.1,
                0.1, 0.7, 0.1, 0.1,
                0.4, 0.1, 0.4, 0.1), ncol = 3,
              dimnames = list(c('A', 'C', 'G', 'T'), NULL))
ggseqlogo(pwm, method = 'bits')           # 'bits' OR 'probability'

# Multiple logos stacked (e.g., compare TF-A and TF-B)
ggseqlogo(list(TFA = seqs_a, TFB = seqs_b),
          method = 'bits',
          col_scheme = 'nucleotide')

# Custom color scheme (protein motif, kinase substrate)
ggseqlogo(protein_pwm,
          method = 'bits',
          seq_type = 'aa',                      # auto-detected usually
          col_scheme = make_col_scheme(
              chars = c('S','T','Y','K','R','H','D','E','A','V','L','I','M'),
              cols  = c('#D55E00','#D55E00','#D55E00',          # phospho-acceptors
                        '#0072B2','#0072B2','#0072B2',          # basic
                        '#CC79A7','#CC79A7',                    # acidic
                        '#009E73','#009E73','#009E73','#009E73','#009E73')))  # hydrophobic

Logomaker (Python) -- Most Flexible

import logomaker
import pandas as pd

# Counts matrix (rows = position, columns = ACGT)
counts_df = pd.DataFrame({'A': [10, 0, 5, 8],
                          'C': [0, 8, 5, 1],
                          'G': [0, 2, 0, 0],
                          'T': [0, 0, 0, 1]})

# Convert counts -> information (bits)
ic_df = logomaker.transform_matrix(counts_df,
                                    from_type='counts',
                                    to_type='information',
                                    background=[0.25] * 4)    # uniform; pass real background for corrected IC

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 2))
logo = logomaker.Logo(ic_df,
                      color_scheme='classic',           # 'NajafabadiEtAl2017' for protein
                      shade_below=0.5,
                      fade_below=0.5,
                      font_name='Arial Rounded MT Bold')
logo.style_xticks(rotation=0)
logo.ax.set_ylabel('Bits')

# Weight matrix (signed) -- enrichment vs depletion
weight_df = logomaker.transform_matrix(counts_df,
                                        from_type='counts',
                                        to_type='weight',
                                        background=genome_composition)
logo = logomaker.Logo(weight_df, color_scheme='classic',
                       flip_below=True)                  # depleted letters below axis

WebLogo (CLI / web)

weblogo --format pdf --sequence-type dna \
        --color-scheme classic --units bits \
        --composition equiprobable \
        --fineprint '' \
        --size large \
        < aligned.fasta > logo.pdf

WebLogo (Crooks 2004) is the original; supports many formats and is scriptable. For reproducible figures, prefer ggseqlogo or Logomaker (programmatic, easier to integrate with multi-panel figures).

Background Composition Correction

The bits encoding assumes a uniform background by default. For genome-derived motifs, the background should match the genome:

Human genome: A=0.29, C=0.21, G=0.21, T=0.29 (approx)
GC-rich genomes (Streptomyces): A=0.18, C=0.32, G=0.32, T=0.18

Without correction, a motif preferring GC in a genome where GC is rare overestimates information; conversely, an A-rich motif in an AT-rich genome underestimates.

# ggseqlogo: pass `bg_freq`
ggseqlogo(pwm, method = 'bits',
          bg_freq = c(A = 0.29, C = 0.21, G = 0.21, T = 0.29))

logomaker.transform_matrix(counts_df, from_type='counts', to_type='information',
                            background=[0.29, 0.21, 0.21, 0.29])

Per-Method Failure Modes

Probability encoding mistaken for bits

Trigger: Default method = 'probability' in some implementations.

Mechanism: Every position has total height 1; visually flat with all letters same total.

Symptom: Reviewer asks "why doesn't the logo show conservation gradient?"

Fix: Use method = 'bits' for the standard motif encoding.

Background uniform when genome composition matters

Trigger: Uniform bg_freq = c(0.25, 0.25, 0.25, 0.25) for a non-uniform genome.

Mechanism: Information content overestimates conservation for preferred bases.

Symptom: Reported motif looks more conserved than it actually is.

Fix: Pass genome composition to bg_freq / background parameter.

PWM rows/columns reversed

Trigger: Input matrix in samples-as-rows convention; logomaker expects positions-as-rows.

Mechanism: Logo renders the wrong dimension as "position."

Symptom: Logo has letter count = number of input rows, not motif length.

Fix: Transpose the matrix; verify with print(matrix.shape) before plotting.

Custom alphabet not recognized

Trigger: RNA logo with U instead of T; protein logo with J or Z.

Mechanism: ggseqlogo and logomaker auto-detect alphabet from input; unusual characters may fail.

Symptom: Letters render as boxes or missing entirely.

Fix: Explicit seq_type = 'rna' (ggseqlogo) or pass custom color scheme (Logomaker).

Aligned sequences of unequal length

Trigger: Vector of motif instances with different lengths.

Mechanism: Most tools require equal-length input.

Symptom: Error or only first N positions plotted.

Fix: Pre-align (MEME / TOMTOM) or trim to a common length.

Logo for too few input sequences

Trigger: PWM from N=5 sequences plotted as if N=500.

Mechanism: Information content has small-N bias; even random sequences look "conserved" at N=5.

Symptom: Logo appears more meaningful than the input warrants.

Fix: Compute small-sample correction (Schneider 1986; standard in MEME); annotate N in caption; require N ≥ 20 for credible motif.

Stacked logos with different alphabets compared

Trigger: Stacking a DNA logo above a protein logo for visual comparison.

Mechanism: Maximum information content differs (2 bits DNA vs 4.3 bits protein); y-axes are not comparable.

Symptom: Apparent "weaker" protein logo because of higher possible max.

Fix: Normalize both to fractional information (0-1) OR present separately.

Reconciliation: When Logos Differ

| Pattern | Cause | Action | |---------|-------|--------| | ggseqlogo vs Logomaker show different heights | Different default backgrounds (uniform vs explicit) | Standardize background; recompute | | WebLogo vs Logomaker differ at low-conservation positions | Small-sample correction differs | Use consistent N; report sample-corrected IC | | JASPAR vs MEME PWM look different | JASPAR uses observed counts; MEME has Dirichlet prior | Document source; cite version |

Quantitative Thresholds

| Threshold | Value | Source | |-----------|-------|--------| | Max IC per DNA position | 2 bits | Schneider-Stephens 1990 | | Max IC per protein position | 4.32 bits (log2(20)) | Schneider-Stephens 1990 | | Min N for credible motif | ≥ 20 instances; ≥ 100 ideal | Common practice | | Small-sample correction | Schneider 1986 entropy correction | MEME default; ggseqlogo via small-N tools | | TF binding-site length typical | 6-20 bp | Biology |

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Logo flat with all positions = 1 | Probability mode | Switch to bits | | Motif looks too conserved | Uniform bg in non-uniform genome | Pass bg_freq | | Letter count = N samples not motif length | Matrix transposed | Verify shape | | RNA U renders as box | Alphabet not recognized | seq_type = 'rna' | | Logos at different scales overlaid | Different alphabets | Normalize OR separate | | Logo from N=5 looks meaningful | Small-sample bias | Require N>=20; annotate |

References

Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a sequence logo generator. Genome Res 14(6):1188-1190.
Dey KK, Xie D, Stephens M. 2018. A new sequence logo plot to highlight enrichment and depletion. bioRxiv.
Schneider TD. 1986. Information content of binding sites on nucleotide sequences. J Mol Biol 188(3):415-431.
Schneider TD, Stephens RM. 1990. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18(20):6097-6100.
Tareen A, Kinney JB. 2020. Logomaker: beautiful sequence logos in Python. Bioinformatics 36(7):2272-2274.
Wagih O. 2017. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33(22):3645-3647.

Related Skills

chip-seq/motif-analysis - Discover the PWM that becomes the logo
atac-seq/footprinting - Footprinting motifs to visualize
clip-seq/clip-motif-analysis - CLIP-derived motifs
alignment/multiple-alignment - Aligned sequences as logo input
data-visualization/color-palettes - Custom alphabet color schemes

Version Compatibility

Reference examples tested with: ggseqlogo 0.2 (CRAN; per Wagih 2017), Logomaker 0.8+ (Python), WebLogo 3.7+ (CLI), Biopython 1.83+ (motif parsing), MEME suite 5.5+.

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
R: packageVersion('<pkg>') then ?function_name

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Sequence Logos

R: ggseqlogo::ggseqlogo (Wagih 2017 Bioinformatics 33:3645)
Python: logomaker.Logo
CLI: weblogo (Crooks 2004 Genome Res 14:1188)

The Single Most Important Modern Insight -- Bits vs Probability Are Different Visualizations

A sequence logo can encode each position as bits (information content) or probability (raw frequency). They look superficially similar; they communicate different things.

Bits (Schneider-Stephens 1990): position height = R = log2(K) − H(p) where K=4 for DNA, H is Shannon entropy. Maximum 2 bits for DNA, 4.3 bits for protein. A fully conserved position is 2 bits; a uniform position is 0. This is the canonical motif encoding.
Probability: position height = 1.0; letter height = frequency. Every position has the same total height. Cannot distinguish "conserved A" from "variable" — both can show 100% A at a position.
EDLogo (enrichment-depletion): Dey et al. 2018 — uses log-odds of observed vs background, supporting depleted-residue display.

Default to bits unless a specific reason exists otherwise. Bits is what reviewers expect to see for a TF binding site, splice site, or CRISPR spacer composition.

Decision Tree by Use Case

ggseqlogo (R) -- Canonical Bioinformatics Default

library(ggseqlogo)

# Input: PWM matrix (rows = positions, columns = nucleotides A/C/G/T)
# or aligned sequence vector

# From a vector of aligned sequences (same length)
seqs <- c('ATGCAA', 'ATGCAC', 'ATGCAG', 'ATGCAT', 'ACGCAA')
ggseqlogo(seqs, method = 'bits')

# From a PWM matrix (probability or counts)
pwm <- matrix(c(0.7, 0.1, 0.1, 0.1,
                0.1, 0.7, 0.1, 0.1,
                0.4, 0.1, 0.4, 0.1), ncol = 3,
              dimnames = list(c('A', 'C', 'G', 'T'), NULL))
ggseqlogo(pwm, method = 'bits')           # 'bits' OR 'probability'

# Multiple logos stacked (e.g., compare TF-A and TF-B)
ggseqlogo(list(TFA = seqs_a, TFB = seqs_b),
          method = 'bits',
          col_scheme = 'nucleotide')

# Custom color scheme (protein motif, kinase substrate)
ggseqlogo(protein_pwm,
          method = 'bits',
          seq_type = 'aa',                      # auto-detected usually
          col_scheme = make_col_scheme(
              chars = c('S','T','Y','K','R','H','D','E','A','V','L','I','M'),
              cols  = c('#D55E00','#D55E00','#D55E00',          # phospho-acceptors
                        '#0072B2','#0072B2','#0072B2',          # basic
                        '#CC79A7','#CC79A7',                    # acidic
                        '#009E73','#009E73','#009E73','#009E73','#009E73')))  # hydrophobic

Logomaker (Python) -- Most Flexible

import logomaker
import pandas as pd

# Counts matrix (rows = position, columns = ACGT)
counts_df = pd.DataFrame({'A': [10, 0, 5, 8],
                          'C': [0, 8, 5, 1],
                          'G': [0, 2, 0, 0],
                          'T': [0, 0, 0, 1]})

# Convert counts -> information (bits)
ic_df = logomaker.transform_matrix(counts_df,
                                    from_type='counts',
                                    to_type='information',
                                    background=[0.25] * 4)    # uniform; pass real background for corrected IC

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 2))
logo = logomaker.Logo(ic_df,
                      color_scheme='classic',           # 'NajafabadiEtAl2017' for protein
                      shade_below=0.5,
                      fade_below=0.5,
                      font_name='Arial Rounded MT Bold')
logo.style_xticks(rotation=0)
logo.ax.set_ylabel('Bits')

# Weight matrix (signed) -- enrichment vs depletion
weight_df = logomaker.transform_matrix(counts_df,
                                        from_type='counts',
                                        to_type='weight',
                                        background=genome_composition)
logo = logomaker.Logo(weight_df, color_scheme='classic',
                       flip_below=True)                  # depleted letters below axis

WebLogo (CLI / web)

weblogo --format pdf --sequence-type dna \
        --color-scheme classic --units bits \
        --composition equiprobable \
        --fineprint '' \
        --size large \
        < aligned.fasta > logo.pdf

WebLogo (Crooks 2004) is the original; supports many formats and is scriptable. For reproducible figures, prefer ggseqlogo or Logomaker (programmatic, easier to integrate with multi-panel figures).

Background Composition Correction

The bits encoding assumes a uniform background by default. For genome-derived motifs, the background should match the genome:

Human genome: A=0.29, C=0.21, G=0.21, T=0.29 (approx)
GC-rich genomes (Streptomyces): A=0.18, C=0.32, G=0.32, T=0.18

Without correction, a motif preferring GC in a genome where GC is rare overestimates information; conversely, an A-rich motif in an AT-rich genome underestimates.

# ggseqlogo: pass `bg_freq`
ggseqlogo(pwm, method = 'bits',
          bg_freq = c(A = 0.29, C = 0.21, G = 0.21, T = 0.29))

logomaker.transform_matrix(counts_df, from_type='counts', to_type='information',
                            background=[0.29, 0.21, 0.21, 0.29])

Per-Method Failure Modes

Probability encoding mistaken for bits

Trigger: Default method = 'probability' in some implementations.

Mechanism: Every position has total height 1; visually flat with all letters same total.

Symptom: Reviewer asks "why doesn't the logo show conservation gradient?"

Fix: Use method = 'bits' for the standard motif encoding.

Background uniform when genome composition matters

Trigger: Uniform bg_freq = c(0.25, 0.25, 0.25, 0.25) for a non-uniform genome.

Mechanism: Information content overestimates conservation for preferred bases.

Symptom: Reported motif looks more conserved than it actually is.

Fix: Pass genome composition to bg_freq / background parameter.

PWM rows/columns reversed

Trigger: Input matrix in samples-as-rows convention; logomaker expects positions-as-rows.

Mechanism: Logo renders the wrong dimension as "position."

Symptom: Logo has letter count = number of input rows, not motif length.

Fix: Transpose the matrix; verify with print(matrix.shape) before plotting.

Custom alphabet not recognized

Trigger: RNA logo with U instead of T; protein logo with J or Z.

Mechanism: ggseqlogo and logomaker auto-detect alphabet from input; unusual characters may fail.

Symptom: Letters render as boxes or missing entirely.

Fix: Explicit seq_type = 'rna' (ggseqlogo) or pass custom color scheme (Logomaker).

Aligned sequences of unequal length

Trigger: Vector of motif instances with different lengths.

Mechanism: Most tools require equal-length input.

Symptom: Error or only first N positions plotted.

Fix: Pre-align (MEME / TOMTOM) or trim to a common length.

Logo for too few input sequences

Trigger: PWM from N=5 sequences plotted as if N=500.

Mechanism: Information content has small-N bias; even random sequences look "conserved" at N=5.

Symptom: Logo appears more meaningful than the input warrants.

Fix: Compute small-sample correction (Schneider 1986; standard in MEME); annotate N in caption; require N ≥ 20 for credible motif.

Stacked logos with different alphabets compared

Trigger: Stacking a DNA logo above a protein logo for visual comparison.

Mechanism: Maximum information content differs (2 bits DNA vs 4.3 bits protein); y-axes are not comparable.

Symptom: Apparent "weaker" protein logo because of higher possible max.

Fix: Normalize both to fractional information (0-1) OR present separately.

Reconciliation: When Logos Differ

Quantitative Thresholds

Common Errors

References

Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a sequence logo generator. Genome Res 14(6):1188-1190.
Dey KK, Xie D, Stephens M. 2018. A new sequence logo plot to highlight enrichment and depletion. bioRxiv.
Schneider TD. 1986. Information content of binding sites on nucleotide sequences. J Mol Biol 188(3):415-431.
Schneider TD, Stephens RM. 1990. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18(20):6097-6100.
Tareen A, Kinney JB. 2020. Logomaker: beautiful sequence logos in Python. Bioinformatics 36(7):2272-2274.
Wagih O. 2017. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33(22):3645-3647.

Related Skills

chip-seq/motif-analysis - Discover the PWM that becomes the logo
atac-seq/footprinting - Footprinting motifs to visualize
clip-seq/clip-motif-analysis - CLIP-derived motifs
alignment/multiple-alignment - Aligned sequences as logo input
data-visualization/color-palettes - Custom alphabet color schemes

Adoption

GPTomics/bio-data-visualization-sequence-logos

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Sequence Logos

The Single Most Important Modern Insight -- Bits vs Probability Are Different Visualizations

Decision Tree by Use Case

ggseqlogo (R) -- Canonical Bioinformatics Default

Logomaker (Python) -- Most Flexible

WebLogo (CLI / web)

Background Composition Correction

Per-Method Failure Modes

Probability encoding mistaken for bits

Background uniform when genome composition matters

PWM rows/columns reversed

Custom alphabet not recognized

Aligned sequences of unequal length

Logo for too few input sequences

Stacked logos with different alphabets compared

Reconciliation: When Logos Differ

Quantitative Thresholds

Common Errors

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-data-visualization-sequence-logos

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Sequence Logos

The Single Most Important Modern Insight -- Bits vs Probability Are Different Visualizations

Decision Tree by Use Case

ggseqlogo (R) -- Canonical Bioinformatics Default

Logomaker (Python) -- Most Flexible

WebLogo (CLI / web)

Background Composition Correction

Per-Method Failure Modes

Probability encoding mistaken for bits

Background uniform when genome composition matters

PWM rows/columns reversed

Custom alphabet not recognized

Aligned sequences of unequal length

Logo for too few input sequences

Stacked logos with different alphabets compared

Reconciliation: When Logos Differ

Quantitative Thresholds

Common Errors

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis