ribo-seq/ribosome-stalling/SKILL.md
Detect ribosome pausing and stalling sites from Ribo-seq data at codon resolution. Use when studying translational regulation, identifying pause sites, or analyzing codon-specific translation dynamics.
npx skillsauth add GPTomics/bioSkills bio-ribo-seq-ribosome-stallingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: BioPython 1.83+, numpy 1.26+, scipy 1.12+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturesIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Find ribosome pause sites in my data" -> Detect codon-level ribosome stalling and pausing events from Ribo-seq footprint density, identifying positions with abnormally high ribosome occupancy.
plastid for codon-resolution density calculation, scipy for statistical scoringRibosome stalling/pausing occurs when ribosomes slow or stop at specific codons:
Goal: Quantify ribosome occupancy at each codon position across all transcripts.
Approach: Map reads to P-sites using a fixed offset, then bin counts into codons along each CDS.
from plastid import BAMGenomeArray, GTF2_TranscriptAssembler, FivePrimeMapFactory
import numpy as np
from collections import defaultdict
def get_codon_occupancy(bam_path, gtf_path, psite_offset=12):
'''Calculate ribosome occupancy per codon'''
# Load reads with P-site mapping
alignments = BAMGenomeArray(
bam_path,
mapping=FivePrimeMapFactory(offset=psite_offset)
)
transcripts = list(GTF2_TranscriptAssembler(gtf_path))
codon_counts = defaultdict(lambda: defaultdict(int))
for tx in transcripts:
if tx.cds_start is None:
continue
cds = tx.get_cds()
cds_seq = tx.get_sequence(cds)
# Get counts at each position
counts = alignments.count_in_region(cds)
# Assign to codons
for i in range(0, len(cds_seq) - 2, 3):
codon = cds_seq[i:i+3]
codon_pos = i // 3
codon_counts[tx.get_name()][codon_pos] = counts # Simplified
return codon_counts
Goal: Detect codon positions with significantly elevated ribosome occupancy indicative of translational pausing.
Approach: Z-score normalize occupancy per transcript and flag positions exceeding a threshold (default z > 3).
def find_pause_sites(codon_occupancy, threshold_zscore=3):
'''Find positions with significantly elevated ribosome occupancy
Pause sites have much higher occupancy than surrounding codons
'''
pause_sites = []
for tx, occupancy in codon_occupancy.items():
values = np.array(list(occupancy.values()))
if len(values) < 10 or values.sum() < 100:
continue
# Z-score normalization
mean_occ = values.mean()
std_occ = values.std()
if std_occ == 0:
continue
zscores = (values - mean_occ) / std_occ
# Find positions above threshold
for pos, zscore in enumerate(zscores):
if zscore > threshold_zscore:
pause_sites.append({
'transcript': tx,
'codon_position': pos,
'occupancy': values[pos],
'zscore': zscore
})
return pause_sites
Goal: Calculate average ribosome occupancy for each of the 64 codon types across all genes.
Approach: Aggregate read density per codon identity across all CDS positions and compute per-codon mean occupancy.
from Bio.Seq import Seq
from Bio.Data import CodonTable
def codon_occupancy_table(bam_path, gtf_path, psite_offset=12):
'''Calculate average occupancy per codon type'''
# Count reads per codon type
codon_reads = defaultdict(list)
alignments = BAMGenomeArray(bam_path,
mapping=FivePrimeMapFactory(offset=psite_offset))
transcripts = list(GTF2_TranscriptAssembler(gtf_path))
for tx in transcripts:
if tx.cds_start is None:
continue
cds = tx.get_cds()
cds_seq = str(tx.get_sequence(cds))
# Get read density
density = alignments.get_density(cds)
for i in range(0, len(cds_seq) - 2, 3):
codon = cds_seq[i:i+3]
if len(density) > i + 2:
codon_reads[codon].append(sum(density[i:i+3]))
# Calculate mean occupancy per codon
codon_means = {codon: np.mean(reads) for codon, reads in codon_reads.items()}
return codon_means
Goal: Test whether ribosome pausing correlates with tRNA availability across codons.
Approach: Compute Spearman rank correlation between per-codon occupancy and tRNA abundance; expect a negative relationship.
def correlate_with_trna(codon_occupancy, trna_abundance):
'''Test if pausing correlates with tRNA availability
Rare codons (low tRNA) should have higher occupancy
'''
from scipy import stats
codons = list(set(codon_occupancy.keys()) & set(trna_abundance.keys()))
occ = [codon_occupancy[c] for c in codons]
trna = [trna_abundance[c] for c in codons]
corr, pval = stats.spearmanr(occ, trna)
return corr, pval # Expect negative correlation
Goal: Extract amino acid sequence context around identified pause sites to discover recurrent motifs.
Approach: Translate the coding region flanking each pause site and collect fixed-width windows for motif analysis.
def extract_pause_motifs(pause_sites, sequences, window=10):
'''Extract amino acid context around pause sites'''
motifs = []
for site in pause_sites:
tx = site['transcript']
pos = site['codon_position']
seq = sequences.get(tx, '')
if len(seq) > pos * 3 + window * 3:
start = max(0, (pos - window) * 3)
end = min(len(seq), (pos + window + 1) * 3)
aa_seq = str(Seq(seq[start:end]).translate())
motifs.append(aa_seq)
return motifs
| Motif | Description | |-------|-------------| | PPP | Polyproline (ribosome tunnel interaction) | | XPX | Proline-containing | | D/E-rich | Negatively charged nascent chain | | Stop codon context | Influenced by nucleotides around stop |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.