immunoinformatics/epitope-prediction/SKILL.md
Predict B-cell and T-cell epitopes for vaccine antigen design and epitope mapping with BepiPred-3.0, DiscoTope-3.0, the IEDB tools, and EL-mode MHC presentation. Encodes the load-bearing asymmetry that T-cell epitope prediction is mature (it reduces to MHC presentation, AUC>0.9) while B-cell prediction is unreliable (linear predictors ~AUC 0.6 because ~90% of real epitopes are conformational) — so structure-based DiscoTope-3.0 on AlphaFold models is the only defensible B-cell path, propensity scales are obsolete, and NetChop is largely redundant on EL-trained models. Use when mapping epitopes or selecting vaccine antigens. MHC binding lives in mhc-binding-prediction.
npx skillsauth add GPTomics/bioSkills bio-immunoinformatics-epitope-predictionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: BepiPred-3.0, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signatures<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Notes specific to this skill: BepiPred-3.0 ships as the bepipred3 package and auto-downloads ESM-2 weights on first run; its default threshold is 0.1512 (NOT 0.5). DiscoTope-3.0, ElliPro, SEPPA, NetChop, and NetCTLpan are standalone/web (IEDB or DTU). The IEDB classic and next-generation REST APIs wrap most predictors. Re-verify thresholds and the supported-method list against current docs.
"Predict the B-cell and T-cell epitopes in my antigen" -> Identify antibody-binding (B-cell) and MHC-presented (T-cell) immunogenic regions, with appropriately different confidence for each.
bepipred3 for linear B-cell epitopes; IEDB REST API for B-cell/T-cell toolsT-cell epitope prediction is mature and trustworthy because it reduces to MHC binding/presentation — a sharply constrained problem (a peptide fits the groove or it does not) with an enormous mass-spec eluted-ligand training corpus; NetMHCpan-4.1 and MHCflurry routinely exceed AUC 0.9 for class I. B-cell epitope prediction is unreliable: linear sequence-based predictors land around AUC 0.6, and even the ESM-2-based BepiPred-3.0 falls to AUC 0.663 on the real IEDB external test set. This is structural, not a tuning problem the next network will fix: ~90% of natural B-cell epitopes are conformational/discontinuous — residues clustered in 3D but far apart in sequence — which a sequence-only model is by construction blind to. The single most damaging mistake in this domain is letting the well-deserved confidence in MHC/T-cell prediction leak into unwarranted confidence in B-cell prediction. Write down which problem is being solved before running anything.
| Tool | Citation | Target | Input | When | |------|----------|--------|-------|------| | NetMHCpan-4.1 EL / MHCflurry | Reynisson 2020; O'Donnell 2020 | T-cell (MHC-I presentation) | sequence + HLA | Default T-cell path; EL encodes processing | | NetMHCIIpan / NetCTLpan | Nilsson 2023; Stranzl 2010 | T-cell (CD4 / integrated CTL) | sequence + HLA | CD4 epitopes; integrated cleavage+TAP+MHC | | DiscoTope-3.0 | Høie 2024 | B-cell (conformational) | 3D structure (AlphaFold OK) | The only defensible B-cell method when a structure exists | | BepiPred-3.0 | Clifford 2022 | B-cell (linear) | sequence | Linear/denatured-target reagents; misses ~90% native | | ElliPro / SEPPA 3.0 | Ponomarenko 2008; Zhou 2019 | B-cell (conformational) | 3D structure | Fast geometric baseline; SEPPA for glycoproteins | | Propensity scales | Kolaskar 1990 etc. | B-cell (linear) | sequence | Obsolete; decoration, not data |
| Scenario | Recommended | Why | |----------|-------------|-----| | T-cell (CD8) epitopes | NetMHCpan-4.1 EL / MHCflurry | Mature; defer to mhc-binding-prediction | | T-cell (CD4) epitopes | NetMHCIIpan-4.3 | Defer to mhc-class-ii-prediction; less reliable | | B-cell, structure available or foldable | DiscoTope-3.0 on AlphaFold model | Conformational; ~no penalty for predicted structures | | B-cell glycoprotein (Env/S/HA) | SEPPA 3.0 | Models glycan shielding | | B-cell, sequence only, peptide/denatured target | BepiPred-3.0 (linear/top-X%) | Legitimate narrow use; state the conformational caveat | | B-cell, sequence only, native antibody response | Fold a structure first, then DiscoTope-3.0 | Linear prediction structurally cannot see native epitopes | | Broadly-protective vaccine | + conservation + HLA population coverage | A high-scoring epitope in a hypervariable loop is worthless |
Goal: Score per-residue linear B-cell epitope probability from sequence, for a linear/denatured-target use case.
Approach: Run the bepipred3 CLI (or package) on a FASTA; it emits per-residue probabilities, a binary FASTA (upper = epitope), and top-X% selections. Use the default threshold 0.1512 or the top-X% mode; treat output as a hypothesis that misses most native conformational epitopes.
# bepipred3 auto-downloads ESM-2 weights on first run; default threshold 0.1512 (NOT 0.5)
python bepipred3_CLI.py -i antigen.fasta -o bp3_out/ -pred vt_pred -t 0.1512
# or select the top 20% scoring residues per sequence instead of a fixed cutoff:
python bepipred3_CLI.py -i antigen.fasta -o bp3_out/ -pred vt_pred -top 20
Goal: Identify antibody-accessible surface patches from a 3D structure (the defensible B-cell path).
Approach: Provide a single antigen chain (experimental or AlphaFold). DiscoTope-3.0 scores per-residue conformational propensity and was trained on predicted structures, so AF2 models incur essentially no penalty (AUC 0.799 vs 0.807). Gate trust by pLDDT — accuracy drops ~5 percentile points per 10-point pLDDT decrease — and remember AUC-PR is only ~0.22 (low precision, many false positives).
def gate_discotope_by_plddt(df, plddt_col='pLDDT', score_col='DiscoTope-3.0 score', min_plddt=70):
'''Keep DiscoTope-3.0 calls only in confidently-folded regions; low-pLDDT loops
(where antibodies often bind) are exactly where structure-based calls are least
reliable. df: per-residue DiscoTope-3.0 output joined with model pLDDT.'''
return df[df[plddt_col] >= min_plddt].sort_values(score_col, ascending=False)
Goal: Nominate CD8/CD4 epitopes from an antigen.
Approach: Tile the antigen and score with EL-mode MHC presentation (class I: mhc-binding-prediction; class II: mhc-class-ii-prediction). Do NOT add NetChop by default — EL models are trained on eluted ligands that already survived proteasomal cleavage and TAP, so the processing signal is implicit; explicit cleavage prediction is largely redundant and can double-penalize. Reserve NetChop/NetCTLpan for long source proteins as a cleavage sanity check or alleles lacking EL coverage.
Trigger: running BepiPred on a folded viral spike to predict neutralizing epitopes. Mechanism: native epitopes are conformational; sequence models cannot see them. Symptom: "predicted epitopes" that no native antibody targets. Fix: fold a structure and use DiscoTope-3.0; reserve linear predictors for peptide/denatured targets.
Trigger: DiscoTope on a low-confidence AlphaFold surface loop or a monomer of an oligomeric antigen. Mechanism: a subtly wrong surface moves the predicted epitope; an oligomer interface looks exposed in the monomer. Symptom: false-positive epitopes at buried/flexible sites. Fix: gate by pLDDT; model the biological assembly when the antigen oligomerizes.
Trigger: reporting Kolaskar-Tongaonkar/Parker/Emini "antigenic regions" as data. Mechanism: these are coarse 1980s physicochemical descriptors at/near random. Symptom: confident-looking but uninformative B-cell calls. Fix: treat as obsolete decoration; everything they encode is subsumed by BepiPred/structure methods.
Trigger: ranking vaccine epitopes purely by binding/presentation score. Mechanism: immunodominance depends on repertoire, competition, processing kinetics, immune history — none modeled. Symptom: a strong predicted binder that is subdominant or ignored in vivo. Fix: treat presentation as necessary-not-sufficient; validate by ELISpot/tetramer.
| Threshold | Source | Rationale | |-----------|--------|-----------| | BepiPred-3.0 default 0.1512 | Clifford 2022 | Balances sens/spec on their benchmark; NOT 0.5 | | Linear B-cell AUC ~0.6 | Field benchmarks | Barely above random; report as hypothesis | | DiscoTope-3.0 AUC-ROC ~0.80, AUC-PR ~0.22 | Høie 2024 | Moderate ranking, low precision (minority class) | | pLDDT >= 70 to trust DiscoTope calls | Høie 2024 | ~5 percentile-point drop per 10-point pLDDT loss | | ~90% of B-cell epitopes conformational | B-cell literature | Why sequence-only prediction has a low ceiling | | Skip NetChop on EL-mode predictions | Reynisson 2020 | EL training already encodes cleavage/TAP |
| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Over-trusting B-cell predictions | Conflated with mature T-cell prediction | State the maturity asymmetry; treat B-cell as hypothesis | | Few/no BepiPred epitopes | Applied 0.5 threshold | Use default 0.1512 or top-X% mode | | False epitopes in flexible loops | Low-pLDDT AlphaFold model | Gate by pLDDT; assess model quality | | Epitope worthless across strains | No conservation analysis | Add IEDB Epitope Conservancy + MSA | | Redundant/over-penalized T-cell calls | NetChop stacked on EL model | Use EL presentation as the primary filter | | Vaccine "designed" in silico | Over-trusting reverse-vaccinology scores | Treat VaxiJen/Vaxign as candidate funnels; validate experimentally |
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.