skills/tooluniverse-vaccine-design/SKILL.md
Computational vaccine candidate design: peptide/subunit vaccines via MHC-I/MHC-II epitope prediction (IEDB), population HLA coverage optimization, B-cell epitope identification, and cross-strain conservation analysis. Use for vaccine epitope prediction, HLA allele coverage, multi-epitope construct design, and immunogenicity assessment. Combines predicted MHC binding with experimentally validated IEDB epitopes for higher-confidence designs.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-vaccine-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Computational pipeline for designing peptide/subunit vaccine candidates through epitope prediction, population coverage optimization, and immunogenicity assessment.
Vaccine design requires presenting the right epitopes to elicit protective immunity — not just any immune response, but one that is neutralizing, durable, and broadly applicable. For T-cell vaccines, the core tool is MHC binding prediction (IEDB tools): predict peptide-MHC affinity across multiple HLA alleles, then select epitopes with broad coverage of the target population. For antibody vaccines, prioritize surface-exposed conserved regions — a deeply buried or hypervariable region makes a poor antibody target. MHC binding does not equal immunogenicity; many good binders are not immunogenic in vivo due to tolerance, poor processing, or lack of T-cell help. A multi-epitope strategy (combining MHC-I for CD8+ CTL response, MHC-II for CD4+ helper response, and B-cell epitopes for antibody induction) is more robust than any single epitope. Conservation across pathogen strains is critical — an epitope that mutates under immune pressure (like HIV envelope hypervariable regions) is a poor vaccine target.
LOOK UP DON'T GUESS: Do not predict MHC binding or population coverage from memory — use IEDB_predict_mhci_binding and IEDB_predict_mhcii_binding for predictions and iedb_search_epitopes for validated experimental data. Do not assume what's on the pathogen surface; retrieve annotated sequences from UniProt or BVBRC.
Key principles:
Not this skill: For HLA typing or allele frequency only, use tooluniverse-hla-immunogenomics. For antibody engineering, use tooluniverse-antibody-engineering.
| Tool | Use For |
|------|---------|
| iedb_search_epitopes | Search experimentally validated epitopes |
| iedb_get_epitope_mhc | Get detailed epitope data (assay results, MHC restriction) |
| iedb_search_mhc | Search validated MHC binding assay data |
| IEDB_predict_mhci_binding | Predict MHC-I binding (NetMHCpan EL; rank < 0.5% = strong binder) |
| IEDB_predict_mhcii_binding | Predict MHC-II binding (NetMHCIIpan EL; CD4+ helper epitopes) |
| UniProt_get_entry_by_accession | Get antigen protein sequence |
| UniProt_search | Find pathogen protein sequences |
| BVBRC_search_genome_features | Search pathogen proteomes |
| alphafold_get_prediction | Get/predict antigen 3D structure |
| EnsemblVEP_annotate_hgvs | Check epitope conservation across variants |
| PubMed_search_articles | Find published vaccine studies |
| search_clinical_trials | Find ongoing vaccine clinical trials |
Phase 0: Antigen Selection
Pathogen → essential surface proteins → sequence retrieval
|
Phase 1: T-Cell Epitope Prediction
MHC-I (CD8+ CTL) and MHC-II (CD4+ helper) binding prediction
|
Phase 2: B-Cell Epitope Prediction
Linear and conformational B-cell epitopes for antibody response
|
Phase 3: Population Coverage
HLA allele frequencies → design for target population
|
Phase 4: Conservation Analysis
Cross-strain epitope conservation → broad protection
|
Phase 5: Candidate Assembly & Report
Multi-epitope construct design → immunogenicity assessment
Best antigens for vaccines: Surface-exposed, essential for pathogen function, conserved across strains.
# Find pathogen surface proteins
UniProt_search(query="[organism] AND locations:(location:cell surface) AND reviewed:true")
# Or search BVBRC for annotated pathogen proteomes
BVBRC_search_genome_features(keyword="surface protein", genome_id="[taxon_id]")
Antigen prioritization: prefer surface-exposed (secreted/outer membrane) over cytoplasmic; >95% conserved across strains; essential for pathogen viability; known immunogen in natural infection. Use UniProt subcellular location annotations and PubMed to verify these properties.
MHC-I epitopes (CD8+ cytotoxic T cells — kill infected cells):
# Option A: Search for KNOWN validated epitopes from IEDB
iedb_search_mhc(
mhc_class="I",
qualitative_measure="Positive",
filters={"source_organism_iri": "eq.NCBITaxon:2697049"}, # SARS-CoV-2
select=["linear_sequence", "mhc_restriction", "qualitative_measure"],
limit=50
)
# Option B: PREDICT novel peptide binding (recommended for new proteins)
IEDB_predict_mhci_binding(
sequence="YOUR_PROTEIN_SEQUENCE", # full protein or peptide
allele="HLA-A*02:01", # or H-2-Kd for mouse
method="netmhcpan_el", # EL = eluted ligand (recommended)
length=9 # 8-11 for MHC-I
)
# Returns peptides ranked by percentile_rank:
# < 0.5% = strong binder (include in vaccine)
# 0.5-2% = moderate binder (consider)
# > 2% = weak/non-binder (exclude)
MHC-II epitopes (CD4+ helper T cells — activate B cells and CD8+ T cells):
iedb_search_mhc(
mhc_class="II",
qualitative_measure="Positive",
filters={"source_organism_iri": "eq.NCBITaxon:2697049"},
limit=50
)
Binding affinity interpretation:
| IC50 (nM) | Classification | Vaccine Relevance | |-----------|---------------|-------------------| | < 50 | Strong binder | Include — high presentation probability | | 50-500 | Moderate binder | Consider — may contribute to response | | 500-5000 | Weak binder | Exclude — unlikely to be presented | | > 5000 | Non-binder | Exclude |
HLA supertype strategy: For broad coverage, predict against HLA supertypes:
B-cell epitopes trigger antibody production. Look for:
# Check for known B-cell epitopes
iedb_search_epitopes(query="[protein_name]", epitope_type="B cell")
# Get structure for conformational epitope prediction
alphafold_get_prediction(uniprot_id="[accession]")
B-cell epitope criteria: Surface-exposed loops, hydrophilic regions, flexible regions (high B-factor). Combine computational prediction with structural analysis.
# Search for epitopes restricted to common HLA alleles in target population
# NOTE: No HLA frequency tool exists in ToolUniverse. For population coverage:
# 1. Use IEDB Analysis Resource (tools.iedb.org/population) for population coverage calculation
# 2. Use the HLA supertype strategy (see above) to ensure broad coverage
# 3. Search PubMed for published HLA frequency data: PubMed_search_articles(query="HLA allele frequency [population]")
Population coverage targets:
| Coverage Level | Interpretation | Action | |---------------|---------------|--------| | >90% | Excellent — vaccine will work in most individuals | Proceed to development | | 70-90% | Good — most people covered; some populations underserved | Add more epitopes for uncovered HLA types | | 50-70% | Moderate — significant gaps | Redesign with broader HLA coverage | | <50% | Poor — vaccine will miss too many people | Fundamental redesign needed |
Check if epitopes are conserved across pathogen strains/variants:
# Search for protein variants across strains
PubMed_search_articles(query="[pathogen] [protein] sequence variation strains")
# Check specific mutations in epitope regions
EnsemblVEP_annotate_hgvs(hgvs_notation="[variant_in_epitope]")
Conservation interpretation:
Multi-epitope construct design principles:
Report structure:
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).