plugin/skills/tooluniverse-protein-structure-prediction/SKILL.md
Protein 3D structure prediction from sequence — ESMFold de novo prediction, AlphaFold database retrieval, experimental structures from RCSB, ProtVar variant impact assessment, ProtParam sequence properties. Use for structure prediction when no experimental structure exists, fold-confidence scoring, and structure-guided variant interpretation.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-protein-structure-predictionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
End-to-end workflow for protein structure prediction starting from a sequence or UniProt accession. Combines ESMFold de novo prediction, AlphaFold database retrieval, experimental structure benchmarking from RCSB, ProtVar variant impact assessment, and ProtParam sequence property calculation.
KEY PRINCIPLES:
qualifier parameter (UniProt accession)When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply when users ask:
Not for (use tooluniverse-protein-structure-retrieval instead): retrieval-only tasks where user provides a PDB ID or wants to browse experimental structures without prediction.
| Parameter | Required | Description | Example |
|-----------|----------|-------------|---------|
| sequence | Yes (for ESMFold) | Amino acid sequence (single-letter FASTA) | MVLSPADKTNVK... |
| uniprot_id | Yes (for AlphaFold) | UniProt accession | P04637, P69905 |
| variant | No | Variant notation for structural impact | P04637 R175H, TP53 R175H |
| max_length | No | ESMFold limit: ~800 residues recommended | — |
Phase 0: Input preparation (sequence retrieval if needed)
|
Phase 1: Sequence properties (ProtParam_calculate)
|
Phase 2: De novo prediction (ESMFold_predict_structure)
|
Phase 3: AlphaFold reference (alphafold_get_prediction + alphafold_get_summary)
|
Phase 4: Experimental structure comparison (RCSBAdvSearch_search_structures, RCSBData_get_entry)
|
Phase 5: Variant structural impact (ProtVar_map_variant + ProtVar_get_function) [if variant provided]
|
Phase 6: Quality synthesis and interpretation
Objective: Obtain or verify the protein sequence needed for ESMFold prediction.
Use it directly for ESMFold_predict_structure. Check length:
800 residues: ESMFold may fail or produce lower quality; recommend using AlphaFold instead
Retrieve sequence from UniProt_get_entry_by_accession:
accession: UniProt accessionsequence.value field from the responseNote: If only a name is given (not accession), first resolve with UniProt_search or MyGene_query_genes to get the UniProt accession, then fetch the sequence.
Objective: Calculate physicochemical properties before prediction to contextualize results.
ProtParam_calculate:
sequence: amino acid sequence string (single-letter code)Objective: Predict 3D structure from sequence using Meta's ESM-2 language model.
ESMFold_predict_structure:
sequence: amino acid sequence stringESMFold_predict_structure with the sequence| pLDDT Range | Interpretation | Reliability | |-------------|---------------|-------------| | >90 | Very high confidence | Equivalent to experimental quality | | 70-90 | High confidence | Backbone reliable, side chains approximate | | 50-70 | Low confidence | Potentially disordered or flexible region | | <50 | Very low confidence | Likely intrinsically disordered; do not interpret |
| pTM Score | Fold Confidence | |-----------|----------------| | >0.8 | High confidence global fold | | 0.5-0.8 | Moderate; some domains may be uncertain | | <0.5 | Low global fold confidence |
Objective: Retrieve precomputed AlphaFold2 model for comparison and higher-accuracy reference.
alphafold_get_prediction:
qualifier (or alias uniprot_id / uniprot_accession): UniProt accession (e.g., "P04637")alphafold_get_summary:
qualifier (or alias uniprot_id / uniprot_accession): UniProt accessionalphafold_get_annotations (optional):
qualifier: UniProt accessionAlphaFill_get_transplants (optional, ligands/cofactors):
uniprot: UniProt accession (e.g., "P00520" ABL1)alphafold_get_prediction and alphafold_get_summaryObjective: Check whether experimental structures exist in PDB and how predictions compare.
RCSBAdvSearch_search_structures (search by protein/gene name):
query: protein name or gene symbollimit: number of results (default 10)RCSBData_get_entry (details for a specific PDB ID):
pdb_id: 4-character PDB identifierObjective: Assess how a specific amino acid substitution affects the predicted structure.
ProtVar_map_variant:
variant: string notation like "P04637 R175H" or HGVS notationProtVar_get_function:
accession: UniProt accessionposition: integer residue positionvariant_aa: mutant amino acid (single letter)ProtVar_map_variant to resolve the variant and confirm positionProtVar_get_function with wild-type position to get domain context| Tier | Evidence | |------|----------| | T1 | Clinical/functional data for this exact variant (from ProtVar) | | T2 | Variant at experimentally characterized active site or binding interface | | T3 | Computational pathogenicity prediction (PolyPhen, SIFT from ProtVar) | | T4 | Position in predicted structured region only |
Protein summary — name, length, pI, stability index (from ProtParam)
Structure prediction summary table: | Method | Mean pLDDT | pTM/Global Score | Coverage | Notes | |--------|-----------|------------------|----------|-------| | ESMFold | X.X | X.X | 100% (full seq) | — | | AlphaFold | X.X | — | 100% | version vN | | Experimental (best) | N/A | N/A | XX% | PDB: XXXX, Xray, X.X A |
Confidence map — regions of high vs low confidence; highlight disordered regions
Experimental structure comparison — does PDB have coverage? How does prediction align?
Variant impact (if applicable) — domain context, pathogenicity, structural consequence
Recommendations:
| Tool | Key Parameter | Notes |
|------|--------------|-------|
| ESMFold_predict_structure | sequence | Raw amino acid string, no spaces, no FASTA header |
| alphafold_get_prediction | qualifier or uniprot_id | UniProt accession (e.g., "P04637") |
| alphafold_get_summary | qualifier or uniprot_id | Same UniProt accession |
| ProtParam_calculate | sequence | Same sequence string |
| ProtVar_map_variant | variant | Format: "<UniProt_ID> <AA><pos><AA>" e.g., "P04637 R175H" |
| ProtVar_get_function | position | Integer residue number |
| Situation | Fallback | |-----------|----------| | ESMFold fails (sequence too long > 800 aa) | Use AlphaFold model only; note length limitation | | AlphaFold no entry for UniProt ID | Use ESMFold prediction only | | RCSB search returns no results | Note no experimental structure; proceed with predictions | | No UniProt accession available | Use ESMFold from raw sequence; skip AlphaFold | | ProtVar variant not found | Manually assess position from domain annotation in Phase 4 |
| Database | Coverage | What it provides | |----------|----------|-----------------| | ESMFold | Any protein sequence (up to ~800 aa) | De novo structure prediction from sequence alone | | AlphaFold DB | UniProt reviewed proteins (>200M entries) | Precomputed predictions with per-residue pLDDT | | RCSB PDB | ~220,000 experimental structures | Ground-truth experimental coordinates for comparison | | ProtVar | All UniProt proteins | Variant impact, domain context, clinical annotations | | ProtParam | Any sequence | Physicochemical sequence properties |
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.