plugin/skills/tooluniverse-protein-therapeutic-design/SKILL.md
AI-guided de novo protein design — RFdiffusion backbone generation, ProteinMPNN sequence design, structure validation (pLDDT, pTM, MPNN scores). Use for designing therapeutic protein binders, novel scaffolds, enzyme variants, and miniprotein/protein-interface design before experimental validation.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-protein-therapeutic-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.
KEY PRINCIPLES:
Therapeutic protein design starts with the target interaction. What binding surface do you need to cover? A small pocket = nanobody or peptide. A large flat surface = designed protein. Stability, immunogenicity, and manufacturability constrain the design space.
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply when user asks to:
Phase 1: Target Characterization
Get structure (PDB, EMDB cryo-EM, AlphaFold), identify binding epitope
Phase 2: Backbone Generation (RFdiffusion)
Define constraints, generate >= 5 backbones, filter by geometry
Phase 3: Sequence Design (ProteinMPNN)
Design >= 8 sequences per backbone, sample with temperature control
Phase 4: Structure Validation (ESMFold/AlphaFold2)
Predict structure, compare to backbone, assess pLDDT/pTM
Phase 5: Developability Assessment
Aggregation, pI, expression prediction
Phase 6: Report Synthesis
Ranked candidates, FASTA, experimental recommendations
[TARGET]_protein_design_report.md first with section headers[TARGET]_designed_sequences.fasta and [TARGET]_top_candidates.csvEvery design MUST include: Sequence, Length, Target, Method, and Quality Metrics (pLDDT, pTM, MPNN score, binding prediction).
| Tool | Purpose | Key Parameter |
|------|---------|---------------|
| NvidiaNIM_rfdiffusion (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | Backbone generation | diffusion_steps (NOT num_steps) |
| NvidiaNIM_proteinmpnn (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | Sequence design | pdb_string (NOT pdb) |
| ESMFold_predict_structure | Fast validation | sequence (NOT seq) |
| NvidiaNIM_alphafold2 (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | High-accuracy structure inference from sequence | sequence, algorithm |
| NvidiaNIM_esm2_650m (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | Sequence embeddings | sequences, format |
| Tool | Wrong | Correct |
|------|-------|---------|
| NvidiaNIM_rfdiffusion (requires NVIDIA_API_KEY) | num_steps=50 | diffusion_steps=50 |
| NvidiaNIM_proteinmpnn (requires NVIDIA_API_KEY) | pdb=content | pdb_string=content |
| ESMFold_predict_structure | seq="MVLS..." | sequence="MVLS..." |
| NvidiaNIM_alphafold2 (requires NVIDIA_API_KEY) | seq="MVLS..." | sequence="MVLS..." |
NVIDIA_API_KEY environment variable required| Tool | Purpose | Key Parameters |
|------|---------|----------------|
| PDBe_get_uniprot_mappings | Find PDB structures | uniprot_id |
| RCSBData_get_entry | Download PDB file | pdb_id |
| alphafold_get_prediction | Get AlphaFold DB structure | accession |
| EMDB_search_structures | Search cryo-EM maps | query |
| EMDB_get_structure | Get entry details | entry_id |
| UniProt_get_entry_by_accession | Get target sequence | accession |
| InterPro_get_protein_domains | Get domains | accession |
| Tier | Criteria | |------|----------| | T1 (best) | pLDDT >85, pTM >0.8, low aggregation, neutral pI | | T2 | pLDDT >75, pTM >0.7, acceptable developability | | T3 | pLDDT >70, pTM >0.65, developability concerns | | T4 | Failed validation or major developability issues |
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.