skills/tooluniverse-protein-therapeutic-design/SKILL.md
AI-guided de novo protein design — RFdiffusion backbone generation, ProteinMPNN sequence design, structure validation (pLDDT, pTM, MPNN scores). Use for designing therapeutic protein binders, novel scaffolds, enzyme variants, and miniprotein/protein-interface design before experimental validation.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-protein-therapeutic-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.
KEY PRINCIPLES:
Therapeutic protein design starts with the target interaction. What binding surface do you need to cover? A small pocket = nanobody or peptide. A large flat surface = designed protein. Stability, immunogenicity, and manufacturability constrain the design space.
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply when user asks to:
Phase 1: Target Characterization
Get structure (PDB, EMDB cryo-EM, AlphaFold), identify binding epitope
Phase 2: Backbone Generation (RFdiffusion)
Define constraints, generate >= 5 backbones, filter by geometry
Phase 3: Sequence Design (ProteinMPNN)
Design >= 8 sequences per backbone, sample with temperature control
Phase 4: Structure Validation (ESMFold/AlphaFold2)
Predict structure, compare to backbone, assess pLDDT/pTM
Phase 5: Developability Assessment
Aggregation, pI, expression prediction
Phase 6: Report Synthesis
Ranked candidates, FASTA, experimental recommendations
[TARGET]_protein_design_report.md first with section headers[TARGET]_designed_sequences.fasta and [TARGET]_top_candidates.csvEvery design MUST include: Sequence, Length, Target, Method, and Quality Metrics (pLDDT, pTM, MPNN score, binding prediction).
| Tool | Purpose | Key Parameter |
|------|---------|---------------|
| NvidiaNIM_rfdiffusion (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | Backbone generation | diffusion_steps (NOT num_steps) |
| NvidiaNIM_proteinmpnn (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | Sequence design | pdb_string (NOT pdb) |
| ESMFold_predict_structure | Fast validation | sequence (NOT seq) |
| NvidiaNIM_alphafold2 (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | High-accuracy structure inference from sequence | sequence, algorithm |
| NvidiaNIM_esm2_650m (requires NVIDIA_API_KEY env var; free key at build.nvidia.com) | Sequence embeddings | sequences, format |
| Tool | Wrong | Correct |
|------|-------|---------|
| NvidiaNIM_rfdiffusion (requires NVIDIA_API_KEY) | num_steps=50 | diffusion_steps=50 |
| NvidiaNIM_proteinmpnn (requires NVIDIA_API_KEY) | pdb=content | pdb_string=content |
| ESMFold_predict_structure | seq="MVLS..." | sequence="MVLS..." |
| NvidiaNIM_alphafold2 (requires NVIDIA_API_KEY) | seq="MVLS..." | sequence="MVLS..." |
NVIDIA_API_KEY environment variable required| Tool | Purpose | Key Parameters |
|------|---------|----------------|
| PDBe_get_uniprot_mappings | Find PDB structures | uniprot_id |
| RCSBData_get_entry | Download PDB file | pdb_id |
| alphafold_get_prediction | Get AlphaFold DB structure | accession |
| emdb_search | Search cryo-EM maps | query |
| emdb_get_entry | Get entry details | entry_id |
| UniProt_get_entry_by_accession | Get target sequence | accession |
| InterPro_get_protein_domains | Get domains | accession |
| Tier | Criteria | |------|----------| | T1 (best) | pLDDT >85, pTM >0.8, low aggregation, neutral pI | | T2 | pLDDT >75, pTM >0.7, acceptable developability | | T3 | pLDDT >70, pTM >0.65, developability concerns | | T4 | Failed validation or major developability issues |
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).