skills/protein-therapeutic-design/SKILL.md
ToolUniverse workflow — Protein Therapeutic Design
npx skillsauth add lamm-mit/scienceclaw protein-therapeutic-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.
KEY PRINCIPLES:
Apply when user asks:
Create the report file FIRST:
[TARGET]_protein_design_report.md[Designing...]Progressively update as designs are generated
Output separate files:
[TARGET]_designed_sequences.fasta - All designed sequences[TARGET]_top_candidates.csv - Ranked candidates with metricsEvery design MUST include:
### Design: Binder_001
**Sequence**: MVLSPADKTN...
**Length**: 85 amino acids
**Target**: PD-L1 (UniProt: Q9NZQ7)
**Method**: RFdiffusion → ProteinMPNN → ESMFold validation
**Quality Metrics**:
| Metric | Value | Interpretation |
|--------|-------|----------------|
| pLDDT | 88.5 | High confidence |
| pTM | 0.82 | Good fold |
| ProteinMPNN score | -2.3 | Favorable |
| Predicted binding | Strong | Based on interface pLDDT |
*Source: NVIDIA NIM via `NvidiaNIM_rfdiffusion`, `NvidiaNIM_proteinmpnn`, `NvidiaNIM_esmfold`*
| Tool | Purpose | API Key Required |
|------|---------|------------------|
| NvidiaNIM_rfdiffusion | Backbone generation | Yes |
| NvidiaNIM_proteinmpnn | Sequence design | Yes |
| NvidiaNIM_esmfold | Fast structure validation | Yes |
| NvidiaNIM_alphafold2 | High-accuracy validation | Yes |
| NvidiaNIM_esm2_650m | Sequence embeddings | Yes |
| Tool | WRONG Parameter | CORRECT Parameter |
|------|-----------------|-------------------|
| NvidiaNIM_rfdiffusion | num_steps | diffusion_steps |
| NvidiaNIM_proteinmpnn | pdb | pdb_string |
| NvidiaNIM_esmfold | seq | sequence |
Phase 1: Target Characterization
├── Get target structure (PDB, EMDB cryo-EM, or AlphaFold)
├── Identify binding epitope
├── Analyze existing binders
├── Check EMDB for membrane protein structures (NEW)
└── OUTPUT: Target profile
↓
Phase 2: Backbone Generation (RFdiffusion)
├── Define design constraints
├── Generate multiple backbones
├── Filter by geometry quality
└── OUTPUT: Candidate backbones
↓
Phase 3: Sequence Design (ProteinMPNN)
├── Design sequences for each backbone
├── Sample multiple sequences per backbone
├── Score by ProteinMPNN likelihood
└── OUTPUT: Designed sequences
↓
Phase 4: Structure Validation
├── Predict structure (ESMFold/AlphaFold2)
├── Compare to designed backbone
├── Assess fold quality (pLDDT, pTM)
└── OUTPUT: Validated designs
↓
Phase 5: Developability Assessment
├── Aggregation propensity
├── Expression likelihood
├── Immunogenicity prediction
└── OUTPUT: Developability scores
↓
Phase 6: Report Synthesis
├── Ranked candidate list
├── Experimental recommendations
├── Next steps
└── OUTPUT: Final report
def get_target_structure(tu, target_id):
"""Get target structure from PDB, EMDB, or predict."""
# Try PDB first (X-ray/NMR)
pdb_results = tu.tools.PDB_search_by_uniprot(uniprot_id=target_id)
if pdb_results:
# Get highest resolution structure
best_pdb = sorted(pdb_results, key=lambda x: x['resolution'])[0]
structure = tu.tools.PDB_get_structure(pdb_id=best_pdb['pdb_id'])
return {'source': 'PDB', 'pdb_id': best_pdb['pdb_id'],
'resolution': best_pdb['resolution'], 'structure': structure}
# Try EMDB for cryo-EM structures (valuable for membrane proteins)
protein_info = tu.tools.UniProt_get_protein_by_accession(accession=target_id)
emdb_results = tu.tools.emdb_search(
query=protein_info['proteinDescription']['recommendedName']['fullName']['value']
)
if emdb_results and len(emdb_results) > 0:
# Get highest resolution cryo-EM entry
best_emdb = sorted(emdb_results, key=lambda x: x.get('resolution', 99))[0]
# Get associated PDB model if available
emdb_details = tu.tools.emdb_get_entry(entry_id=best_emdb['emdb_id'])
if emdb_details.get('pdb_ids'):
structure = tu.tools.PDB_get_structure(pdb_id=emdb_details['pdb_ids'][0])
return {'source': 'EMDB cryo-EM', 'emdb_id': best_emdb['emdb_id'],
'pdb_id': emdb_details['pdb_ids'][0],
'resolution': best_emdb.get('resolution'), 'structure': structure}
# Fallback to AlphaFold prediction
sequence = tu.tools.UniProt_get_protein_sequence(accession=target_id)
structure = tu.tools.NvidiaNIM_alphafold2(
sequence=sequence['sequence'],
algorithm="mmseqs2"
)
return {'source': 'AlphaFold2 (predicted)', 'structure': structure}
When to prioritize EMDB: Membrane proteins, large complexes, and targets where conformational states matter.
def get_cryoem_structures(tu, target_name):
"""Get cryo-EM structures for membrane proteins/complexes."""
# Search EMDB
emdb_results = tu.tools.emdb_search(
query=f"{target_name} membrane OR receptor"
)
structures = []
for entry in emdb_results[:5]:
details = tu.tools.emdb_get_entry(entry_id=entry['emdb_id'])
structures.append({
'emdb_id': entry['emdb_id'],
'resolution': entry.get('resolution', 'N/A'),
'title': entry.get('title', 'N/A'),
'conformational_state': details.get('state', 'Unknown'),
'pdb_models': details.get('pdb_ids', [])
})
return structures
Output for Report:
### 1.1b Cryo-EM Structures (EMDB)
| EMDB ID | Resolution | PDB Model | Conformation |
|---------|------------|-----------|--------------|
| EMD-12345 | 2.8 Å | 7ABC | Active state |
| EMD-23456 | 3.1 Å | 8DEF | Inactive state |
**Note**: Cryo-EM structures capture physiologically relevant conformations for membrane protein targets.
*Source: EMDB*
def identify_epitope(tu, target_structure, epitope_residues=None):
"""Identify or validate binding epitope."""
if epitope_residues:
# User-specified epitope
return {'residues': epitope_residues, 'source': 'user-defined'}
# Find surface-exposed regions
# Use structural analysis to identify potential epitopes
return analyze_surface(target_structure)
## 1. Target Characterization
### 1.1 Target Information
| Property | Value |
|----------|-------|
| **Target** | PD-L1 (Programmed death-ligand 1) |
| **UniProt** | Q9NZQ7 |
| **Structure source** | PDB: 4ZQK (2.0 Å resolution) |
| **Binding epitope** | IgV domain, residues 19-127 |
| **Known binders** | Atezolizumab, durvalumab, avelumab |
### 1.2 Epitope Analysis
| Residue Range | Type | Surface Area | Druggability |
|---------------|------|--------------|--------------|
| 54-68 | Loop | 850 Ų | High |
| 115-125 | Beta strand | 420 Ų | Medium |
| 19-30 | N-terminus | 380 Ų | Medium |
**Selected Epitope**: Residues 54-68 (PD-1 binding interface)
*Source: PDB 4ZQK, surface analysis*
def generate_backbones(tu, design_params):
"""Generate de novo backbones using RFdiffusion."""
backbones = tu.tools.NvidiaNIM_rfdiffusion(
diffusion_steps=design_params.get('steps', 50),
# Additional parameters depending on design type
)
return backbones
| Mode | Use Case | Key Parameters |
|------|----------|----------------|
| Unconditional | De novo scaffold | diffusion_steps only |
| Binder design | Target-guided binder | target_structure, hotspot_residues |
| Motif scaffolding | Functional motif embedding | motif_sequence, motif_structure |
## 2. Backbone Generation
### 2.1 Design Parameters
| Parameter | Value |
|-----------|-------|
| **Method** | RFdiffusion via NVIDIA NIM |
| **Design mode** | Unconditional scaffold generation |
| **Diffusion steps** | 50 |
| **Number generated** | 10 backbones |
### 2.2 Generated Backbones
| Backbone | Length | Topology | Quality |
|----------|--------|----------|---------|
| BB_001 | 85 aa | 3-helix bundle | Good |
| BB_002 | 92 aa | Beta sandwich | Good |
| BB_003 | 78 aa | Alpha-beta | Good |
| BB_004 | 88 aa | All-alpha | Moderate |
| BB_005 | 95 aa | Mixed | Good |
**Selected for sequence design**: BB_001, BB_002, BB_003, BB_005 (top 4)
*Source: NVIDIA NIM via `NvidiaNIM_rfdiffusion`*
def design_sequences(tu, backbone_pdb, num_sequences=8):
"""Design sequences for backbone using ProteinMPNN."""
sequences = tu.tools.NvidiaNIM_proteinmpnn(
pdb_string=backbone_pdb,
num_sequences=num_sequences,
temperature=0.1 # Lower = more conservative
)
return sequences
| Parameter | Conservative | Moderate | Diverse | |-----------|--------------|----------|---------| | Temperature | 0.1 | 0.2 | 0.5 | | Sequences per backbone | 4 | 8 | 16 | | Use case | Validated scaffold | Exploration | Diversity |
## 3. Sequence Design
### 3.1 Design Parameters
| Parameter | Value |
|-----------|-------|
| **Method** | ProteinMPNN via NVIDIA NIM |
| **Temperature** | 0.1 (conservative) |
| **Sequences per backbone** | 8 |
| **Total sequences** | 32 |
### 3.2 Designed Sequences (Top 10 by Score)
| Rank | Backbone | Sequence ID | Length | MPNN Score | Predicted pI |
|------|----------|-------------|--------|------------|--------------|
| 1 | BB_001 | Seq_001_A | 85 | -1.89 | 6.2 |
| 2 | BB_002 | Seq_002_C | 92 | -1.95 | 5.8 |
| 3 | BB_001 | Seq_001_B | 85 | -2.01 | 7.1 |
| 4 | BB_003 | Seq_003_A | 78 | -2.08 | 6.5 |
| 5 | BB_005 | Seq_005_B | 95 | -2.12 | 5.4 |
### 3.3 Top Sequence: Seq_001_A
Seq_001_A (85 aa, MPNN score: -1.89) MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL
*Source: NVIDIA NIM via `NvidiaNIM_proteinmpnn`*
def validate_structure(tu, sequence):
"""Validate designed sequence by structure prediction."""
# Fast validation with ESMFold
predicted = tu.tools.NvidiaNIM_esmfold(sequence=sequence)
# Extract quality metrics
plddt = extract_plddt(predicted)
ptm = extract_ptm(predicted)
return {
'structure': predicted,
'mean_plddt': np.mean(plddt),
'ptm': ptm,
'passes': np.mean(plddt) > 70 and ptm > 0.7
}
| Metric | Threshold | Interpretation | |--------|-----------|----------------| | Mean pLDDT | >70 | Confident fold | | pTM | >0.7 | Good global topology | | RMSD to backbone | <2 Å | Design recapitulated |
## 4. Structure Validation
### 4.1 Validation Results
| Sequence | pLDDT | pTM | RMSD to Design | Status |
|----------|-------|-----|----------------|--------|
| Seq_001_A | 88.5 | 0.85 | 1.2 Å | ✓ PASS |
| Seq_002_C | 82.3 | 0.79 | 1.5 Å | ✓ PASS |
| Seq_001_B | 85.1 | 0.82 | 1.3 Å | ✓ PASS |
| Seq_003_A | 79.8 | 0.76 | 1.8 Å | ✓ PASS |
| Seq_005_B | 68.2 | 0.65 | 2.8 Å | ✗ FAIL |
### 4.2 Top Validated Design: Seq_001_A
| Region | Residues | pLDDT | Interpretation |
|--------|----------|-------|----------------|
| Helix 1 | 1-28 | 92.3 | Very high confidence |
| Loop 1 | 29-35 | 78.4 | Moderate confidence |
| Helix 2 | 36-58 | 91.8 | Very high confidence |
| Loop 2 | 59-65 | 75.2 | Moderate confidence |
| Helix 3 | 66-85 | 90.1 | Very high confidence |
**Overall**: Well-folded 3-helix bundle with high confidence core
*Source: NVIDIA NIM via `NvidiaNIM_esmfold`*
def assess_aggregation(sequence):
"""Assess aggregation propensity."""
# Calculate hydrophobic patches
# Calculate isoelectric point
# Identify aggregation-prone motifs
return {
'aggregation_score': score,
'hydrophobic_patches': patches,
'risk_level': 'Low' if score < 0.5 else 'Medium' if score < 0.7 else 'High'
}
| Metric | Favorable | Marginal | Unfavorable | |--------|-----------|----------|-------------| | Aggregation score | <0.5 | 0.5-0.7 | >0.7 | | Isoelectric point | 5-9 | 4-5 or 9-10 | <4 or >10 | | Hydrophobic patches | <3 | 3-5 | >5 | | Cysteine count | 0 or even | Odd | Multiple unpaired |
## 5. Developability Assessment
### 5.1 Developability Scores
| Design | Aggregation | pI | Cysteines | Expression | Overall |
|--------|-------------|-----|-----------|------------|---------|
| Seq_001_A | 0.32 (Low) | 6.2 | 0 | High | ★★★ |
| Seq_002_C | 0.45 (Low) | 5.8 | 2 (paired) | Medium | ★★☆ |
| Seq_001_B | 0.38 (Low) | 7.1 | 0 | High | ★★★ |
| Seq_003_A | 0.58 (Med) | 6.5 | 0 | Medium | ★★☆ |
### 5.2 Recommendations
**Best candidate for expression**: Seq_001_A
- Low aggregation propensity
- Neutral pI (easy purification)
- No cysteines (no misfolding risk)
- Predicted high E. coli expression
*Source: Sequence analysis*
# Therapeutic Protein Design Report: [TARGET]
**Generated**: [Date] | **Query**: [Original query] | **Status**: In Progress
---
## Executive Summary
[Designing...]
---
## 1. Target Characterization
### 1.1 Target Information
[Designing...]
### 1.2 Binding Epitope
[Designing...]
---
## 2. Backbone Generation
### 2.1 Design Parameters
[Designing...]
### 2.2 Generated Backbones
[Designing...]
---
## 3. Sequence Design
### 3.1 ProteinMPNN Results
[Designing...]
### 3.2 Top Sequences
[Designing...]
---
## 4. Structure Validation
### 4.1 ESMFold Validation
[Designing...]
### 4.2 Quality Metrics
[Designing...]
---
## 5. Developability Assessment
### 5.1 Scores
[Designing...]
### 5.2 Recommendations
[Designing...]
---
## 6. Final Candidates
### 6.1 Ranked List
[Designing...]
### 6.2 Sequences for Testing
[Designing...]
---
## 7. Experimental Recommendations
[Designing...]
---
## 8. Data Sources
[Will be populated...]
| Tier | Symbol | Criteria | |------|--------|----------| | T1 | ★★★ | pLDDT >85, pTM >0.8, low aggregation, neutral pI | | T2 | ★★☆ | pLDDT >75, pTM >0.7, acceptable developability | | T3 | ★☆☆ | pLDDT >70, pTM >0.65, developability concerns | | T4 | ☆☆☆ | Failed validation or major developability issues |
| Primary Tool | Fallback 1 | Fallback 2 |
|--------------|------------|------------|
| NvidiaNIM_rfdiffusion | Manual backbone design | Scaffold from PDB |
| NvidiaNIM_proteinmpnn | Rosetta ProteinMPNN | Manual sequence design |
| NvidiaNIM_esmfold | NvidiaNIM_alphafold2 | AlphaFold DB |
| PDB structure | NvidiaNIM_alphafold2 | AlphaFold DB |
See TOOLS_REFERENCE.md for complete tool documentation.
tools
Onboard and manage Paperclip AI for research-paper knowledge and agent orchestration
development
Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.
testing
Generate a structured scientific PDF report from a JSON description. Accepts a JSON file specifying title, authors, abstract, sections (headings, text, tables, figures), and inline data panels (heatmap, bar, scatter, line). Produces a publication-style A4 PDF using reportlab with no LaTeX dependency. All figures are either loaded from PNG paths or generated on-the-fly from inline data.
development
Execute arbitrary Python code and return stdout. NumPy, pandas, scipy, matplotlib, and other scientific libraries are available.