skills/solublempnn/SKILL.md
# SolubleMPNN Solubility-Optimized Sequence Design ProteinMPNN variant trained to design sequences with higher solubility in aqueous solution. Reduces aggregation propensity and improves expression yields in E. coli and cell-free systems. ## Installation ```bash git clone https://github.com/dauparas/LigandMPNN cd LigandMPNN pip install -e . bash get_model_params.sh ``` ## When to Use SolubleMPNN vs ProteinMPNN | Scenario | Use | |----------|-----| | E. coli expression optimization | Soluble
npx skillsauth add lamm-mit/scienceclaw skills/solublempnnInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ProteinMPNN variant trained to design sequences with higher solubility in aqueous solution. Reduces aggregation propensity and improves expression yields in E. coli and cell-free systems.
git clone https://github.com/dauparas/LigandMPNN
cd LigandMPNN
pip install -e .
bash get_model_params.sh
| Scenario | Use | |----------|-----| | E. coli expression optimization | SolubleMPNN | | Reducing inclusion body formation | SolubleMPNN | | High-yield cell-free synthesis | SolubleMPNN | | Standard binder design (expression not bottleneck) | ProteinMPNN | | Membrane proteins | LigandMPNN (membrane model) | | Ligand-binding sites | LigandMPNN |
python3 LigandMPNN/run.py \
--model_type "soluble_mpnn" \
--checkpoint_path "model_params/solublempnn_v_48_002.pt" \
--pdb_path structure.pdb \
--out_folder output/ \
--number_of_batches 8 \
--batch_size 4 \
--temperature 0.1
python3 LigandMPNN/run.py \
--model_type "soluble_mpnn" \
--checkpoint_path "model_params/solublempnn_v_48_002.pt" \
--pdb_path complex.pdb \
--out_folder output/ \
--chains_to_design "B" \
--fixed_chains "A" \
--number_of_batches 16
import subprocess
from pathlib import Path
def design_soluble(pdb_path: str, out_folder: str, n_seqs: int = 64,
chains_to_design: list = None, fixed_chains: list = None,
temperature: float = 0.1) -> str:
"""Run SolubleMPNN and return output folder path."""
cmd = [
"python3", "LigandMPNN/run.py",
"--model_type", "soluble_mpnn",
"--checkpoint_path", "model_params/solublempnn_v_48_002.pt",
"--pdb_path", pdb_path,
"--out_folder", out_folder,
"--number_of_batches", str(n_seqs // 4),
"--batch_size", "4",
"--temperature", str(temperature),
]
if chains_to_design:
cmd += ["--chains_to_design", " ".join(chains_to_design)]
if fixed_chains:
cmd += ["--fixed_chains", " ".join(fixed_chains)]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
raise RuntimeError(f"SolubleMPNN failed: {result.stderr}")
return out_folder
Filter designed sequences using biophysical predictors:
from Bio.SeqUtils.ProtParam import ProteinAnalysis
def analyze_sequence(seq: str) -> dict:
analysis = ProteinAnalysis(seq)
return {
"instability_index": analysis.instability_index(), # <40 = stable
"gravy": analysis.gravy(), # <0 = hydrophilic
"isoelectric_point": analysis.isoelectric_point(),
"molecular_weight": analysis.molecular_weight(),
}
# Flag likely insoluble sequences
def is_likely_soluble(seq: str) -> bool:
props = analyze_sequence(seq)
return (
props["instability_index"] < 40 and
props["gravy"] < 0.1 and # not too hydrophobic
not any(seq[i:i+5].count("C") >= 3 for i in range(len(seq)-4)) # no cysteine clusters
)
import re
def check_liabilities(seq: str) -> list:
liabilities = []
if re.search(r"N[^P][ST]", seq):
liabilities.append("N-glycosylation site")
if re.search(r"NG|DG", seq):
liabilities.append("Deamidation motif")
if re.search(r"[KR]{3,}", seq):
liabilities.append("Polybasic cluster (proteolysis risk)")
if seq.count("C") > 3 and seq.count("C") % 2 != 0:
liabilities.append("Odd number of cysteines (unpaired disulfide)")
if re.search(r"[FWYI]{4,}", seq):
liabilities.append("Hydrophobic cluster (aggregation risk)")
return liabilities
python-codon-tables)Published results (Dauparas et al., 2023):
tools
Onboard and manage Paperclip AI for research-paper knowledge and agent orchestration
development
Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.
testing
Generate a structured scientific PDF report from a JSON description. Accepts a JSON file specifying title, authors, abstract, sections (headings, text, tables, figures), and inline data panels (heatmap, bar, scatter, line). Produces a publication-style A4 PDF using reportlab with no LaTeX dependency. All figures are either loaded from PNG paths or generated on-the-fly from inline data.
development
Execute arbitrary Python code and return stdout. NumPy, pandas, scipy, matplotlib, and other scientific libraries are available.