skills/boltzgen/SKILL.md
All-atom protein design using BoltzGen diffusion model. Use this skill when: (1) Need side-chain aware design from the start, (2) Designing around small molecules or ligands, (3) Want all-atom diffusion (not just backbone), (4) Require precise binding geometries, (5) Using YAML-based configuration. For backbone-only generation, use rfdiffusion. For sequence-only design, use proteinmpnn. For structure validation, use boltz.
npx skillsauth add adaptyvbio/protein-design-skills boltzgenInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
| Requirement | Minimum | Recommended | |-------------|---------|-------------| | Python | 3.10+ | 3.11 | | CUDA | 12.0+ | 12.1+ | | GPU VRAM | 24GB | 48GB (L40S) | | RAM | 32GB | 64GB |
First time? See Installation Guide to set up Modal and biomodals.
# Clone biomodals
git clone https://github.com/hgbrian/biomodals && cd biomodals
# Run BoltzGen (requires YAML config file)
modal run modal_boltzgen.py \
--input-yaml binder_config.yaml \
--protocol protein-anything \
--num-designs 50
# With custom GPU
GPU=L40S modal run modal_boltzgen.py \
--input-yaml binder_config.yaml \
--protocol protein-anything \
--num-designs 100
GPU: L40S (48GB) recommended | Timeout: 120min default
Available protocols: protein-anything, peptide-anything, protein-small_molecule, nanobody-anything, antibody-anything
git clone https://github.com/HannesStark/boltzgen.git
cd boltzgen
pip install -e .
python sample.py config=config.yaml
from boltzgen import BoltzGen
model = BoltzGen.load_pretrained()
designs = model.sample(
target_pdb="target.pdb",
num_samples=50,
binder_length=80
)
GPU: L40S (48GB) | Time: ~30-60s per design
| Parameter | Default | Description |
|-----------|---------|-------------|
| --input-yaml | required | Path to YAML design specification |
| --protocol | protein-anything | Design protocol |
| --num-designs | 10 | Number of designs to generate |
| --steps | all | Pipeline steps to run (e.g., design inverse_folding) |
BoltzGen uses an entity-based YAML format where you specify designed proteins and target structures as entities.
Important notes:
label_seq_id (1-indexed), not author residue numbersboltzgen check config.yaml to verify your specification before runningentities:
# Designed protein (variable length 80-140 residues)
- protein:
id: B
sequence: 80..140
# Target from structure file
- file:
path: target.cif
include:
- chain:
id: A
# Specify binding site residues (optional but recommended)
binding_types:
- chain:
id: A
binding: 45,67,89
entities:
- protein:
id: G
sequence: 60..100
- file:
path: 5cqg.cif
include:
- chain:
id: A
binding_types:
- chain:
id: A
binding: 343,344,251
structure_groups: "all"
entities:
- protein:
id: S
sequence: 10..14C6C3 # With cysteines for disulfide
- file:
path: target.cif
include:
- chain:
id: A
constraints:
- bond:
atom1: [S, 11, SG]
atom2: [S, 18, SG] # Disulfide bond
| Protocol | Use Case |
|----------|----------|
| protein-anything | Design proteins to bind proteins or peptides |
| peptide-anything | Design cyclic peptides to bind proteins |
| protein-small_molecule | Design proteins to bind small molecules |
| nanobody-anything | Design nanobody CDRs |
| antibody-anything | Design antibody CDRs |
output/
├── sample_0/
│ ├── design.cif # All-atom structure (CIF format)
│ ├── metrics.json # Confidence scores
│ └── sequence.fasta # Sequence
├── sample_1/
│ └── ...
└── summary.csv
Note: BoltzGen outputs CIF format. Convert to PDB if needed:
from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("design", "design.cif")
io = PDBIO()
io.set_structure(structure)
io.save("design.pdb")
$ modal run modal_boltzgen.py --input-yaml binder.yaml --protocol protein-anything --num-designs 10
Running: boltzgen run binder.yaml --output /tmp/out --protocol protein-anything --num_designs 10
[INFO] Loading BoltzGen model...
[INFO] Generating designs...
[INFO] Running inverse folding...
[INFO] Running structure prediction...
[INFO] Filtering and ranking...
[INFO] Pipeline complete
Results saved to: ./out/boltzgen/2501161234/
Output directory structure:
out/boltzgen/2501161234/
├── intermediate_designs/ # Raw diffusion outputs
│ ├── design_0.cif
│ └── design_0.npz
├── intermediate_designs_inverse_folded/
│ ├── refold_cif/ # Refolded complexes
│ └── aggregate_metrics_analyze.csv
└── final_ranked_designs/
├── final_10_designs/ # Top designs
└── results_overview.pdf # Summary plots
What good output looks like:
Should I use BoltzGen?
│
├─ What type of design?
│ ├─ All-atom precision needed → BoltzGen ✓
│ ├─ Ligand binding pocket → BoltzGen ✓
│ └─ Standard miniprotein → RFdiffusion (faster)
│
├─ What matters most?
│ ├─ Side-chain packing → BoltzGen ✓
│ ├─ Speed / diversity → RFdiffusion
│ ├─ Highest success rate → BindCraft
│ └─ AF2 optimization → ColabDesign
│
└─ Compute resources?
├─ Have L40S/A100 (48GB+) → BoltzGen ✓
└─ Only A10G (24GB) → Consider RFdiffusion
| Campaign Size | Time (L40S) | Cost (Modal) | Notes | |---------------|-------------|--------------|-------| | 50 designs | 30-45 min | ~$8 | Quick exploration | | 100 designs | 1-1.5h | ~$15 | Standard campaign | | 500 designs | 5-8h | ~$70 | Large campaign |
Per-design: ~30-60s for typical binder.
find output -name "*.cif" | wc -l # Should match num_samples
Verify config first: Always run boltzgen check config.yaml before running the full pipeline
Slow generation: Use fewer designs for initial testing, then scale up
OOM errors: Use A100-80GB or reduce --num-designs
Wrong binding site: Residue indices use label_seq_id (1-indexed), check in Molstar viewer
| Error | Cause | Fix |
|-------|-------|-----|
| RuntimeError: CUDA out of memory | Large design or long protein | Use A100-80GB or reduce designs |
| FileNotFoundError: *.cif | Target file not found | File paths are relative to YAML location |
| ValueError: invalid chain | Chain not in target | Verify chain IDs with Molstar or PyMOL |
| modal: command not found | Modal CLI not installed | Run pip install modal && modal setup |
Next: Validate with boltz or chai → protein-qc for filtering.
testing
Access UniProt for protein sequence and annotation retrieval. Use this skill when: (1) Looking up protein sequences by accession, (2) Finding functional annotations, (3) Getting domain boundaries, (4) Finding homologs and variants, (5) Cross-referencing to PDB structures. For structure retrieval, use pdb. For sequence design, use proteinmpnn.
development
Solubility-optimized protein sequence design using SolubleMPNN. Use this skill when: (1) Designing for E. coli expression, (2) Optimizing solubility of designed proteins, (3) Reducing aggregation propensity, (4) Need high-yield expression, (5) Avoiding inclusion body formation. For standard design, use proteinmpnn. For ligand-aware design, use ligandmpnn.
tools
First-time setup for protein design tools. Use this skill when: (1) User is new and hasn't run any tools yet, (2) Commands fail with "file not found" or "modal: command not found", (3) Modal authentication errors occur, (4) User asks how to get started or set up the environment, (5) biomodals directory is missing or tools aren't working.
development
Generate protein backbones using RFdiffusion, a diffusion-based generative model for de novo protein structure generation. Use this skill when: (1) Designing binder scaffolds for a target protein, (2) Generating novel protein backbones from scratch, (3) Scaffolding functional motifs into new proteins, (4) Specifying hotspot residues for interface design, (5) Creating symmetric oligomers. For sequence design after backbone generation, use proteinmpnn. For structure validation, use alphafold or chai. For QC thresholds, use protein-qc.