public/SKILLS/Scientific & Research Tools/torchdrug/SKILL.md
PyTorch-native graph neural networks for molecules and proteins. Use when building custom GNN architectures for drug discovery, protein modeling, or knowledge graph reasoning. Best for custom model development, protein property prediction, retrosynthesis. For pre-trained models and diverse featurizers use deepchem; for benchmark datasets use pytdc.
npx skillsauth add eric861129/skills_all-in-one torchdrugInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
TorchDrug is a comprehensive PyTorch-based machine learning toolbox for drug discovery and molecular science. Apply graph neural networks, pre-trained models, and task definitions to molecules, proteins, and biological knowledge graphs, including molecular property prediction, protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis planning, with 40+ curated datasets and 20+ model architectures.
This skill should be used when working with:
Data Types:
Tasks:
Libraries and Integration:
uv pip install torchdrug
# Or with optional dependencies
uv pip install torchdrug[full]
from torchdrug import datasets, models, tasks
from torch.utils.data import DataLoader
# Load molecular dataset
dataset = datasets.BBBP("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()
# Define GNN model
model = models.GIN(
input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256],
edge_input_dim=dataset.edge_feature_dim,
batch_norm=True,
readout="mean"
)
# Create property prediction task
task = tasks.PropertyPrediction(
model,
task=dataset.tasks,
criterion="bce",
metric=["auroc", "auprc"]
)
# Train with PyTorch
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
for epoch in range(100):
for batch in train_loader:
loss = task(batch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Predict chemical, physical, and biological properties of molecules from structure.
Use Cases:
Key Components:
Reference: See references/molecular_property_prediction.md for:
Work with protein sequences, structures, and properties.
Use Cases:
Key Components:
Reference: See references/protein_modeling.md for:
Predict missing links and relationships in biological knowledge graphs.
Use Cases:
Key Components:
Reference: See references/knowledge_graphs.md for:
Generate novel molecular structures with desired properties.
Use Cases:
Key Components:
Reference: See references/molecular_generation.md for:
Predict synthetic routes from target molecules to starting materials.
Use Cases:
Key Components:
Reference: See references/retrosynthesis.md for:
Comprehensive catalog of GNN architectures for different data types and tasks.
Available Models:
Reference: See references/models_architectures.md for:
40+ curated datasets spanning chemistry, biology, and knowledge graphs.
Categories:
Reference: See references/datasets.md for:
Scenario: Predict blood-brain barrier penetration for drug candidates.
Steps:
datasets.BBBP()PropertyPrediction with binary classificationNavigation: references/molecular_property_prediction.md → Dataset selection → Model selection → Training
Scenario: Predict enzyme function from sequence.
Steps:
datasets.EnzymeCommission()PropertyPrediction with multi-class classificationNavigation: references/protein_modeling.md → Model selection (sequence vs structure) → Pre-training strategies
Scenario: Find new disease treatments in Hetionet.
Steps:
datasets.Hetionet()KnowledgeGraphCompletionNavigation: references/knowledge_graphs.md → Hetionet dataset → Model selection → Biomedical applications
Scenario: Generate drug-like molecules optimized for target binding.
Steps:
Navigation: references/molecular_generation.md → Conditional generation → Multi-objective optimization
Scenario: Plan synthesis route for target molecule.
Steps:
datasets.USPTO50k()Navigation: references/retrosynthesis.md → Task types → Multi-step planning
Convert between TorchDrug molecules and RDKit:
from torchdrug import data
from rdkit import Chem
# SMILES → TorchDrug molecule
smiles = "CCO"
mol = data.Molecule.from_smiles(smiles)
# TorchDrug → RDKit
rdkit_mol = mol.to_molecule()
# RDKit → TorchDrug
rdkit_mol = Chem.MolFromSmiles(smiles)
mol = data.Molecule.from_molecule(rdkit_mol)
Use predicted structures:
from torchdrug import data
# Load AlphaFold predicted structure
protein = data.Protein.from_pdb("AF-P12345-F1-model_v4.pdb")
# Build graph with spatial edges
graph = protein.residue_graph(
node_position="ca",
edge_types=["sequential", "radius"],
radius_cutoff=10.0
)
Wrap tasks for Lightning training:
import pytorch_lightning as pl
class LightningTask(pl.LightningModule):
def __init__(self, torchdrug_task):
super().__init__()
self.task = torchdrug_task
def training_step(self, batch, batch_idx):
return self.task(batch)
def validation_step(self, batch, batch_idx):
pred = self.task.predict(batch)
target = self.task.target(batch)
return {"pred": pred, "target": target}
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
For deep dives into TorchDrug's architecture:
Core Concepts: See references/core_concepts.md for:
Choose Dataset:
references/datasets.md → Molecular sectionreferences/datasets.md → Protein sectionreferences/datasets.md → Knowledge graph sectionChoose Model:
references/models_architectures.md → GNN section → GIN/GAT/SchNetreferences/models_architectures.md → Protein section → ESMreferences/models_architectures.md → Protein section → GearNetreferences/models_architectures.md → KG section → RotatE/ComplExCommon Tasks:
references/molecular_property_prediction.md or references/protein_modeling.mdreferences/molecular_generation.mdreferences/retrosynthesis.mdreferences/knowledge_graphs.mdUnderstand Architecture:
references/core_concepts.md → Data Structuresreferences/core_concepts.md → Model Interfacereferences/core_concepts.md → Task InterfaceIssue: Dimension mismatch errors
→ Check model.input_dim matches dataset.node_feature_dim
→ See references/core_concepts.md → Essential Attributes
Issue: Poor performance on molecular tasks
→ Use scaffold splitting, not random
→ Try GIN instead of GCN
→ See references/molecular_property_prediction.md → Best Practices
Issue: Protein model not learning
→ Use pre-trained ESM for sequence tasks
→ Check edge construction for structure models
→ See references/protein_modeling.md → Training Workflows
Issue: Memory errors with large graphs
→ Reduce batch size
→ Use gradient accumulation
→ See references/core_concepts.md → Memory Efficiency
Issue: Generated molecules are invalid
→ Add validity constraints
→ Post-process with RDKit validation
→ See references/molecular_generation.md → Validation and Filtering
Official Documentation: https://torchdrug.ai/docs/ GitHub: https://github.com/DeepGraphLearning/torchdrug Paper: TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery
Navigate to the appropriate reference file based on your task:
molecular_property_prediction.mdprotein_modeling.mdknowledge_graphs.mdmolecular_generation.mdretrosynthesis.mdmodels_architectures.mddatasets.mdcore_concepts.mdEach reference provides comprehensive coverage of its domain with examples, best practices, and common use cases.
development
Run structured What-If scenario analysis with multi-branch possibility exploration. Use this skill when the user asks speculative questions like "what if...", "what would happen if...", "what are the possibilities", "explore scenarios", "scenario analysis", "possibility space", "what could go wrong", "best case / worst case", "risk analysis", "contingency planning", "strategic options", or any question about uncertain futures. Also trigger when the user faces a fork-in-the-road decision, wants to stress-test an idea, or needs to think through consequences before committing.
development
Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
development
Use when challenging ideas, plans, decisions, or proposals using structured critical reasoning. Invoke to play devil's advocate, run a pre-mortem, red team, or audit evidence and assumptions.
tools
Core skill for the deep research and writing tool. Write scientific manuscripts in full paragraphs (never bullet points). Use two-stage process with (1) section outlines with key points using research-lookup then (2) convert to flowing prose. IMRAD structure, citations (APA/AMA/Vancouver), figures/tables, reporting guidelines (CONSORT/STROBE/PRISMA), for research papers and journal submissions.