chemoinformatics/retrosynthesis/SKILL.md
Performs retrosynthetic planning using AiZynthFinder (MCTS, template-based), Chemformer (template-free transformer), ASKCOS, and emerging RetroSynFormer with explicit handling of route scoring, building-block availability (eMolecules, Enamine, Mcule), forward prediction validation (Molecular Transformer), and disconnection-aware multi-objective search (MO-MCTS). Use when assessing synthetic feasibility of generated or selected molecules, planning multi-step syntheses, building synthesis-aware design pipelines, or screening libraries for retro-route feasibility.
npx skillsauth add GPTomics/bioSkills bio-retrosynthesisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: AiZynthFinder 4.4+, Chemformer 1.3+, RDKit 2024.09+, RDChiral 1.1+, Aizynthtrain 1.0+, ASKCOS Lite 0.5+.
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturesaizynthcli --versionIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Plan synthetic routes from a target molecule back to commercially-available building blocks. AiZynthFinder 4.0 (Saigiridharan, Hassen, Lai, Torren-Peraire, Engkvist, Genheden 2024 J Cheminform 16:57, AstraZeneca) is the open-source production-grade tool: Monte Carlo Tree Search (MCTS) + template-based expansion + multi-objective scoring (MO-MCTS). Chemformer (Irwin 2022) is template-free transformer alternative. ASKCOS (MIT) is the academic reference. Modern best practice combines retrosynthesis with forward validation (the predicted route should also predict the target from starting materials via Molecular Transformer) and building-block availability (eMolecules, Enamine, Mcule, ZINC catalog).
For generative design pipelines that need synthetic feasibility, see chemoinformatics/generative-design. For reaction enumeration (forward direction), see chemoinformatics/reaction-enumeration.
| Tool | Approach | Strength | Fails when | |------|----------|----------|------------| | AiZynthFinder 4.0 | Template-based MCTS | Open, scalable, well-validated | Beyond template coverage | | Chemformer | Template-free transformer | Novel disconnections | Less interpretable; harder to debug | | ASKCOS | Template-based + neural | MIT-quality academic standard | Setup complexity | | Molecular Transformer | Forward + retro transformer | Single SMILES-to-SMILES | Less robust to non-training distribution | | RetroSynFormer | Decision transformer | Modern method | Limited adoption | | IBM RXN | Cloud service | High quality, easy interface | API access required | | BKMS_MTHRO / RetroPath | Pathway-based | Metabolic / biochemical | Not for general medchem | | SyntheMol (StanfordGarbage) | Specialized for medchem | Public domain alternative | Limited tooling |
Decision: For most users, AiZynthFinder 4.4 with USPTO + USPTO-50k templates is the open-source standard. For high-stakes routes, validate with Molecular Transformer forward prediction.
| Scenario | Tool | Notes | |----------|------|-------| | Standard medchem target | AiZynthFinder default templates | USPTO/Reaxys templates | | Novel chemotype | AiZynthFinder + Chemformer template-free fallback | Combine both | | Generated molecules (REINVENT output) | AiZynthFinder batch | Filter to feasible routes | | Multi-step synthesis planning | AiZynthFinder + manual review | Top-K routes | | Validate generated route | Molecular Transformer forward | Check round-trip | | Cost-aware synthesis | AiZynthFinder + custom building-block pricing | Score weight | | Disconnection-aware design (DAD) | AiZynthFinder MO-MCTS | Multi-objective | | Patent-aware routes | Custom template exclusion | Specialized |
Goal: Configure AiZynthFinder with USPTO templates + a building-block stock and run MCTS retrosynthesis planning on a target SMILES.
Approach: Build a configuration dict pointing to policy templates (ONNX + CSV) and a stock H5, instantiate AiZynthFinder, set the target SMILES, then call tree_search() followed by build_routes().
from aizynthfinder.aizynthfinder import AiZynthFinder
config_dict = {
'policy': {
'files': {
'uspto': ['policy/uspto_model.onnx', 'templates/uspto_templates.csv'],
}
},
'stock': {
'files': {
'zinc': 'stock/zinc.h5',
}
},
'finder': {
'algorithm': 'mcts',
'iteration_limit': 100,
'time_limit': 120,
}
}
finder = AiZynthFinder(configdict=config_dict)
finder.target_smiles = 'CC(=O)Nc1ccc(C(=O)Nc2cccc(C(F)(F)F)c2)cc1'
finder.tree_search()
finder.build_routes()
Output: list of routes, each with depth, building blocks, score, leaf nodes.
for route in finder.routes:
print(f'Depth: {route.depth}, Score: {route.score:.2f}')
print(f'In-stock: {sum(node.in_stock for node in route.leafs())}')
print(f'Building blocks: {[node.smiles for node in route.leafs()]}')
Critical metrics:
AiZynthFinder 4.0 supports multi-objective scoring (Saigiridharan et al 2024 J Cheminform 16:57):
config_dict['finder']['algorithm'] = 'mo_mcts'
config_dict['finder']['mo_mcts'] = {
'objectives': [
{'name': 'state_score', 'weight': 0.5}, # default state score
{'name': 'broken_bonds_score', 'weight': 0.3}, # complexity reduction
{'name': 'route_length', 'weight': 0.2, 'maximize': False}, # shorter
]
}
State score: probability the current state can be solved. Broken bonds: each step should reduce molecular complexity. Route length: shorter is better.
| Stock | Compounds | Source | Cost-tier | |-------|-----------|--------|-----------| | ZINC clean leads | 250k | ZINC22 catalog | Various commercial | | Enamine Building Blocks | 200k+ | Enamine | $$ | | Enamine REAL | 29B (make-on-demand) | Enamine | $$$ | | Mcule | 25M | Mcule | $$ | | eMolecules | 16M | eMolecules | $$ | | ChemBridge | 1M | ChemBridge | $$ |
AiZynthFinder accepts stocks as HDF5 (built via aizynthtrain):
aizynthtrain build-stock --input zinc_building_blocks.smi --output zinc.h5
AiZynthFinder predicts retrosynthesis (target -> precursors); Molecular Transformer predicts forward (precursors -> product). Validating the round-trip:
from molecular_transformer import predict_forward
precursors = route.leafs() # building blocks from retro
predicted_product = predict_forward(precursors)
match = (Chem.CanonSmiles(predicted_product) ==
Chem.CanonSmiles(finder.target_smiles))
Routes where the forward prediction reproduces the target are highest confidence. ~30-50% of AiZynthFinder routes pass forward validation (Saigiridharan, Genheden et al 2024 J Cheminform 16:57).
Chemformer uses a Transformer (BART) trained on USPTO reactions for SMILES-to-SMILES:
from chemformer import Chemformer
cf = Chemformer.load_pretrained('USPTO_RETROSYNTHESIS_TEMPLATE_FREE')
predictions = cf.predict('CC(=O)Nc1ccc(C(=O)Nc2cccc(C(F)(F)F)c2)cc1',
beam_search=10)
Output: 10 predicted precursor SMILES. No templates required; can predict novel disconnections.
Trade-off: Template-free is more flexible but harder to debug. Combining with AiZynthFinder template MCTS gives best of both.
Modify generative design to also score retrosynthetic feasibility. AiZynthFinder batch mode for 1000+ molecules.
Goal: Add retrosynthetic feasibility scoring to generative design pipelines for hundreds-to-thousands of candidate molecules.
Approach: Batch-process generated SMILES through aizynthcli, classify each compound by route depth and in-stock leaf count, and feed feasibility back into the generative scoring function.
aizynthcli --smiles compounds.smi --output routes.json \
--config config.yaml --policy uspto --stock zinc
For each compound, returns top-K routes. Score-feasibility for generative design:
Add building-block pricing as objective:
def route_cost(route, price_db):
total = 0
for leaf in route.leafs():
smi = Chem.CanonSmiles(leaf.smiles)
if smi in price_db:
total += price_db[smi]
return total
Combine with step-cost estimate (typical: ~$500-2000 per synthesis step).
Trigger: Target molecule uses bond formation not in training reactions.
Mechanism: USPTO templates are biased toward common transformations; novel chemistry (organometallics, exotic heterocycles) missing.
Symptom: No solved route or route uses unsuitable simplifications.
Fix: Augment templates from Reaxys; combine with Chemformer; manual review.
Trigger: Default Chemformer output.
Mechanism: Transformer can produce non-canonical SMILES variants.
Symptom: SMILES round-trip fails; validation tools confused.
Fix: Canonicalize Chemformer output via RDKit before comparing.
Trigger: Leaf node not in stock database.
Mechanism: AiZynthFinder tree may end on non-purchasable molecules.
Symptom: Route "complete" but route has non-stock leaves.
Fix: Filter routes by in_stock_only=True; or expand stock to include theoretical building blocks (Enamine REAL).
Trigger: Complex target requiring deep tree search.
Mechanism: MCTS may not find route in default 100 iterations.
Symptom: No routes returned despite plausible target.
Fix: Increase iteration_limit=500 and time_limit=600; consider divide-and-conquer for complex targets.
Trigger: Retro route uses chemistry that doesn't actually work in forward.
Mechanism: Template-based retro lacks reaction conditions / catalysts; forward prediction more conservative.
Symptom: Forward predicts different product than target.
Fix: Use as confidence signal, not rejection; many routes don't round-trip but are still valid synthesis-wise.
Trigger: Old ZINC catalog used; building blocks no longer purchasable.
Mechanism: Commercial catalogs change quarterly.
Symptom: Routes recommend unavailable building blocks.
Fix: Use Enamine REAL or recent ZINC22 for current stock; verify with vendor before synthesis.
| Aspect | AiZynthFinder | Chemformer | |--------|---------------|------------| | Approach | Templates + MCTS | Transformer encoder-decoder | | Speed | Fast for shallow trees | Single-pass per target | | Interpretability | High (template + atom mapping) | Low (black box) | | Novel disconnections | Limited | Better | | Production maturity | High | Medium | | Cost | CPU | GPU recommended |
For comprehensive coverage, run both and merge unique routes.
| Symptom | Cause | Fix |
|---------|-------|-----|
| tree_search() returns no routes | Target outside template coverage | Increase iterations; try Chemformer |
| All routes depth > 8 | Complex target | Likely correct; review manually |
| Route says "solved" but leaves not in stock | Stock incomplete | Update stock; or set in_stock_only=True |
| Building block price not found | Compound not in pricing DB | Use Enamine quote or vendor inquiry |
| Chemformer truncates SMILES | Token limit | Increase max_length |
| Forward prediction wrong | Out-of-distribution reaction | Use as confidence signal only |
| MCTS slow on simple target | Default config | Reduce time_limit; use smaller template set |
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.