skills/tooluniverse-chemical-sourcing/SKILL.md
Find commercial sources for chemical compounds — PubChem/ChEMBL identity resolution then vendor catalog search across ZINC, Enamine, eMolecules, Mcule. Compares pricing, availability, and identifies purchasable analogs when an exact compound is not in stock. Use for chemical procurement, virtual library curation, and 'where can I buy X' questions for synthesis planning.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-chemical-sourcingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Pipeline for identifying, sourcing, and purchasing chemical compounds from commercial vendors. Resolves compound identity through PubChem/ChEMBL, searches multiple vendor databases (ZINC, Enamine, eMolecules, Mcule), compares pricing and availability, and identifies purchasable analogs when exact compounds are unavailable.
Guiding principles:
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Typical triggers:
Not this skill: For ADMET/toxicity assessment, use tooluniverse-admet-prediction. For drug-target interaction analysis, use tooluniverse-drug-target-validation.
| Database | Scope | Best For | |----------|-------|----------| | ZINC | 230M+ purchasable compounds; aggregates vendors | Broadest coverage; substructure/similarity search; free | | Enamine | ~4M in-stock, 30B+ REAL (make-on-demand) | Large in-stock library; fast delivery; building blocks | | eMolecules | Multi-vendor aggregator; 8M+ compounds | Cross-vendor comparison; pricing transparency | | Mcule | 40M+ compounds; one-stop purchasing | Integrated ordering; quote generation | | PubChem | 110M+ compounds; identity resolution | Authoritative compound identification; CID lookup | | ChEMBL | 2.4M+ bioactive molecules | Bioactivity context for sourced compounds |
Phase 0: Compound Identity Resolution
Name/SMILES/CAS -> PubChem CID -> canonical SMILES
|
Phase 1: Vendor Search
Query ZINC, Enamine, eMolecules, Mcule
|
Phase 2: Price & Availability Comparison
Catalog numbers, pricing, stock status, purity
|
Phase 3: Analog Search (if needed)
Similarity search for purchasable alternatives
|
Phase 4: Bioactivity Context (optional)
ChEMBL activity data for sourced compounds
|
Phase 5: Order Summary
Consolidated vendor comparison table
Objective: Establish unambiguous compound identity before vendor searches.
Tools:
PubChem_get_CID_by_compound_name -- resolve name to CID
name (compound name){IdentifierList: {CID: [...]}}PubChem_get_compound_properties_by_CID -- get SMILES, MW, formula
cid (PubChem CID), properties (comma-separated list){CID, MolecularWeight, ConnectivitySMILES, IUPACName}ChEMBL_get_molecule -- get ChEMBL compound details
molecule_chembl_id (ChEMBL ID) or search by nameWorkflow:
Important: PubChem ConnectivitySMILES (not CanonicalSMILES) is the correct property name. Always confirm the SMILES matches the intended compound before proceeding.
Objective: Search all available vendor databases for the target compound.
Tools:
ZINC_search_compounds -- search ZINC by name or SMILES
query (name or SMILES), optional catalog, limitZINC_get_compound -- get detailed compound info from ZINC
zinc_id (ZINC identifier)Enamine_search_catalog -- search Enamine catalog
query (name or SMILES), optional catalog_type, limitEnamine_get_compound -- get Enamine compound details
compound_id (Enamine catalog number)eMolecules_search -- search across multiple vendors
query (name or SMILES), optional limiteMolecules_get_compound -- get eMolecules compound details
compound_id (eMolecules ID)Mcule_get_compound -- search Mcule database
query (name or SMILES), optional limitMcule_get_compound -- get Mcule compound details
compound_id (Mcule ID)Workflow:
Tip: SMILES-based searches are more precise than name searches. If name search returns too many results, switch to SMILES.
Objective: Create a comparison table across vendors.
Compile from Phase 1 results:
| Field | Description | |-------|-------------| | Vendor | Company name | | Catalog # | Vendor-specific identifier | | Quantity | Available pack sizes | | Price | Per unit or per mg | | Purity | Stated purity grade (>95%, >98%, etc.) | | Stock | In-stock vs make-on-demand | | Delivery | Estimated delivery time |
Rank vendors by: (1) in-stock availability, (2) price per mg, (3) purity grade, (4) delivery time.
Objective: When the exact compound is unavailable, find purchasable structural analogs.
Triggered when:
Approach:
Objective: Provide biological activity data for context when sourcing compounds for research.
Tools:
ChEMBL_get_molecule -- get bioactivity summary
Useful when:
Vendor selection decision matrix — don't just list vendors, recommend one:
| Scenario | Best Vendor Strategy | Why | |----------|---------------------|-----| | Need it this week | In-stock vendor with fastest shipping | Make-on-demand takes 2-4 weeks minimum | | Budget-constrained | Cheapest per mg, accept lower purity (>95%) | Academic budgets are tight; >95% is fine for screening | | High-throughput screen | ZINC/Enamine for large libraries; mg quantities | Price per compound matters more than purity | | Assay validation | Highest purity (>98%) from reputable vendor | False positives from impurities waste months | | Building blocks for synthesis | Enamine (largest building block catalog) | Purpose-built for medicinal chemistry | | Exact compound unavailable | Analog search → check bioactivity (ChEMBL) → source best analog | Tanimoto > 0.85 likely retains activity; 0.7-0.85 may have different SAR |
Red flags when sourcing:
Generate a final sourcing report:
| Pattern | Description | Key Phases | |---------|-------------|------------| | Quick Availability Check | Is this compound purchasable? | 0, 1 | | Full Vendor Comparison | Compare all sources with pricing | 0, 1, 2, 5 | | Analog Discovery | Compound unavailable; find alternatives | 0, 1, 3, 5 | | Building Block Sourcing | Find reagents for synthesis | 0, 1, 2 | | Hit-to-Lead Sourcing | Source screening hits with bioactivity context | 0, 1, 2, 4, 5 |
| Evidence Grade | Criteria | Action | |----------------|----------|--------| | A -- High confidence | In-stock at 2+ vendors, purity >=98%, CoA available | Order directly | | B -- Moderate confidence | Single vendor or make-on-demand, purity >=95% | Request CoA, verify structure | | C -- Low confidence | No stock, purity unstated, or price outlier (>5x median) | Custom synthesis or analog search |
Interpreting vendor results:
Synthesis questions to address in the final report:
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.