skills/tooluniverse-chemical-compound-retrieval/SKILL.md
Retrieve chemical compound data from PubChem and ChEMBL with disambiguation, cross-referencing, and stereochemistry handling. Use for resolving compound names to SMILES/InChI/CID/ChEMBL IDs, fetching molecular properties, distinguishing isomers/stereo forms, and cross-validating identity across databases. Always use English compound names; flags ambiguous queries (e.g., Vitamin D has multiple forms).
npx skillsauth add mims-harvard/tooluniverse tooluniverse-chemical-compound-retrievalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Retrieve comprehensive chemical compound data with proper disambiguation and cross-database validation.
LOOK UP DON'T GUESS: Never assume a CID, ChEMBL ID, or molecular property value. Always retrieve from PubChem/ChEMBL.
English-first: Always use English compound names in tool calls. Respond in user's language.
"Aspirin" = one compound. "Vitamin D" = multiple forms (D2/D3/active metabolite). For generic class names (steroids, vitamins, acids), present candidates and confirm before proceeding.
Phase 0: Clarify (only if highly ambiguous -- skip for unambiguous names or specific IDs)
Phase 1: Disambiguate → resolve PubChem CID + ChEMBL ID
Phase 2: Retrieve data (silent)
Phase 3: Report compound profile
# By name
result = tu.tools.PubChem_get_CID_by_compound_name(compound_name=name)
# By SMILES
result = tu.tools.PubChem_get_CID_by_SMILES(smiles=smiles)
# Cross-reference
chembl_result = tu.tools.ChEMBL_search_compounds(query=name, limit=5)
Verify: CID + ChEMBL ID + canonical SMILES + stereochemistry + salt forms.
PubChem: PubChem_get_compound_properties_by_CID, PubChemBioAssay_get_assay_summary, PubChemTox_get_acute_effects, PubChem_get_compound_2D_image_by_CID
ChEMBL: ChEMBL_get_compound_record_activities, ChEMBL_get_molecule_targets, ChEMBL_get_assay_activities
Optional: PubChem_get_associated_patents_by_CID, PubChem_search_compounds_by_similarity
Compound Profile with: Identity (CID, ChEMBL ID, IUPAC, SMILES), Chemical Properties (MW, LogP, HBD, HBA, PSA, Lipinski), Bioactivity (targets, IC50/Ki), Drug Info (if approved), Data Sources.
| Primary | Fallback | |---------|----------| | PubChem name lookup | ChEMBL search → SMILES → PubChem_get_CID_by_SMILES | | ChEMBL bioactivity | PubChem bioassay summary | | Drug label | Note "unavailable" |
| Grade | Criteria | |-------|----------| | Confirmed | CID + ChEMBL cross-match, InChI/SMILES agree | | Probable | CID found, partial ChEMBL match | | Uncertain | Single database only, or multiple CIDs | | Unverified | No cross-reference, single-source |
Bioactivity: ChEMBL > PubChem BioAssay for curated data. IC50/Ki < 100nM = potent, 100nM-1uM = moderate, >10uM = weak. Lipinski violations reduce oral bioavailability but don't disqualify.
Always verify novel SMILES: python3 src/tooluniverse/tools/smiles_verifier.py --smiles "SMILES_STRING". Invalid SMILES produce wrong results or cryptic errors.
PubChem: PubChem_get_CID_by_compound_name, PubChem_get_CID_by_SMILES, PubChem_get_compound_properties_by_CID, PubChem_get_compound_2D_image_by_CID, PubChemBioAssay_get_assay_summary, PubChemTox_get_acute_effects, PubChem_get_associated_patents_by_CID, PubChem_search_compounds_by_similarity, PubChem_search_compounds_by_substructure
ChEMBL: ChEMBL_search_drugs, ChEMBL_get_molecule, ChEMBL_get_activity, ChEMBL_get_target, ChEMBL_search_targets, ChEMBL_search_assays
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).