scientific-skills/zinc-database/SKILL.md
Access ZINC (230M+ purchasable compounds). Search by ZINC ID/SMILES, similarity searches, 3D-ready structures for docking, analog discovery, for virtual screening and drug discovery.
npx skillsauth add K-Dense-AI/claude-scientific-skills zinc-databaseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ZINC is a freely accessible repository of 230M+ purchasable compounds maintained by UCSF. Search by ZINC ID or SMILES, perform similarity searches, download 3D-ready structures for docking, discover analogs for virtual screening and drug discovery.
This skill should be used when:
ZINC has evolved through multiple versions:
This skill primarily focuses on ZINC22, the most current and comprehensive version.
Primary access point: https://zinc.docking.org/ Interactive searching: https://cartblanche22.docking.org/
All ZINC22 searches can be performed programmatically via the CartBlanche22 API:
Base URL: https://cartblanche22.docking.org/
All API endpoints return data in text or JSON format with customizable fields.
Retrieve specific compounds using their ZINC identifiers.
Web interface: https://cartblanche22.docking.org/search/zincid
API endpoint:
curl "https://cartblanche22.docking.org/[email protected]_fields=smiles,zinc_id"
Multiple IDs:
curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001,ZINC000000000002&output_fields=smiles,zinc_id,tranche"
Response fields: zinc_id, smiles, sub_id, supplier_code, catalogs, tranche (includes H-count, LogP, MW, phase)
Find compounds by chemical structure using SMILES notation, with optional distance parameters for analog searching.
Web interface: https://cartblanche22.docking.org/search/smiles
API endpoint:
curl "https://cartblanche22.docking.org/[email protected]=4-Fadist=4"
Parameters:
smiles: Query SMILES string (URL-encoded if necessary)dist: Tanimoto distance threshold (default: 0 for exact match)adist: Alternative distance parameter for broader searches (default: 0)output_fields: Comma-separated list of desired output fieldsExample - Exact match:
curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1"
Example - Similarity search:
curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1&dist=3&output_fields=zinc_id,smiles,tranche"
Query compounds from specific chemical suppliers or retrieve all molecules from particular catalogs.
Web interface: https://cartblanche22.docking.org/search/catitems
API endpoint:
curl "https://cartblanche22.docking.org/catitems.txt:catitem_id=SUPPLIER-CODE-123"
Use cases:
Generate random compound sets for screening or benchmarking purposes.
Web interface: https://cartblanche22.docking.org/search/random
API endpoint:
curl "https://cartblanche22.docking.org/substance/random.txt:count=100"
Parameters:
count: Number of random compounds to retrieve (default: 100)subset: Filter by subset (e.g., 'lead-like', 'drug-like', 'fragment')output_fields: Customize returned data fieldsExample - Random lead-like molecules:
curl "https://cartblanche22.docking.org/substance/random.txt:count=1000&subset=lead-like&output_fields=zinc_id,smiles,tranche"
Define search criteria based on target properties or desired chemical space
Query ZINC22 using appropriate search method:
# Example: Get drug-like compounds with specific LogP and MW
curl "https://cartblanche22.docking.org/substance/random.txt:count=10000&subset=drug-like&output_fields=zinc_id,smiles,tranche" > docking_library.txt
Parse results to extract ZINC IDs and SMILES:
import pandas as pd
# Load results
df = pd.read_csv('docking_library.txt', sep='\t')
# Filter by properties in tranche data
# Tranche format: H##P###M###-phase
# H = H-bond donors, P = LogP*10, M = MW
Download 3D structures for docking using ZINC ID or download from file repositories
Obtain SMILES of the hit compound:
hit_smiles = "CC(C)Cc1ccc(cc1)C(C)C(=O)O" # Example: Ibuprofen
Perform similarity search with distance threshold:
curl "https://cartblanche22.docking.org/smiles.txt:smiles=CC(C)Cc1ccc(cc1)C(C)C(=O)O&dist=5&output_fields=zinc_id,smiles,catalogs" > analogs.txt
Analyze results to identify purchasable analogs:
import pandas as pd
analogs = pd.read_csv('analogs.txt', sep='\t')
print(f"Found {len(analogs)} analogs")
print(analogs[['zinc_id', 'smiles', 'catalogs']].head(10))
Retrieve 3D structures for the most promising analogs
Compile list of ZINC IDs from literature, databases, or previous screens:
zinc_ids = [
"ZINC000000000001",
"ZINC000000000002",
"ZINC000000000003"
]
zinc_ids_str = ",".join(zinc_ids)
Query ZINC22 API:
curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001,ZINC000000000002&output_fields=zinc_id,smiles,supplier_code,catalogs"
Process results for downstream analysis or purchasing
Select subset parameters based on screening goals:
Generate random sample:
curl "https://cartblanche22.docking.org/substance/random.txt:count=5000&subset=lead-like&output_fields=zinc_id,smiles,tranche" > chemical_space_sample.txt
Analyze chemical diversity and prepare for virtual screening
Customize API responses with the output_fields parameter:
Available fields:
zinc_id: ZINC identifiersmiles: SMILES string representationsub_id: Internal substance IDsupplier_code: Vendor catalog numbercatalogs: List of suppliers offering the compoundtranche: Encoded molecular properties (H-count, LogP, MW, reactivity phase)Example:
curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001&output_fields=zinc_id,smiles,catalogs,tranche"
ZINC organizes compounds into "tranches" based on molecular properties:
Format: H##P###M###-phase
Example tranche: H05P035M400-0
Use tranche data to filter compounds by drug-likeness criteria.
For molecular docking, 3D structures are available via file repositories:
File repository: https://files.docking.org/zinc22/
Structures are organized by tranches and available in multiple formats:
Refer to ZINC documentation at https://wiki.docking.org for downloading protocols and batch access methods.
import subprocess
import json
def query_zinc_by_id(zinc_id, output_fields="zinc_id,smiles,catalogs"):
"""Query ZINC22 by ZINC ID."""
url = f"https://cartblanche22.docking.org/[email protected]_id={zinc_id}&output_fields={output_fields}"
result = subprocess.run(['curl', url], capture_output=True, text=True)
return result.stdout
def search_by_smiles(smiles, dist=0, adist=0, output_fields="zinc_id,smiles"):
"""Search ZINC22 by SMILES with optional distance parameters."""
url = f"https://cartblanche22.docking.org/smiles.txt:smiles={smiles}&dist={dist}&adist={adist}&output_fields={output_fields}"
result = subprocess.run(['curl', url], capture_output=True, text=True)
return result.stdout
def get_random_compounds(count=100, subset=None, output_fields="zinc_id,smiles,tranche"):
"""Get random compounds from ZINC22."""
url = f"https://cartblanche22.docking.org/substance/random.txt:count={count}&output_fields={output_fields}"
if subset:
url += f"&subset={subset}"
result = subprocess.run(['curl', url], capture_output=True, text=True)
return result.stdout
import pandas as pd
from io import StringIO
# Query ZINC and parse as DataFrame
result = query_zinc_by_id("ZINC000000000001")
df = pd.read_csv(StringIO(result), sep='\t')
# Extract tranche properties
def parse_tranche(tranche_str):
"""Parse ZINC tranche code to extract properties."""
# Format: H##P###M###-phase
import re
match = re.match(r'H(\d+)P(\d+)M(\d+)-(\d+)', tranche_str)
if match:
return {
'h_donors': int(match.group(1)),
'logP': int(match.group(2)) / 10.0,
'mw': int(match.group(3)),
'phase': int(match.group(4))
}
return None
df['tranche_props'] = df['tranche'].apply(parse_tranche)
Comprehensive documentation including:
Consult this document for detailed technical information and advanced usage patterns.
ZINC explicitly states: "We do not guarantee the quality of any molecule for any purpose and take no responsibility for errors arising from the use of this database."
When using ZINC in publications, cite the appropriate version:
ZINC22: Irwin, J. J., et al. "ZINC22—A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery." Journal of Chemical Information and Modeling 2023.
ZINC15: Irwin, J. J., et al. "ZINC15 – Ligand Discovery for Everyone." Journal of Chemical Information and Modeling 2020, 60, 6065–6073.
development
Spectral similarity and compound identification for metabolomics. Use for comparing mass spectra, computing similarity scores (cosine, modified cosine), and identifying unknown compounds from spectral libraries. Best for metabolite identification, spectral matching, library searching. For full LC-MS/MS proteomics pipelines use pyopenms.
development
Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more.
development
Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.
testing
Comprehensive markdown and Mermaid diagram writing skill. Use when creating any scientific document, report, analysis, or visualization. Establishes text-based diagrams as the default documentation standard with full style guides (markdown + mermaid), 24 diagram type references, and 9 document templates.