public/SKILLS/Scientific & Research Tools/chembl-database/SKILL.md
Query ChEMBL bioactive molecules and drug discovery data. Search compounds by structure/properties, retrieve bioactivity data (IC50, Ki), find inhibitors, perform SAR studies, for medicinal chemistry.
npx skillsauth add eric861129/skills_all-in-one chembl-databaseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ChEMBL is a manually curated database of bioactive molecules maintained by the European Bioinformatics Institute (EBI), containing over 2 million compounds, 19 million bioactivity measurements, 13,000+ drug targets, and data on approved drugs and clinical candidates. Access and query this data programmatically using the ChEMBL Python client for drug discovery and medicinal chemistry research.
This skill should be used when:
The ChEMBL Python client is required for programmatic access:
uv pip install chembl_webresource_client
from chembl_webresource_client.new_client import new_client
# Access different endpoints
molecule = new_client.molecule
target = new_client.target
activity = new_client.activity
drug = new_client.drug
Retrieve by ChEMBL ID:
molecule = new_client.molecule
aspirin = molecule.get('CHEMBL25')
Search by name:
results = molecule.filter(pref_name__icontains='aspirin')
Filter by properties:
# Find small molecules (MW <= 500) with favorable LogP
results = molecule.filter(
molecule_properties__mw_freebase__lte=500,
molecule_properties__alogp__lte=5
)
Retrieve target information:
target = new_client.target
egfr = target.get('CHEMBL203')
Search for specific target types:
# Find all kinase targets
kinases = target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
Query activities for a target:
activity = new_client.activity
# Find potent EGFR inhibitors
results = activity.filter(
target_chembl_id='CHEMBL203',
standard_type='IC50',
standard_value__lte=100,
standard_units='nM'
)
Get all activities for a compound:
compound_activities = activity.filter(
molecule_chembl_id='CHEMBL25',
pchembl_value__isnull=False
)
Similarity search:
similarity = new_client.similarity
# Find compounds similar to aspirin
similar = similarity.filter(
smiles='CC(=O)Oc1ccccc1C(=O)O',
similarity=85 # 85% similarity threshold
)
Substructure search:
substructure = new_client.substructure
# Find compounds containing benzene ring
results = substructure.filter(smiles='c1ccccc1')
Retrieve drug data:
drug = new_client.drug
drug_info = drug.get('CHEMBL25')
Get mechanisms of action:
mechanism = new_client.mechanism
mechanisms = mechanism.filter(molecule_chembl_id='CHEMBL25')
Query drug indications:
drug_indication = new_client.drug_indication
indications = drug_indication.filter(molecule_chembl_id='CHEMBL25')
Identify the target by searching by name:
targets = new_client.target.filter(pref_name__icontains='EGFR')
target_id = targets[0]['target_chembl_id']
Query bioactivity data for that target:
activities = new_client.activity.filter(
target_chembl_id=target_id,
standard_type='IC50',
standard_value__lte=100
)
Extract compound IDs and retrieve details:
compound_ids = [act['molecule_chembl_id'] for act in activities]
compounds = [new_client.molecule.get(cid) for cid in compound_ids]
Get drug information:
drug_info = new_client.drug.get('CHEMBL1234')
Retrieve mechanisms:
mechanisms = new_client.mechanism.filter(molecule_chembl_id='CHEMBL1234')
Find all bioactivities:
activities = new_client.activity.filter(molecule_chembl_id='CHEMBL1234')
Find similar compounds:
similar = new_client.similarity.filter(smiles='query_smiles', similarity=80)
Get activities for each compound:
for compound in similar:
activities = new_client.activity.filter(
molecule_chembl_id=compound['molecule_chembl_id']
)
Analyze property-activity relationships using molecular properties from results.
ChEMBL supports Django-style query filters:
__exact - Exact match__iexact - Case-insensitive exact match__contains / __icontains - Substring matching__startswith / __endswith - Prefix/suffix matching__gt, __gte, __lt, __lte - Numeric comparisons__range - Value in range__in - Value in list__isnull - Null/not null checkConvert results to pandas DataFrame for analysis:
import pandas as pd
activities = new_client.activity.filter(target_chembl_id='CHEMBL203')
df = pd.DataFrame(list(activities))
# Analyze results
print(df['standard_value'].describe())
print(df.groupby('standard_type').size())
The client automatically caches results for 24 hours. Configure caching:
from chembl_webresource_client.settings import Settings
# Disable caching
Settings.Instance().CACHING = False
# Adjust cache expiration (seconds)
Settings.Instance().CACHE_EXPIRE = 86400
Queries execute only when data is accessed. Convert to list to force execution:
# Query is not executed yet
results = molecule.filter(pref_name__icontains='aspirin')
# Force execution
results_list = list(results)
Results are paginated automatically. Iterate through all results:
for activity in new_client.activity.filter(target_chembl_id='CHEMBL203'):
# Process each activity
print(activity['molecule_chembl_id'])
# Identify kinase targets
kinases = new_client.target.filter(
target_type='SINGLE PROTEIN',
pref_name__icontains='kinase'
)
# Get potent inhibitors
for kinase in kinases[:5]: # First 5 kinases
activities = new_client.activity.filter(
target_chembl_id=kinase['target_chembl_id'],
standard_type='IC50',
standard_value__lte=50
)
# Get approved drugs
drugs = new_client.drug.filter()
# For each drug, find all targets
for drug in drugs[:10]:
mechanisms = new_client.mechanism.filter(
molecule_chembl_id=drug['molecule_chembl_id']
)
# Find compounds with desired properties
candidates = new_client.molecule.filter(
molecule_properties__mw_freebase__range=[300, 500],
molecule_properties__alogp__lte=5,
molecule_properties__hba__lte=10,
molecule_properties__hbd__lte=5
)
Ready-to-use Python functions demonstrating common ChEMBL query patterns:
get_molecule_info() - Retrieve molecule details by IDsearch_molecules_by_name() - Name-based molecule searchfind_molecules_by_properties() - Property-based filteringget_bioactivity_data() - Query bioactivities for targetsfind_similar_compounds() - Similarity searchingsubstructure_search() - Substructure matchingget_drug_info() - Retrieve drug informationfind_kinase_inhibitors() - Specialized kinase inhibitor searchexport_to_dataframe() - Convert results to pandas DataFrameConsult this script for implementation details and usage examples.
Comprehensive API documentation including:
Refer to this document when detailed API information is needed or when troubleshooting queries.
data_validity_comment field in activity recordspotential_duplicate flagspchembl_value provides normalized activity (-log scale)standard_type to understand measurement type (IC50, Ki, EC50, etc.)development
Run structured What-If scenario analysis with multi-branch possibility exploration. Use this skill when the user asks speculative questions like "what if...", "what would happen if...", "what are the possibilities", "explore scenarios", "scenario analysis", "possibility space", "what could go wrong", "best case / worst case", "risk analysis", "contingency planning", "strategic options", or any question about uncertain futures. Also trigger when the user faces a fork-in-the-road decision, wants to stress-test an idea, or needs to think through consequences before committing.
development
Access comprehensive LaTeX templates, formatting requirements, and submission guidelines for major scientific publication venues (Nature, Science, PLOS, IEEE, ACM), academic conferences (NeurIPS, ICML, CVPR, CHI), research posters, and grant proposals (NSF, NIH, DOE, DARPA). This skill should be used when preparing manuscripts for journal submission, conference papers, research posters, or grant proposals and need venue-specific formatting requirements and templates.
development
Use when challenging ideas, plans, decisions, or proposals using structured critical reasoning. Invoke to play devil's advocate, run a pre-mortem, red team, or audit evidence and assumptions.
tools
Core skill for the deep research and writing tool. Write scientific manuscripts in full paragraphs (never bullet points). Use two-stage process with (1) section outlines with key points using research-lookup then (2) convert to flowing prose. IMRAD structure, citations (APA/AMA/Vancouver), figures/tables, reporting guidelines (CONSORT/STROBE/PRISMA), for research papers and journal submissions.