skills/structural-biology-drug-discovery/fda-database/SKILL.md
Query openFDA REST API for adverse events (FAERS), labeling, product info, recalls, enforcement. Search by drug name, ingredient, MedDRA, or NDC. 1k req/day no key; 120k with free key. For trials use clinicaltrials-database-search; for structures use drugbank-database-access or chembl-database-bioactivity.
npx skillsauth add jaechang-hits/scicraft fda-databaseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
openFDA provides public access to FDA regulatory data through a simple REST API. Key datasets include the FDA Adverse Event Reporting System (FAERS) with 20M+ adverse event reports, drug product labeling (NDC, SPL), drug approvals (Drugs@FDA), medical device reports, and recall enforcement actions. The API supports full-text search and structured queries using Elasticsearch-style syntax.
clinicaltrials-database-search; for drug structures/targets use drugbank-database-accessrequests, pandaspip install requests pandas
import requests
BASE = "https://api.fda.gov/drug"
# Optional: add api_key parameter for higher rate limits
# Find adverse events for aspirin
r = requests.get(
f"{BASE}/event.json",
params={
"search": 'patient.drug.medicinalproduct:"aspirin"',
"count": "patient.reaction.reactionmeddrapt.exact",
"limit": 10
}
)
r.raise_for_status()
data = r.json()
print("Top adverse reactions for aspirin:")
for item in data["results"][:5]:
print(f" {item['term']:40s} count={item['count']}")
Search the FDA Adverse Event Reporting System for drug-event associations.
import requests, pandas as pd
BASE = "https://api.fda.gov/drug"
def faers_search(drug_name, limit=100):
"""Search FAERS for adverse event reports mentioning a drug."""
r = requests.get(f"{BASE}/event.json",
params={"search": f'patient.drug.medicinalproduct:"{drug_name}"',
"limit": limit})
r.raise_for_status()
return r.json()
data = faers_search("warfarin", limit=5)
total = data["meta"]["results"]["total"]
print(f"Total FAERS reports for warfarin: {total:,}")
# Show first report summary
report = data["results"][0]
print(f"\nReport {report['safetyreportid']}:")
print(f" Date : {report.get('receivedate', 'n/a')}")
print(f" Serious : {report.get('serious', 'n/a')}")
drugs = [d.get("medicinalproduct", "n/a") for d in report.get("patient", {}).get("drug", [])]
print(f" Drugs : {drugs[:5]}")
reactions = [r.get("reactionmeddrapt", "n/a") for r in report.get("patient", {}).get("reaction", [])]
print(f" Reactions: {reactions[:5]}")
Use the count parameter to aggregate adverse event terms.
import requests, pandas as pd
BASE = "https://api.fda.gov/drug"
def top_adverse_events(drug_name, limit=20):
"""Get the most frequently reported adverse events for a drug."""
r = requests.get(f"{BASE}/event.json",
params={
"search": f'patient.drug.medicinalproduct:"{drug_name}"',
"count": "patient.reaction.reactionmeddrapt.exact",
"limit": limit
})
r.raise_for_status()
results = r.json()["results"]
return pd.DataFrame(results).rename(columns={"term": "reaction", "count": "reports"})
df_atorvastatin = top_adverse_events("atorvastatin", limit=15)
print("Top adverse events for atorvastatin:")
print(df_atorvastatin.head(10).to_string(index=False))
df_atorvastatin.to_csv("atorvastatin_adverse_events.csv", index=False)
# Compare two drugs: adverse event profile overlap
df_drug1 = top_adverse_events("simvastatin", limit=20)
df_drug2 = top_adverse_events("atorvastatin", limit=20)
common = set(df_drug1["reaction"]) & set(df_drug2["reaction"])
print(f"\nCommon adverse events (simvastatin ∩ atorvastatin): {len(common)}")
print("Shared reactions:", list(common)[:10])
Retrieve official drug labels (indications, warnings, dosing, contraindications).
import requests
BASE = "https://api.fda.gov/drug"
def get_label(drug_name):
"""Retrieve FDA drug label by brand or generic name."""
r = requests.get(f"{BASE}/label.json",
params={"search": f'openfda.brand_name:"{drug_name}"',
"limit": 1})
if r.status_code == 404:
r = requests.get(f"{BASE}/label.json",
params={"search": f'openfda.generic_name:"{drug_name}"',
"limit": 1})
r.raise_for_status()
results = r.json()["results"]
return results[0] if results else None
label = get_label("Lipitor")
if label:
print(f"Brand name : {label.get('openfda', {}).get('brand_name', ['n/a'])[0]}")
print(f"Generic name: {label.get('openfda', {}).get('generic_name', ['n/a'])[0]}")
print(f"Manufacturer: {label.get('openfda', {}).get('manufacturer_name', ['n/a'])[0]}")
indications = label.get("indications_and_usage", ["n/a"])[0]
print(f"\nIndications (first 300 chars):\n{indications[:300]}...")
Retrieve marketed product information by National Drug Code.
import requests, pandas as pd
BASE = "https://api.fda.gov/drug"
def ndc_search(ndc_or_name, limit=10):
"""Search NDC directory for drug product information."""
# Search by product name or NDC
r = requests.get(f"{BASE}/ndc.json",
params={"search": f'generic_name:"{ndc_or_name}"',
"limit": limit})
r.raise_for_status()
return r.json()
data = ndc_search("metformin", limit=10)
total = data["meta"]["results"]["total"]
print(f"Metformin products: {total}")
rows = []
for prod in data["results"]:
rows.append({
"product_ndc": prod.get("product_ndc"),
"brand_name": prod.get("brand_name"),
"generic_name": prod.get("generic_name"),
"dosage_form": prod.get("dosage_form"),
"route": ", ".join(prod.get("route", [])),
"labeler": prod.get("labeler_name"),
})
df = pd.DataFrame(rows)
print(df.to_string(index=False))
Search FDA enforcement actions and drug recalls by drug name or company.
import requests, pandas as pd
BASE = "https://api.fda.gov/drug"
def drug_recalls(drug_name, limit=20):
"""Find FDA drug recalls for a given drug name."""
r = requests.get(f"{BASE}/enforcement.json",
params={
"search": f'product_description:"{drug_name}"',
"limit": limit
})
if r.status_code == 404:
return pd.DataFrame()
r.raise_for_status()
results = r.json()["results"]
return pd.DataFrame([{
"recalling_firm": rec.get("recalling_firm"),
"product": rec.get("product_description", "")[:80],
"reason": rec.get("reason_for_recall", "")[:100],
"classification": rec.get("classification"),
"recall_date": rec.get("recall_initiation_date"),
"status": rec.get("status"),
} for rec in results])
recalls = drug_recalls("metformin", limit=5)
print(f"Metformin recalls: {len(recalls)}")
if not recalls.empty:
print(recalls[["recalling_firm", "classification", "recall_date", "status"]].to_string(index=False))
Find all drug products containing a specific active ingredient.
import requests, pandas as pd
BASE = "https://api.fda.gov/drug"
def products_by_ingredient(ingredient, limit=50):
"""Find all FDA-listed products with a given active ingredient."""
r = requests.get(f"{BASE}/ndc.json",
params={
"search": f'active_ingredients.name:"{ingredient}"',
"limit": limit
})
r.raise_for_status()
data = r.json()
print(f"Total products with {ingredient}: {data['meta']['results']['total']}")
rows = []
for prod in data["results"]:
for ai in prod.get("active_ingredients", []):
if ingredient.lower() in ai.get("name", "").lower():
rows.append({
"brand": prod.get("brand_name"),
"generic": prod.get("generic_name"),
"strength": ai.get("strength"),
"dosage_form": prod.get("dosage_form"),
"route": ", ".join(prod.get("route", [])),
})
return pd.DataFrame(rows)
df = products_by_ingredient("metformin hydrochloride")
print(df.drop_duplicates(subset=["generic", "strength", "dosage_form"]).head(10).to_string(index=False))
| Endpoint | Dataset | Key Use |
|----------|---------|---------|
| /drug/event.json | FAERS (adverse events) | Pharmacovigilance, safety signals |
| /drug/label.json | Structured Product Labeling | Indications, warnings, dosing |
| /drug/ndc.json | NDC Directory | Marketed products, strengths |
| /drug/enforcement.json | Recalls & Enforcement | Drug recalls, market withdrawals |
| /device/event.json | MAUDE (device events) | Medical device adverse events |
openFDA uses Elasticsearch-style queries. Use field:"exact phrase" for exact matching, field:term for fuzzy matching, and +field1:"A" +field2:"B" for AND logic. Use count parameter to aggregate (equivalent to GROUP BY). Use limit (1–1000) for pagination with skip for offset.
Goal: Compare adverse event frequency for multiple drugs in the same therapeutic class to identify differentiated safety profiles.
import requests, pandas as pd, time
BASE = "https://api.fda.gov/drug"
drugs = ["atorvastatin", "simvastatin", "rosuvastatin"]
def count_reactions(drug, limit=20):
r = requests.get(f"{BASE}/event.json",
params={"search": f'patient.drug.medicinalproduct:"{drug}"',
"count": "patient.reaction.reactionmeddrapt.exact",
"limit": limit})
if r.status_code != 200:
return pd.Series(dtype=float, name=drug)
df = pd.DataFrame(r.json()["results"])
df.columns = ["reaction", drug]
return df.set_index("reaction")[drug]
series_list = []
for drug in drugs:
s = count_reactions(drug, limit=20)
series_list.append(s)
time.sleep(0.5)
comparison = pd.concat(series_list, axis=1).fillna(0)
comparison = comparison.sort_values(drugs[0], ascending=False)
print("Adverse event count comparison (statins):")
print(comparison.head(10).to_string())
comparison.to_csv("statin_safety_comparison.csv")
Goal: Extract indications, contraindications, and warnings for multiple drugs and save to CSV.
import requests, pandas as pd, time, re
BASE = "https://api.fda.gov/drug"
def get_label_sections(drug_name):
r = requests.get(f"{BASE}/label.json",
params={"search": f'openfda.generic_name:"{drug_name}"', "limit": 1})
if r.status_code != 200 or not r.json()["results"]:
return None
label = r.json()["results"][0]
def clean(field):
text = " ".join(label.get(field, [""]))
return re.sub(r"\s+", " ", text).strip()[:500]
return {
"drug": drug_name,
"indications": clean("indications_and_usage"),
"contraindications": clean("contraindications"),
"warnings": clean("warnings_and_cautions") or clean("warnings"),
}
drugs = ["metformin", "atorvastatin", "lisinopril", "omeprazole"]
rows = []
for drug in drugs:
info = get_label_sections(drug)
if info:
rows.append(info)
time.sleep(0.4)
df = pd.DataFrame(rows)
df.to_csv("drug_labels.csv", index=False)
print(df[["drug", "indications"]].to_string(index=False))
| Parameter | Module | Default | Range / Options | Effect |
|-----------|--------|---------|-----------------|--------|
| search | All endpoints | — | Elasticsearch syntax | Filter query |
| count | All endpoints | — | field name + .exact | Aggregate/count by field value |
| limit | All endpoints | 1 | 1–1000 | Results per request |
| skip | All endpoints | 0 | integer | Offset for pagination |
| api_key | All endpoints | — | API key string | Increase rate limit to 120K/day |
| .exact suffix | count field | — | appended to field name | Exact string matching vs tokenized |
Get a free API key: Registration at https://open.fda.gov/apis/authentication/ is instant and raises your limit from 1,000 to 120,000 requests/day — essential for production use.
Use .exact for drug name searches: patient.drug.medicinalproduct.exact (not .medicinalproduct) gives exact phrase matching, preventing partial matches that inflate counts.
Normalize drug names: FAERS reporters use many spellings (e.g., "Lipitor", "atorvastatin calcium", "ATORVASTATIN"). Search multiple name variants or use generic ingredient field for consistent coverage.
Interpret FAERS counts carefully: Report counts do not equal incidence rates. FAERS is voluntary and subject to reporting bias; higher counts may reflect market size or media attention, not higher risk.
Paginate large result sets: Maximum limit is 1000; use skip to paginate through large result sets (total in meta.results).
When to use: Quick check of total adverse event report volume for a drug.
import requests
drug = "ibuprofen"
r = requests.get("https://api.fda.gov/drug/event.json",
params={"search": f'patient.drug.medicinalproduct.exact:"{drug}"',
"limit": 1})
total = r.json()["meta"]["results"]["total"]
print(f"Total FAERS reports for {drug}: {total:,}")
When to use: Filter FAERS for reports classified as serious (death, hospitalization, disability).
import requests, pandas as pd
r = requests.get("https://api.fda.gov/drug/event.json",
params={
"search": 'patient.drug.medicinalproduct:"warfarin" AND serious:1',
"count": "patient.reaction.reactionmeddrapt.exact",
"limit": 10
})
df = pd.DataFrame(r.json()["results"])
df.columns = ["reaction", "serious_reports"]
print(df.to_string(index=False))
When to use: Verify whether a drug has FDA NDA/ANDA approval and find the approval year.
import requests
r = requests.get("https://api.fda.gov/drug/label.json",
params={"search": 'openfda.generic_name:"metformin"', "limit": 1})
label = r.json()["results"][0]
openfda = label.get("openfda", {})
print(f"Application numbers: {openfda.get('application_number', ['n/a'])}")
print(f"Product type: {openfda.get('product_type', ['n/a'])}")
print(f"NDA sponsor: {openfda.get('manufacturer_name', ['n/a'])}")
| Problem | Cause | Solution |
|---------|-------|----------|
| HTTP 404 with {"error": {"code": "NOT_FOUND"}} | No results match query | Check drug name spelling; try alternative name formats |
| HTTP 429 Too Many Requests | Rate limit exceeded | Register for API key; add time.sleep(1) between requests |
| Count results don't match expectations | Drug name tokenization | Use .exact suffix: medicinalproduct.exact not medicinalproduct |
| Label search returns wrong drug | Ambiguous name | Add +openfda.product_type:"HUMAN PRESCRIPTION DRUG" to filter |
| Missing fields in FAERS report | Incomplete voluntary report | Check if field exists with .get("field", "n/a") |
| skip + limit > 26000 error | Pagination limit | openFDA caps pagination at 26,000 records; use count endpoint for aggregates beyond this |
clinicaltrials-database-search — Clinical trial data for drugs identified via openFDAdrugbank-database-access — Drug structures, targets, and interactions to contextualize FDA datachembl-database-bioactivity — Preclinical bioactivity data for drugs in the FAERS databasestring-database-ppi — Protein interactions for drug targets found via adverse event analysistools
Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.
tools
smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock-blind-docking; for the Python-bindings/Vinardo workflow use autodock-vina-docking.
development
mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.
development
Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.