ADME Property Predictor

When to Use

Use this skill when the task needs 1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work. 2. Validate that the request matches the documented scope and stop early if the task would require unsupported as.
Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.

Key Features

Scope-focused workflow aligned to: Analyze data with adme-property-predictor using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.
Packaged executable path(s): scripts/main.py.
Reference material available in references/ for task-specific guidance.
Structured execution path designed to keep outputs consistent and reviewable.

Dependencies

Python: 3.10+. Repository baseline for current packaged skills.
dataclasses: unspecified. Declared in requirements.txt.
rdkit: unspecified. Declared in requirements.txt.

Example Usage

cd "20260318/scientific-skills/Data Analytics/adme-property-predictor"
python -m py_compile scripts/main.py
python scripts/main.py --help

Example run plan:

Confirm the user input, output path, and any required config values.
Edit the in-file CONFIG block or documented parameters if the script uses fixed settings.
Run python scripts/main.py with the validated inputs.
Review the generated output and return the final artifact with any assumptions called out.

Implementation Details

See ## Workflow above for related details.

Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
Primary implementation surface: scripts/main.py.
Reference guidance: references/ contains supporting rules, prompts, or checklists.
Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.

Quick Check

Use this command to verify that the packaged script entry point can be parsed before deeper execution.

python -m py_compile scripts/main.py

Audit-Ready Commands

Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.

python -m py_compile scripts/main.py

# Example invocation: python scripts/main.py --help

# Example invocation: python scripts/main.py --input "Audit validation sample with explicit symptoms, history, assessment, and next-step plan." --format json

Workflow

Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.

Overview

Comprehensive pharmacokinetic prediction tool that assesses drug-likeness and ADME properties of small molecules using validated cheminformatics models, molecular descriptors, and structure-property relationships.

Key Capabilities:

Multi-Property Prediction: Absorption, Distribution, Metabolism, Excretion
Drug-Likeness Scoring: Lipinski's Rule of 5, Veber rules, QED score
Batch Processing: Analyze compound libraries efficiently
Structure-Based Insights: Identify liability hotspots and optimization opportunities
Comparative Analysis: Rank candidates by predicted PK profile

Integration with Other Skills

Upstream Skills:

chemical-structure-converter: Convert between SMILES, InChI, MOL formats
lipinski-rule-filter: Initial rule-based drug-likeness screening
chemical-structure-converter: Generate 3D conformers for structure-based predictions
smiles-de-salter: Remove salt counterions before analysis

Downstream Skills:

drug-candidate-evaluator: Multi-parameter optimization including ADME
toxicity-structure-alert: Assess safety alongside ADME
target-novelty-scorer: Evaluate target uniqueness for selected candidates
biotech-pitch-deck-narrative: Create investor materials with PK data

Complete Workflow:

Chemical Structure Converter (prepare structures) → 
  Lipinski Rule Filter (initial filtering) → 
    ADME Property Predictor (this skill, detailed PK) → 
      Drug Candidate Evaluator (integrated scoring) → 
        Toxicity Structure Alert (safety check)

Core Capabilities

1. Absorption (A) Prediction

Predict intestinal absorption, solubility, and permeability:

from scripts.adme_predictor import ADMEPredictor

predictor = ADMEPredictor()

# Predict absorption properties
absorption = predictor.predict_absorption(
    smiles="CC(=O)Oc1ccccc1C(=O)O",  # Aspirin
    properties=["all"]  # or specific: ["hia", "caco2", "solubility"]
)

print(absorption.summary())

Predicted Properties: | Property | Model | Units | Interpretation | |----------|-------|-------|----------------| | HIA | ML + physicochemical | % | Human intestinal absorption; >80% good | | Caco-2 | QSPR | 10⁻⁶ cm/s | Permeability; >70 high, <25 low | | Solubility | QSPR | mg/mL | Aqueous solubility; >0.1 mg/mL acceptable | | LogS | QSPR | unitless | Intrinsic solubility; >-4 acceptable | | Lipinski Pass | Rule-based | boolean | Passes all 5 rules | | Veber Pass | Rule-based | boolean | PSA <140, rotatable bonds <10 |

Best Practices:

✅ Consider HIA and solubility together (high HIA but low solubility = dissolution-limited)
✅ Caco-2 good for oral absorption prediction; poor for BBB penetration
✅ Use both rule-based (Lipinski) and ML-based predictions for consensus
✅ Check solubility at physiological pH (not just intrinsic)

Common Issues and Solutions:

Issue: Lipinski pass but poor solubility

Symptom: "Passes Rule of 5 but LogS = -5"
Solution: Lipinski checks MW and LogP, not solubility directly; use explicit solubility prediction

Issue: Caco-2 predicts high absorption but HIA low

Symptom: "Caco-2 = 85 (high) but HIA = 60%"
Solution: Models have different training sets; Caco-2 is in vitro, HIA in vivo; HIA generally more reliable

2. Distribution (D) Prediction

Predict tissue distribution, protein binding, and brain penetration:


# Predict distribution properties
distribution = predictor.predict_distribution(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    properties=["vd", "ppb", "bbb"]
)

# Access specific predictions
vd = distribution.volume_of_distribution
bbb = distribution.blood_brain_barrier
ppb = distribution.plasma_protein_binding

Predicted Properties: | Property | Model | Units | Interpretation | |----------|-------|-------|----------------| | Vd | QSPR | L/kg | Volume of distribution; 0.1-10 typical | | PPB | ML | % | Plasma protein binding; >90% high, <50% low | | BBB | LogBB | unitless | Brain penetration; >0.3 penetrant | | fu | Calculated | fraction | Free (unbound) fraction; 1 - PPB/100 |

Best Practices:

✅ High PPB (>90%) may require higher doses but longer half-life
✅ Low Vd (<0.3) = mainly in plasma; high Vd (>3) = extensive tissue distribution
✅ BBB penetration critical for CNS drugs; avoid for peripherally-acting drugs
✅ fu (free fraction) drives pharmacological activity, not total concentration

Common Issues and Solutions:

Issue: BBB predictions unreliable for certain chemotypes

Symptom: "BBB model gives conflicting predictions for peptides"
Solution: Models trained on small molecules; use specialized BBB predictors for peptides, macrocycles

Issue: PPB overestimated for acidic drugs

Symptom: "PPB predicted 95% but experimental is 70%"
Solution: Some models biased toward neutral/basic compounds; check model training set overlap

3. Metabolism (M) Prediction

Predict metabolic stability, CYP interactions, and liability sites:


# Predict metabolism properties
metabolism = predictor.predict_metabolism(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    include_site_prediction=True
)

# Check CYP interactions
cyp_profile = metabolism.cyp_profile
stability = metabolism.metabolic_stability

Predicted Properties: | Property | Model | Output | Interpretation | |----------|-------|--------|----------------| | CYP Inhibition | ML | IC50 or class | Potential DDI; <1 μM high risk | | CYP Substrate | Classification | Boolean/Probability | Metabolized by specific CYP | | Stability | ML | T1/2 or class | Microsomal/ hepatocyte stability | | Liability Sites | Reactivity models | Atom indices | Soft spots for metabolism | | MAO Substrate | Classification | Boolean | Monoamine oxidase substrate |

Best Practices:

✅ Screen for CYP3A4 inhibition early (most common DDI)
✅ Check if compound is CYP substrate (for polymorphism concerns)
✅ Identify metabolic hotspots for structural blocking
✅ Consider species differences (human vs rodent metabolism)

Common Issues and Solutions:

Issue: False negatives for time-dependent inhibition (TDI)

Symptom: "No CYP inhibition predicted but TDI observed experimentally"
Solution: Standard models predict reversible inhibition; use specialized TDI predictors

Issue: Metabolic site prediction shows multiple hotspots

Symptom: "5 different atoms flagged as metabolic liabilities"
Solution: Prioritize by reactivity score; consider blocking highest-risk site first

4. Excretion (E) Prediction

Predict clearance routes and elimination kinetics:


# Predict excretion properties
excretion = predictor.predict_excretion(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    properties=["clearance", "half_life", "route"]
)

# Access predictions
clearance = excretion.clearance_ml_min_kg
t12 = excretion.half_life_hours
route = excretion.primary_route

Predicted Properties: | Property | Model | Units | Interpretation | |----------|-------|-------|----------------| | CL | QSPR | mL/min/kg | Clearance; <5 low, 5-15 moderate, >15 high | | T1/2 | QSPR | hours | Half-life; 2-8h typical for oral drugs | | Route | Classification | renal/biliary/mixed | Primary excretion pathway | | LogD | QSPR | unitless | Distribution coefficient; affects clearance |

Best Practices:

✅ Half-life determines dosing frequency (T1/2 × 5 = time to steady state)
✅ Renal clearance predictable for polar compounds; hepatic less predictable
✅ High clearance (>15) may require high doses or prodrug approach
✅ Very long T1/2 (>24h) good for adherence but risk accumulation

Common Issues and Solutions:

Issue: Clearance predictions highly variable

Symptom: "Same compound, different models give CL = 5 vs 20 mL/min/kg"
Solution: Allometry-based methods unreliable for novel scaffolds; use average of multiple models

Issue: Route prediction contradicts structure

Symptom: "Highly polar compound predicted biliary, expected renal"
Solution: Check LogP/LogD; polar compounds (<0) usually renal; neutral/lipophilic (>1) usually hepatic

5. Integrated Drug-Likeness Scoring

Overall assessment combining all ADME properties:


# Generate comprehensive drug-likeness score
druglikeness = predictor.calculate_druglikeness(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    methods=["qed", "muegge", "golden_triangle"]
)

# Multi-parameter optimization
mpo_score = predictor.mpo_score(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    target_profile={"hia": >80, "bbb": <0.3, "t12": "2-8h"}
)

Scoring Methods: | Method | Description | Range | Good Score | |--------|-------------|-------|------------| | QED | Quantitative Estimation of Drug-likeness | 0-1 | >0.6 | | Muegge | Bioavailability score | 0-6 | >4 | | MPO | Multi-Parameter Optimization | 0-10 | >6 |

Best Practices:

✅ Use QED as quick overall metric; MPO for property-weighted scoring
✅ Don't rely solely on drug-likeness; efficacy and safety equally important
✅ Compare to marketed drugs in same class for context
✅ Track drug-likeness trends during optimization (should improve)

Common Issues and Solutions:

Issue: Drug-likeness score conflicts with project needs

Symptom: "CNS drug has low QED (0.5) because high LogP needed for BBB"
Solution: Drug-likeness rules biased toward oral drugs; use category-specific models (CNS, oncology, etc.)

6. Batch Processing and Library Screening

Analyze compound libraries efficiently:


# Batch process library
results = predictor.batch_predict(
    input_file="library.smi",  # SMILES file
    properties=["all"],
    output_format="csv",
    n_workers=4  # Parallel processing
)

# Filter by criteria
filtered = results.filter(
    lipinski_pass=True,
    hia__gt=80,
    t12__between=(2, 8)
)

# Rank by multi-parameter score
ranked = results.rank(by="mpo_score", ascending=False)

Best Practices:

✅ Process in batches of 1000-10000 for memory efficiency
✅ Save intermediate results (crash recovery)
✅ Apply filters sequentially (Lipinski first, then detailed ADME)
✅ Check property distributions to identify outliers

Common Issues and Solutions:

Issue: Batch processing runs out of memory

Symptom: "Killed: Out of memory" with 50K compounds
Solution: Process in chunks; use generators instead of loading all into RAM

Issue: Some compounds fail prediction

Symptom: "30% of library returns NaN"
Solution: Check for invalid SMILES, unusual atoms, or molecules outside training set domain

Complete Workflow Example

From SMILES to prioritized candidates:


# Step 1: Predict ADME for single compound

# Example invocation: python scripts/main.py \
  --smiles "CC(=O)Oc1ccccc1C(=O)O" \
  --properties all \
  --output aspirin_adme.json

# Step 2: Batch process compound library

# Example invocation: python scripts/main.py \
  --input library.smi \
  --properties absorption,distribution \
  --format csv \
  --output library_adme.csv

# Step 3: Filter and rank

# Example invocation: python scripts/main.py \
  --input library_adme.csv \
  --filter "lipinski_pass=True,hia>80" \
  --rank-by qed \
  --top-n 100 \
  --output top_candidates.csv

Python API Usage:

from scripts.adme_predictor import ADMEPredictor
from scripts.batch_processor import BatchProcessor

# Initialize
predictor = ADMEPredictor()
batch = BatchProcessor()

# Single compound analysis
aspirin = predictor.predict_all("CC(=O)Oc1ccccc1C(=O)O")
print(f"HIA: {aspirin.absorption.hia}%")
print(f"Half-life: {aspirin.excretion.t12} hours")

# Batch screening
results = batch.process(
    input_file="library.smi",
    predictor=predictor,
    properties=["absorption", "distribution"],
    n_workers=4
)

# Filter good candidates
good_candidates = results[
    (results.lipinski_pass == True) &
    (results.hia > 80) &
    (results.bbb < 0.3) &
    (results.t12.between(2, 8))
]

Expected Output Files:

output/
├── aspirin_adme.json           # Single compound detailed results
├── library_adme.csv            # Batch screening results
├── top_candidates.csv          # Filtered and ranked candidates

Quality Checklist

Pre-Prediction Checks:

[ ] SMILES string is valid and canonical
[ ] Salt forms removed (if analyzing parent compound)
[ ] Tautomeric state appropriate for physiological pH
[ ] Stereochemistry specified (if relevant for activity)

During Prediction:

[ ] Compound within model applicability domain (check similarity to training set)
[ ] No unusual atoms or functional groups (models trained on typical drug-like space)
[ ] MW in range 100-800 Da (outside range predictions less reliable)
[ ] Predictions complete (no missing values for critical properties)

Post-Prediction Verification:

[ ] Drug-likeness scores in reasonable range (sanity check)
[ ] Individual properties internally consistent (e.g., high LogP predicts low solubility)
[ ] CRITICAL: Comparison to experimental data if available (validate model for chemotype)
[ ] Rankings align with medicinal chemistry intuition

Before Making Decisions:

[ ] CRITICAL: Predictions are NOT experimental data; use for prioritization only
[ ] Multiple orthogonal models give consistent results
[ ] Structural alerts checked (toxicity, reactivity)
[ ] Top candidates selected for experimental validation
[ ] Documentation of model versions and confidence intervals

For Regulatory Submissions:

[ ] Model validation documented (training set, test set performance)
[ ] Applicability domain clearly defined
[ ] Prediction uncertainty quantified
[ ] Experimental confirmation for key predictions

Common Pitfalls

Over-Reliance Issues:

❌ Treating predictions as experimental facts → Poor decision making
- ✅ Use predictions for prioritization; experimental validation required for lead optimization
❌ Single model dependency → Miss model-specific biases
- ✅ Compare multiple models; consensus predictions more reliable
❌ Ignoring prediction confidence → False sense of certainty
- ✅ Check confidence intervals; low confidence predictions need higher scrutiny

Input Issues:

❌ Invalid or non-canonical SMILES → Wrong compound analyzed
- ✅ Validate SMILES before prediction; use canonical forms
❌ Analyzing salt forms → Properties skewed by counterion
- ✅ Remove salts using smiles-de-salter; analyze free base/acid
❌ Ignoring stereochemistry → Inaccurate predictions for chiral drugs
- ✅ Specify stereochemistry explicitly; use 3D descriptors if available

Interpretation Issues:

❌ Focusing on single property → Miss overall profile
- ✅ Consider all ADME properties; use integrated scores like QED or MPO
❌ Rigid cutoff application → Discard good candidates
- ✅ Use cutoffs as guidelines; consider project-specific needs
❌ Ignoring property correlations → Unrealistic optimization
- ✅ Recognize trade-offs (e.g., increasing LogP improves BBB but reduces solubility)

Domain Issues:

❌ Applying to biologics → Completely inappropriate
- ✅ These models for small molecules only; use specialized tools for biologics
❌ Extrapolating beyond training set → Unreliable predictions
- ✅ Check applicability domain; novel scaffolds need experimental validation

Workflow Issues:

❌ No experimental validation → Continue with false leads
- ✅ Always validate top predictions experimentally
❌ Not documenting model versions → Irreproducible results
- ✅ Record software version, model versions, prediction dates

Troubleshooting

Problem: All predictions show "out of domain" warning

Symptoms: "Compound outside training set" for entire library
Causes: Library contains unusual chemotypes (peptidomimetics, macrocycles, etc.)
Solutions:
- Use specialized models for non-traditional chemotypes
- Check if input format correct (SMILES vs InChI)
- Verify no strange atoms (metals, silicon, etc.)

Problem: Extreme predictions (negative solubility, >100% absorption)

Symptoms: "LogS = -15" or "HIA = 150%"
Causes: Model extrapolation errors; invalid input structures
Solutions:
- Check input structure validity
- Cap extreme values at physiologically plausible limits
- Flag for manual review if outside typical ranges

Problem: Batch processing extremely slow

Symptoms: "100 compounds taking 30 minutes"
Causes: Single-threaded execution; complex models
Solutions:
- Enable parallel processing (--n-workers 4)
- Use faster models for initial screening (QSAR vs ML)
- Pre-filter with rule-based methods (Lipinski) before detailed ADME

Problem: Inconsistent predictions across runs

Symptoms: "Same compound, different predictions on re-run"
Causes: Random seed issues; stochastic models
Solutions:
- Set random seeds for reproducibility
- Use deterministic models when consistency critical
- Average multiple predictions if stochastic models necessary

Problem: Properties contradict each other

Symptoms: "High LogP (4.5) but predicted very soluble"
Causes: Model inconsistencies; prediction errors
Solutions:
- Check input structure (tautomeric form matters for both)
- Lipophilic compounds (LogP > 3) typically have poor solubility
- Use thermodynamic cycle checks if available

Problem: Cannot process certain file formats

Symptoms: "Error: Unsupported format" for SDF or MOL files
Causes: Format limitations; parser issues
Solutions:
- Convert to SMILES using chemical-structure-converter
- Check file encoding (UTF-8 vs Latin-1)
- Verify structure validity with external tools

References

Available in references/ directory:

lipinski_rules.md - Detailed explanation of Rule of 5 and variants
qsar_models.md - Technical documentation of predictive models
adme_databases.md - Experimental ADME data sources for validation
property_ranges.md - Acceptable ranges for marketed drugs by class
model_validation.md - Validation statistics and applicability domains
cheminformatics_basics.md - Introduction to molecular descriptors

Scripts

Located in scripts/ directory:

main.py - CLI interface for ADME prediction
adme_predictor.py - Core prediction engine
absorption.py - Absorption property models
distribution.py - Distribution property models
metabolism.py - Metabolism prediction models
excretion.py - Excretion and clearance models
druglikeness.py - QED, MPO, and other scoring functions
batch_processor.py - Library screening and parallel processing
validator.py - Input validation and applicability domain checking

Performance and Resources

Prediction Speed: | Task | Time | Hardware | |------|------|----------| | Single compound | 0.5-2 sec | CPU | | 100 compounds | 30-60 sec | CPU | | 1000 compounds | 5-10 min | CPU | | 1000 compounds | 2-3 min | 4-core parallel | | 10,000 compounds | 30-60 min | 4-core parallel |

System Requirements:

RAM: 4 GB minimum; 8 GB for large libraries (>10K compounds)
Storage: 100 MB for models and dependencies
CPU: Multi-core recommended for batch processing
No GPU required: All models CPU-based

Optimization Tips:

Process libraries in batches of 5000-10000
Use rule-based filters (Lipinski) before expensive ML predictions
Cache results to avoid re-prediction
Parallel processing scales nearly linearly up to 8 cores

Limitations

Small Molecules Only: Models trained on drugs with MW 100-800 Da; unreliable for larger compounds
pH 7.4 Assumption: Most models predict properties at physiological pH
Human-Specific: Predictions for human PK; animal models may differ
Healthy Subject Assumption: Does not account for disease states, drug interactions
Single Compound: Does not predict formulation effects, salt form impact
Static Models: Do not account for induction, inhibition, or time-dependent changes
Training Set Bias: Underperforms for novel scaffolds not in training data
Qualitative Only: For Go/No-Go decisions; not for precise quantitative predictions
No Toxicity: ADME only; use separate tools for safety assessment

Model Accuracy (Typical):

LogP: R² = 0.85-0.95 (very good)
Solubility: R² = 0.65-0.80 (moderate)
HIA: Accuracy = 75-85% (good)
BBB: Accuracy = 70-80% (moderate)
Metabolic stability: R² = 0.60-0.75 (moderate)
T1/2: R² = 0.50-0.65 (challenging)

Version History

v1.0.0 (Current): Initial release with 20+ ADME endpoints, QED scoring, batch processing
Planned: Integration with PK simulation, population variability modeling, formulation effects

⚠️ CRITICAL DISCLAIMER: These predictions are computational estimates for prioritization and guidance only. They do NOT replace experimental ADME studies required for regulatory submissions or clinical decision-making. Always validate predictions with appropriate in vitro and in vivo assays before advancing compounds.

Parameters

| Parameter | Type | Default | Description | |-----------|------|---------|-------------| | --smiles | str | Required | SMILES string of the molecule | | --properties | str | ["all"] | Specific properties to calculate | | --format | str | "json" | Output format | | --input | str | Required | Input CSV file with SMILES column | | --output | str | Required | Output file for results |

Output Requirements

Every final response should make these items explicit when they are relevant:

Objective or requested deliverable
Inputs used and assumptions introduced
Workflow or decision path
Core result, recommendation, or artifact
Constraints, risks, caveats, or validation needs
Unresolved items and next-step checks

Error Handling

If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
If scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
Do not fabricate files, citations, data, search results, or execution outcomes.

Input Validation

This skill accepts requests that match the documented purpose of adme-property-predictor and include enough context to complete the workflow safely.

Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:

adme-property-predictor only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.

Response Template

Use the following fixed structure for non-trivial requests:

Objective
Inputs Received
Assumptions
Workflow
Deliverable
Risks and Limits
Next Checks

If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.

Inputs to Collect

Required inputs: the user goal, the primary data or source file, and the requested output format.
Optional inputs: output directory, formatting preferences, and validation constraints.
If a required input is unavailable, return a short clarification request before continuing.

Output Contract

Return a short summary, the main deliverables, and any assumptions that materially affect interpretation.
If execution is partial, label what succeeded, what failed, and the next safe recovery step.
Keep the final answer within the documented scope of the skill.

Validation and Safety Rules

Validate identifiers, file paths, and user-provided parameters before execution.
Do not fabricate results, metrics, citations, or downstream conclusions.
Use safe fallback behavior when dependencies, credentials, or required inputs are missing.
Surface any execution failure with a concise diagnosis and recovery path.

ADME Property Predictor

When to Use

Use this skill when the task needs 1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work. 2. Validate that the request matches the documented scope and stop early if the task would require unsupported as.
Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.

Key Features

Scope-focused workflow aligned to: Analyze data with adme-property-predictor using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.
Packaged executable path(s): scripts/main.py.
Reference material available in references/ for task-specific guidance.
Structured execution path designed to keep outputs consistent and reviewable.

Dependencies

Python: 3.10+. Repository baseline for current packaged skills.
dataclasses: unspecified. Declared in requirements.txt.
rdkit: unspecified. Declared in requirements.txt.

Example Usage

cd "20260318/scientific-skills/Data Analytics/adme-property-predictor"
python -m py_compile scripts/main.py
python scripts/main.py --help

Example run plan:

Confirm the user input, output path, and any required config values.
Edit the in-file CONFIG block or documented parameters if the script uses fixed settings.
Run python scripts/main.py with the validated inputs.
Review the generated output and return the final artifact with any assumptions called out.

Implementation Details

See ## Workflow above for related details.

Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
Primary implementation surface: scripts/main.py.
Reference guidance: references/ contains supporting rules, prompts, or checklists.
Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.

Quick Check

Use this command to verify that the packaged script entry point can be parsed before deeper execution.

python -m py_compile scripts/main.py

Audit-Ready Commands

Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.

python -m py_compile scripts/main.py

# Example invocation: python scripts/main.py --help

# Example invocation: python scripts/main.py --input "Audit validation sample with explicit symptoms, history, assessment, and next-step plan." --format json

Workflow

Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.

Overview

Key Capabilities:

Multi-Property Prediction: Absorption, Distribution, Metabolism, Excretion
Drug-Likeness Scoring: Lipinski's Rule of 5, Veber rules, QED score
Batch Processing: Analyze compound libraries efficiently
Structure-Based Insights: Identify liability hotspots and optimization opportunities
Comparative Analysis: Rank candidates by predicted PK profile

Integration with Other Skills

Upstream Skills:

chemical-structure-converter: Convert between SMILES, InChI, MOL formats
lipinski-rule-filter: Initial rule-based drug-likeness screening
chemical-structure-converter: Generate 3D conformers for structure-based predictions
smiles-de-salter: Remove salt counterions before analysis

Downstream Skills:

drug-candidate-evaluator: Multi-parameter optimization including ADME
toxicity-structure-alert: Assess safety alongside ADME
target-novelty-scorer: Evaluate target uniqueness for selected candidates
biotech-pitch-deck-narrative: Create investor materials with PK data

Complete Workflow:

Chemical Structure Converter (prepare structures) → 
  Lipinski Rule Filter (initial filtering) → 
    ADME Property Predictor (this skill, detailed PK) → 
      Drug Candidate Evaluator (integrated scoring) → 
        Toxicity Structure Alert (safety check)

Core Capabilities

1. Absorption (A) Prediction

Predict intestinal absorption, solubility, and permeability:

from scripts.adme_predictor import ADMEPredictor

predictor = ADMEPredictor()

# Predict absorption properties
absorption = predictor.predict_absorption(
    smiles="CC(=O)Oc1ccccc1C(=O)O",  # Aspirin
    properties=["all"]  # or specific: ["hia", "caco2", "solubility"]
)

print(absorption.summary())

Best Practices:

✅ Consider HIA and solubility together (high HIA but low solubility = dissolution-limited)
✅ Caco-2 good for oral absorption prediction; poor for BBB penetration
✅ Use both rule-based (Lipinski) and ML-based predictions for consensus
✅ Check solubility at physiological pH (not just intrinsic)

Common Issues and Solutions:

Issue: Lipinski pass but poor solubility

Symptom: "Passes Rule of 5 but LogS = -5"
Solution: Lipinski checks MW and LogP, not solubility directly; use explicit solubility prediction

Issue: Caco-2 predicts high absorption but HIA low

Symptom: "Caco-2 = 85 (high) but HIA = 60%"
Solution: Models have different training sets; Caco-2 is in vitro, HIA in vivo; HIA generally more reliable

2. Distribution (D) Prediction

Predict tissue distribution, protein binding, and brain penetration:


# Predict distribution properties
distribution = predictor.predict_distribution(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    properties=["vd", "ppb", "bbb"]
)

# Access specific predictions
vd = distribution.volume_of_distribution
bbb = distribution.blood_brain_barrier
ppb = distribution.plasma_protein_binding

Best Practices:

✅ High PPB (>90%) may require higher doses but longer half-life
✅ Low Vd (<0.3) = mainly in plasma; high Vd (>3) = extensive tissue distribution
✅ BBB penetration critical for CNS drugs; avoid for peripherally-acting drugs
✅ fu (free fraction) drives pharmacological activity, not total concentration

Common Issues and Solutions:

Issue: BBB predictions unreliable for certain chemotypes

Symptom: "BBB model gives conflicting predictions for peptides"
Solution: Models trained on small molecules; use specialized BBB predictors for peptides, macrocycles

Issue: PPB overestimated for acidic drugs

Symptom: "PPB predicted 95% but experimental is 70%"
Solution: Some models biased toward neutral/basic compounds; check model training set overlap

3. Metabolism (M) Prediction

Predict metabolic stability, CYP interactions, and liability sites:


# Predict metabolism properties
metabolism = predictor.predict_metabolism(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    include_site_prediction=True
)

# Check CYP interactions
cyp_profile = metabolism.cyp_profile
stability = metabolism.metabolic_stability

Best Practices:

✅ Screen for CYP3A4 inhibition early (most common DDI)
✅ Check if compound is CYP substrate (for polymorphism concerns)
✅ Identify metabolic hotspots for structural blocking
✅ Consider species differences (human vs rodent metabolism)

Common Issues and Solutions:

Issue: False negatives for time-dependent inhibition (TDI)

Symptom: "No CYP inhibition predicted but TDI observed experimentally"
Solution: Standard models predict reversible inhibition; use specialized TDI predictors

Issue: Metabolic site prediction shows multiple hotspots

Symptom: "5 different atoms flagged as metabolic liabilities"
Solution: Prioritize by reactivity score; consider blocking highest-risk site first

4. Excretion (E) Prediction

Predict clearance routes and elimination kinetics:


# Predict excretion properties
excretion = predictor.predict_excretion(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    properties=["clearance", "half_life", "route"]
)

# Access predictions
clearance = excretion.clearance_ml_min_kg
t12 = excretion.half_life_hours
route = excretion.primary_route

Best Practices:

✅ Half-life determines dosing frequency (T1/2 × 5 = time to steady state)
✅ Renal clearance predictable for polar compounds; hepatic less predictable
✅ High clearance (>15) may require high doses or prodrug approach
✅ Very long T1/2 (>24h) good for adherence but risk accumulation

Common Issues and Solutions:

Issue: Clearance predictions highly variable

Symptom: "Same compound, different models give CL = 5 vs 20 mL/min/kg"
Solution: Allometry-based methods unreliable for novel scaffolds; use average of multiple models

Issue: Route prediction contradicts structure

Symptom: "Highly polar compound predicted biliary, expected renal"
Solution: Check LogP/LogD; polar compounds (<0) usually renal; neutral/lipophilic (>1) usually hepatic

5. Integrated Drug-Likeness Scoring

Overall assessment combining all ADME properties:


# Generate comprehensive drug-likeness score
druglikeness = predictor.calculate_druglikeness(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    methods=["qed", "muegge", "golden_triangle"]
)

# Multi-parameter optimization
mpo_score = predictor.mpo_score(
    smiles="CC(=O)Oc1ccccc1C(=O)O",
    target_profile={"hia": >80, "bbb": <0.3, "t12": "2-8h"}
)

Best Practices:

✅ Use QED as quick overall metric; MPO for property-weighted scoring
✅ Don't rely solely on drug-likeness; efficacy and safety equally important
✅ Compare to marketed drugs in same class for context
✅ Track drug-likeness trends during optimization (should improve)

Common Issues and Solutions:

Issue: Drug-likeness score conflicts with project needs

Symptom: "CNS drug has low QED (0.5) because high LogP needed for BBB"
Solution: Drug-likeness rules biased toward oral drugs; use category-specific models (CNS, oncology, etc.)

6. Batch Processing and Library Screening

Analyze compound libraries efficiently:


# Batch process library
results = predictor.batch_predict(
    input_file="library.smi",  # SMILES file
    properties=["all"],
    output_format="csv",
    n_workers=4  # Parallel processing
)

# Filter by criteria
filtered = results.filter(
    lipinski_pass=True,
    hia__gt=80,
    t12__between=(2, 8)
)

# Rank by multi-parameter score
ranked = results.rank(by="mpo_score", ascending=False)

Best Practices:

✅ Process in batches of 1000-10000 for memory efficiency
✅ Save intermediate results (crash recovery)
✅ Apply filters sequentially (Lipinski first, then detailed ADME)
✅ Check property distributions to identify outliers

Common Issues and Solutions:

Issue: Batch processing runs out of memory

Symptom: "Killed: Out of memory" with 50K compounds
Solution: Process in chunks; use generators instead of loading all into RAM

Issue: Some compounds fail prediction

Symptom: "30% of library returns NaN"
Solution: Check for invalid SMILES, unusual atoms, or molecules outside training set domain

Complete Workflow Example

From SMILES to prioritized candidates:


# Step 1: Predict ADME for single compound

# Example invocation: python scripts/main.py \
  --smiles "CC(=O)Oc1ccccc1C(=O)O" \
  --properties all \
  --output aspirin_adme.json

# Step 2: Batch process compound library

# Example invocation: python scripts/main.py \
  --input library.smi \
  --properties absorption,distribution \
  --format csv \
  --output library_adme.csv

# Step 3: Filter and rank

# Example invocation: python scripts/main.py \
  --input library_adme.csv \
  --filter "lipinski_pass=True,hia>80" \
  --rank-by qed \
  --top-n 100 \
  --output top_candidates.csv

Python API Usage:

from scripts.adme_predictor import ADMEPredictor
from scripts.batch_processor import BatchProcessor

# Initialize
predictor = ADMEPredictor()
batch = BatchProcessor()

# Single compound analysis
aspirin = predictor.predict_all("CC(=O)Oc1ccccc1C(=O)O")
print(f"HIA: {aspirin.absorption.hia}%")
print(f"Half-life: {aspirin.excretion.t12} hours")

# Batch screening
results = batch.process(
    input_file="library.smi",
    predictor=predictor,
    properties=["absorption", "distribution"],
    n_workers=4
)

# Filter good candidates
good_candidates = results[
    (results.lipinski_pass == True) &
    (results.hia > 80) &
    (results.bbb < 0.3) &
    (results.t12.between(2, 8))
]

Expected Output Files:

output/
├── aspirin_adme.json           # Single compound detailed results
├── library_adme.csv            # Batch screening results
├── top_candidates.csv          # Filtered and ranked candidates

Quality Checklist

Pre-Prediction Checks:

[ ] SMILES string is valid and canonical
[ ] Salt forms removed (if analyzing parent compound)
[ ] Tautomeric state appropriate for physiological pH
[ ] Stereochemistry specified (if relevant for activity)

During Prediction:

[ ] Compound within model applicability domain (check similarity to training set)
[ ] No unusual atoms or functional groups (models trained on typical drug-like space)
[ ] MW in range 100-800 Da (outside range predictions less reliable)
[ ] Predictions complete (no missing values for critical properties)

Post-Prediction Verification:

[ ] Drug-likeness scores in reasonable range (sanity check)
[ ] Individual properties internally consistent (e.g., high LogP predicts low solubility)
[ ] CRITICAL: Comparison to experimental data if available (validate model for chemotype)
[ ] Rankings align with medicinal chemistry intuition

Before Making Decisions:

[ ] CRITICAL: Predictions are NOT experimental data; use for prioritization only
[ ] Multiple orthogonal models give consistent results
[ ] Structural alerts checked (toxicity, reactivity)
[ ] Top candidates selected for experimental validation
[ ] Documentation of model versions and confidence intervals

For Regulatory Submissions:

[ ] Model validation documented (training set, test set performance)
[ ] Applicability domain clearly defined
[ ] Prediction uncertainty quantified
[ ] Experimental confirmation for key predictions

Common Pitfalls

Over-Reliance Issues:

❌ Treating predictions as experimental facts → Poor decision making
- ✅ Use predictions for prioritization; experimental validation required for lead optimization
❌ Single model dependency → Miss model-specific biases
- ✅ Compare multiple models; consensus predictions more reliable
❌ Ignoring prediction confidence → False sense of certainty
- ✅ Check confidence intervals; low confidence predictions need higher scrutiny

Input Issues:

❌ Invalid or non-canonical SMILES → Wrong compound analyzed
- ✅ Validate SMILES before prediction; use canonical forms
❌ Analyzing salt forms → Properties skewed by counterion
- ✅ Remove salts using smiles-de-salter; analyze free base/acid
❌ Ignoring stereochemistry → Inaccurate predictions for chiral drugs
- ✅ Specify stereochemistry explicitly; use 3D descriptors if available

Interpretation Issues:

❌ Focusing on single property → Miss overall profile
- ✅ Consider all ADME properties; use integrated scores like QED or MPO
❌ Rigid cutoff application → Discard good candidates
- ✅ Use cutoffs as guidelines; consider project-specific needs
❌ Ignoring property correlations → Unrealistic optimization
- ✅ Recognize trade-offs (e.g., increasing LogP improves BBB but reduces solubility)

Domain Issues:

❌ Applying to biologics → Completely inappropriate
- ✅ These models for small molecules only; use specialized tools for biologics
❌ Extrapolating beyond training set → Unreliable predictions
- ✅ Check applicability domain; novel scaffolds need experimental validation

Workflow Issues:

❌ No experimental validation → Continue with false leads
- ✅ Always validate top predictions experimentally
❌ Not documenting model versions → Irreproducible results
- ✅ Record software version, model versions, prediction dates

Troubleshooting

Problem: All predictions show "out of domain" warning

Symptoms: "Compound outside training set" for entire library
Causes: Library contains unusual chemotypes (peptidomimetics, macrocycles, etc.)
Solutions:
- Use specialized models for non-traditional chemotypes
- Check if input format correct (SMILES vs InChI)
- Verify no strange atoms (metals, silicon, etc.)

Problem: Extreme predictions (negative solubility, >100% absorption)

Symptoms: "LogS = -15" or "HIA = 150%"
Causes: Model extrapolation errors; invalid input structures
Solutions:
- Check input structure validity
- Cap extreme values at physiologically plausible limits
- Flag for manual review if outside typical ranges

Problem: Batch processing extremely slow

Symptoms: "100 compounds taking 30 minutes"
Causes: Single-threaded execution; complex models
Solutions:
- Enable parallel processing (--n-workers 4)
- Use faster models for initial screening (QSAR vs ML)
- Pre-filter with rule-based methods (Lipinski) before detailed ADME

Problem: Inconsistent predictions across runs

Symptoms: "Same compound, different predictions on re-run"
Causes: Random seed issues; stochastic models
Solutions:
- Set random seeds for reproducibility
- Use deterministic models when consistency critical
- Average multiple predictions if stochastic models necessary

Problem: Properties contradict each other

Symptoms: "High LogP (4.5) but predicted very soluble"
Causes: Model inconsistencies; prediction errors
Solutions:
- Check input structure (tautomeric form matters for both)
- Lipophilic compounds (LogP > 3) typically have poor solubility
- Use thermodynamic cycle checks if available

Problem: Cannot process certain file formats

Symptoms: "Error: Unsupported format" for SDF or MOL files
Causes: Format limitations; parser issues
Solutions:
- Convert to SMILES using chemical-structure-converter
- Check file encoding (UTF-8 vs Latin-1)
- Verify structure validity with external tools

References

Available in references/ directory:

lipinski_rules.md - Detailed explanation of Rule of 5 and variants
qsar_models.md - Technical documentation of predictive models
adme_databases.md - Experimental ADME data sources for validation
property_ranges.md - Acceptable ranges for marketed drugs by class
model_validation.md - Validation statistics and applicability domains
cheminformatics_basics.md - Introduction to molecular descriptors

Scripts

Located in scripts/ directory:

main.py - CLI interface for ADME prediction
adme_predictor.py - Core prediction engine
absorption.py - Absorption property models
distribution.py - Distribution property models
metabolism.py - Metabolism prediction models
excretion.py - Excretion and clearance models
druglikeness.py - QED, MPO, and other scoring functions
batch_processor.py - Library screening and parallel processing
validator.py - Input validation and applicability domain checking

Performance and Resources

System Requirements:

RAM: 4 GB minimum; 8 GB for large libraries (>10K compounds)
Storage: 100 MB for models and dependencies
CPU: Multi-core recommended for batch processing
No GPU required: All models CPU-based

Optimization Tips:

Process libraries in batches of 5000-10000
Use rule-based filters (Lipinski) before expensive ML predictions
Cache results to avoid re-prediction
Parallel processing scales nearly linearly up to 8 cores

Limitations

Small Molecules Only: Models trained on drugs with MW 100-800 Da; unreliable for larger compounds
pH 7.4 Assumption: Most models predict properties at physiological pH
Human-Specific: Predictions for human PK; animal models may differ
Healthy Subject Assumption: Does not account for disease states, drug interactions
Single Compound: Does not predict formulation effects, salt form impact
Static Models: Do not account for induction, inhibition, or time-dependent changes
Training Set Bias: Underperforms for novel scaffolds not in training data
Qualitative Only: For Go/No-Go decisions; not for precise quantitative predictions
No Toxicity: ADME only; use separate tools for safety assessment

Model Accuracy (Typical):

LogP: R² = 0.85-0.95 (very good)
Solubility: R² = 0.65-0.80 (moderate)
HIA: Accuracy = 75-85% (good)
BBB: Accuracy = 70-80% (moderate)
Metabolic stability: R² = 0.60-0.75 (moderate)
T1/2: R² = 0.50-0.65 (challenging)

Version History

v1.0.0 (Current): Initial release with 20+ ADME endpoints, QED scoring, batch processing
Planned: Integration with PK simulation, population variability modeling, formulation effects

Parameters

Output Requirements

Every final response should make these items explicit when they are relevant:

Objective or requested deliverable
Inputs used and assumptions introduced
Workflow or decision path
Core result, recommendation, or artifact
Constraints, risks, caveats, or validation needs
Unresolved items and next-step checks

Error Handling

If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
If scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
Do not fabricate files, citations, data, search results, or execution outcomes.

Input Validation

This skill accepts requests that match the documented purpose of adme-property-predictor and include enough context to complete the workflow safely.

Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:

adme-property-predictor only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.

Response Template

Use the following fixed structure for non-trivial requests:

Objective
Inputs Received
Assumptions
Workflow
Deliverable
Risks and Limits
Next Checks

If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.

Inputs to Collect

Required inputs: the user goal, the primary data or source file, and the requested output format.
Optional inputs: output directory, formatting preferences, and validation constraints.
If a required input is unavailable, return a short clarification request before continuing.

Output Contract

Return a short summary, the main deliverables, and any assumptions that materially affect interpretation.
If execution is partial, label what succeeded, what failed, and the next safe recovery step.
Keep the final answer within the documented scope of the skill.

Validation and Safety Rules

Validate identifiers, file paths, and user-provided parameters before execution.
Do not fabricate results, metrics, citations, or downstream conclusions.
Use safe fallback behavior when dependencies, credentials, or required inputs are missing.
Surface any execution failure with a concise diagnosis and recovery path.

Adoption

aipoch/adme-property-predictor

$ install --global

Security Scan Results

SKILL.md

ADME Property Predictor

When to Use

Key Features

Dependencies

Example Usage

Implementation Details

Quick Check

Audit-Ready Commands

Workflow

Overview

Integration with Other Skills

Core Capabilities

1. Absorption (A) Prediction

2. Distribution (D) Prediction

3. Metabolism (M) Prediction

4. Excretion (E) Prediction

5. Integrated Drug-Likeness Scoring

6. Batch Processing and Library Screening

Complete Workflow Example

Quality Checklist

Common Pitfalls

Troubleshooting

References

Scripts

Performance and Resources

Limitations

Version History

Parameters

Output Requirements

Error Handling

Input Validation

Response Template

Inputs to Collect

Output Contract

Validation and Safety Rules

Related Skills

aipoch/conventional-oncology-hub-gene

aipoch/conventional-non-oncology-hub-gene

aipoch/confounder-and-bias-control-planner

aipoch/comparative-network-toxicology-shared-mechanism-reference-grounded

aipoch/adme-property-predictor

$ install --global

Security Scan Results

SKILL.md

ADME Property Predictor

When to Use

Key Features

Dependencies

Example Usage

Implementation Details

Quick Check

Audit-Ready Commands

Workflow

Overview

Integration with Other Skills

Core Capabilities

1. Absorption (A) Prediction

2. Distribution (D) Prediction

3. Metabolism (M) Prediction

4. Excretion (E) Prediction

5. Integrated Drug-Likeness Scoring

6. Batch Processing and Library Screening

Complete Workflow Example

Quality Checklist

Common Pitfalls

Troubleshooting

References

Scripts

Performance and Resources

Limitations

Version History

Parameters

Output Requirements

Error Handling

Input Validation