skills/43-wentorai-research-plugins/skills/domains/biomedical/bioagents-guide/SKILL.md
AI scientist framework for autonomous biological research workflows
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research bioagents-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
BioAgents -- AI agent systems for biological research -- represent a paradigm shift in how life science experiments are conceived, designed, executed, and analyzed. Building on the foundation of large language models, these systems integrate literature search, hypothesis generation, experimental design, data analysis, and manuscript drafting into semi-autonomous or fully autonomous research pipelines.
The AI Scientist framework (Sakana AI, 2024) demonstrated that language models can conduct end-to-end research: generating ideas, writing code, running experiments, and producing papers. In biology, this approach is being applied to drug discovery, protein engineering, genomics analysis, and systems biology -- domains where the combinatorial complexity of experimental space makes AI-assisted exploration particularly valuable.
This guide covers the architecture of bioagent systems, the biological research tasks they can automate, integration with wet-lab automation, and the methodological considerations for researchers building or evaluating these systems. The focus is on practical patterns that connect AI capabilities to real biological research problems.
BioAgent System Architecture:
┌─────────────────────────────────────────────────┐
│ ORCHESTRATOR │
│ (LLM-based planning and reasoning agent) │
├──────────┬──────────┬──────────┬────────────────┤
│ LITERATURE│ HYPOTHESIS│ EXPERIMENT│ ANALYSIS │
│ MODULE │ MODULE │ MODULE │ MODULE │
├──────────┼──────────┼──────────┼────────────────┤
│ PubMed │ Causal │ Protocol │ Statistical │
│ Semantic │ inference│ generator│ analysis │
│ Scholar │ Graph │ Robot │ Visualization │
│ BioRxiv │ reasoning│ interface│ Interpretation │
│ Patents │ Novelty │ LIMS │ Manuscript │
│ │ scoring │ integration│ drafting │
└──────────┴──────────┴──────────┴────────────────┘
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Knowledge│ │ Wet Lab │ │ Compute │
│ Bases │ │ Equipment│ │ Cluster │
└─────────┘ └─────────┘ └─────────┘
from dataclasses import dataclass
from typing import List, Optional
import json
@dataclass
class Hypothesis:
statement: str
mechanism: str
evidence_for: List[str]
evidence_against: List[str]
novelty_score: float
testability_score: float
predicted_outcome: str
def generate_hypotheses(
research_question: str,
literature_context: List[dict],
existing_data: Optional[dict] = None,
n_hypotheses: int = 5,
) -> List[Hypothesis]:
"""
Generate ranked hypotheses from literature and data context.
This is a framework for LLM-driven hypothesis generation.
In practice, the LLM call would go here.
"""
prompt = f"""
Based on the following research question and literature context,
generate {n_hypotheses} testable hypotheses.
Research question: {research_question}
Literature findings:
{json.dumps(literature_context, indent=2)}
For each hypothesis, provide:
1. A clear, falsifiable statement
2. The proposed mechanism
3. Supporting evidence from the literature
4. Contradictory evidence
5. Novelty score (0-1): How novel relative to existing literature
6. Testability score (0-1): How feasible to test experimentally
7. Predicted outcome if the hypothesis is correct
"""
# In production: response = llm.generate(prompt)
# Parse and return structured hypotheses
return [] # Placeholder for LLM output parsing
def rank_hypotheses(hypotheses: List[Hypothesis]) -> List[Hypothesis]:
"""Rank hypotheses by composite score (novelty * testability)."""
for h in hypotheses:
h.composite_score = h.novelty_score * h.testability_score
return sorted(hypotheses, key=lambda h: h.composite_score, reverse=True)
AI-assisted drug discovery workflow:
1. TARGET IDENTIFICATION
- Literature mining for disease-gene associations
- Network analysis of protein-protein interactions
- Druggability assessment (binding site prediction)
Tools: OpenTargets, STRING, FPocket
2. HIT IDENTIFICATION
- Virtual screening of compound libraries
- De novo molecular generation (SMILES, graph-based)
- Docking and scoring (molecular dynamics)
Tools: AutoDock-GPU, RDKit, DeepChem
3. LEAD OPTIMIZATION
- ADMET property prediction (absorption, distribution, metabolism)
- Toxicity prediction
- Multi-objective optimization (potency vs. selectivity vs. ADMET)
Tools: ADMET-AI, ToxCast, Optuna
4. PRECLINICAL VALIDATION
- In vitro assay design and analysis
- Animal model selection and protocol design
- Pharmacokinetic modeling
Tools: PK-Sim, literature-based dose prediction
# Example: Using ESM-2 embeddings for protein function prediction
# (Practical pattern for bioagent integration)
from transformers import AutoTokenizer, AutoModel
import torch
def get_protein_embeddings(sequences: list, model_name: str = "facebook/esm2_t33_650M_UR50D"):
"""
Generate protein embeddings using ESM-2 for downstream tasks.
Applications: function prediction, fitness landscape, design.
"""
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
model.eval()
embeddings = []
for seq in sequences:
inputs = tokenizer(seq, return_tensors="pt", padding=True, truncation=True, max_length=1024)
with torch.no_grad():
outputs = model(**inputs)
# Use mean pooling over sequence length
embedding = outputs.last_hidden_state.mean(dim=1).squeeze().numpy()
embeddings.append(embedding)
return embeddings
# Applications:
# 1. Cluster proteins by function (unsupervised)
# 2. Predict fitness effects of mutations (supervised)
# 3. Guide directed evolution experiments (active learning)
# 4. Design novel sequences (generative, conditional on embedding space)
# Automated RNA-seq analysis pipeline
# (Pattern for agent-orchestrated bioinformatics)
def automated_rnaseq_pipeline(
fastq_dir: str,
reference_genome: str,
sample_sheet: str,
output_dir: str,
) -> dict:
"""
End-to-end RNA-seq analysis pipeline that a bioagent can orchestrate.
Steps:
1. Quality control (FastQC + MultiQC)
2. Adapter trimming (Trim Galore)
3. Alignment (STAR or HISAT2)
4. Quantification (featureCounts or Salmon)
5. Differential expression (DESeq2)
6. Pathway analysis (GSEA, enrichR)
7. Visualization and report generation
"""
pipeline_steps = {
"qc": f"fastqc {fastq_dir}/*.fastq.gz -o {output_dir}/qc/",
"trim": f"trim_galore --paired {fastq_dir}/*_R1.fastq.gz {fastq_dir}/*_R2.fastq.gz -o {output_dir}/trimmed/",
"align": f"STAR --genomeDir {reference_genome} --readFilesIn {{trimmed_R1}} {{trimmed_R2}} --outSAMtype BAM SortedByCoordinate",
"count": f"featureCounts -a {reference_genome}/genes.gtf -o {output_dir}/counts.txt {{bam_files}}",
"de_analysis": "Rscript run_deseq2.R --counts counts.txt --design sample_sheet.csv",
"pathway": "Rscript run_gsea.R --de_results de_results.csv --gene_sets msigdb.gmt",
"report": "Rmarkdown::render('analysis_report.Rmd')",
}
return {
"pipeline": pipeline_steps,
"expected_outputs": [
"qc/multiqc_report.html",
"de_results.csv",
"pathway_results.csv",
"analysis_report.html",
"figures/volcano_plot.pdf",
"figures/heatmap.pdf",
],
}
Cloud lab integration pattern:
AGENT → API → CLOUD LAB → RESULTS → AGENT
Platforms:
- Emerald Cloud Lab: Programmatic access to wet lab equipment
- Strateos: Automated biology research platform
- Arctoris: AI-integrated drug discovery lab
API pattern:
1. Agent designs experiment protocol (JSON/YAML)
2. Protocol validated against lab capabilities
3. Experiment submitted via API
4. Real-time monitoring of experiment progress
5. Results returned as structured data
6. Agent analyzes results, designs next experiment
Active learning loop:
- Agent proposes most informative experiment (Bayesian optimization)
- Lab executes experiment
- Results update model
- Repeat until convergence or budget exhausted
| Criterion | Metric | Benchmark | |-----------|--------|-----------| | Literature coverage | Recall of relevant papers | Compare to expert bibliography | | Hypothesis quality | Expert rating (1-5), novelty score | Panel of domain scientists | | Experimental design | Validity, power, feasibility | IRB/protocol review standards | | Data analysis | Accuracy, reproducibility | Gold standard datasets | | Manuscript quality | Expert review scores | Peer review simulation | | Cost efficiency | $/discovery, time to insight | Traditional lab benchmarks |
Key ethical issues in autonomous biological research:
1. DUAL USE RISK
- AI-designed pathogens or toxins
- Mitigation: Red-team evaluation, biosecurity review
- Reference: Wilson Center, NTI biosecurity frameworks
2. REPRODUCIBILITY
- Agent-generated experiments must be reproducible
- All parameters, code, and data must be logged
- Version control for every pipeline component
3. ATTRIBUTION
- Who is the "author" of AI-generated research?
- Current consensus: Humans are responsible, AI is a tool
- Journals require human accountability for all claims
4. DATA PRIVACY
- Patient data in biomedical research (HIPAA, GDPR)
- Agent access must respect data governance
- De-identification before agent processing
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.