skills/domains/biomedical/genotex-benchmark-guide/SKILL.md
Benchmark for LLM agents on gene expression data analysis
npx skillsauth add wentorai/research-plugins genotex-benchmark-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
GenoTEX is a benchmark for evaluating LLM-based agents on gene expression data analysis tasks. It provides curated datasets from GEO (Gene Expression Omnibus) with ground-truth analysis pipelines, testing agents on data preprocessing, differential expression, enrichment analysis, and biological interpretation. Published at MLCB 2025 as an oral presentation.
GenoTEX Benchmark
├── Data Collection
│ └── Curated GEO datasets with ground truth
├── Task Categories
│ ├── Data preprocessing (QC, normalization)
│ ├── Differential expression analysis
│ ├── Gene set enrichment analysis
│ ├── Clustering and classification
│ └── Biological interpretation
├── Evaluation
│ ├── Code correctness (executes without error)
│ ├── Statistical validity (appropriate tests)
│ ├── Result accuracy (vs ground truth)
│ └── Interpretation quality (biological insight)
└── Baselines
├── GPT-4 agent
├── Claude agent
└── Domain-specific fine-tuned models
from genotex import GenoTEXBenchmark
bench = GenoTEXBenchmark()
# List available tasks
tasks = bench.list_tasks()
for task in tasks[:5]:
print(f"Task: {task.id}")
print(f" Dataset: {task.geo_accession}")
print(f" Category: {task.category}")
print(f" Difficulty: {task.difficulty}")
# Get a specific task
task = bench.get_task("GSE12345_DEG")
print(f"Description: {task.description}")
print(f"Input files: {task.input_files}")
print(f"Expected output: {task.expected_output_type}")
# Evaluate an agent on GenoTEX
from genotex import evaluate_agent
results = evaluate_agent(
agent_fn=my_agent_function,
tasks="all", # or specific task IDs
timeout_per_task=300, # seconds
)
print(f"Tasks completed: {results.completed}/{results.total}")
print(f"Code correctness: {results.code_correct_rate:.1%}")
print(f"Statistical validity: {results.stats_valid_rate:.1%}")
print(f"Result accuracy: {results.accuracy:.3f}")
# Example: Differential Expression Analysis
task = {
"id": "GSE12345_DEG",
"description": "Identify differentially expressed genes "
"between treatment and control groups in "
"this RNA-seq dataset.",
"input": "GSE12345_counts.csv", # Raw count matrix
"metadata": "GSE12345_metadata.csv", # Sample info
"expected": {
"method": "DESeq2 or limma-voom",
"output": "DEG table with log2FC, p-value, adj.p",
"ground_truth": "GSE12345_deg_truth.csv",
},
}
# Example: Gene Set Enrichment
task = {
"id": "GSE12345_GSEA",
"description": "Perform gene set enrichment analysis on "
"the DEGs and identify enriched pathways.",
"input": "GSE12345_deg_results.csv",
"expected": {
"method": "fgsea, clusterProfiler, or enrichR",
"output": "Enriched pathways with NES and FDR",
},
}
documentation
Write Tsinghua University theses using the ThuThesis LaTeX template
development
Templates, formatting rules, and strategies for thesis and dissertation writing
documentation
Set up LaTeX templates for PhD and Master's thesis documents
documentation
Write SJTU theses using the SJTUThesis LaTeX template with full compliance