skills/expression-data-retrieval/SKILL.md
ToolUniverse workflow — Expression Data Retrieval
npx skillsauth add lamm-mit/scienceclaw expression-data-retrievalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Retrieve gene expression experiments and multi-omics datasets with proper disambiguation and quality assessment.
IMPORTANT: Always use English terms in tool calls (gene names, tissue names, condition descriptions), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.
Phase 0: Clarify Query (if ambiguous)
↓
Phase 1: Disambiguate Gene/Condition
↓
Phase 2: Search & Retrieve (Internal)
↓
Phase 3: Report Dataset Profile
Ask the user ONLY if:
Skip clarification for:
If searching by gene, first resolve official identifiers:
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# For gene-focused searches, resolve official symbol first
# This helps construct better search queries
# Example: "p53" → "TP53" (official HGNC symbol)
Gene Disambiguation Checklist:
| User Query Type | Search Strategy | |-----------------|-----------------| | Specific accession | Direct retrieval | | Gene + condition | "[gene] [condition]" + species filter | | Disease only | "[disease]" + species filter | | Technology-specific | Add platform keywords (RNA-seq, microarray) |
Search silently. Do NOT narrate the process.
# ArrayExpress search
result = tu.tools.arrayexpress_search_experiments(
keywords="[gene/disease] [condition]",
species="[species]",
limit=20
)
# BioStudies for multi-omics
biostudies_result = tu.tools.biostudies_search_studies(
query="[keywords]",
limit=10
)
For top results, retrieve full metadata:
# Get details for each relevant experiment
details = tu.tools.arrayexpress_get_experiment_details(
accession=accession
)
# Get sample information
samples = tu.tools.arrayexpress_get_experiment_samples(
accession=accession
)
# Get available files
files = tu.tools.arrayexpress_get_experiment_files(
accession=accession
)
# Multi-omics study details
study_details = tu.tools.biostudies_get_study_details(
accession=study_accession
)
# Study structure
sections = tu.tools.biostudies_get_study_sections(
accession=study_accession
)
# Available files
files = tu.tools.biostudies_get_study_files(
accession=study_accession
)
| Primary | Fallback | Notes | |---------|----------|-------| | ArrayExpress search | BioStudies search | ArrayExpress empty | | arrayexpress_get_experiment_details | biostudies_get_study_details | E-GEOD may have BioStudies mirror | | arrayexpress_get_experiment_files | Note "Files unavailable" | Some studies restrict downloads |
Present as a Dataset Search Report. Hide search process.
# Expression Data: [Query Topic]
**Search Summary**
- Query: [gene/disease] in [species]
- Databases: ArrayExpress, BioStudies
- Results: [N] relevant experiments found
**Data Quality Overview**: [assessment based on criteria below]
---
## Top Experiments
### 1. [E-MTAB-XXXX]: [Title]
| Attribute | Value |
|-----------|-------|
| **Accession** | [accession with link] |
| **Organism** | [species] |
| **Experiment Type** | RNA-seq / Microarray |
| **Platform** | [specific platform] |
| **Samples** | [N] samples |
| **Release Date** | [date] |
**Description**: [Brief description from metadata]
**Experimental Design**:
- Conditions: [treatment vs control, etc.]
- Replicates: [N biological, M technical]
- Tissue/Cell type: [if specified]
**Sample Groups**:
| Group | Samples | Description |
|-------|---------|-------------|
| Control | [N] | [description] |
| Treatment | [N] | [description] |
**Data Files Available**:
| File | Type | Size |
|------|------|------|
| [filename] | Processed data | [size] |
| [filename] | Raw data | [size] |
| [filename] | Sample metadata | [size] |
**Quality Assessment**: ●●● High / ●●○ Medium / ●○○ Low
- Sample size: [adequate/limited]
- Replication: [yes/no]
- Metadata completeness: [complete/partial]
---
### 2. [E-GEOD-XXXXX]: [Title]
[Same structure as above]
---
## Multi-Omics Studies (from BioStudies)
### [S-BSST-XXXXX]: [Title]
| Attribute | Value |
|-----------|-------|
| **Accession** | [accession] |
| **Study Type** | [proteomics/metabolomics/integrated] |
| **Organism** | [species] |
| **Samples** | [N] |
**Data Types Included**:
- [ ] Transcriptomics
- [ ] Proteomics
- [ ] Metabolomics
- [ ] Other: [specify]
---
## Summary Table
| Accession | Type | Samples | Platform | Quality |
|-----------|------|---------|----------|---------|
| [E-MTAB-X] | RNA-seq | [N] | Illumina | ●●● |
| [E-GEOD-X] | Microarray | [N] | Affymetrix | ●●○ |
---
## Recommendations
**For [specific analysis type]**:
- Best experiment: [accession] - [reason]
- Alternative: [accession] - [reason]
**Data Integration Notes**:
- Platform compatibility: [notes on combining datasets]
- Batch considerations: [if applicable]
---
## Data Access
### Direct Download Links
- [E-MTAB-XXXX processed data](link)
- [E-MTAB-XXXX raw data](link)
### Database Links
- ArrayExpress: https://www.ebi.ac.uk/arrayexpress/experiments/[accession]
- BioStudies: https://www.ebi.ac.uk/biostudies/studies/[accession]
Retrieved: [date]
Assessment criteria for expression experiments:
| Tier | Symbol | Criteria | |------|--------|----------| | High Quality | ●●● | ≥3 bio replicates, complete metadata, processed data available | | Medium Quality | ●●○ | 2-3 replicates OR some metadata gaps, data accessible | | Low Quality | ●○○ | No replicates, sparse metadata, or data access issues | | Use with Caution | ○○○ | Single sample, no replication, outdated platform |
Include assessment rationale:
**Quality**: ●●● High
- ✓ 4 biological replicates per condition
- ✓ Complete sample annotations
- ✓ Processed and raw data available
- ✓ Recent RNA-seq platform
Every dataset report MUST include:
User: "Find breast cancer RNA-seq data"
result = tu.tools.arrayexpress_search_experiments(
keywords="breast cancer RNA-seq",
species="Homo sapiens",
limit=20
)
→ Report top experiments with quality assessment
User: "Find TP53 expression experiments in mouse"
result = tu.tools.arrayexpress_search_experiments(
keywords="TP53 p53", # Include aliases
species="Mus musculus",
limit=15
)
→ Report experiments studying this gene
User: "Get details for E-MTAB-5214" → Single experiment profile with all details and files
User: "Find proteomics and transcriptomics studies for liver disease" → Search both ArrayExpress and BioStudies, note integration potential
| Error | Response | |-------|----------| | "No experiments found" | Broaden keywords, remove species filter, try synonyms | | "Accession not found" | Verify format (E-MTAB-, E-GEOD-, S-BSST*), check if withdrawn | | "Files not available" | Note in report: "Data files restricted by submitter" | | "API timeout" | Retry once, then note: "(metadata retrieval incomplete)" |
ArrayExpress (Gene Expression)
| Tool | Purpose |
|------|---------|
| arrayexpress_search_experiments | Keyword/species search |
| arrayexpress_get_experiment_details | Full metadata |
| arrayexpress_get_experiment_files | Download links |
| arrayexpress_get_experiment_samples | Sample annotations |
BioStudies (Multi-Omics)
| Tool | Purpose |
|------|---------|
| biostudies_search_studies | Multi-omics search |
| biostudies_get_study_details | Study metadata |
| biostudies_get_study_files | Data files |
| biostudies_get_study_sections | Study structure |
ArrayExpress
| Parameter | Description | Example |
|-----------|-------------|---------|
| keywords | Free text search | "breast cancer RNA-seq" |
| species | Scientific name | "Homo sapiens" |
| array | Platform filter | "Illumina" |
| limit | Max results | 20 |
BioStudies
| Parameter | Description | Example |
|-----------|-------------|---------|
| query | Free text | "proteomics liver" |
| limit | Max results | 10 |
tools
Onboard and manage Paperclip AI for research-paper knowledge and agent orchestration
development
Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.
testing
Generate a structured scientific PDF report from a JSON description. Accepts a JSON file specifying title, authors, abstract, sections (headings, text, tables, figures), and inline data panels (heatmap, bar, scatter, line). Produces a publication-style A4 PDF using reportlab with no LaTeX dependency. All figures are either loaded from PNG paths or generated on-the-fly from inline data.
development
Execute arbitrary Python code and return stdout. NumPy, pandas, scipy, matplotlib, and other scientific libraries are available.