skills/arcinstitute/sragent/SKILL.md
Query the Sequence Read Archive (SRA), retrieve scientific publications, and analyze genomics metadata using the SRAgent toolkit. Supports accession conversion (GSE→SRX→SRR), BigQuery metadata queries, manuscript downloads from multiple sources, and scRNA-seq technology identification. Use when working with SRA/GEO datasets, finding publications, or analyzing single-cell sequencing experiments.
npx skillsauth add aiskillstore/marketplace sragentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
SRAgent is an agentic workflow system for working with the NCBI Sequence Read Archive (SRA) and Gene Expression Omnibus (GEO) databases. It automates literature discovery, metadata extraction, and manuscript retrieval for genomics datasets.
SRAgent requires Python ≥3.11. Check to see if SRAgent is already installed:
which SRAgent
If SRAgent is not installed, follow the instructions below.
Install using uv:
# Clone the repository
git clone https://github.com/ArcInstitute/SRAgent.git
cd SRAgent
# Create and activate virtual environment with uv
uv venv
source .venv/bin/activate
# Install the package
uv pip install .
Verify installation:
SRAgent --help
The following environment variables are required:
OPENAI_API_KEY=sk-openai-...
ANTHROPIC_API_KEY=sk-ant-...
DYNACONF
[email protected]
NCBI_API_KEY=your-ncbi-key
CORE_API_KEY=your-core-key
GCP_PROJECT_ID=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
Prompt the user to provide the environment variables if they are not already set as environment variables: export MY_SECRET_VAR=my-secret-value.
SRAgent uses a settings file (settings.yml) to configure models and behavior.
The default configuration works for most users, but you can customize it.
No action needed - SRAgent ships with sensible defaults.
See ./references/example-settings.yml for an example settings file that you can modify as needed.
Test your configuration:
# Check which model is being used
python -c "from SRAgent.agents.utils import load_settings; s = load_settings(); print(s['models']['default'])"
# Test basic functionality
SRAgent entrez "Convert GSE121737 to SRX accessions"
Convert between different genomics database accession formats:
Query comprehensive metadata from SRA/GEO:
Leverage NCBI's BigQuery dataset for large-scale queries:
Automatically find and download manuscripts:
Use SRAgent when the user:
SRAgent entrezPurpose: Low-level NCBI Entrez database queries
Best for:
Examples:
# Convert GEO to SRX
SRAgent --no-progress --no-summaries entrez "Convert GSE121737 to SRX accessions"
# Summarize a dataset
SRAgent --no-progress --no-summaries entrez "Summarize SRX4967527"
# Link to publications
SRAgent --no-progress --no-summaries entrez "Find publications for GSE196830"
SRAgent sragentPurpose: Comprehensive metadata extraction with multiple tools
Best for:
Tools available:
Examples:
# Check sequencing technology
SRAgent --no-progress --no-summaries sragent "Which 10X Genomics technology was used for ERX11887200?"
# Comprehensive summary
SRAgent --no-progress --no-summaries sragent "Summarize SRX4967527"
# Verify data type
SRAgent --no-progress --no-summaries sragent "Is SRX4967527 single-cell RNA-seq data?"
# Get organism info
SRAgent --no-progress --no-summaries sragent "What organism was sequenced in study PRJNA498286?"
SRAgent papersPurpose: Find and download manuscripts associated with SRA accessions
Best for:
Input formats:
SRX4967527SRP167700 or PRJNA498286accession columnExamples:
# Single experiment
SRAgent --no-progress --no-summaries papers SRX4967527
# Entire study
SRAgent --no-progress --no-summaries papers PRJNA498286
# Batch from CSV
SRAgent --no-progress --no-summaries papers accessions.csv --output-dir papers/
# Custom accession column name
SRAgent --no-progress --no-summaries papers my-data.csv --accession-column "experiment_id"
# Control concurrency
SRAgent --no-progress --no-summaries papers accessions.csv --max-concurrency 3
Output:
--output-dir/<accession>/pubmed_iddoidownload_path# Step 1: Convert GEO accession to SRX
SRAgent --no-progress --no-summaries entrez "Convert GSE121737 to SRX accessions"
# Step 2: Get detailed metadata
SRAgent --no-progress --no-summaries sragent "For each SRX from GSE121737, determine: Is it single-cell? What library prep?"
# Step 3: Find associated publications
SRAgent --no-progress --no-summaries papers GSE121737 --output-dir manuscripts/
# Check if dataset meets specific criteria
SRAgent --no-progress --no-summaries sragent "Is SRX4967527 Illumina paired-end single-cell RNA-seq data?"
# Get specific technology details
SRAgent --no-progress --no-summaries sragent "Which 10X Genomics chemistry was used: SRX4967527?"
# Verify organism
SRAgent --no-progress --no-summaries sragent "What organism is SRX4967527?"
# Create CSV with accessions
cat > accessions.csv << EOF
accession
SRX4967527
SRX4967528
SRX4967529
EOF
# Download all papers
SRAgent --no-progress --no-summaries \
papers accessions.csv \
--output-dir papers/ \
--max-concurrency 5
# Result: CSV enriched with DOIs and download paths
# Get all experiments in a study
SRAgent --no-progress --no-summaries entrez "List all SRX accessions for study SRP167700"
# Or use a BioProject accession
SRAgent --no-progress --no-summaries entrez "Convert PRJNA498286 to SRX accessions"
# Then analyze the study
SRAgent --no-progress --no-summaries sragent "Summarize the library prep technologies used in PRJNA498286"
When the user needs SRAgent functionality, use the bash tool:
# Example: Convert accessions
result = bash_tool(
command="SRAgent --no-progress --no-summaries entrez 'Convert GSE121737 to SRX accessions'",
description="Converting GEO accession to SRX format"
)
# Example: Get metadata
result = bash_tool(
command="SRAgent --no-progress --no-summaries sragent 'Which 10X technology was used for SRX4967527?'",
description="Determining library preparation technology"
)
# Example: Download papers
result = bash_tool(
command="SRAgent --no-progress --no-summaries papers SRX4967527 --output-dir /home/claude/papers",
description="Downloading manuscripts for dataset"
)
When processing batch data:
import pandas as pd
# User provides accessions - create CSV
accessions = ["SRX4967527", "SRX4967528", "SRX4967529"]
df = pd.DataFrame({"accession": accessions})
df.to_csv("/home/claude/accessions.csv", index=False)
# Run SRAgent papers command
result = bash_tool(
command="SRAgent --no-progress --no-summaries papers /home/claude/accessions.csv --output-dir /home/claude/papers",
description="Batch downloading papers for multiple accessions"
)
# Read enriched CSV
enriched_df = pd.read_csv("/home/claude/accessions.csv")
# Now has: accession, pubmed_id, doi, download_path columns
GSE + 5-7 digits (e.g., GSE121737)GSM + 6-7 digits (e.g., GSM3457845)SRP + 6 digits (e.g., SRP167700)
PRJNA + 6 digits (e.g., PRJNA498286)SRX + 7-8 digits (e.g., SRX4967527)SRR + 7-8 digits (e.g., SRR8124405)ERP + 6 digits or PRJEB + 6 digitsERX + 7-8 digits (e.g., ERX11887200)ERR + 7-8 digitsGEO Series (GSE)
↓
SRA Study (SRP) = BioProject (PRJNA)
↓
SRA Experiment (SRX) ← Links to → Publications (PubMed ID, DOI)
↓
SRA Run (SRR) [actual sequence files]
SRAgent can identify these scRNA-seq technologies:
SRAgent uses multiple signals:
If you don't have Google Cloud credentials:
# SRAgent gracefully falls back to Entrez-only queries
# BigQuery features will be skipped with a warning
# These still work without BigQuery:
SRAgent --no-progress --no-summaries entrez "Convert GSE121737 to SRX accessions"
SRAgent --no-progress --no-summaries papers SRX4967527
# This will warn but proceed:
SRAgent --no-progress --no-summaries sragent "Which 10X technology for SRX4967527?"
# (Uses Entrez + web scraping instead of BigQuery)
# For large batch operations, adjust concurrency
SRAgent --no-progress --no-summaries papers large-dataset.csv \
--max-concurrency 10 \
--recursion-limit 150
# For paper downloads specifically
SRAgent --no-progress --no-summaries papers accessions.csv \
--core-api-key "$CORE_API_KEY" \
--email "$EMAIL" \
--max-concurrency 5
# Ensure package is installed
cd SRAgent
uv pip install .
# Verify installation
python -c "import SRAgent; print(SRAgent.__file__)"
# Get NCBI API key: https://www.ncbi.nlm.nih.gov/account/settings/
export NCBI_API_KEY="your-ncbi-api-key"
# Reduces concurrent requests
SRAgent papers accessions.csv --max-concurrency 3
Check: Is DOI found?
Check: Multiple sources attempted?
Check: Network/authentication
./references/metadata-fields.md
./references/quick-reference.md./references/usage-examples.md./references/example-settings.yml
development
Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.
tools
Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.
testing
Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.
tools
GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.