skills/labclaw/bio/tooluniverse-protein-interactions/SKILL.md
Analyze protein-protein interaction networks using STRING, BioGRID, and SASBDB databases. Maps protein identifiers, retrieves interaction networks with confidence scores, performs functional enrichment analysis (GO/KEGG/Reactome), and optionally includes structural data. No API key required for core functionality (STRING). Use when analyzing protein networks, discovering interaction partners, identifying functional modules, or studying protein complexes.
npx skillsauth add andyzhuang/openlife Protein Interaction Network AnalysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive protein interaction network analysis using ToolUniverse tools. Analyzes protein networks through a 4-phase workflow: identifier mapping, network retrieval, enrichment analysis, and optional structural data.
✅ Identifier Mapping - Convert protein names to database IDs (STRING, UniProt, Ensembl) ✅ Network Retrieval - Get interaction networks with confidence scores (0-1.0) ✅ Functional Enrichment - GO terms, KEGG pathways, Reactome pathways ✅ PPI Enrichment - Test if proteins form functional modules ✅ Structural Data - Optional SAXS/SANS solution structures (SASBDB) ✅ Fallback Strategy - STRING primary (no API key) → BioGRID secondary (if key available)
| Database | Coverage | API Key | Purpose | |----------|----------|---------|---------| | STRING | 14M+ proteins, 5,000+ organisms | ❌ Not required | Primary interaction source | | BioGRID | 2.3M+ interactions, 80+ organisms | ✅ Required | Fallback, curated data | | SASBDB | 2,000+ SAXS/SANS entries | ❌ Not required | Solution structures |
from tooluniverse import ToolUniverse
from python_implementation import analyze_protein_network
# Initialize ToolUniverse
tu = ToolUniverse()
# Analyze protein network
result = analyze_protein_network(
tu=tu,
proteins=["TP53", "MDM2", "ATM", "CHEK2"],
species=9606, # Human
confidence_score=0.7 # High confidence
)
# Access results
print(f"Mapped: {len(result.mapped_proteins)} proteins")
print(f"Network: {result.total_interactions} interactions")
print(f"Enrichment: {len(result.enriched_terms)} GO terms")
print(f"PPI p-value: {result.ppi_enrichment.get('p_value', 1.0):.2e}")
🔍 Phase 1: Mapping 4 protein identifiers...
✅ Mapped 4/4 proteins (100.0%)
🕸️ Phase 2: Retrieving interaction network...
✅ STRING: Retrieved 6 interactions
🧬 Phase 3: Performing enrichment analysis...
✅ Found 245 enriched GO terms (FDR < 0.05)
✅ PPI enrichment significant (p=3.45e-05)
✅ Analysis complete!
Discover interaction partners for a protein of interest:
result = analyze_protein_network(
tu=tu,
proteins=["TP53"], # Single protein
species=9606,
confidence_score=0.7
)
# Top 5 partners will be in the network
for edge in result.network_edges[:5]:
print(f"{edge['preferredName_A']} ↔ {edge['preferredName_B']} "
f"(score: {edge['score']})")
Test if proteins form a functional complex:
# DNA damage response proteins
proteins = ["TP53", "ATM", "CHEK2", "BRCA1", "BRCA2"]
result = analyze_protein_network(tu=tu, proteins=proteins)
# Check PPI enrichment
if result.ppi_enrichment.get("p_value", 1.0) < 0.05:
print("✅ Proteins form functional module!")
print(f" Expected edges: {result.ppi_enrichment['expected_number_of_edges']:.1f}")
print(f" Observed edges: {result.ppi_enrichment['number_of_edges']}")
else:
print("⚠️ Proteins may be unrelated")
Find enriched pathways for a protein set:
result = analyze_protein_network(
tu=tu,
proteins=["MAPK1", "MAPK3", "RAF1", "MAP2K1"], # MAPK pathway
confidence_score=0.7
)
# Show top enriched processes
print("\nTop Enriched Pathways:")
for term in result.enriched_terms[:10]:
print(f" {term['term']}: p={term['p_value']:.2e}, FDR={term['fdr']:.2e}")
Build complete interaction network for multiple proteins:
# Apoptosis regulators
proteins = ["TP53", "BCL2", "BAX", "CASP3", "CASP9"]
result = analyze_protein_network(
tu=tu,
proteins=proteins,
confidence_score=0.7
)
# Export network for Cytoscape
import pandas as pd
df = pd.DataFrame(result.network_edges)
df.to_csv("apoptosis_network.tsv", sep="\t", index=False)
Use BioGRID for experimentally validated interactions:
# Requires BIOGRID_API_KEY in environment
result = analyze_protein_network(
tu=tu,
proteins=["TP53", "MDM2"],
include_biogrid=True # Enable BioGRID fallback
)
print(f"Primary source: {result.primary_source}") # "STRING" or "BioGRID"
Add SAXS/SANS solution structures:
result = analyze_protein_network(
tu=tu,
proteins=["TP53"],
include_structure=True # Query SASBDB
)
if result.structural_data:
print(f"\nFound {len(result.structural_data)} SAXS/SANS entries:")
for entry in result.structural_data:
print(f" {entry.get('sasbdb_id')}: {entry.get('title')}")
analyze_protein_network() Parameters| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| tu | ToolUniverse | Required | ToolUniverse instance |
| proteins | list[str] | Required | Protein identifiers (gene symbols, UniProt IDs) |
| species | int | 9606 | NCBI taxonomy ID (9606=human, 10090=mouse) |
| confidence_score | float | 0.7 | Min interaction confidence (0-1). 0.4=low, 0.7=high, 0.9=very high |
| include_biogrid | bool | False | Use BioGRID if STRING fails (requires API key) |
| include_structure | bool | False | Include SASBDB structural data (slower) |
| suppress_warnings | bool | True | Suppress ToolUniverse loading warnings |
9606 - Homo sapiens (human)10090 - Mus musculus (mouse)10116 - Rattus norvegicus (rat)7227 - Drosophila melanogaster (fruit fly)6239 - Caenorhabditis elegans (worm)7955 - Danio rerio (zebrafish)559292 - Saccharomyces cerevisiae (yeast)| Score | Level | Description | Use Case | |-------|-------|-------------|----------| | 0.15 | Very low | All evidence | Exploratory, hypothesis generation | | 0.4 | Low | Medium evidence | Default STRING threshold | | 0.7 | High | Strong evidence | Recommended - reliable interactions | | 0.9 | Very high | Strongest evidence | Core interactions only |
ProteinNetworkResult Object@dataclass
class ProteinNetworkResult:
# Phase 1: Identifier mapping
mapped_proteins: List[Dict[str, Any]]
mapping_success_rate: float
# Phase 2: Network retrieval
network_edges: List[Dict[str, Any]]
total_interactions: int
# Phase 3: Enrichment analysis
enriched_terms: List[Dict[str, Any]]
ppi_enrichment: Dict[str, Any]
# Phase 4: Structural data (optional)
structural_data: Optional[List[Dict[str, Any]]]
# Metadata
primary_source: str # "STRING" or "BioGRID"
warnings: List[str]
{
"stringId_A": "9606.ENSP00000269305", # Protein A STRING ID
"stringId_B": "9606.ENSP00000258149", # Protein B STRING ID
"preferredName_A": "TP53", # Protein A name
"preferredName_B": "MDM2", # Protein B name
"ncbiTaxonId": 9606, # Species
"score": 0.999, # Combined confidence (0-1)
"nscore": 0.0, # Neighborhood score
"fscore": 0.0, # Gene fusion score
"pscore": 0.0, # Phylogenetic profile score
"ascore": 0.947, # Coexpression score
"escore": 0.951, # Experimental score
"dscore": 0.9, # Database score
"tscore": 0.994 # Text mining score
}
{
"category": "Process", # GO category
"term": "GO:0006915", # GO term ID
"description": "apoptotic process", # Term description
"number_of_genes": 4, # Genes in your set
"number_of_genes_in_background": 1234, # Genes in genome
"p_value": 1.23e-05, # Enrichment p-value
"fdr": 0.0012, # FDR correction
"inputGenes": "TP53,MDM2,BAX,CASP3" # Matching genes
}
┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Identifier Mapping │
│ ─────────────────────────────────────────────────────────── │
│ STRING_map_identifiers() │
│ • Validates protein names exist in database │
│ • Converts to STRING IDs for consistency │
│ • Returns mapping success rate │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Network Retrieval │
│ ─────────────────────────────────────────────────────────── │
│ PRIMARY: STRING_get_network() (no API key needed) │
│ • Retrieves all pairwise interactions │
│ • Returns confidence scores by evidence type │
│ │
│ FALLBACK: BioGRID_get_interactions() (if enabled) │
│ • Used if STRING fails or for validation │
│ • Requires BIOGRID_API_KEY │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: Enrichment Analysis │
│ ─────────────────────────────────────────────────────────── │
│ STRING_functional_enrichment() │
│ • GO terms (Process, Component, Function) │
│ • KEGG pathways │
│ • Reactome pathways │
│ • FDR-corrected p-values │
│ │
│ STRING_ppi_enrichment() │
│ • Tests if proteins interact more than random │
│ • Returns p-value for functional coherence │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 4: Structural Data (Optional) │
│ ─────────────────────────────────────────────────────────── │
│ SASBDB_search_entries() │
│ • SAXS/SANS solution structures │
│ • Protein flexibility and conformations │
│ • Complements crystal/cryo-EM data │
└─────────────────────────────────────────────────────────────┘
# Install ToolUniverse (if not already installed)
pip install tooluniverse
# Or with extras
pip install tooluniverse[all]
For BioGRID fallback functionality:
.env file:
BIOGRID_API_KEY=your_key_here
tooluniverse-protein-interactions/
├── SKILL.md # This file
├── python_implementation.py # Main implementation
├── QUICK_START.md # Quick reference
├── DOMAIN_ANALYSIS.md # Design rationale
├── PHASE2_COMPLETE.md # Tool testing results
├── PHASE4_IMPLEMENTATION_COMPLETE.md
└── KNOWN_ISSUES.md # ToolUniverse limitations
Issue: ToolUniverse prints 40+ warning messages during analysis.
Workaround: Filter output when running:
python your_script.py 2>&1 | grep -v "Error loading tools"
See KNOWN_ISSUES.md for details.
BioGRID fallback requires free API key. STRING works without any API key.
SASBDB endpoints occasionally return errors. Structural data is optional.
| Operation | Time | Notes | |-----------|------|-------| | Identifier mapping | 1-2 sec | For 5 proteins | | Network retrieval | 2-3 sec | Depends on network size | | Enrichment analysis | 3-5 sec | For 374 terms | | Full 4-phase analysis | 6-10 sec | Excluding ToolUniverse overhead |
Note: Add 4-8 seconds per tool call for ToolUniverse loading (framework limitation).
include_structure=Falseconfidence_score=0.9✅ Fixed in this skill - All parameter names verified in Phase 2 testing.
confidence_score=0.4BIOGRID_API_KEY is set in environmentSee python_implementation.py for:
example_tp53_analysis() - Complete TP53 network analysisanalyze_protein_network() - Main function with all optionsProteinNetworkResult - Result data structureFor issues with:
Same as ToolUniverse framework license.
tools
Search ClinicalTrials.gov with natural language queries. Find clinical trials, enrollment, and outcomes using Valyu semantic search.
development
Comprehensive citation management for academic research. Search Google Scholar and PubMed for papers, extract accurate metadata, validate citations, and generate properly formatted BibTeX entries. This skill should be used when you need to find papers, verify citation information, convert DOIs to BibTeX, or ensure reference accuracy in scientific writing.
development
Unified Python interface to 40+ bioinformatics services. Use when querying multiple databases (UniProt, KEGG, ChEMBL, Reactome) in a single workflow with consistent API. Best for cross-database analysis, ID mapping across services. For quick single-database lookups use gget; for sequence/file manipulation use biopython.
tools
Search bioRxiv biology preprints with natural language queries. Semantic search powered by Valyu.