STRING Database — Protein-Protein Interactions

Overview

Query the STRING protein-protein interaction database (59M proteins, 20B+ interactions, 5000+ species) via REST API. Covers network retrieval, functional enrichment (GO, KEGG, Pfam), interaction partner discovery, PPI enrichment testing, network visualization, and homology analysis.

When to Use

Retrieving protein-protein interaction networks for one or multiple proteins
Performing functional enrichment analysis (GO, KEGG, Pfam, InterPro) on protein lists
Discovering interaction partners and expanding protein networks from seed proteins
Testing whether a set of proteins forms a significantly enriched functional module
Generating network visualizations with evidence-based coloring
Analyzing homology and protein family relationships across species
Identifying hub proteins and network connectivity patterns
For chemical compound interactions use chembl-database-bioactivity instead; for pathway-centric queries use kegg-database

Prerequisites

uv pip install requests pandas

Rate limiting: No strict rate limit, but wait ~1 second between API calls. For proteome-scale analyses, use bulk downloads from https://string-db.org/cgi/download instead of the API.

Quick Start

import requests
import time

STRING_API = "https://string-db.org/api"

def string_query(endpoint, params, fmt="tsv"):
    """Reusable helper for all STRING API calls."""
    url = f"{STRING_API}/{fmt}/{endpoint}"
    params.setdefault("caller_identity", "python_script")
    response = requests.get(url, params=params)
    response.raise_for_status()
    return response.text

# Map gene names to STRING IDs (always do this first)
result = string_query("get_string_ids", {
    "identifiers": "TP53\nBRCA1\nEGFR",
    "species": 9606
})
print(result)

# Get interaction network
time.sleep(1)
network = string_query("network", {
    "identifiers": "TP53%0dBRCA1%0dMDM2",
    "species": 9606,
    "required_score": 400
})
print(network[:500])

Key Concepts

Common Species NCBI Taxon IDs

| Organism | Common Name | Taxon ID | |----------|-------------|----------| | Homo sapiens | Human | 9606 | | Mus musculus | Mouse | 10090 | | Rattus norvegicus | Rat | 10116 | | Drosophila melanogaster | Fruit fly | 7227 | | Caenorhabditis elegans | C. elegans | 6239 | | Saccharomyces cerevisiae | Yeast | 4932 | | Arabidopsis thaliana | Thale cress | 3702 | | Escherichia coli K-12 | E. coli | 511145 | | Danio rerio | Zebrafish | 7955 | | Gallus gallus | Chicken | 9031 |

Full species list: https://string-db.org/cgi/input?input_page_active_form=organisms

STRING Identifier Format

STRING uses Ensembl protein IDs with taxon prefix: {taxonId}.{ensemblProteinId} (e.g., 9606.ENSP00000269305 for human TP53). Always map gene names to STRING IDs first via get_string_ids for faster subsequent queries.

Interaction Confidence Scores

Combined scores (0-1000) integrating 7 evidence channels:

| Channel | Code | Source | |---------|------|--------| | Neighborhood | nscore | Conserved genomic neighborhood | | Fusion | fscore | Gene fusion events | | Phylogenetic profile | pscore | Co-occurrence across species | | Coexpression | ascore | Correlated RNA expression | | Experimental | escore | Biochemical/genetic experiments | | Database | dscore | Curated pathway/complex databases | | Text-mining | tscore | Literature co-occurrence and NLP |

Recommended thresholds:

150: Low confidence (exploratory, hypothesis generation)
400: Medium confidence (standard analysis, default)
700: High confidence (conservative, fewer false positives)
900: Highest confidence (very stringent, experimental evidence preferred)

Network Types

Functional (default): All evidence types — proteins functionally associated even without direct binding. Use for pathway analysis, enrichment, systems biology
Physical: Direct binding evidence only — experimental data and curated physical interactions. Use for structural studies, complex analysis

Output Formats

Replace /tsv/ in the URL with the desired format:

TSV: Tab-separated (default, best for data processing)
JSON: Structured data (/json/)
PNG/SVG: Network images (/image/)
PSI-MI/PSI-MITAB: Proteomics standard formats

Core API

1. Identifier Mapping

# Map gene names to STRING IDs
result = string_query("get_string_ids", {
    "identifiers": "TP53\nBRCA1\nEGFR",
    "species": 9606,
    "limit": 1,        # matches per identifier
    "echo_query": 1    # include query term in output
})

# Parse the mapping
import pandas as pd
import io
df = pd.read_csv(io.StringIO(result), sep='\t')
id_map = dict(zip(df['queryItem'], df['stringId']))
print(id_map)
# {'TP53': '9606.ENSP00000269305', 'BRCA1': '9606.ENSP00000...', ...}

2. Network Retrieval

# Get PPI network with confidence scores
network = string_query("network", {
    "identifiers": "TP53%0dBRCA1%0dMDM2%0dATM%0dCHEK2",
    "species": 9606,
    "required_score": 400,
    "network_type": "functional"  # or "physical"
})

# Parse network edges
time.sleep(1)
df = pd.read_csv(io.StringIO(network), sep='\t')
print(f"Found {len(df)} interactions")
print(df[['preferredName_A', 'preferredName_B', 'score']].head())

# Expand network with additional interactors
expanded = string_query("network", {
    "identifiers": "TP53",
    "species": 9606,
    "add_nodes": 10,  # add 10 most connected proteins
    "required_score": 700
})

3. Network Visualization

# Get PNG network image
url = f"{STRING_API}/image/network"
params = {
    "identifiers": "TP53%0dMDM2%0dATM%0dCHEK2%0dBRCA1",
    "species": 9606,
    "required_score": 700,
    "network_flavor": "evidence",  # "evidence", "confidence", or "actions"
    "caller_identity": "python_script"
}
response = requests.get(url, params=params)
with open("network.png", "wb") as f:
    f.write(response.content)

4. Interaction Partners

# Discover top interaction partners
partners = string_query("interaction_partners", {
    "identifiers": "TP53",
    "species": 9606,
    "limit": 20,
    "required_score": 700
})

df = pd.read_csv(io.StringIO(partners), sep='\t')
print(f"Top 20 TP53 interactors:")
print(df[['preferredName_B', 'score']].head(10))

5. Functional Enrichment

# GO, KEGG, Pfam, InterPro, SMART, UniProt Keywords enrichment
# Statistical method: Fisher's exact test with Benjamini-Hochberg FDR correction
enrichment = string_query("enrichment", {
    "identifiers": "TP53%0dMDM2%0dATM%0dCHEK2%0dBRCA1%0dATR%0dTP73",
    "species": 9606
})

df = pd.read_csv(io.StringIO(enrichment), sep='\t')
significant = df[df['fdr'] < 0.05]
print(f"Significant terms: {len(significant)}")

# Group by annotation category
for cat, group in significant.groupby('category'):
    print(f"\n{cat}: {len(group)} terms")
    for _, row in group.head(3).iterrows():
        print(f"  {row['description']} (FDR={row['fdr']:.2e})")

6. PPI Enrichment Testing

import json

# Test if proteins form a significant functional module
result = string_query("ppi_enrichment", {
    "identifiers": "TP53%0dMDM2%0dATM%0dCHEK2%0dBRCA1",
    "species": 9606,
    "required_score": 400
}, fmt="json")

data = json.loads(result)
print(f"Observed edges: {data['number_of_edges']}")
print(f"Expected edges: {data['expected_number_of_edges']}")
print(f"P-value: {data['p_value']}")
# p < 0.05 → proteins form a significantly enriched network

7. Homology Scores

# Get homology/similarity between proteins
homology = string_query("homology", {
    "identifiers": "TP53%0dTP63%0dTP73",
    "species": 9606
})
print(homology)

Common Workflows

Workflow 1: Protein List Analysis (Standard)

import requests, pandas as pd, io, json, time

STRING_API = "https://string-db.org/api"
def string_query(endpoint, params, fmt="tsv"):
    url = f"{STRING_API}/{fmt}/{endpoint}"
    params.setdefault("caller_identity", "python_script")
    response = requests.get(url, params=params)
    response.raise_for_status()
    time.sleep(1)
    return response.text

genes = "TP53%0dBRCA1%0dATM%0dCHEK2%0dMDM2%0dATR%0dBRCA2"

# Step 1: Map identifiers
mapping = string_query("get_string_ids", {"identifiers": genes.replace("%0d", "\n"), "species": 9606})

# Step 2: Get interaction network
network = string_query("network", {"identifiers": genes, "species": 9606, "required_score": 400})
net_df = pd.read_csv(io.StringIO(network), sep='\t')
print(f"Network: {len(net_df)} interactions")

# Step 3: Test PPI enrichment
ppi = json.loads(string_query("ppi_enrichment", {"identifiers": genes, "species": 9606}, fmt="json"))
print(f"PPI enrichment p-value: {ppi['p_value']}")

# Step 4: Functional enrichment
enrich = string_query("enrichment", {"identifiers": genes, "species": 9606})
enrich_df = pd.read_csv(io.StringIO(enrich), sep='\t')
sig = enrich_df[enrich_df['fdr'] < 0.05]
print(f"Significant GO/KEGG terms: {len(sig)}")

# Step 5: Save network image
img_resp = requests.get(f"{STRING_API}/image/network", params={
    "identifiers": genes, "species": 9606, "required_score": 400,
    "network_flavor": "evidence", "caller_identity": "python_script"
})
with open("protein_network.png", "wb") as f:
    f.write(img_resp.content)

Workflow 2: Network Expansion from Seed Proteins

# Start with seed proteins, discover connected functional modules
seed = "TP53"

# Step 1: Get high-confidence interaction partners
partners = string_query("interaction_partners", {
    "identifiers": seed, "species": 9606, "limit": 30, "required_score": 700
})
df = pd.read_csv(io.StringIO(partners), sep='\t')
all_proteins = list(set(df['preferredName_A'].tolist() + df['preferredName_B'].tolist()))
print(f"Expanded network: {len(all_proteins)} proteins")

# Step 2: Enrichment on expanded set
expanded_ids = "%0d".join(all_proteins[:50])
enrichment = string_query("enrichment", {"identifiers": expanded_ids, "species": 9606})
enrich_df = pd.read_csv(io.StringIO(enrichment), sep='\t')
modules = enrich_df[enrich_df['fdr'] < 0.001]
print(f"Highly significant terms: {len(modules)}")

Workflow 3: Cross-Species Comparison

# Compare protein interactions across species
for species, name, gene in [(9606, "Human", "TP53"), (10090, "Mouse", "Trp53")]:
    network = string_query("network", {
        "identifiers": gene, "species": species,
        "required_score": 700, "add_nodes": 5
    })
    df = pd.read_csv(io.StringIO(network), sep='\t')
    print(f"{name} ({gene}): {len(df)} interactions at score >= 700")

Common Recipes

Recipe: Parse Enrichment Results to DataFrame

import pandas as pd, io

enrichment_tsv = string_query("enrichment", {
    "identifiers": "TP53%0dBRCA1%0dATM", "species": 9606
})
df = pd.read_csv(io.StringIO(enrichment_tsv), sep='\t')
# Columns: category, term, description, number_of_genes, p_value, fdr
kegg = df[df['category'] == 'KEGG'].sort_values('fdr')
print(kegg[['description', 'fdr']].head(5))

Recipe: Batch Protein Queries with Rate Limiting

import time

protein_lists = [["TP53", "MDM2"], ["EGFR", "ERBB2"], ["BRCA1", "BRCA2"]]
results = []
for proteins in protein_lists:
    ids = "%0d".join(proteins)
    network = string_query("network", {"identifiers": ids, "species": 9606})
    results.append(network)
    time.sleep(1)  # respect rate limits

Recipe: Version Check for Reproducibility

version = string_query("version", {})
print(f"STRING version: {version.strip()}")
# Include in methods section: "STRING v{version}, accessed {date}"

Key Parameters

| Parameter | Endpoint | Default | Description | |-----------|----------|---------|-------------| | identifiers | All | — | Protein IDs, %0d-separated for URL or \n-separated for POST | | species | All | — | NCBI taxon ID (9606=human, 10090=mouse) | | required_score | network, partners, ppi_enrichment | 400 | Confidence threshold 0-1000 | | network_type | network | functional | functional (all evidence) or physical (direct binding) | | add_nodes | network, image | 0 | Additional connected proteins to include (0-10) | | limit | get_string_ids, partners | 1/10 | Max results per query | | network_flavor | image | evidence | evidence, confidence, or actions |

Troubleshooting

| Problem | Cause | Solution | |---------|-------|----------| | No proteins found | Wrong species or identifier typo | Verify species taxon ID; use get_string_ids to check identifier mapping | | Empty network | Too strict confidence threshold | Lower required_score; verify proteins actually interact in STRING | | Timeout on large queries | Too many proteins in single request | Split into batches of 50-100; use bulk downloads for proteome-scale | | "Species required" error | Missing species for >10 protein networks | Always include species parameter | | Unexpected results | Wrong network type or STRING version | Check network_type (functional vs physical); verify version with /version | | 400 Bad Request | Malformed identifiers | Use %0d separator in URL or \n in POST body; URL-encode special characters | | Enrichment returns no terms | Too few input proteins | Enrichment needs 5+ proteins for meaningful results |

Best Practices

Always map identifiers first — use get_string_ids() before other operations; STRING IDs (e.g., 9606.ENSP00000269305) are faster than gene names
Rate-limit all requests — add time.sleep(1) between API calls
Choose appropriate thresholds — 400 for exploratory analysis, 700 for publications, 900 for high-confidence only
Specify species explicitly — required for networks >10 proteins, recommended always
Use functional networks for pathway analysis and enrichment; physical networks for structural biology and direct binding
Include version in methods — check string_version() for reproducibility

Related Skills

networkx-graph-analysis — Graph analysis and visualization of STRING interaction networks
kegg-database — Pathway-centric queries complementary to STRING enrichment
bioservices-multi-database — Alternative access to STRING via the PSICQUIC interface

References

STRING website: https://string-db.org
API documentation: https://string-db.org/help/api/
Download page: https://string-db.org/cgi/download
Publications: https://string-db.org/cgi/about

Bundled Resources

Main SKILL.md + 1 reference file. Original total: 990 lines (SKILL.md 534 + string_reference.md 456). Scripts: 370 lines (string_api.py).

references/api_advanced.md: Advanced API features (values/ranks enrichment, bulk upload, R/Cytoscape integration), output format details, HTTP error codes, data license — content from original string_reference.md that exceeds Core API scope.

Original file disposition:

SKILL.md (534 lines) → Core API modules 1-7, Workflows 1-3, Quick Start helper function, Key Concepts (species table, score thresholds, network types). "Common Use Cases" per-operation subsections consolidated into Core API module descriptions (rule 7b): each operation's "When to use" and "Use cases" → Core API intro text. "Detailed Reference" stub section → removed, content consolidated inline
references/string_reference.md (456 lines) → Partially consolidated inline: API endpoints → Core API modules with code blocks; species table → Key Concepts; confidence scores → Key Concepts; identifier format → Key Concepts. Advanced features (values/ranks enrichment, bulk upload), integration examples (R STRINGdb, Cytoscape), output format details, HTTP error codes, data license → migrated to references/api_advanced.md
scripts/string_api.py (370 lines) → Helper function pattern absorbed into Quick Start (string_query reusable function). Per-function disposition: string_map_ids → Core API Module 1; string_network → Module 2; string_network_image → Module 3; string_interaction_partners → Module 4; string_enrichment → Module 5; string_ppi_enrichment → Module 6; string_homology → Module 7; string_version → Recipe. All were thin wrappers around urllib; replaced with requests-based string_query helper

Retention: ~460 lines (SKILL.md) + ~180 lines (reference) = ~640 / 990 original = ~65%.

STRING Database — Protein-Protein Interactions

Overview

When to Use

Retrieving protein-protein interaction networks for one or multiple proteins
Performing functional enrichment analysis (GO, KEGG, Pfam, InterPro) on protein lists
Discovering interaction partners and expanding protein networks from seed proteins
Testing whether a set of proteins forms a significantly enriched functional module
Generating network visualizations with evidence-based coloring
Analyzing homology and protein family relationships across species
Identifying hub proteins and network connectivity patterns
For chemical compound interactions use chembl-database-bioactivity instead; for pathway-centric queries use kegg-database

Prerequisites

uv pip install requests pandas

Rate limiting: No strict rate limit, but wait ~1 second between API calls. For proteome-scale analyses, use bulk downloads from https://string-db.org/cgi/download instead of the API.

Quick Start

import requests
import time

STRING_API = "https://string-db.org/api"

def string_query(endpoint, params, fmt="tsv"):
    """Reusable helper for all STRING API calls."""
    url = f"{STRING_API}/{fmt}/{endpoint}"
    params.setdefault("caller_identity", "python_script")
    response = requests.get(url, params=params)
    response.raise_for_status()
    return response.text

# Map gene names to STRING IDs (always do this first)
result = string_query("get_string_ids", {
    "identifiers": "TP53\nBRCA1\nEGFR",
    "species": 9606
})
print(result)

# Get interaction network
time.sleep(1)
network = string_query("network", {
    "identifiers": "TP53%0dBRCA1%0dMDM2",
    "species": 9606,
    "required_score": 400
})
print(network[:500])

Key Concepts

Common Species NCBI Taxon IDs

Full species list: https://string-db.org/cgi/input?input_page_active_form=organisms

STRING Identifier Format

Interaction Confidence Scores

Combined scores (0-1000) integrating 7 evidence channels:

Recommended thresholds:

150: Low confidence (exploratory, hypothesis generation)
400: Medium confidence (standard analysis, default)
700: High confidence (conservative, fewer false positives)
900: Highest confidence (very stringent, experimental evidence preferred)

Network Types

Functional (default): All evidence types — proteins functionally associated even without direct binding. Use for pathway analysis, enrichment, systems biology
Physical: Direct binding evidence only — experimental data and curated physical interactions. Use for structural studies, complex analysis

Output Formats

Replace /tsv/ in the URL with the desired format:

TSV: Tab-separated (default, best for data processing)
JSON: Structured data (/json/)
PNG/SVG: Network images (/image/)
PSI-MI/PSI-MITAB: Proteomics standard formats

Core API

1. Identifier Mapping

# Map gene names to STRING IDs
result = string_query("get_string_ids", {
    "identifiers": "TP53\nBRCA1\nEGFR",
    "species": 9606,
    "limit": 1,        # matches per identifier
    "echo_query": 1    # include query term in output
})

# Parse the mapping
import pandas as pd
import io
df = pd.read_csv(io.StringIO(result), sep='\t')
id_map = dict(zip(df['queryItem'], df['stringId']))
print(id_map)
# {'TP53': '9606.ENSP00000269305', 'BRCA1': '9606.ENSP00000...', ...}

2. Network Retrieval

# Get PPI network with confidence scores
network = string_query("network", {
    "identifiers": "TP53%0dBRCA1%0dMDM2%0dATM%0dCHEK2",
    "species": 9606,
    "required_score": 400,
    "network_type": "functional"  # or "physical"
})

# Parse network edges
time.sleep(1)
df = pd.read_csv(io.StringIO(network), sep='\t')
print(f"Found {len(df)} interactions")
print(df[['preferredName_A', 'preferredName_B', 'score']].head())

# Expand network with additional interactors
expanded = string_query("network", {
    "identifiers": "TP53",
    "species": 9606,
    "add_nodes": 10,  # add 10 most connected proteins
    "required_score": 700
})

3. Network Visualization

# Get PNG network image
url = f"{STRING_API}/image/network"
params = {
    "identifiers": "TP53%0dMDM2%0dATM%0dCHEK2%0dBRCA1",
    "species": 9606,
    "required_score": 700,
    "network_flavor": "evidence",  # "evidence", "confidence", or "actions"
    "caller_identity": "python_script"
}
response = requests.get(url, params=params)
with open("network.png", "wb") as f:
    f.write(response.content)

4. Interaction Partners

# Discover top interaction partners
partners = string_query("interaction_partners", {
    "identifiers": "TP53",
    "species": 9606,
    "limit": 20,
    "required_score": 700
})

df = pd.read_csv(io.StringIO(partners), sep='\t')
print(f"Top 20 TP53 interactors:")
print(df[['preferredName_B', 'score']].head(10))

5. Functional Enrichment

# GO, KEGG, Pfam, InterPro, SMART, UniProt Keywords enrichment
# Statistical method: Fisher's exact test with Benjamini-Hochberg FDR correction
enrichment = string_query("enrichment", {
    "identifiers": "TP53%0dMDM2%0dATM%0dCHEK2%0dBRCA1%0dATR%0dTP73",
    "species": 9606
})

df = pd.read_csv(io.StringIO(enrichment), sep='\t')
significant = df[df['fdr'] < 0.05]
print(f"Significant terms: {len(significant)}")

# Group by annotation category
for cat, group in significant.groupby('category'):
    print(f"\n{cat}: {len(group)} terms")
    for _, row in group.head(3).iterrows():
        print(f"  {row['description']} (FDR={row['fdr']:.2e})")

6. PPI Enrichment Testing

import json

# Test if proteins form a significant functional module
result = string_query("ppi_enrichment", {
    "identifiers": "TP53%0dMDM2%0dATM%0dCHEK2%0dBRCA1",
    "species": 9606,
    "required_score": 400
}, fmt="json")

data = json.loads(result)
print(f"Observed edges: {data['number_of_edges']}")
print(f"Expected edges: {data['expected_number_of_edges']}")
print(f"P-value: {data['p_value']}")
# p < 0.05 → proteins form a significantly enriched network

7. Homology Scores

# Get homology/similarity between proteins
homology = string_query("homology", {
    "identifiers": "TP53%0dTP63%0dTP73",
    "species": 9606
})
print(homology)

Common Workflows

Workflow 1: Protein List Analysis (Standard)

import requests, pandas as pd, io, json, time

STRING_API = "https://string-db.org/api"
def string_query(endpoint, params, fmt="tsv"):
    url = f"{STRING_API}/{fmt}/{endpoint}"
    params.setdefault("caller_identity", "python_script")
    response = requests.get(url, params=params)
    response.raise_for_status()
    time.sleep(1)
    return response.text

genes = "TP53%0dBRCA1%0dATM%0dCHEK2%0dMDM2%0dATR%0dBRCA2"

# Step 1: Map identifiers
mapping = string_query("get_string_ids", {"identifiers": genes.replace("%0d", "\n"), "species": 9606})

# Step 2: Get interaction network
network = string_query("network", {"identifiers": genes, "species": 9606, "required_score": 400})
net_df = pd.read_csv(io.StringIO(network), sep='\t')
print(f"Network: {len(net_df)} interactions")

# Step 3: Test PPI enrichment
ppi = json.loads(string_query("ppi_enrichment", {"identifiers": genes, "species": 9606}, fmt="json"))
print(f"PPI enrichment p-value: {ppi['p_value']}")

# Step 4: Functional enrichment
enrich = string_query("enrichment", {"identifiers": genes, "species": 9606})
enrich_df = pd.read_csv(io.StringIO(enrich), sep='\t')
sig = enrich_df[enrich_df['fdr'] < 0.05]
print(f"Significant GO/KEGG terms: {len(sig)}")

# Step 5: Save network image
img_resp = requests.get(f"{STRING_API}/image/network", params={
    "identifiers": genes, "species": 9606, "required_score": 400,
    "network_flavor": "evidence", "caller_identity": "python_script"
})
with open("protein_network.png", "wb") as f:
    f.write(img_resp.content)

Workflow 2: Network Expansion from Seed Proteins

# Start with seed proteins, discover connected functional modules
seed = "TP53"

# Step 1: Get high-confidence interaction partners
partners = string_query("interaction_partners", {
    "identifiers": seed, "species": 9606, "limit": 30, "required_score": 700
})
df = pd.read_csv(io.StringIO(partners), sep='\t')
all_proteins = list(set(df['preferredName_A'].tolist() + df['preferredName_B'].tolist()))
print(f"Expanded network: {len(all_proteins)} proteins")

# Step 2: Enrichment on expanded set
expanded_ids = "%0d".join(all_proteins[:50])
enrichment = string_query("enrichment", {"identifiers": expanded_ids, "species": 9606})
enrich_df = pd.read_csv(io.StringIO(enrichment), sep='\t')
modules = enrich_df[enrich_df['fdr'] < 0.001]
print(f"Highly significant terms: {len(modules)}")

Workflow 3: Cross-Species Comparison

# Compare protein interactions across species
for species, name, gene in [(9606, "Human", "TP53"), (10090, "Mouse", "Trp53")]:
    network = string_query("network", {
        "identifiers": gene, "species": species,
        "required_score": 700, "add_nodes": 5
    })
    df = pd.read_csv(io.StringIO(network), sep='\t')
    print(f"{name} ({gene}): {len(df)} interactions at score >= 700")

Common Recipes

Recipe: Parse Enrichment Results to DataFrame

import pandas as pd, io

enrichment_tsv = string_query("enrichment", {
    "identifiers": "TP53%0dBRCA1%0dATM", "species": 9606
})
df = pd.read_csv(io.StringIO(enrichment_tsv), sep='\t')
# Columns: category, term, description, number_of_genes, p_value, fdr
kegg = df[df['category'] == 'KEGG'].sort_values('fdr')
print(kegg[['description', 'fdr']].head(5))

Recipe: Batch Protein Queries with Rate Limiting

import time

protein_lists = [["TP53", "MDM2"], ["EGFR", "ERBB2"], ["BRCA1", "BRCA2"]]
results = []
for proteins in protein_lists:
    ids = "%0d".join(proteins)
    network = string_query("network", {"identifiers": ids, "species": 9606})
    results.append(network)
    time.sleep(1)  # respect rate limits

Recipe: Version Check for Reproducibility

version = string_query("version", {})
print(f"STRING version: {version.strip()}")
# Include in methods section: "STRING v{version}, accessed {date}"

Key Parameters

Troubleshooting

Best Practices

Always map identifiers first — use get_string_ids() before other operations; STRING IDs (e.g., 9606.ENSP00000269305) are faster than gene names
Rate-limit all requests — add time.sleep(1) between API calls
Choose appropriate thresholds — 400 for exploratory analysis, 700 for publications, 900 for high-confidence only
Specify species explicitly — required for networks >10 proteins, recommended always
Use functional networks for pathway analysis and enrichment; physical networks for structural biology and direct binding
Include version in methods — check string_version() for reproducibility

Related Skills

networkx-graph-analysis — Graph analysis and visualization of STRING interaction networks
kegg-database — Pathway-centric queries complementary to STRING enrichment
bioservices-multi-database — Alternative access to STRING via the PSICQUIC interface

References

STRING website: https://string-db.org
API documentation: https://string-db.org/help/api/
Download page: https://string-db.org/cgi/download
Publications: https://string-db.org/cgi/about

Bundled Resources

Main SKILL.md + 1 reference file. Original total: 990 lines (SKILL.md 534 + string_reference.md 456). Scripts: 370 lines (string_api.py).

Original file disposition:

SKILL.md (534 lines) → Core API modules 1-7, Workflows 1-3, Quick Start helper function, Key Concepts (species table, score thresholds, network types). "Common Use Cases" per-operation subsections consolidated into Core API module descriptions (rule 7b): each operation's "When to use" and "Use cases" → Core API intro text. "Detailed Reference" stub section → removed, content consolidated inline
references/string_reference.md (456 lines) → Partially consolidated inline: API endpoints → Core API modules with code blocks; species table → Key Concepts; confidence scores → Key Concepts; identifier format → Key Concepts. Advanced features (values/ranks enrichment, bulk upload), integration examples (R STRINGdb, Cytoscape), output format details, HTTP error codes, data license → migrated to references/api_advanced.md
scripts/string_api.py (370 lines) → Helper function pattern absorbed into Quick Start (string_query reusable function). Per-function disposition: string_map_ids → Core API Module 1; string_network → Module 2; string_network_image → Module 3; string_interaction_partners → Module 4; string_enrichment → Module 5; string_ppi_enrichment → Module 6; string_homology → Module 7; string_version → Recipe. All were thin wrappers around urllib; replaced with requests-based string_query helper

Retention: ~460 lines (SKILL.md) + ~180 lines (reference) = ~640 / 990 original = ~65%.

Adoption

jaechang-hits/string-database-ppi

$ install --global

Security Scan Results

SKILL.md

STRING Database — Protein-Protein Interactions

Overview

When to Use

Prerequisites

Quick Start

Key Concepts

Common Species NCBI Taxon IDs

STRING Identifier Format

Interaction Confidence Scores

Network Types

Output Formats

Core API

1. Identifier Mapping

2. Network Retrieval

3. Network Visualization

4. Interaction Partners

5. Functional Enrichment

6. PPI Enrichment Testing

7. Homology Scores

Common Workflows

Workflow 1: Protein List Analysis (Standard)

Workflow 2: Network Expansion from Seed Proteins

Workflow 3: Cross-Species Comparison

Common Recipes

Recipe: Parse Enrichment Results to DataFrame

Recipe: Batch Protein Queries with Rate Limiting

Recipe: Version Check for Reproducibility

Key Parameters

Troubleshooting

Best Practices

Related Skills

References

Bundled Resources

Related Skills

jaechang-hits/deseq2-differential-expression

jaechang-hits/vcf-variant-filtering

jaechang-hits/snpeff-variant-annotation

jaechang-hits/plink2-gwas-analysis

jaechang-hits/string-database-ppi

$ install --global

Security Scan Results

SKILL.md

STRING Database — Protein-Protein Interactions

Overview

When to Use

Prerequisites

Quick Start

Key Concepts

Common Species NCBI Taxon IDs

STRING Identifier Format

Interaction Confidence Scores

Network Types

Output Formats

Core API

1. Identifier Mapping

2. Network Retrieval

3. Network Visualization

4. Interaction Partners

5. Functional Enrichment

6. PPI Enrichment Testing

7. Homology Scores

Common Workflows

Workflow 1: Protein List Analysis (Standard)

Workflow 2: Network Expansion from Seed Proteins

Workflow 3: Cross-Species Comparison

Common Recipes

Recipe: Parse Enrichment Results to DataFrame

Recipe: Batch Protein Queries with Rate Limiting

Recipe: Version Check for Reproducibility

Key Parameters

Troubleshooting

Best Practices

Related Skills

References

Bundled Resources