drclaw/agent_hub/templates/biochemistry/skills/interproscan-domain-analysis/SKILL.md
Analyze protein sequences using InterProScan to identify functional domains, protein families, and Gene Ontology (GO) annotations.
npx skillsauth add qzzqzzb/drclaw interproscan-domain-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use the same BioInfoToolsClient class as defined in the protein-blast-search skill.
This workflow analyzes protein sequences using InterProScan to identify functional domains, protein families, binding sites, and associated Gene Ontology annotations.
Workflow Steps:
Implementation:
from datetime import timedelta
## Initialize client
client = BioInfoToolsClient(
"https://scp.intern-ai.org.cn/api/v1/mcp/17/BioInfo-Tools",
"<your-api-key>"
)
if not await client.connect():
print("connection failed")
exit()
## Input: Protein sequence to analyze
protein_sequence = """
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
"""
## Step 1 & 2: Run InterProScan analysis
result = await client.session.call_tool(
"interproscan_analyze",
arguments={
"sequence": protein_sequence.strip(),
"sequence_id": "HBB_HUMAN", # Optional identifier
"databases": ["Pfam"], # Signature databases to use
"goterms": True # Include GO term annotations
},
read_timeout_seconds=timedelta(seconds=900) # Allow up to 15 minutes
)
## Step 3: Parse and display results
result_data = client.parse_result(result)
if result_data.get("success"):
results = result_data.get("results", {})
domains = results.get("domains", [])
go_terms = results.get("go_terms", [])
print(f"✅ InterProScan analysis completed successfully")
print(f"Execution time: {result_data.get('time_seconds', '?')} seconds")
print(f"Domains found: {len(domains)}")
print(f"GO annotations: {len(go_terms)}\n")
# Display domain information
if domains:
print("=== Functional Domains ===\n")
for i, domain in enumerate(domains, 1):
print(f"{i}. {domain.get('name', 'N/A')}")
print(f" Accession: {domain.get('accession', 'N/A')}")
print(f" Database: {domain.get('database', 'N/A')}")
if domain.get('description'):
print(f" Description: {domain.get('description')}")
# Display domain locations
locations = domain.get('locations', [])
if locations:
print(f" Locations:")
for loc in locations:
print(f" - Position {loc.get('start')}-{loc.get('end')} aa")
if loc.get('score'):
print(f" Score: {loc.get('score')}")
print()
# Display GO annotations
if go_terms:
print("=== Gene Ontology Annotations ===\n")
# Group by category
by_category = {}
for go in go_terms:
category = go.get('category', 'UNKNOWN')
if category not in by_category:
by_category[category] = []
by_category[category].append(go)
for category, terms in by_category.items():
print(f"{category}:")
for go in terms:
print(f" - {go.get('id', 'N/A')}: {go.get('name', 'N/A')}")
print()
else:
print(f"❌ InterProScan analysis failed: {result_data.get('error', 'Unknown error')}")
await client.disconnect()
BioInfo-Tools Server:
interproscan_analyze: Analyze protein sequence using InterProScan
sequence (str): Protein sequence in amino acid single-letter codesequence_id (str, optional): Identifier for the query sequencedatabases (list, optional): Signature databases to query (default: ["Pfam"])goterms (bool, optional): Include GO term annotations (default: True)success (bool): Whether analysis completed successfullyresults (dict): Analysis results containing domains and GO termstime_seconds (float): Execution timeInput:
sequence: Protein sequence (amino acid single-letter code)sequence_id: Optional identifier for the querydatabases: List of signature databases (e.g., ["Pfam", "SMART", "PRINTS"])goterms: Whether to include Gene Ontology annotationsOutput:
domains: List of identified protein domains, each containing:
name: Domain or family nameaccession: Database accession numberdatabase: Source database (e.g., "PFAM", "SMART")description: Functional descriptionlocations: List of domain positions in the sequence
start: Start position (amino acid number)end: End position (amino acid number)score: Match score (if available)go_terms: List of GO annotations, each containing:
id: GO identifier (e.g., "GO:0020037")name: GO term namecategory: GO category (MOLECULAR_FUNCTION, BIOLOGICAL_PROCESS, or CELLULAR_COMPONENT)InterProScan integrates multiple signature databases:
Default: ["Pfam"] for fastest results
content-media
当用户明确要求“写/生成 NSFC 预算说明书”“写预算说明”“生成 budget.tex / budget.pdf”“写国自然预算 justification”时使用。基于用户标书正文或补充材料,输出一份可提交的预算说明书 LaTeX 项目并渲染 `budget.pdf`。若用户未指定工作目录,必须暂停并先要求其指定。⚠️ 不适用:用户只是想了解预算原则;用户仅要预算表数字而不写说明书;或用户是 2026 青年 A/B/C 默认包干制且无需预算说明书的场景。
tools
当用户明确要求"写/润色 NSFC 标书摘要""生成中文摘要和英文摘要""把中文摘要翻译成英文摘要"时使用。输出中文、英文两个版本(英文必须是中文的忠实翻译版),同时输出标题建议(1个推荐标题+5个候选标题及理由)。中文摘要默认≤400字符,英文摘要默认≤4000字符。输出方式:将结果写入工作目录下的 `NSFC-ABSTRACTS.md`。⚠️ 不适用:用户只想翻译一段与标书无关的通用文本(应直接翻译);用户只想写立项依据/研究内容/研究基础正文(应使用对应 nsfc 系列 skill)。
documentation
当用户明确要求"更新项目指南""同步指南""沉淀洞见到指南"时使用。将对话中新产生的可复用写作洞见实时沉淀到项目指南文件,保持术语口径一致、结构稳定、可检验与可复现。调用时必须指定指南文件路径。
content-media
当用户明确要求"从文件/图片/网页/描述中提取综述主题"或"生成主题+关键词+核心问题结构化输出"时使用。支持文件(PDF/Word/Markdown/Tex)、文件夹、图片、自然语言描述、网页 URL 等多种输入源,自动识别输入类型并提取内容,生成可直接用于 systematic-literature-review 及其他文献综述技能的结构化输出。