drclaw/agent_hub/templates/biochemistry/skills/comprehensive-protein-analysis/SKILL.md
Comprehensive protein analysis combining InterProScan domain identification with BLAST similarity search to provide complete functional and evolutionary annotation.
npx skillsauth add qzzqzzb/drclaw comprehensive-protein-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use the same BioInfoToolsClient class as defined in the protein-blast-search skill.
This workflow combines InterProScan domain analysis with BLAST similarity search to provide a complete functional and evolutionary annotation of a protein sequence.
Workflow Steps:
Implementation:
from datetime import timedelta
## Initialize client
client = BioInfoToolsClient(
"https://scp.intern-ai.org.cn/api/v1/mcp/17/BioInfo-Tools",
"<your-api-key>"
)
if not await client.connect():
print("connection failed")
exit()
## Input: Protein sequence to analyze
protein_sequence = """
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
"""
sequence_id = "INS_HUMAN"
## Step 1, 2 & 3: Run comprehensive analysis (InterProScan + BLAST)
result = await client.session.call_tool(
"analyze_protein",
arguments={
"sequence": protein_sequence.strip(),
"sequence_id": sequence_id,
"databases": ["Pfam"], # InterProScan databases
"evalue": 1e-5, # BLAST E-value threshold (more stringent)
"max_hits": 10 # BLAST max hits
},
read_timeout_seconds=timedelta(seconds=1200) # Allow up to 20 minutes
)
## Step 4: Parse and display comprehensive results
result_data = client.parse_result(result)
print(f"{'='*80}")
print(f"Comprehensive Protein Analysis: {sequence_id}")
print(f"{'='*80}\n")
# InterProScan Results
ips_result = result_data.get("interproscan", {})
if ips_result.get("success"):
ips_data = ips_result.get("results", {})
domains = ips_data.get('domains', [])
go_terms = ips_data.get('go_terms', [])
print("=== DOMAIN ANALYSIS (InterProScan) ===")
print(f"Execution time: {ips_result.get('time_seconds', '?')} seconds")
print(f"Domains found: {len(domains)}")
print(f"GO annotations: {len(go_terms)}\n")
if domains:
print("Functional Domains:")
for domain in domains:
print(f" • {domain.get('name', 'N/A')} ({domain.get('database', 'N/A')})")
if domain.get('description'):
print(f" Description: {domain.get('description')}")
locations = domain.get('locations', [])
if locations:
loc = locations[0]
print(f" Position: {loc.get('start')}-{loc.get('end')} aa")
print()
if go_terms:
print("Gene Ontology Annotations:")
for go in go_terms[:5]: # Show top 5
print(f" • {go.get('id', 'N/A')}: {go.get('name', 'N/A')}")
print(f" Category: {go.get('category', 'N/A')}")
if len(go_terms) > 5:
print(f" ... and {len(go_terms) - 5} more")
print()
else:
print(f"❌ InterProScan failed: {ips_result.get('error', 'Unknown')}\n")
# BLAST Results
blast_result = result_data.get("blast", {})
if blast_result.get("success"):
hits = blast_result.get('hits', [])
print("=== HOMOLOGY SEARCH (BLAST) ===")
print(f"Execution time: {blast_result.get('time_seconds', '?')} seconds")
print(f"Similar sequences found: {blast_result.get('total_hits', 0)}")
print(f"E-value threshold: {1e-5}\n")
if hits:
print("Top Homologous Proteins:")
for i, hit in enumerate(hits[:5], 1):
print(f" {i}. {hit['uniprot_id']} - {hit.get('organism', 'N/A')}")
print(f" Description: {hit['description']}")
print(f" Identity: {hit['identity_percent']:.1f}%, E-value: {hit['evalue']:.2e}")
if len(hits) > 5:
print(f" ... and {len(hits) - 5} more matches")
print()
else:
print("No significant homologs found (E-value threshold may be too stringent)\n")
else:
print(f"❌ BLAST failed: {blast_result.get('error', 'Unknown')}\n")
# Summary
print("=== FUNCTIONAL SUMMARY ===")
if domains:
print(f"Protein Family: {domains[0].get('name', 'Unknown')}")
if hits:
most_similar = hits[0]
print(f"Most Similar Protein: {most_similar['uniprot_id']} ({most_similar['identity_percent']:.1f}% identity)")
print(f"Organism: {most_similar.get('organism', 'Unknown')}")
print(f"{'='*80}")
await client.disconnect()
BioInfo-Tools Server:
analyze_protein: Comprehensive protein analysis combining InterProScan and BLAST
sequence (str): Protein sequence in amino acid single-letter codesequence_id (str, optional): Identifier for the query sequencedatabases (list, optional): InterProScan databases (default: ["Pfam"])evalue (float, optional): BLAST E-value threshold (default: 0.01)max_hits (int, optional): Maximum BLAST hits (default: 10)interproscan (dict): InterProScan analysis results
success (bool): Whether InterProScan completedresults (dict): Domains and GO termstime_seconds (float): Execution timeblast (dict): BLAST search results
success (bool): Whether BLAST completedhits (list): Similar proteinstotal_hits (int): Number of matchestime_seconds (float): Execution timeInput:
sequence: Protein sequence (amino acid single-letter code)sequence_id: Optional identifier for the querydatabases: List of InterProScan databases to queryevalue: BLAST E-value threshold (lower = more stringent)max_hits: Maximum number of BLAST hits to returnOutput:
This comprehensive approach provides:
Structural Information (InterProScan):
Evolutionary Context (BLAST):
Functional Prediction:
content-media
当用户明确要求“写/生成 NSFC 预算说明书”“写预算说明”“生成 budget.tex / budget.pdf”“写国自然预算 justification”时使用。基于用户标书正文或补充材料,输出一份可提交的预算说明书 LaTeX 项目并渲染 `budget.pdf`。若用户未指定工作目录,必须暂停并先要求其指定。⚠️ 不适用:用户只是想了解预算原则;用户仅要预算表数字而不写说明书;或用户是 2026 青年 A/B/C 默认包干制且无需预算说明书的场景。
tools
当用户明确要求"写/润色 NSFC 标书摘要""生成中文摘要和英文摘要""把中文摘要翻译成英文摘要"时使用。输出中文、英文两个版本(英文必须是中文的忠实翻译版),同时输出标题建议(1个推荐标题+5个候选标题及理由)。中文摘要默认≤400字符,英文摘要默认≤4000字符。输出方式:将结果写入工作目录下的 `NSFC-ABSTRACTS.md`。⚠️ 不适用:用户只想翻译一段与标书无关的通用文本(应直接翻译);用户只想写立项依据/研究内容/研究基础正文(应使用对应 nsfc 系列 skill)。
documentation
当用户明确要求"更新项目指南""同步指南""沉淀洞见到指南"时使用。将对话中新产生的可复用写作洞见实时沉淀到项目指南文件,保持术语口径一致、结构稳定、可检验与可复现。调用时必须指定指南文件路径。
content-media
当用户明确要求"从文件/图片/网页/描述中提取综述主题"或"生成主题+关键词+核心问题结构化输出"时使用。支持文件(PDF/Word/Markdown/Tex)、文件夹、图片、自然语言描述、网页 URL 等多种输入源,自动识别输入类型并提取内容,生成可直接用于 systematic-literature-review 及其他文献综述技能的结构化输出。