skills/devtu-create-tool/SKILL.md
Create new scientific tools for ToolUniverse framework with proper structure, validation, and testing. Use when users need to add tools to ToolUniverse, implement new API integrations, create tool wrappers for scientific databases/services, expand ToolUniverse capabilities, or follow ToolUniverse contribution guidelines. Supports creating tool classes, JSON configurations, validation, error handling, and test examples.
npx skillsauth add Zaoqu-Liu/ScienceClaw devtu-create-toolInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Create new scientific tools for the ToolUniverse framework following established best practices.
default_config.py Entry - Tools silently won't loadtest_new_tools.py - Misses schema/API issuesMost Common (2026): Mistake #2 affects 60% of new tools. Always make mutually exclusive parameters nullable: {"type": ["string", "null"]}
| SDK User (Using) | Tool Creator (Building) |
|------------------|-------------------------|
| tu.tools.ToolName() | @register_tool() + JSON |
| Handle responses | Design schemas |
| One-level usage | Three-step registration |
Stage 1: Tool Class Stage 2: Wrappers (Auto-Generated)
Python Implementation From JSON Configs
↓ ↓
@register_tool("MyTool") MyAPI_list_items()
class MyTool(BaseTool): MyAPI_search()
def run(arguments): MyAPI_get_details()
Key Points:
Step 1: Class Registration
@register_tool("MyAPITool") # Decorator registers class
class MyAPITool(BaseTool):
pass
Step 2: Config Registration ⚠️ MOST COMMONLY MISSED
# In src/tooluniverse/default_config.py
TOOLS_CONFIGS = {
"my_category": os.path.join(current_dir, "data", "my_category_tools.json"),
}
Step 3: Wrapper Generation (Automatic)
tu = ToolUniverse()
tu.load_tools() # Auto-generates wrappers in tools/
Verification Script:
import sys
sys.path.insert(0, 'src')
# Step 1: Check class registered
from tooluniverse.tool_registry import get_tool_registry
import tooluniverse.your_tool_module
registry = get_tool_registry()
assert "YourToolClass" in registry, "❌ Step 1 FAILED"
print("✅ Step 1: Class registered")
# Step 2: Check config registered
from tooluniverse.default_config import TOOLS_CONFIGS
assert "your_category" in TOOLS_CONFIGS, "❌ Step 2 FAILED"
print("✅ Step 2: Config registered")
# Step 3: Check wrappers generated
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
assert hasattr(tu.tools, 'YourCategory_operation1'), "❌ Step 3 FAILED"
print("✅ Step 3: Wrappers generated")
print(f"✅ All steps complete!")
All tools must return:
{
"status": "success" | "error",
"data": {...}, // On success
"error": "message" // On failure
}
Why: Consistent error handling, composability, user expectations
Required Files:
src/tooluniverse/my_api_tool.py - Implementationsrc/tooluniverse/data/my_api_tools.json - Tool definitionstests/tools/test_my_api_tool.py - Testsexamples/my_api_examples.py - Usage examplesAuto-Generated (don't create manually):
src/tooluniverse/tools/MyAPI_*.py - WrappersPython Class:
from typing import Dict, Any
from tooluniverse.tool import BaseTool
from tooluniverse.tool_utils import register_tool
import requests
@register_tool("MyAPITool")
class MyAPITool(BaseTool):
"""Tool for MyAPI database."""
BASE_URL = "https://api.example.com/v1"
def __init__(self, tool_config):
super().__init__(tool_config)
self.parameter = tool_config.get("parameter", {})
self.required = self.parameter.get("required", [])
def run(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Route to operation handler."""
operation = arguments.get("operation")
if not operation:
return {"status": "error", "error": "Missing: operation"}
if operation == "list_items":
return self._list_items(arguments)
elif operation == "search":
return self._search(arguments)
else:
return {"status": "error", "error": f"Unknown: {operation}"}
def _list_items(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""List items with pagination."""
try:
params = {}
if "limit" in arguments:
params["limit"] = arguments["limit"]
response = requests.get(
f"{self.BASE_URL}/items",
params=params,
timeout=30
)
response.raise_for_status()
data = response.json()
return {
"status": "success",
"data": data.get("items", []),
"total": data.get("total", 0)
}
except requests.exceptions.Timeout:
return {"status": "error", "error": "Timeout after 30s"}
except requests.exceptions.HTTPError as e:
return {"status": "error", "error": f"HTTP {e.response.status_code}"}
except Exception as e:
return {"status": "error", "error": str(e)}
def _search(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Search items by query."""
query = arguments.get("query")
if not query:
return {"status": "error", "error": "Missing: query"}
try:
response = requests.get(
f"{self.BASE_URL}/search",
params={"q": query},
timeout=30
)
response.raise_for_status()
data = response.json()
return {
"status": "success",
"results": data.get("results", []),
"count": data.get("count", 0)
}
except requests.exceptions.RequestException as e:
return {"status": "error", "error": f"API failed: {str(e)}"}
JSON Configuration:
[
{
"name": "MyAPI_list_items",
"class": "MyAPITool",
"description": "List items from database with pagination. Returns item IDs and names. Supports filtering by status and type. Example: limit=10 returns first 10 items.",
"parameter": {
"type": "object",
"required": ["operation"],
"properties": {
"operation": {
"const": "list_items",
"description": "Operation type (fixed)"
},
"limit": {
"type": "integer",
"description": "Max results (1-100)",
"minimum": 1,
"maximum": 100
}
}
},
"return": {
"type": "object",
"properties": {
"status": {"type": "string", "enum": ["success", "error"]},
"data": {"type": "array"},
"total": {"type": "integer"},
"error": {"type": "string"}
},
"required": ["status"]
},
"test_examples": [
{
"operation": "list_items",
"limit": 10
}
]
}
]
import time
def _submit_job(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Submit job and poll for results."""
try:
# Submit
submit_response = requests.post(
f"{self.BASE_URL}/jobs/submit",
json={"data": arguments.get("data")},
timeout=30
)
submit_response.raise_for_status()
job_id = submit_response.json().get("job_id")
# Poll
for attempt in range(60): # 2 min max
status_response = requests.get(
f"{self.BASE_URL}/jobs/{job_id}/status",
timeout=30
)
status_response.raise_for_status()
result = status_response.json()
if result.get("status") == "completed":
return {
"status": "success",
"data": result.get("results"),
"job_id": job_id
}
elif result.get("status") == "failed":
return {
"status": "error",
"error": result.get("error"),
"job_id": job_id
}
time.sleep(2) # Poll every 2s
return {"status": "error", "error": "Timeout after 2 min"}
except requests.exceptions.RequestException as e:
return {"status": "error", "error": str(e)}
Tools can specify API key requirements in JSON config:
required_api_keys - Tool won't load without these:
{
"name": "NVIDIA_ESMFold_predict",
"required_api_keys": ["NVIDIA_API_KEY"],
...
}
optional_api_keys - Tool loads and works without keys, but with reduced performance:
{
"name": "PubMed_search_articles",
"optional_api_keys": ["NCBI_API_KEY"],
"description": "Search PubMed. Rate limits: 3 req/sec without key, 10 req/sec with NCBI_API_KEY.",
...
}
When to use optional_api_keys:
Implementation pattern for optional keys:
def __init__(self, tool_config):
super().__init__(tool_config)
# Read from environment variable only (not as parameter)
self.api_key = os.environ.get("NCBI_API_KEY", "")
def run(self, arguments):
# Adjust behavior based on key availability
has_key = bool(self.api_key)
rate_limit = 0.1 if has_key else 0.4 # Faster with key
...
Key rules:
api_key as a tool parameter for optional keysTool Naming (≤55 chars for MCP):
{API}_{action}_{target}FDA_get_drug_info (20 chars)FDA_get_detailed_drug_information_with_history (55+ chars)Description (150-250 chars, high-context):
{
"description": "Search database for items. Returns up to 100 results with scores. Supports wildcards (* ?) and Boolean operators (AND, OR, NOT). Example: 'protein AND membrane' finds membrane proteins."
}
Include: What it returns, data source, use case, input format, example
test_examples (MUST be real):
{
"test_examples": [
{
"operation": "search",
"query": "protein", // ✅ Real, common term
"limit": 10
}
]
}
❌ Don't use: "id": "XXXXX", "placeholder": "example_123"
✅ Do use: Real IDs from actual API documentation
"type" field in JSON must match the Python class name from @register_toolreturn_schema to define tool output structure (not just return)test_examples only; avoid examples blocks inside parameter/return_schema — they bloat configs and drift from realitydescription and enforce in Python (avoid giant schema enum)src/tooluniverse/ are auto-discovered; no __init__.py modification neededvalidate_parameters(arguments) for custom validationFor the full implementation plan, maintenance checklist, and large API expansion guidance, see references/implementation-guide.md
All new tools MUST pass scripts/test_new_tools.py before submission.
This script tests tools using their test_examples from JSON configs and validates responses against return_schema.
# Test your specific tools
python scripts/test_new_tools.py your_tool_name
# Test with verbose output
python scripts/test_new_tools.py your_tool_name -v
# Test all tools (for full validation)
python scripts/test_new_tools.py
# Stop on first failure
python scripts/test_new_tools.py your_tool_name --fail-fast
What it validates:
return_schema (if defined)Common failures and fixes: | Failure | Cause | Fix | |---------|-------|-----| | 404 ERROR | Invalid ID in test_examples | Use real IDs from API docs | | Schema Mismatch | Response doesn't match return_schema | Update schema or fix response format | | Exception | Code bug or missing dependency | Check error message, fix implementation |
Level 1: Direct Class Testing
import json
from tooluniverse.your_tool_module import YourToolClass
def test_direct_class():
"""Test implementation logic."""
with open("src/tooluniverse/data/your_tools.json") as f:
tools = json.load(f)
config = next(t for t in tools if t["name"] == "YourTool_operation1")
tool = YourToolClass(config)
result = tool.run({"operation": "operation1", "param": "value"})
assert result["status"] == "success"
assert "data" in result
Level 2: ToolUniverse Interface Testing
import pytest
from tooluniverse import ToolUniverse
class TestYourTools:
@pytest.fixture
def tu(self):
tu = ToolUniverse()
tu.load_tools() # CRITICAL
return tu
def test_tools_load(self, tu):
"""Verify registration."""
assert hasattr(tu.tools, 'YourTool_operation1')
def test_execution(self, tu):
"""Test via ToolUniverse (how users call it)."""
result = tu.tools.YourTool_operation1(**{
"operation": "operation1",
"param": "value"
})
assert result["status"] == "success"
def test_error_handling(self, tu):
"""Test missing params."""
result = tu.tools.YourTool_operation1(**{
"operation": "operation1"
# Missing required param
})
assert result["status"] == "error"
Level 3: Real API Testing
def test_real_api():
"""Verify actual API integration."""
tu = ToolUniverse()
tu.load_tools()
result = tu.tools.YourTool_operation1(**{
"operation": "operation1",
"param": "real_value_from_docs"
})
if result["status"] == "success":
assert "data" in result
print("✅ Real API works")
else:
print(f"⚠️ API error (may be down): {result['error']}")
Why Both Levels:
When creating multiple tools at once (e.g., 5-15 tools for an API), use this systematic approach:
Step 1: Sample Testing (Quick validation)
# Test 1-2 tools per API to catch common issues
python scripts/test_new_tools.py MyAPI_get_item -v
python scripts/test_new_tools.py MyAPI_search -v
Step 2: Identify Patterns
Group errors by type:
Step 3: Fix Systematically
Fix all tools with same issue together:
# Fix all nullable parameter issues in JSON
# Then regenerate once:
python -m tooluniverse.generate_tools
Step 4: Verify All
Test all tools comprehensively:
# Test entire tool set
python scripts/test_new_tools.py MyAPI -v
# Verify count
python3 << 'EOF'
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
my_tools = [t for t in dir(tu.tools) if t.startswith('MyAPI_')]
print(f"✅ {len(my_tools)} MyAPI tools loaded")
EOF
Step 5: Verify Parameter Names
Before testing, verify parameter names match the config:
import json
with open('src/tooluniverse/data/myapi_tools.json') as f:
tools = json.load(f)
for tool in tools:
print(f"{tool['name']}: {list(tool['parameter']['properties'].keys())}")
Don't assume parameter names like query, id, taxon - always verify!
Step 6: Data Structure Verification
Test understanding of return data structure:
result = tu.tools.MyAPI_get_item(id="123")
# For object data
if isinstance(result.get('data'), dict):
print("✅ Returns single object")
value = result['data'].get('field')
# For array data
elif isinstance(result.get('data'), list):
print("✅ Returns array of items")
count = len(result['data'])
first = result['data'][0] if result['data'] else {}
✅ Always set timeout (30s recommended) ✅ Catch specific exceptions (Timeout, ConnectionError, HTTPError) ✅ Return error dicts, never raise in run() ✅ Include helpful context in error messages ✅ Handle JSON parsing errors ✅ Validate required parameters
When creating tools that accept EITHER parameterA OR parameterB, both parameters MUST be nullable.
❌ WRONG - Will cause validation errors:
{
"parameter": {
"type": "object",
"properties": {
"id": {
"type": "integer",
"description": "Numeric ID"
},
"name": {
"type": "string",
"description": "Name string (alternative to id)"
}
}
}
}
Problem: When user provides only name, validation fails because id is None and not of type integer.
✅ CORRECT - Make mutually exclusive parameters nullable:
{
"parameter": {
"type": "object",
"properties": {
"id": {
"type": ["integer", "null"],
"description": "Numeric ID"
},
"name": {
"type": ["string", "null"],
"description": "Name string (alternative to id)"
}
}
}
}
Common patterns requiring nullable parameters:
id OR name (get by ID or by name)acronym OR name (search by symbol or full name)gene_id OR gene_symbolneuron_id OR neuron_namefilter_field, filter_value)All optional parameters should be nullable:
{
"filter_field": {
"type": ["string", "null"],
"description": "Optional filter field"
},
"limit": {
"type": ["integer", "null"],
"description": "Optional result limit",
"default": 10
}
}
Use consistent, descriptive parameter names:
gene_id, gene_symbol, tax_id - Clear and specificid, query, q - Too generic, causes confusionCheck API documentation for parameter names:
Use REAL IDs in test_examples, never placeholders:
{
"test_examples": [
{
"gene_id": "7157" // ✅ Real TP53 gene ID
},
{
"gene_symbol": "BRCA1", // ✅ Real gene symbol
"taxon": "9606" // ✅ Real human tax ID
}
]
}
❌ AVOID:
{
"test_examples": [
{
"gene_id": "TEST123" // ❌ Fake ID
},
{
"gene_id": "example_id" // ❌ Placeholder
}
]
}
Document whether data is object, array, or string:
{
"name": "MyAPI_get_item",
"description": "Get single item by ID. Returns object with item details.",
"return_schema": {
"oneOf": [
{
"type": "object",
"properties": {
"data": {
"type": "object", // ← Specify: single object
"properties": {
"id": {"type": "string"},
"name": {"type": "string"}
}
}
}
}
]
}
}
{
"name": "MyAPI_search",
"description": "Search items. Returns array of matching items.",
"return_schema": {
"oneOf": [
{
"type": "object",
"properties": {
"data": {
"type": "array", // ← Specify: array of results
"items": {
"type": "object",
"properties": {
"id": {"type": "string"}
}
}
}
}
}
]
}
}
Check package size FIRST:
curl -s https://pypi.org/pypi/PACKAGE/json | python3 -c "
import json, sys
data = json.load(sys.stdin)
print(f'Dependencies: {len(data[\"info\"][\"requires_dist\"] or [])}')
"
Classification:
[project.dependencies][project.optional-dependencies]In code:
try:
import optional_package
except ImportError:
return {
"status": "error",
"error": "Install with: pip install optional_package"
}
def _list_items(self, arguments):
params = {}
if "page" in arguments:
params["page"] = arguments["page"]
if "limit" in arguments:
params["limit"] = arguments["limit"]
response = requests.get(url, params=params, timeout=30)
data = response.json()
return {
"status": "success",
"data": data.get("items", []),
"page": data.get("page", 0),
"total_pages": data.get("total_pages", 1),
"total_items": data.get("total", 0)
}
Symptoms: Tool count doesn't increase, no error, AttributeError when calling
Cause: Missing Step 2 of registration (default_config.py)
Solution:
# Edit src/tooluniverse/default_config.py
TOOLS_CONFIGS = {
# ... existing ...
"your_category": os.path.join(current_dir, "data", "your_category_tools.json"),
}
Verify:
grep "your_category" src/tooluniverse/default_config.py
ls src/tooluniverse/tools/YourCategory_*.py
python3 -c "from tooluniverse import ToolUniverse; tu = ToolUniverse(); tu.load_tools(); print(hasattr(tu.tools, 'YourCategory_op1'))"
Mock vs Real Testing:
What Real Testing Catches:
@register_toolreturn_schemadefault_config.py ← CRITICALtu.load_tools()python scripts/test_new_tools.py your_tool -v ← MANDATORYpython scripts/check_tool_name_lengths.py --test-shorteningFor the full development checklist and maintenance phases, see references/implementation-guide.md
# Validate JSON
python3 -m json.tool src/tooluniverse/data/your_tools.json
# Check Python syntax
python3 -m py_compile src/tooluniverse/your_tool.py
# Verify registration
grep "your_category" src/tooluniverse/default_config.py
# Generate wrappers
PYTHONPATH=src python3 -m tooluniverse.generate_tools --force
# List wrappers
ls src/tooluniverse/tools/YourCategory_*.py
# Run unit tests
pytest tests/tools/test_your_tool.py -v
# MANDATORY: Run test_new_tools.py validation
python scripts/test_new_tools.py your_tool -v
# Count tools
python3 << 'EOF'
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
print(f"Total: {len([t for t in dir(tu.tools) if 'YourCategory' in t])} tools")
EOF
⚠️ ALWAYS add to default_config.py (Step 2)
⚠️ NEVER raise exceptions in run()
⚠️ ALWAYS use real test_examples
⚠️ ALWAYS test both levels
⚠️ RUN python scripts/test_new_tools.py your_tool -v before submitting
⚠️ KEEP tool names ≤55 characters
⚠️ RETURN standard response format
⚠️ SET timeout on all HTTP requests
⚠️ VERIFY all 3 registration steps
⚠️ USE optional_api_keys for rate-limited APIs that work without keys
⚠️ NEVER add api_key parameter for optional keys - use env vars only
✅ All 3 registration steps verified
✅ Level 1 tests passing (direct class)
✅ Level 2 tests passing (ToolUniverse interface)
✅ Real API calls working (Level 3)
✅ test_new_tools.py passes with 0 failures ← MANDATORY
✅ Tool names ≤55 characters
✅ test_examples use real IDs
✅ Standard response format used
✅ Helpful error messages
✅ Examples file created
✅ No raised exceptions in run()
When all criteria met → Production Ready 🎉
testing
Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.
tools
Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines.
development
Complete mass spectrometry analysis platform. Use for proteomics workflows feature detection, peptide identification, protein quantification, and complex LC-MS/MS pipelines. Supports extensive file formats and algorithms. Best for proteomics, comprehensive MS data processing. For simple spectral comparison and metabolite ID use matchms.
development
Multi-objective optimization framework. NSGA-II, NSGA-III, MOEA/D, Pareto fronts, constraint handling, benchmarks (ZDT, DTLZ), for engineering design and optimization problems.