skills/devtu-auto-discover-apis/SKILL.md
Automatically discover life science APIs online, create ToolUniverse tools, validate them, and prepare integration PRs. Performs gap analysis to identify missing tool categories, web searches for APIs, automated tool creation using devtu-create-tool patterns, validation with devtu-fix-tool, and git workflow management. Use when expanding ToolUniverse coverage, adding new API integrations, or systematically discovering scientific resources.
npx skillsauth add Zaoqu-Liu/ScienceClaw devtu-auto-discover-apisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Discover, create, validate, and integrate life science APIs into ToolUniverse through fully automated workflows with human review checkpoints.
Use this skill when:
Triggers: "find new APIs", "expand tool coverage", "discover missing tools", "add APIs for [domain]"
This skill orchestrates a complete pipeline:
Gap Analysis → API Discovery → Tool Creation → Validation → Integration
↓ ↓ ↓ ↓ ↓
Coverage Web Search devtu-create devtu-fix Git PR
Report + Docs patterns validation Ready
Automation Level: Fully automated with human approval gates at:
Authentication Handling: Supports public APIs, API keys, OAuth, and complex authentication schemes
Output: Working tool files (.py + .json), validation reports, discovery documentation, and integration-ready PRs
Objectives:
Key Activities:
Output: discovery_report.md with prioritized API candidates
Objectives:
Key Activities:
Output: .py and .json files for each tool
Objectives:
Key Activities:
python scripts/test_new_tools.py <tool> -vOutput: validation_report.md with pass/fail metrics
Objectives:
Key Activities:
feature/add-<api-name>-toolsOutput: Integration-ready PR for human review
Load and categorize existing tools:
Initialize ToolUniverse and load all tools
Extract tool names and descriptions
Categorize by domain using keywords:
Count tools per category
Calculate coverage percentages
Output: Coverage matrix with tool counts
Find underrepresented areas:
Gap Detection Criteria:
Common Gap Areas (as of 2026):
Prioritization Factors:
Search Strategy:
For each gap domain, execute multiple search queries:
Direct API Search:
Database Discovery:
Recent Releases:
Academic Sources:
Documentation Extraction:
For each discovered API:
Scoring Matrix (0-100 points):
| Criterion | Max Points | Evaluation | |-----------|------------|------------| | Documentation Quality | 20 | OpenAPI/Swagger=20, detailed docs=15, basic=10, poor=5 | | API Stability | 15 | Versioned+stable=15, versioned=10, unversioned=5 | | Authentication | 15 | Public/API-key=15, OAuth=10, complex=5 | | Coverage | 15 | Comprehensive=15, good=10, limited=5 | | Maintenance | 10 | Active (updates <6mo)=10, moderate=6, stale=2 | | Community | 10 | Popular (citations/stars)=10, moderate=6, unknown=2 | | License | 10 | Open/Academic=10, free commercial=7, restricted=3 | | Rate Limits | 5 | Generous=5, moderate=3, restrictive=1 |
Prioritization:
Report Structure:
# API Discovery Report
Generated: [Timestamp]
## Executive Summary
- Total APIs discovered: X
- High priority: Y
- Gap domains addressed: Z
## Coverage Analysis
[Table showing tool counts by category, gaps highlighted]
## Prioritized API Candidates
### High Priority
#### 1. [API Name]
- **Domain**: [Category]
- **Score**: [Points]/100
- **Base URL**: [URL]
- **Auth**: [Method]
- **Endpoints**: [Count]
- **Rationale**: [Why this fills a gap]
- **Example Operations**:
- Operation 1: Description
- Operation 2: Description
[Repeat for each high-priority API]
## Medium Priority
[Similar structure]
## Implementation Roadmap
1. Batch 1 (Week 1): [APIs]
2. Batch 2 (Week 2): [APIs]
## Appendix: Search Methodology
[Search queries used, sources consulted]
Decision Tree:
API has multiple endpoints?
├─ YES → Multi-operation tool (single class, multiple JSON wrappers)
└─ NO → Consider if more endpoints likely in future
├─ YES → Still use multi-operation (future-proof)
└─ NO → Single-operation acceptable
Multi-Operation Pattern (Recommended):
operation parameterFile Naming:
src/tooluniverse/[api_name]_tool.pysrc/tooluniverse/data/[api_name]_tools.json[api_category] (lowercase, underscores)Template Structure:
from typing import Dict, Any
from tooluniverse.tool import BaseTool
from tooluniverse.tool_utils import register_tool
import requests
import os
@register_tool("[APIName]Tool")
class [APIName]Tool(BaseTool):
"""Tool for [API Name] - [brief description]."""
BASE_URL = "[API base URL]"
def __init__(self, tool_config):
super().__init__(tool_config)
self.parameter = tool_config.get("parameter", {})
self.required = self.parameter.get("required", [])
# For optional API keys
self.api_key = os.environ.get("[API_KEY_NAME]", "")
def run(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Route to operation handler."""
operation = arguments.get("operation")
if not operation:
return {"status": "error", "error": "Missing required parameter: operation"}
# Route to handlers
if operation == "operation1":
return self._operation1(arguments)
elif operation == "operation2":
return self._operation2(arguments)
else:
return {"status": "error", "error": f"Unknown operation: {operation}"}
def _operation1(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Description of operation1."""
# Validate required parameters
param1 = arguments.get("param1")
if not param1:
return {"status": "error", "error": "Missing required parameter: param1"}
try:
# Build request
headers = {}
if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}"
# Make API call
response = requests.get(
f"{self.BASE_URL}/endpoint",
params={"param1": param1},
headers=headers,
timeout=30
)
response.raise_for_status()
# Parse response
data = response.json()
# Return with data wrapper
return {
"status": "success",
"data": data.get("results", []),
"metadata": {
"total": data.get("total", 0),
"source": "[API Name]"
}
}
except requests.exceptions.Timeout:
return {"status": "error", "error": "API timeout after 30 seconds"}
except requests.exceptions.HTTPError as e:
return {"status": "error", "error": f"HTTP {e.response.status_code}: {e.response.text[:200]}"}
except Exception as e:
return {"status": "error", "error": f"Unexpected error: {str(e)}"}
Critical Requirements:
{"status": "success|error", "data": {...}}run() methodTemplate Structure:
[
{
"name": "[APIName]_operation1",
"class": "[APIName]Tool",
"description": "[What it does]. Returns [data format]. [Input details]. Example: [usage example]. [Special notes].",
"parameter": {
"type": "object",
"required": ["operation", "param1"],
"properties": {
"operation": {
"const": "operation1",
"description": "Operation identifier (fixed)"
},
"param1": {
"type": "string",
"description": "Description of param1 with format/constraints"
}
}
},
"return_schema": {
"oneOf": [
{
"type": "object",
"properties": {
"data": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "string"},
"name": {"type": "string"}
}
}
},
"metadata": {
"type": "object",
"properties": {
"total": {"type": "integer"},
"source": {"type": "string"}
}
}
}
},
{
"type": "object",
"properties": {
"error": {"type": "string"}
},
"required": ["error"]
}
]
},
"test_examples": [
{
"operation": "operation1",
"param1": "real_value_from_api_docs"
}
]
}
]
Critical Requirements:
data wrapper requiredAuthentication Patterns:
# No special handling needed
response = requests.get(url, params=params, timeout=30)
# In __init__
self.api_key = os.environ.get("API_KEY_NAME", "")
# In JSON config
"optional_api_keys": ["API_KEY_NAME"],
"description": "... Rate limits: 3 req/sec without key, 10 req/sec with API_KEY_NAME."
# In request
headers = {}
if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}"
response = requests.get(url, headers=headers, timeout=30)
# In __init__
self.api_key = os.environ.get("API_KEY_NAME")
if not self.api_key:
raise ValueError("API_KEY_NAME environment variable required")
# In JSON config
"required_api_keys": ["API_KEY_NAME"]
# In request
headers = {"Authorization": f"Bearer {self.api_key}"}
response = requests.get(url, headers=headers, timeout=30)
# Document in skill: requires manual OAuth setup
# Store tokens in environment
# Implement token refresh logic
# Include example OAuth flow in documentation
Strategy: List → Get pattern
Example Process:
1. API docs show: GET /items → returns [{id: "ABC123", ...}]
2. Make request: curl https://api.example.com/items
3. Extract: id = "ABC123"
4. Verify: curl https://api.example.com/items/ABC123 → 200 OK
5. Use in test_examples: {"operation": "get_item", "item_id": "ABC123"}
Fallback: Search API documentation examples, tutorial code, or forum posts for real IDs
Add to src/tooluniverse/default_config.py:
TOOLS_CONFIGS = {
# ... existing entries ...
"[api_category]": os.path.join(current_dir, "data", "[api_name]_tools.json"),
}
Critical: This step is commonly missed! Tools won't load without it.
Check return_schema structure:
import json
with open("src/tooluniverse/data/[api_name]_tools.json") as f:
tools = json.load(f)
for tool in tools:
schema = tool.get("return_schema", {})
# Must have oneOf
assert "oneOf" in schema, f"{tool['name']}: Missing oneOf in return_schema"
# oneOf must have 2 schemas (success + error)
assert len(schema["oneOf"]) == 2, f"{tool['name']}: oneOf must have 2 schemas"
# Success schema must have 'data' field
success_schema = schema["oneOf"][0]
assert "properties" in success_schema, f"{tool['name']}: Missing properties in success schema"
assert "data" in success_schema["properties"], f"{tool['name']}: Missing 'data' field in success schema"
print(f"✅ {tool['name']}: Schema valid")
Check for placeholder values:
PLACEHOLDER_PATTERNS = [
"test", "dummy", "placeholder", "example", "sample",
"xxx", "temp", "fake", "mock", "your_"
]
for tool in tools:
examples = tool.get("test_examples", [])
for i, example in enumerate(examples):
for key, value in example.items():
if isinstance(value, str):
value_lower = value.lower()
if any(pattern in value_lower for pattern in PLACEHOLDER_PATTERNS):
print(f"❌ {tool['name']}: test_examples[{i}][{key}] contains placeholder: {value}")
else:
print(f"✅ {tool['name']}: test_examples[{i}][{key}] appears real")
Verify three-step registration:
import sys
sys.path.insert(0, 'src')
# Step 1: Check class registered
from tooluniverse.tool_registry import get_tool_registry
import tooluniverse.[api_name]_tool
registry = get_tool_registry()
assert "[APIName]Tool" in registry, "❌ Step 1 FAILED: Class not registered"
print("✅ Step 1: Class registered")
# Step 2: Check config registered
from tooluniverse.default_config import TOOLS_CONFIGS
assert "[api_category]" in TOOLS_CONFIGS, "❌ Step 2 FAILED: Config not in default_config.py"
print("✅ Step 2: Config registered")
# Step 3: Check wrappers generated
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
assert hasattr(tu.tools, '[APIName]_operation1'), "❌ Step 3 FAILED: Wrapper not generated"
print("✅ Step 3: Wrappers generated")
print("✅ All registration steps complete!")
Run test_new_tools.py:
# Test specific tools
python scripts/test_new_tools.py [api_name] -v
# Expected output:
# Testing [APIName]_operation1...
# ✅ PASS - Schema valid
#
# Results:
# Total: 3 tests
# Passed: 3 (100.0%)
# Failed: 0
# Schema invalid: 0
Handle failures:
Report Structure:
# Validation Report: [API Name]
Generated: [Timestamp]
## Summary
- Total tools: X
- Passed: Y (Z%)
- Failed: N
- Schema issues: M
## Tool Loading
- [x] Class registered in tool_registry
- [x] Config registered in default_config.py
- [x] Wrappers generated in tools/
## Schema Validation
- [x] All tools have oneOf structure
- [x] All success schemas have data wrapper
- [x] All error schemas have error field
## Test Examples
- [x] No placeholder values detected
- [x] All examples use real IDs
## Integration Tests
### [APIName]_operation1
- Status: ✅ PASS
- Response time: 1.2s
- Schema: Valid
### [APIName]_operation2
- Status: ✅ PASS
- Response time: 0.8s
- Schema: Valid
## Issues Found
None - all tests passing!
## devtu Compliance Checklist
1. [x] Tool Loading: Verified
2. [x] API Verification: Checked against docs
3. [x] Error Pattern Detection: None found
4. [x] Schema Validation: All valid
5. [x] Test Examples: All real IDs
6. [x] Parameter Verification: Matched API requirements
## Conclusion
All tools ready for integration.
# Create feature branch
git checkout -b feature/add-[api-name]-tools
# Verify clean state
git status
Commit structure:
# Stage tool files
git add src/tooluniverse/[api_name]_tool.py
git add src/tooluniverse/data/[api_name]_tools.json
git add src/tooluniverse/default_config.py
# Commit with descriptive message
git commit -m "$(cat <<'EOF'
Add [API Name] tools for [domain]
Implements X tools for [API Name] API:
- [APIName]_operation1: Description
- [APIName]_operation2: Description
- [APIName]_operation3: Description
API Details:
- Base URL: [URL]
- Authentication: [Method]
- Documentation: [URL]
Coverage:
- Addresses gap in [domain] tools
- Enables [use cases]
Validation:
- All tests passing (X/X passed)
- 100% schema validation
- Real test examples verified
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
EOF
)"
PR Template:
# Add [API Name] Tools
## Summary
Adds X new tools integrating the [API Name] API for [domain] research.
## Motivation
Current ToolUniverse has limited coverage in [domain]. These tools fill critical gaps:
- Gap 1: [Description]
- Gap 2: [Description]
## API Information
- **Name**: [API Name]
- **Base URL**: [URL]
- **Documentation**: [URL]
- **Authentication**: [Method]
- **Rate Limits**: [Details]
- **License**: [License type]
## Tools Added
| Tool Name | Operation | Description |
|-----------|-----------|-------------|
| [APIName]_operation1 | operation1 | [Description] |
| [APIName]_operation2 | operation2 | [Description] |
## Validation Results
✅ All tests passing (X/X passed)
✅ 100% schema validation
✅ Real test examples verified
✅ devtu compliance checklist complete
### Test Output
Testing [APIName] tools... Total: X tests Passed: X (100.0%) Failed: 0 Schema invalid: 0
## Files Changed
- `src/tooluniverse/[api_name]_tool.py` - Tool implementation
- `src/tooluniverse/data/[api_name]_tools.json` - Tool configurations
- `src/tooluniverse/default_config.py` - Registration
## Discovery & Prioritization
- **Discovery Score**: [Score]/100
- **Priority**: High
- **Rationale**: [Why this API was prioritized]
## Usage Examples
```python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# Example 1: [Operation 1]
result = tu.tools.[APIName]_operation1(
operation="operation1",
param1="value"
)
# Example 2: [Operation 2]
result = tu.tools.[APIName]_operation2(
operation="operation2",
param1="value"
)
[Any special considerations, limitations, or future enhancements]
Generated by devtu-auto-discover-apis skill Discovery Report: [link to discovery_report.md] Validation Report: [link to validation_report.md]
### Step 4.4: Push and Create PR
```bash
# Push branch to remote
git push -u origin feature/add-[api-name]-tools
# Create PR using gh CLI
gh pr create \
--title "Add [API Name] tools for [domain]" \
--body-file pr_description.md \
--label "enhancement,tools"
# Get PR URL
gh pr view --web
# config.yaml (optional)
discovery:
focus_domains:
- "metabolomics"
- "single-cell"
exclude_domains:
- "deprecated_category"
max_apis_per_batch: 5
search:
max_results_per_query: 20
include_academic_sources: true
date_filter: "2024-2026"
creation:
architecture: "multi-operation" # or "auto-detect"
include_async_support: true
timeout_seconds: 30
validation:
run_integration_tests: true
require_100_percent_pass: true
max_retries: 3
integration:
auto_create_pr: false # Require manual approval
branch_prefix: "feature/add-"
pr_labels: ["enhancement", "tools"]
Required:
Optional:
focus_domains: List of specific domains to prioritizeapi_names: Specific APIs to integrate (skip discovery)batch_size: Number of APIs to process in one runauto_approve: Skip human approval gates (not recommended)discovery_report.md
[api_name]_tool.py (per API)
[api_name]_tools.json (per API)
validation_report.md (per API)
pr_description.md (per batch)
Extended Reference: For detailed tool tables, examples, and templates, read
REFERENCE.mdin this skill directory. The agent can access it via:read skills/devtu-auto-discover-apis/REFERENCE.md
testing
Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.
tools
Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines.
development
Complete mass spectrometry analysis platform. Use for proteomics workflows feature detection, peptide identification, protein quantification, and complex LC-MS/MS pipelines. Supports extensive file formats and algorithms. Best for proteomics, comprehensive MS data processing. For simple spectral comparison and metabolite ID use matchms.
development
Multi-objective optimization framework. NSGA-II, NSGA-III, MOEA/D, Pareto fronts, constraint handling, benchmarks (ZDT, DTLZ), for engineering design and optimization problems.