skills/skillxiv-v0.0.2-claude-opus-4.6/automated-tool-learning-rl/SKILL.md
Improves LLM tool-use capabilities through automated environment construction that generates realistic feedback and verifiable rewards for RL-based training without external tools.
npx skillsauth add ADu2021/skillXiv automated-tool-learning-rlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill improves LLM tool-use abilities by creating automated training environments that generate detailed, verifiable feedback without requiring external tool access. The system constructs realistic task environments through scenario decomposition, document generation, and function integration, then trains models using RL with rewards that evaluate both tool precision and task completion.
Step 1: Design Scenario Decomposition
Break tasks into learnable subtasks:
# Pseudocode for scenario decomposition
class ScenarioDecomposer:
def __init__(self):
super().__init__()
self.task_library = {}
def decompose_task(self, complex_task):
"""
Break complex task into subtasks.
Args:
complex_task: High-level task description
Returns:
subtasks: List of learnable subtasks
"""
subtasks = []
# Parse task intent
intent = self._parse_intent(complex_task)
# Identify required tools
required_tools = self._identify_required_tools(intent)
# Decompose into steps
for tool in required_tools:
subtask = {
'tool': tool,
'prerequisites': self._get_prerequisites(tool),
'success_criteria': self._define_success(tool),
'complexity': self._estimate_complexity(tool)
}
subtasks.append(subtask)
return subtasks
def _parse_intent(self, task):
"""
Extract task intent from description.
"""
# Use NLP or simple pattern matching
keywords = {}
for token in task.lower().split():
keywords[token] = keywords.get(token, 0) + 1
return keywords
def _identify_required_tools(self, intent):
"""
Determine which tools are needed.
"""
tool_keywords = {
'file': ['read', 'write', 'open', 'save', 'create'],
'api': ['fetch', 'request', 'query', 'retrieve'],
'database': ['query', 'insert', 'update', 'select'],
'compute': ['calculate', 'process', 'analyze'],
}
required = []
for tool, keywords in tool_keywords.items():
if any(kw in intent for kw in keywords):
required.append(tool)
return required
def _get_prerequisites(self, tool):
"""
Get prerequisites for tool use.
"""
prereq_map = {
'file': ['file_path'],
'api': ['endpoint', 'credentials'],
'database': ['connection_string'],
'compute': ['input_data']
}
return prereq_map.get(tool, [])
def _define_success(self, tool):
"""
Define success criteria for tool use.
"""
criteria = {
'file': ['file_exists', 'content_correct'],
'api': ['status_code_200', 'response_valid'],
'database': ['rows_affected > 0', 'query_valid'],
'compute': ['result_accurate', 'performance_ok']
}
return criteria.get(tool, [])
def _estimate_complexity(self, tool):
"""
Estimate learning complexity.
"""
complexity = {
'file': 1,
'api': 2,
'database': 3,
'compute': 2
}
return complexity.get(tool, 2)
Step 2: Implement Document Generation
Create tool documentation automatically:
# Pseudocode for automated documentation
class DocumentationGenerator:
def __init__(self):
super().__init__()
self.doc_templates = {}
def generate_tool_documentation(self, tool_name, tool_spec):
"""
Generate documentation for tool.
Args:
tool_name: Name of tool
tool_spec: Tool specification
Returns:
documentation: Generated markdown documentation
"""
doc = f"# {tool_name.title()} Tool\n\n"
# Overview
doc += f"## Overview\n{tool_spec.get('description', 'Tool for ...')}\n\n"
# Parameters
doc += "## Parameters\n"
for param, spec in tool_spec.get('parameters', {}).items():
doc += f"- **{param}** ({spec.get('type', 'string')}): {spec.get('description', '')}\n"
# Return value
doc += f"\n## Returns\n{tool_spec.get('return_description', 'Result object')}\n"
# Examples
doc += f"\n## Examples\n```\n{self._generate_example(tool_name, tool_spec)}\n```\n"
# Common errors
doc += f"\n## Common Errors\n"
for error in tool_spec.get('errors', []):
doc += f"- {error}\n"
return doc
def _generate_example(self, tool_name, spec):
"""
Generate example usage.
"""
example = f"result = {tool_name}("
params = []
for param, pspec in spec.get('parameters', {}).items():
example_val = pspec.get('example', 'value')
params.append(f"{param}={example_val}")
example += ', '.join(params) + ")"
return example
def generate_api_spec(self, tool_name, endpoints):
"""
Generate API specification.
Args:
tool_name: API name
endpoints: List of endpoints
Returns:
spec: API documentation
"""
spec = f"# {tool_name} API Specification\n\n"
for endpoint in endpoints:
spec += f"## {endpoint['method']} {endpoint['path']}\n"
spec += f"{endpoint.get('description', '')}\n"
spec += f"**Parameters**: {endpoint.get('params', [])}\n"
spec += f"**Response**: {endpoint.get('response', {})}\n\n"
return spec
Step 3: Build Function Integration Layer
Create actual tool implementations:
# Pseudocode for function integration
class FunctionIntegration:
def __init__(self):
super().__init__()
self.tool_functions = {}
self.tool_results = {}
def integrate_file_operations(self):
"""
Integrate file operation tools.
"""
def read_file(file_path: str) -> str:
"""Read file content."""
try:
with open(file_path, 'r') as f:
return f.read()
except Exception as e:
return f"Error: {str(e)}"
def write_file(file_path: str, content: str) -> bool:
"""Write content to file."""
try:
with open(file_path, 'w') as f:
f.write(content)
return True
except Exception as e:
return False
self.tool_functions['read_file'] = read_file
self.tool_functions['write_file'] = write_file
def integrate_api_calls(self):
"""
Integrate API calling tools.
"""
def api_request(method: str, endpoint: str, params: dict = None):
"""Make API request."""
import requests
try:
if method.upper() == 'GET':
response = requests.get(endpoint, params=params)
elif method.upper() == 'POST':
response = requests.post(endpoint, json=params)
else:
return {'error': f'Unknown method: {method}'}
return {
'status_code': response.status_code,
'body': response.json(),
'success': response.status_code == 200
}
except Exception as e:
return {'error': str(e), 'success': False}
self.tool_functions['api_request'] = api_request
def integrate_database_tools(self):
"""
Integrate database tools.
"""
def query_database(connection_str: str, query: str):
"""Execute database query."""
# Simplified implementation
return {
'rows_affected': 1,
'result': [],
'success': True
}
self.tool_functions['query_database'] = query_database
def execute_tool(self, tool_name: str, **kwargs):
"""
Execute integrated tool.
Args:
tool_name: Name of tool
**kwargs: Tool parameters
Returns:
result: Tool execution result
"""
if tool_name not in self.tool_functions:
return {'error': f'Tool not found: {tool_name}'}
tool_fn = self.tool_functions[tool_name]
try:
result = tool_fn(**kwargs)
self.tool_results[tool_name] = result
return result
except Exception as e:
return {'error': str(e)}
Step 4: Implement Verifiable Reward Mechanism
Design reward that evaluates tool use:
# Pseudocode for reward computation
class VerifiableRewardMechanism:
def __init__(self):
super().__init__()
self.task_success_evaluator = None
def compute_tool_use_reward(self, action, tool_result, task_state, task_goal):
"""
Compute reward for tool use action.
Args:
action: Tool call with parameters
tool_result: Result from tool execution
task_state: Current task state
task_goal: Goal to achieve
Returns:
reward: Scalar reward value
"""
# Component 1: Tool parameter precision
parameter_reward = self._evaluate_parameter_precision(action)
# Component 2: Tool execution success
execution_reward = self._evaluate_execution_success(tool_result)
# Component 3: Progress toward goal
progress_reward = self._evaluate_progress(tool_result, task_state, task_goal)
# Component 4: Efficiency (not using unnecessary tools)
efficiency_reward = self._evaluate_efficiency(action, task_state)
# Combined reward
total_reward = (
0.25 * parameter_reward +
0.25 * execution_reward +
0.35 * progress_reward +
0.15 * efficiency_reward
)
return total_reward
def _evaluate_parameter_precision(self, action):
"""
Score parameter correctness.
"""
# Check if parameters are well-formed
params = action.get('parameters', {})
valid_params = 0
total_params = len(params)
for param_name, param_value in params.items():
if self._is_valid_parameter(param_name, param_value):
valid_params += 1
if total_params == 0:
return 1.0
return valid_params / total_params
def _is_valid_parameter(self, param_name, param_value):
"""
Check if parameter is valid.
"""
# Simple validation
if param_name and param_value is not None:
return True
return False
def _evaluate_execution_success(self, tool_result):
"""
Score execution outcome.
"""
if not tool_result:
return 0.0
if tool_result.get('success'):
return 1.0
elif tool_result.get('status_code') == 200:
return 1.0
elif 'error' in tool_result:
return 0.0
else:
return 0.5
def _evaluate_progress(self, tool_result, task_state, task_goal):
"""
Measure progress toward goal.
"""
# Check if result brings us closer to goal
task_components = task_goal.split()
result_str = str(tool_result)
matching_components = sum(
1 for component in task_components
if component.lower() in result_str.lower()
)
if len(task_components) == 0:
return 0.5
return matching_components / len(task_components)
def _evaluate_efficiency(self, action, task_state):
"""
Reward efficient tool use.
"""
# Penalize redundant calls
tool_name = action.get('tool')
if task_state.get('last_tool') == tool_name:
return 0.5 # Penalize repeated calls
return 1.0
def compute_trajectory_reward(self, trajectory, task_goal):
"""
Compute reward for complete trajectory.
Args:
trajectory: List of (action, result) pairs
task_goal: Goal statement
Returns:
reward: Total trajectory reward
"""
step_rewards = []
task_state = {}
for action, result in trajectory:
step_reward = self.compute_tool_use_reward(
action,
result,
task_state,
task_goal
)
step_rewards.append(step_reward)
task_state['last_tool'] = action.get('tool')
# Compute trajectory return
trajectory_reward = sum(
(0.99 ** i) * r for i, r in enumerate(step_rewards)
)
return trajectory_reward
Step 5: Implement RL Training Loop
Train tool use through RL:
# Pseudocode for RL training
class ToolUseRLTrainer:
def __init__(self, model, tool_integration, reward_fn):
super().__init__()
self.model = model
self.tools = tool_integration
self.reward_fn = reward_fn
def collect_trajectory(self, task_goal, max_steps=10):
"""
Collect trajectory using model and tools.
Returns:
trajectory: List of (action, result, reward) tuples
"""
trajectory = []
task_state = {}
cumulative_reward = 0
for step in range(max_steps):
# Model decides next action
action = self.model.decide_action(task_goal, task_state)
# Execute tool
result = self.tools.execute_tool(
action['tool'],
**action.get('parameters', {})
)
# Compute reward
step_reward = self.reward_fn.compute_tool_use_reward(
action,
result,
task_state,
task_goal
)
trajectory.append({
'action': action,
'result': result,
'reward': step_reward,
'step': step
})
cumulative_reward += step_reward
task_state['last_result'] = result
# Check if task complete
if self._task_complete(result, task_goal):
break
return trajectory
def train_on_trajectories(self, trajectories, num_epochs=3):
"""
Train model on collected trajectories.
Args:
trajectories: List of collected trajectories
num_epochs: Training epochs
Returns:
training_stats: Training statistics
"""
optimizer = AdamW(self.model.parameters(), lr=1e-5)
for epoch in range(num_epochs):
total_loss = 0
for trajectory in trajectories:
# Compute returns
returns = self._compute_returns(trajectory)
for step_idx, step_data in enumerate(trajectory):
action = step_data['action']
return_val = returns[step_idx]
# Forward pass
action_logits = self.model.get_action_logits(
action['tool'],
action.get('parameters', {})
)
# Policy gradient loss
log_prob = F.log_softmax(action_logits, dim=-1).sum()
loss = -log_prob * return_val
optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
optimizer.step()
total_loss += loss.item()
return total_loss / len(trajectories)
def _compute_returns(self, trajectory):
"""
Compute discounted returns.
"""
returns = []
G = 0
for step in reversed(trajectory):
G = step['reward'] + 0.99 * G
returns.insert(0, G)
return returns
def _task_complete(self, result, goal):
"""
Check if task is complete.
"""
return result.get('success', False)
Hyperparameters and Configuration:
When to Use Automated Tool Learning:
When NOT to Use:
Implementation Notes:
Paper: Feedback-Driven Tool-Use Improvements via Automated Build Environments ArXiv: 2508.08791 Performance: RL training on synthetic environments improves tool-use capability while preserving general abilities
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.