skills/skillxiv-v0.0.2-claude-opus-4.6/evocua-agent-learning/SKILL.md
Train autonomous agents to use computers by generating synthetic task experiences and iterating on them, achieving 56.7% success on OSWorld benchmarks through scalable experience-driven optimization. Use when you need agents that autonomously learn complex computer interaction patterns without manual task curation.
npx skillsauth add ADu2021/skillXiv evocua-agent-learningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables training computer-use agents that autonomously generate diverse synthetic tasks and learn through iterative cycles, significantly improving performance on real-world benchmarks.
EvoCUA evolves agents through a feedback loop: generate synthetic tasks → train agent → evaluate performance → refine task generation. This removes the bottleneck of manual task creation.
The approach combines two stages:
Generate diverse synthetic tasks and create a feedback loop for agent training:
# Pseudocode for EvoCUA training loop
class EvoCUA:
def __init__(self, agent, task_generator):
self.agent = agent
self.task_generator = task_generator
def train_iteration(self, num_synthetic_tasks=100):
# Generate diverse synthetic tasks
tasks = self.task_generator.create_tasks(num_synthetic_tasks)
# Train agent on synthetic tasks
for task in tasks:
trajectory = self.agent.execute(task)
self.agent.learn_from(trajectory)
# Evaluate on real benchmarks (OSWorld)
real_score = self.evaluate_on_real_tasks()
return real_score
Track agent improvement across iterations. Real performance on OSWorld improved from prior approaches to 56.7% success rate through multiple rounds of synthetic task generation and refinement.
This paper addresses the challenge that manual task curation doesn't scale for training generally-capable computer-use agents. By automating task generation, agents can improve through iterative learning loops similar to human learning through varied experiences.
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.