skills/cua-skill-develop-skills-computer/SKILL.md
Build reusable, parameterized skill libraries for computer-using agents (CUAs). Decomposes GUI automation into Skill Cells (intent), Parameterized Execution Graphs (actions), and Skill Composition Graphs (chaining). Use when: 'build a skill library for desktop automation', 'create reusable GUI action primitives', 'design a computer-using agent with skill retrieval', 'structure browser automation as composable skills', 'add failure recovery to my GUI agent', 'make my automation agent learn from past failures'.
npx skillsauth add ndpvt-web/arxiv-claude-skills cua-skill-develop-skills-computerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill teaches Claude to design and implement structured, reusable skill libraries for autonomous agents that operate computer interfaces (browsers, desktops, web apps). Based on Microsoft's CUA-Skill framework, the core idea is to decompose GUI automation into three layers: a Skill Cell capturing user intent, a Parameterized Execution Graph encoding concrete GUI actions with placeholder arguments, and a Skill Composition Graph defining how skills chain together. This architecture replaces brittle, monolithic automation scripts with modular, retrieval-friendly skill units that an LLM planner can dynamically select, parameterize, and compose at runtime.
The Three-Layer Skill Abstraction. CUA-Skill decomposes every automation capability into three components. The Skill Cell is a metadata envelope capturing the skill's name, natural-language intent description, preconditions (what screen state must hold before execution), postconditions (what should be true after), and a typed argument schema with placeholders. The Parameterized Execution Graph (PEG) is an ordered sequence of concrete GUI actions (click, type, scroll, wait, assert) where each action references elements via locators and binds values from the argument schema. The PEG supports conditional branching so the same skill can handle variant UI states. The Skill Composition Graph (SCG) encodes how skills chain: which skills must complete before others, how outputs flow as inputs to downstream skills, and what alternative paths exist when a skill fails.
Dynamic Retrieval and Argument Instantiation. At runtime, an LLM planner receives the user's task and current UI state (screenshot or DOM), then retrieves the best-matching skill from the library using semantic similarity on intent descriptions. The planner instantiates the skill's argument placeholders using context from the task instruction and observed UI elements. This separation of skill definition from argument binding means skills are written once and reused across many tasks with different parameters.
Memory-Aware Failure Recovery. The agent maintains an execution log recording each skill invocation, its parameters, and whether it succeeded or failed. When a postcondition check fails, the agent consults this log to avoid retrying the exact same parameters. It can backtrack to a known-good state, try alternative argument values, or escalate to a different skill entirely. This persistent memory prevents infinite retry loops -- the most common failure mode in naive GUI agents.
Audit the target application's UI surface. Enumerate the screens, dialogs, and workflows the agent must handle. Group related actions into functional clusters (e.g., "file management," "form submission," "navigation"). Each cluster becomes a candidate skill.
Define Skill Cells for each capability. For every skill, write a JSON/YAML metadata block containing: name (kebab-case identifier), intent (1-2 sentence natural-language description), preconditions (list of UI state assertions that must hold), postconditions (list of expected outcomes), and args (typed argument schema with descriptions and defaults).
Build the Parameterized Execution Graph (PEG). For each skill, define the ordered sequence of GUI actions. Each action specifies: action_type (click, type, scroll, select, wait, assert), locator (CSS selector, XPath, accessibility label, or coordinates), value (literal or {{arg_name}} placeholder), and optional condition (branch predicate based on screen state). Keep PEGs as short as possible -- a skill should do one coherent thing.
Design the Skill Composition Graph (SCG). Define edges between skills: depends_on (skill B requires skill A to complete first), data_flow (skill A's output field maps to skill B's input arg), and fallback (if skill A fails, try skill C instead). Store this as an adjacency list or DAG definition.
Implement a skill registry with semantic search. Store all Skill Cells in a searchable index. At minimum, embed each skill's intent field using a text embedding model and support nearest-neighbor retrieval. For smaller libraries (<100 skills), keyword matching on intent + preconditions is sufficient.
Build the planner loop. The LLM planner takes as input: (a) the user's task instruction, (b) the current UI state (screenshot description or DOM snapshot), (c) the execution history so far. It outputs: the skill to invoke and concrete argument values. Prompt the planner to reason about which preconditions are currently satisfied to select the right skill.
Implement argument instantiation. Parse the selected skill's args schema, then fill each placeholder from the planner's output. Validate types and required fields before execution. Reject and re-prompt the planner if instantiation fails.
Execute the PEG with postcondition checking. Run each action in the PEG sequentially, substituting {{arg_name}} placeholders with instantiated values. After the final action, evaluate all postconditions. If any postcondition fails, mark the execution as failed and log the full context.
Apply memory-aware failure recovery. On failure, append {skill_name, args, failure_reason, ui_state_snapshot} to the failure log. Before the planner selects the next action, inject recent failures into its context with the instruction: "The following attempts failed -- do not repeat them. Choose a different approach or different parameters." If the same skill has failed 2+ times, force the planner to select an alternative skill or escalate.
Iterate and expand the skill library. After initial deployment, review execution logs to find: (a) tasks where no skill matched (gaps to fill), (b) skills that frequently fail (need refinement), (c) recurring multi-skill sequences that should become a single composite skill. Add new skills to the registry and update composition edges.
Example 1: Building a browser automation skill library
User: "I'm building a Playwright-based agent that helps users manage their email. Create a skill library structure for common Gmail actions."
Approach:
Output structure:
# skills/compose-email.yaml
name: compose-email
intent: "Compose and send a new email to specified recipients with subject and body"
preconditions:
- "Gmail inbox or any Gmail page is loaded"
postconditions:
- "Sent confirmation banner is visible OR draft is saved"
args:
to:
type: string
required: true
description: "Recipient email address(es), comma-separated"
subject:
type: string
required: true
description: "Email subject line"
body:
type: string
required: true
description: "Email body text"
send:
type: boolean
default: true
description: "Whether to send immediately or save as draft"
peg:
- action: click
locator: "[gh='cm'] div.T-I.T-I-KE" # Compose button
wait_after: 1000
- action: type
locator: "input[aria-label='To recipients']"
value: "{{to}}"
- action: type
locator: "input[name='subjectbox']"
value: "{{subject}}"
- action: type
locator: "div[aria-label='Message Body']"
value: "{{body}}"
- action: click
condition: "{{send}} == true"
locator: "div[aria-label='Send']"
- action: assert
condition: "{{send}} == true"
locator: "span.bAq" # "Message sent" banner
timeout: 5000
# composition/email-with-attachment.yaml
name: email-with-attachment
skills:
- id: compose
skill: compose-email
args:
to: "{{to}}"
subject: "{{subject}}"
body: "{{body}}"
send: false # Don't send yet
- id: attach
skill: attach-file
depends_on: [compose]
args:
file_path: "{{attachment_path}}"
- id: send
skill: send-draft
depends_on: [attach]
fallback: save-draft # If send fails, at least save
Example 2: Adding failure recovery to an existing automation agent
User: "My Puppeteer agent keeps retrying the same failed login action in a loop. How do I add CUA-Skill style failure recovery?"
Approach:
Output:
interface ExecutionRecord {
skillName: string;
args: Record<string, unknown>;
timestamp: number;
success: boolean;
failureReason?: string;
uiStateSnapshot?: string;
}
class FailureMemory {
private log: ExecutionRecord[] = [];
private readonly MAX_RETRIES = 2;
record(entry: ExecutionRecord): void {
this.log.push(entry);
}
getFailuresForSkill(skillName: string): ExecutionRecord[] {
return this.log.filter(e => e.skillName === skillName && !e.success);
}
shouldBlockSkill(skillName: string, args: Record<string, unknown>): boolean {
const failures = this.getFailuresForSkill(skillName);
const sameArgFailures = failures.filter(
f => JSON.stringify(f.args) === JSON.stringify(args)
);
return sameArgFailures.length >= this.MAX_RETRIES;
}
buildPlannerContext(): string {
const recentFailures = this.log
.filter(e => !e.success)
.slice(-5);
if (recentFailures.length === 0) return "";
return [
"## Recent Failures (do NOT repeat these exact approaches):",
...recentFailures.map(f =>
`- Skill "${f.skillName}" with args ${JSON.stringify(f.args)}: ${f.failureReason}`
),
"Choose different parameters or a different skill entirely."
].join("\n");
}
}
// In the planner loop:
async function planAndExecute(task: string, memory: FailureMemory) {
const uiState = await captureScreenState();
const failureContext = memory.buildPlannerContext();
const plannerResponse = await llm.complete({
prompt: `Task: ${task}\nUI State: ${uiState}\n${failureContext}\nSelect a skill and provide arguments.`
});
const { skillName, args } = parsePlannerOutput(plannerResponse);
if (memory.shouldBlockSkill(skillName, args)) {
// Force re-planning with explicit block
return planAndExecute(task, memory); // Planner sees updated context
}
const result = await executeSkill(skillName, args);
memory.record({ skillName, args, timestamp: Date.now(), ...result });
return result;
}
Example 3: Designing a skill registry with retrieval
User: "I have 50 automation skills defined as YAML files. How do I build a retrieval system so my agent can find the right skill?"
Approach:
Output:
import json
from pathlib import Path
from dataclasses import dataclass
from sentence_transformers import SentenceTransformer
import numpy as np
@dataclass
class SkillCell:
name: str
intent: str
preconditions: list[str]
postconditions: list[str]
args: dict
peg: list[dict]
class SkillRegistry:
def __init__(self, skills_dir: str):
self.model = SentenceTransformer("all-MiniLM-L6-v2")
self.skills: list[SkillCell] = self._load_skills(skills_dir)
self.embeddings = self.model.encode(
[s.intent for s in self.skills]
)
def retrieve(self, query: str, ui_state: dict, top_k: int = 5) -> list[SkillCell]:
# Semantic retrieval on intent
q_emb = self.model.encode([query])
scores = np.dot(self.embeddings, q_emb.T).flatten()
top_indices = np.argsort(scores)[::-1][:top_k * 2]
candidates = [self.skills[i] for i in top_indices]
# Filter by precondition satisfaction
viable = []
for skill in candidates:
if self._preconditions_met(skill.preconditions, ui_state):
viable.append(skill)
if len(viable) >= top_k:
break
return viable
def _preconditions_met(self, preconditions: list[str], ui_state: dict) -> bool:
# Check each precondition against current UI state
for pre in preconditions:
if not self._evaluate_condition(pre, ui_state):
return False
return True
| Failure Mode | Detection | Recovery Strategy | |---|---|---| | Skill not found | Retrieval returns empty or low-confidence matches | Fall back to raw LLM planning without skill library; log the gap for future skill creation | | Precondition not met | UI state check fails before PEG execution | Navigate to required state first (compose a "navigate-to" skill), or select a different skill whose preconditions match | | Action element not found | Locator timeout during PEG execution | Retry with fallback locators if defined; capture screenshot and ask planner to identify the element | | Postcondition not met | Assert fails after PEG completes | Log failure with full context; planner re-selects with failure memory injected | | Argument instantiation fails | Type validation or required field missing | Re-prompt the planner with the argument schema and validation error message | | Infinite retry loop | Same skill+args attempted 3+ times | Hard block that skill+args combination; force alternative approach or surface error to user |
Paper: CUA-Skill: Develop Skills for Computer Using Agent -- Chen et al., 2026. Focus on Section 3 (Skill Cell / PEG / SCG architecture), Section 4 (Agent planning loop and failure recovery), and the WindowsAgentArena evaluation for concrete performance data. Project page: https://microsoft.github.io/cua_skill/
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".