skills/ai/improve/SKILL.md
Use when producing agent/LLM evals, synthetic simulation data, or self-improvement pipelines for prompts, code, skills, agents, harnesses, and workflows. Covers AgentEvals/AgentV, Agent Skills evals, ASSERT, GEPA, Trace, VISTA, Agent Lightning, SkillOpt, Simula-style data design, progressive disclosure, deterministic workspaces, and release evidence. USE FOR: eval creation, EVAL.yaml, AgentEvals, AgentV, evals.json, ASSERT, judge-traces, behavior taxonomy, judges, graders, rubrics, synthetic data, simulation data, Simula, QDC, source-grounded generation, prompt optimization, agent improvement, skill improvement, harness hardening, progressive disclosure, deterministic workflows, GEPA, Trace, VISTA, Agent Lightning, SkillOpt DO NOT USE FOR: ordinary unit/integration tests without AI quality criteria (use testing), refactoring without eval or trace feedback (use refactor), generic Agent Skills packaging without eval or improvement work (use agent-skills)
npx skillsauth add Tyler-R-Kendrick/agent-skills improveInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Default new agent and LLM evals to AgentEvals EVAL.yaml with AgentV. Improve only against evidence: eval failures, trace observations, benchmark deltas, human review notes, or explicit user goals. Keep each loop narrow, reproducible, and auditable.
Load only the reference needed for the requested eval or improvement surface:
| If the task says... | Then read... |
|---|---|
| "install", "setup", "environment", "venv", "dependencies", "API keys", "Node", "Python", or missing native tools | references/environment-setup.md |
| "install AgentV", "install ASSERT", "setup eval tools", "eval runner install", or native eval validation setup | references/install-eval-tools.md |
| "install GEPA", "install Trace", "install Agent Lightning", "install SkillOpt", "setup optimizer", or improvement library dependencies | references/install-improvement-libs.md |
| "create an eval", "judge", "grader", "rubric", "EVAL.yaml", or no eval standard | references/agentevals.md and references/agentv.md |
| "which eval standard", "convert eval", "compare standards", or mixed eval formats | references/eval-standards-guide.md |
| "Agent Skills eval", evals.json, "skill quality", "with_skill", or "without_skill" | references/agent-skills-evals.md |
| "ASSERT", assert-ai, "judge-traces", "spec-driven", "behavior taxonomy", "trace-aware", "policy failure modes", or eval_config.yaml | references/assert.md |
| "eval starter", "eval lint", "eval workspace contract", or expected eval artifacts | references/eval-workspace-contracts.md |
| "optimize a skill", "progressive disclosure", "Table of Contents", "Index Page", "conditional access", "top-level links", "scripted workflow", or "deterministic workflow generation" | references/skill-optimization-strategy.md |
| "which technique", "optimize this", "improvement plan", or mixed artifacts | references/techniques-guide.md |
| "GEPA", "Pareto", "reflective mutation", "prompt evolution", or "optimize anything" | references/gepa.md |
| "Trace", "OptoPrime", "computation graph", "node", "bundle", or end-to-end generative optimization | references/microsoft-trace.md |
| "VISTA", "interpretable APO", "hypothesis agent", "random restart", or "epsilon-greedy" | references/vista.md |
| "Agent Lightning", "RL", "reward", "policy reward", "governed training", or skill improvement with policy constraints | references/agent-lightning.md |
| "SkillOpt", "SkillOpts", "skill evolution", best_skill.md, "held-out gate", "bounded edits", "textual learning rate", or "SkillOpt-Sleep" | references/skillopt.md |
| "eval failures", "agent traces", "span logs", "benchmark deltas", or "release evidence" | references/eval-trace-improvement.md |
| "synthetic data", "simulation data", "Simula", "QDC", "Source2Synth", "MAG-V", "MetaSynth", "BARE", "Condor", "data auditor", "generate data", or "simulate" | references/simulation-data.md |
| "CLI", "init", "improve", "eval", "simulate", "lint", "workspace", or "deterministic improvement artifacts" | references/workspace-contracts.md |
scripts/improve-cli.ts init, improve, eval, simulate, or lint when deterministic artifacts help.Use the bundled TypeScript CLI for deterministic planning, eval artifact generation, technique-specific local implementations, simulation data generation, improvement workspaces, and structural linting:
node skills/ai/improve/scripts/improve-cli.ts --help
node skills/ai/improve/scripts/improve-cli.ts init improve/support-skill --json
node skills/ai/improve/scripts/improve-cli.ts improve . --gepa --json
node skills/ai/improve/scripts/improve-cli.ts eval --agent-skills --json
node skills/ai/improve/scripts/improve-cli.ts simulate . --simula --json
node skills/ai/improve/scripts/improve-cli.ts lint improve/support-skill --json
For the CLI contract and generated workspace structure, read references/workspace-contracts.md. The script is dependency-free, calls the bundled implementation libraries in scripts/, and expects Node 24+ TypeScript type stripping.
EVAL.yaml with AgentV unless the user or repo clearly specifies another standard.SKILL.md as a table-of-contents/index page with conditional top-level links; put deeper links inside references.tools
Use when building or maintaining a design system — the coordinated set of design tokens, component libraries, documentation, and tooling that ensures visual and behavioral consistency across products. USE FOR: design system architecture, choosing token formats vs component frameworks, connecting Figma to code, design-to-development workflows, multi-platform consistency DO NOT USE FOR: specific token authoring (use design-tokens), Figma workflows (use figma), component cataloging (use storybook), token transformation (use style-dictionary), cross-framework components (use mitosis)
tools
Use when implementing the x402 protocol for HTTP-native micropayments. Covers server middleware, client payment flows, facilitator integration, and stablecoin payments for APIs and AI agents. USE FOR: API micropayments, monetizing endpoints, stablecoin HTTP payments, automated agent payments for API access DO NOT USE FOR: full commerce flows with cart/checkout (use ap2), agent communication (use a2a), tool integration (use mcp)
tools
Use when implementing or integrating with the Model Context Protocol (MCP) for AI tool servers, resources, prompts, and context management. USE FOR: building MCP tool servers, exposing resources to agents, prompt templates, connecting agents to external APIs DO NOT USE FOR: agent-to-agent communication (use a2a), interactive UI rendering (use mcp-apps), agent payments (use x402 or ap2)
tools
Use when building MCP Apps that serve interactive UI from MCP servers. Covers the ui:// URI scheme, HTML rendering in sandboxed iframes, and bidirectional communication between UI and host. USE FOR: rich UI in agent conversations, interactive dashboards from MCP servers, sandboxed iframe rendering DO NOT USE FOR: basic tool responses without UI (use mcp), agent communication (use a2a), full web applications