Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

Tyler-R-Kendrick/improve

Name: improve
Author: Tyler-R-Kendrick

skills/ai/improve/SKILL.md

npx skillsauth add Tyler-R-Kendrick/agent-skills improve

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Evals and Self-Improvement Pipelines

Operating Rule

Default new agent and LLM evals to AgentEvals EVAL.yaml with AgentV. Improve only against evidence: eval failures, trace observations, benchmark deltas, human review notes, or explicit user goals. Keep each loop narrow, reproducible, and auditable.

Progressive Disclosure

Load only the reference needed for the requested eval or improvement surface:

| If the task says... | Then read... | |---|---| | "install", "setup", "environment", "venv", "dependencies", "API keys", "Node", "Python", or missing native tools | references/environment-setup.md | | "install AgentV", "install ASSERT", "setup eval tools", "eval runner install", or native eval validation setup | references/install-eval-tools.md | | "install GEPA", "install Trace", "install Agent Lightning", "install SkillOpt", "setup optimizer", or improvement library dependencies | references/install-improvement-libs.md | | "create an eval", "judge", "grader", "rubric", "EVAL.yaml", or no eval standard | references/agentevals.md and references/agentv.md | | "which eval standard", "convert eval", "compare standards", or mixed eval formats | references/eval-standards-guide.md | | "Agent Skills eval", evals.json, "skill quality", "with_skill", or "without_skill" | references/agent-skills-evals.md | | "ASSERT", assert-ai, "judge-traces", "spec-driven", "behavior taxonomy", "trace-aware", "policy failure modes", or eval_config.yaml | references/assert.md | | "eval starter", "eval lint", "eval workspace contract", or expected eval artifacts | references/eval-workspace-contracts.md | | "optimize a skill", "progressive disclosure", "Table of Contents", "Index Page", "conditional access", "top-level links", "scripted workflow", or "deterministic workflow generation" | references/skill-optimization-strategy.md | | "which technique", "optimize this", "improvement plan", or mixed artifacts | references/techniques-guide.md | | "GEPA", "Pareto", "reflective mutation", "prompt evolution", or "optimize anything" | references/gepa.md | | "Trace", "OptoPrime", "computation graph", "node", "bundle", or end-to-end generative optimization | references/microsoft-trace.md | | "VISTA", "interpretable APO", "hypothesis agent", "random restart", or "epsilon-greedy" | references/vista.md | | "Agent Lightning", "RL", "reward", "policy reward", "governed training", or skill improvement with policy constraints | references/agent-lightning.md | | "SkillOpt", "SkillOpts", "skill evolution", best_skill.md, "held-out gate", "bounded edits", "textual learning rate", or "SkillOpt-Sleep" | references/skillopt.md | | "eval failures", "agent traces", "span logs", "benchmark deltas", or "release evidence" | references/eval-trace-improvement.md | | "synthetic data", "simulation data", "Simula", "QDC", "Source2Synth", "MAG-V", "MetaSynth", "BARE", "Condor", "data auditor", "generate data", or "simulate" | references/simulation-data.md | | "CLI", "init", "improve", "eval", "simulate", "lint", "workspace", or "deterministic improvement artifacts" | references/workspace-contracts.md |

Workflow

Identify whether the user needs an eval artifact, an improvement loop, or both.
For eval artifacts, select the standard from explicit language, existing repo artifacts, or the default AgentEvals rule.
For improvement loops, identify the artifact type and evidence: eval cases, traces, logs, cost/latency metrics, human review, or explicit constraints.
Load only the matching reference docs, then use scripts/improve-cli.ts init, improve, eval, simulate, or lint when deterministic artifacts help.
Prefer deterministic graders and structural checks before subjective LLM review.
Run the smallest useful loop, compare against the baseline, and preserve selected candidates plus rejected hypotheses.
Report the evidence delta and any residual risk before claiming the artifact is evaluated or improved.

Script

Use the bundled TypeScript CLI for deterministic planning, eval artifact generation, technique-specific local implementations, simulation data generation, improvement workspaces, and structural linting:

node skills/ai/improve/scripts/improve-cli.ts --help
node skills/ai/improve/scripts/improve-cli.ts init improve/support-skill --json
node skills/ai/improve/scripts/improve-cli.ts improve . --gepa --json
node skills/ai/improve/scripts/improve-cli.ts eval --agent-skills --json
node skills/ai/improve/scripts/improve-cli.ts simulate . --simula --json
node skills/ai/improve/scripts/improve-cli.ts lint improve/support-skill --json

For the CLI contract and generated workspace structure, read references/workspace-contracts.md. The script is dependency-free, calls the bundled implementation libraries in scripts/, and expects Node 24+ TypeScript type stripping.

Best Practices

Use AgentEvals by default: default new agent and LLM evals to AgentEvals EVAL.yaml with AgentV unless the user or repo clearly specifies another standard.
Use progressive disclosure by default: treat SKILL.md as a table-of-contents/index page with conditional top-level links; put deeper links inside references.
Codify workflows in scripts: make order-of-operations and workspace generation deterministic in scripts; let calling agents provide generated inputs and handle inference operations.
Improve from evidence: require eval failures, trace observations, benchmark deltas, or explicit human feedback before changing an artifact.
Keep loops narrow: optimize one prompt, skill behavior, agent step, code path, or workflow contract at a time.
Preserve baselines: save the original artifact, eval cases, trace inputs, and metrics before generating candidates.
Wire explicit assertions: wire explicit test cases and assertions; do not ship one anonymous catch-all judge.
Prefer deterministic checks: use exact graders, structural checks, schema checks, and replayable traces before subjective LLM review.
Use GEPA for text evolution: use reflective mutation and Pareto selection when the artifact is textual and measurable.
Use Trace for trainable workflows: use computation-graph optimization when code, prompts, and agent steps need end-to-end feedback propagation.
Use VISTA for interpretability: decouple hypotheses from rewrites when the improvement loop needs auditable reasoning and local-optimum escape.
Design synthetic data before sampling: use dataset-level taxonomies, local diversity, complexity schedules, quality gates, and lineage before asking an agent or model to generate simulation records.
Validate the candidate: accept a candidate only after it beats the baseline on held-out evals or trace-backed acceptance criteria.
Validate natively first: validate eval artifacts with the native standard tool where possible, then use the bundled linter as a structural fallback.
Record rejected paths: keep failed hypotheses and candidates so future iterations do not rediscover the same dead ends.

Tyler-R-Kendrick/improve

skills/ai/improve/SKILL.md

Use when producing agent/LLM evals, synthetic simulation data, or self-improvement pipelines for prompts, code, skills, agents, harnesses, and workflows. Covers AgentEvals/AgentV, Agent Skills evals, ASSERT, GEPA, Trace, VISTA, Agent Lightning, SkillOpt, Simula-style data design, progressive disclosure, deterministic workspaces, and release evidence. USE FOR: eval creation, EVAL.yaml, AgentEvals, AgentV, evals.json, ASSERT, judge-traces, behavior taxonomy, judges, graders, rubrics, synthetic data, simulation data, Simula, QDC, source-grounded generation, prompt optimization, agent improvement, skill improvement, harness hardening, progressive disclosure, deterministic workflows, GEPA, Trace, VISTA, Agent Lightning, SkillOpt DO NOT USE FOR: ordinary unit/integration tests without AI quality criteria (use testing), refactoring without eval or trace feedback (use refactor), generic Agent Skills packaging without eval or improvement work (use agent-skills)

10 stars

development

Updated Jun 10, 2026

$ install --global

skillsauth

npx skillsauth add Tyler-R-Kendrick/agent-skills improve

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 10, 2026, 4:51 AM12.3s44 files scanned

SKILL.md

name:: improve
description:: |
USE FOR:: eval creation, EVAL.yaml, AgentEvals, AgentV, evals.json, ASSERT, judge-traces, behavior taxonomy, judges, graders, rubrics, synthetic data, simulation data, Simula, QDC, source-grounded generation, prompt optimization, agent improvement, skill improvement, harness hardening, progressive disclosure, deterministic workflows, GEPA, Trace, VISTA, Agent Lightning, SkillOpt
DO NOT USE FOR:: ordinary unit/integration tests without AI quality criteria (use testing), refactoring without eval or trace feedback (use refactor), generic Agent Skills packaging without eval or improvement work (use agent-skills)
license:: MIT
displayName:: Self-Improvement Pipelines
author:: Tyler-R-Kendrick
version:: 1.0.0
compatibility:: claude, copilot, cursor

Evals and Self-Improvement Pipelines

Operating Rule

Progressive Disclosure

Load only the reference needed for the requested eval or improvement surface:

Workflow

Identify whether the user needs an eval artifact, an improvement loop, or both.
For eval artifacts, select the standard from explicit language, existing repo artifacts, or the default AgentEvals rule.
For improvement loops, identify the artifact type and evidence: eval cases, traces, logs, cost/latency metrics, human review, or explicit constraints.
Load only the matching reference docs, then use scripts/improve-cli.ts init, improve, eval, simulate, or lint when deterministic artifacts help.
Prefer deterministic graders and structural checks before subjective LLM review.
Run the smallest useful loop, compare against the baseline, and preserve selected candidates plus rejected hypotheses.
Report the evidence delta and any residual risk before claiming the artifact is evaluated or improved.

Script

node skills/ai/improve/scripts/improve-cli.ts --help
node skills/ai/improve/scripts/improve-cli.ts init improve/support-skill --json
node skills/ai/improve/scripts/improve-cli.ts improve . --gepa --json
node skills/ai/improve/scripts/improve-cli.ts eval --agent-skills --json
node skills/ai/improve/scripts/improve-cli.ts simulate . --simula --json
node skills/ai/improve/scripts/improve-cli.ts lint improve/support-skill --json

Best Practices

Use AgentEvals by default: default new agent and LLM evals to AgentEvals EVAL.yaml with AgentV unless the user or repo clearly specifies another standard.
Use progressive disclosure by default: treat SKILL.md as a table-of-contents/index page with conditional top-level links; put deeper links inside references.
Codify workflows in scripts: make order-of-operations and workspace generation deterministic in scripts; let calling agents provide generated inputs and handle inference operations.
Improve from evidence: require eval failures, trace observations, benchmark deltas, or explicit human feedback before changing an artifact.
Keep loops narrow: optimize one prompt, skill behavior, agent step, code path, or workflow contract at a time.
Preserve baselines: save the original artifact, eval cases, trace inputs, and metrics before generating candidates.
Wire explicit assertions: wire explicit test cases and assertions; do not ship one anonymous catch-all judge.
Prefer deterministic checks: use exact graders, structural checks, schema checks, and replayable traces before subjective LLM review.
Use GEPA for text evolution: use reflective mutation and Pareto selection when the artifact is textual and measurable.
Use Trace for trainable workflows: use computation-graph optimization when code, prompts, and agent steps need end-to-end feedback propagation.
Use VISTA for interpretability: decouple hypotheses from rewrites when the improvement loop needs auditable reasoning and local-optimum escape.
Design synthetic data before sampling: use dataset-level taxonomies, local diversity, complexity schedules, quality gates, and lineage before asking an agent or model to generate simulation records.
Validate the candidate: accept a candidate only after it beats the baseline on held-out evals or trace-backed acceptance criteria.
Validate natively first: validate eval artifacts with the native standard tool where possible, then use the bundled linter as a structural fallback.
Record rejected paths: keep failed hypotheses and candidates so future iterations do not rediscover the same dead ends.

Related Skills

Tyler-R-Kendrick/design-system

tools

VerifiedTrustedCommunity

Use when building or maintaining a design system — the coordinated set of design tokens, component libraries, documentation, and tooling that ensures visual and behavioral consistency across products. USE FOR: design system architecture, choosing token formats vs component frameworks, connecting Figma to code, design-to-development workflows, multi-platform consistency DO NOT USE FOR: specific token authoring (use design-tokens), Figma workflows (use figma), component cataloging (use storybook), token transformation (use style-dictionary), cross-framework components (use mitosis)

10SKILL.mdUpdated Jun 10, 2026

Tyler-R-Kendrick/design-system

Tyler-R-Kendrick/x402

tools

VerifiedTrustedCommunity

Use when implementing the x402 protocol for HTTP-native micropayments. Covers server middleware, client payment flows, facilitator integration, and stablecoin payments for APIs and AI agents. USE FOR: API micropayments, monetizing endpoints, stablecoin HTTP payments, automated agent payments for API access DO NOT USE FOR: full commerce flows with cart/checkout (use ap2), agent communication (use a2a), tool integration (use mcp)

10SKILL.mdUpdated Jun 10, 2026

Tyler-R-Kendrick/x402

Tyler-R-Kendrick/mcp

tools

VerifiedTrustedCommunity

Use when implementing or integrating with the Model Context Protocol (MCP) for AI tool servers, resources, prompts, and context management. USE FOR: building MCP tool servers, exposing resources to agents, prompt templates, connecting agents to external APIs DO NOT USE FOR: agent-to-agent communication (use a2a), interactive UI rendering (use mcp-apps), agent payments (use x402 or ap2)

10SKILL.mdUpdated Jun 10, 2026

Tyler-R-Kendrick/mcp-apps

tools

VerifiedTrustedCommunity

Use when building MCP Apps that serve interactive UI from MCP servers. Covers the ui:// URI scheme, HTML rendering in sandboxed iframes, and bidirectional communication between UI and host. USE FOR: rich UI in agent conversations, interactive dashboards from MCP servers, sandboxed iframe rendering DO NOT USE FOR: basic tool responses without UI (use mcp), agent communication (use a2a), full web applications

10SKILL.mdUpdated Jun 10, 2026

Tyler-R-Kendrick/mcp-apps

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/Tyler-R-Kendrick/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/ai/improve ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

Tyler-R-Kendrick/agent-skills

10 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT