plugins/agentv-dev/skills/agentv-dev/SKILL.md
AgentV CLI skills for evaluating, optimizing, and governing AI agents. Triggers: run evals, benchmark agents, write evals, review evals, analyze traces, optimize prompts, governance linting. Covers: eval running, eval writing, eval review, trace analysis, description optimization, autoresearch, and governance compliance.
npx skillsauth add entityprocess/agentv agentv-devInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The full skill content is bundled with the AgentV CLI and always version-matched to it. Load the specific skill you need:
agentv skills get <skill-name>
| Skill | Command | Use when |
|-------|---------|----------|
| agentv-bench | agentv skills get agentv-bench | Run evals, benchmark agents, optimize against evals, compare targets, autoresearch |
| agentv-eval-writer | agentv skills get agentv-eval-writer | Write, edit, or validate eval YAML files |
| agentv-eval-review | agentv skills get agentv-eval-review | Review, lint, or check eval quality before committing |
| agentv-governance | agentv skills get agentv-governance | Author or lint governance blocks (OWASP, MITRE, EU AI Act, ISO 42001) |
| agentv-trace-analyst | agentv skills get agentv-trace-analyst | Analyze eval traces, find regressions, inspect tool trajectories |
agentv CLI is on PATH (run agentv --help to verify)agentv skills get <skill-name> to load itIf agentv is not on PATH, check:
node_modules/.bin/agentv (project-local install)~/.local/bin/agentv (global user install)bun apps/cli/src/cli.ts <command>tools
Analyze AgentV evaluation traces and result JSONL files using `agentv inspect` and `agentv compare` CLI commands. Use when asked to inspect AgentV eval results, find regressions between AgentV evaluation runs, identify failure patterns in AgentV trace data, analyze tool trajectories, or compute cost/latency/score statistics from AgentV result files. Do NOT use for benchmarking skill trigger accuracy, analyzing skill-creator eval performance, or measuring skill description quality — those tasks belong to the skill-creator skill.
development
Author, edit, and lint `governance:` blocks in `*.eval.yaml` files. Use when creating or updating evaluation suites that carry AI-governance metadata (OWASP LLM Top 10, OWASP Agentic Top 10, MITRE ATLAS, EU AI Act, ISO 42001). Also use non-interactively (e.g., from a GitHub Action) to lint changed eval files and report violations against the rules in `references/lint-rules.md`. Do NOT use for running evals or benchmarking — that belongs to agentv-bench.
development
Write, edit, review, and validate AgentV EVAL.yaml / .eval.yaml evaluation files. Use when asked to create new eval files, update or fix existing ones, add or remove test cases, configure graders (`llm-grader`, `code-grader`, `rubrics`), review whether an eval is correct or complete, convert between EVAL.yaml and evals.json using `agentv convert`, or generate eval test cases from chat transcripts (markdown conversation or JSON messages). Do NOT use for creating SKILL.md files, writing skill definitions, or running evals — running and benchmarking belongs to agentv-bench.
development
Use when reviewing eval YAML files for quality issues, linting eval files before committing, checking eval schema compliance, or when asked to "review these evals", "check eval quality", "lint eval files", or "validate eval structure". Do NOT use for writing evals (use agentv-eval-writer) or running evals (use agentv-bench).