Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

entityprocess/agentv-eval-review

Name: agentv-eval-review
Author: entityprocess

plugins/agentv-dev/skills/agentv-eval-review/SKILL.md

npx skillsauth add entityprocess/agentv agentv-eval-review

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Eval Review

Overview

Lint and review AgentV eval YAML files for structural issues, schema compliance, and quality problems. Runs deterministic checks via script, then applies LLM judgment for semantic issues the script cannot catch.

Process

Step 1: Run the linter

Execute scripts/lint_eval.py against the target eval files:

python scripts/lint_eval.py <path-to-evals-dir-or-file> --json

The script checks:

.eval.yaml extension
description field present
Each test has id, input, and at least one of criteria/expected_output/assertions
File paths in type: file use leading /
assertions blocks present (flags tests relying solely on expected_output)
expected_output prose detection (flags "The agent should..." patterns)
Repeated file inputs across tests (recommends top-level input)
Naming prefix consistency across eval files in same directory

Step 2: Review script output

Report the script findings grouped by severity (error > warning > info). For each finding, include the file path and a concrete fix.

Step 3: Semantic review (LLM judgment)

The script catches structural issues but cannot assess:

Factual accuracy — Do tool/command names in expected_output match what the skill documents?
Coverage gaps — Are important edge cases missing?
Assertion discriminability — Would assertions pass for both good and bad output?
Cross-file consistency — Do output filenames match across evals and skills?

Read the relevant SKILL.md files and cross-check against the eval content for these issues.

Skill Resources

scripts/lint_eval.py — Deterministic eval linter (Python 3.11+, stdlib only)

entityprocess/agentv-eval-review

plugins/agentv-dev/skills/agentv-eval-review/SKILL.md

Use when reviewing eval YAML files for quality issues, linting eval files before committing, checking eval schema compliance, or when asked to "review these evals", "check eval quality", "lint eval files", or "validate eval structure". Do NOT use for writing evals (use agentv-eval-writer) or running evals (use agentv-bench).

12 stars

development

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add entityprocess/agentv agentv-eval-review

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 12:24 PM15.0s2 files scanned

SKILL.md

name:: agentv-eval-review
description:: >-

Eval Review

Overview

Process

Step 1: Run the linter

Execute scripts/lint_eval.py against the target eval files:

python scripts/lint_eval.py <path-to-evals-dir-or-file> --json

The script checks:

.eval.yaml extension
description field present
Each test has id, input, and at least one of criteria/expected_output/assertions
File paths in type: file use leading /
assertions blocks present (flags tests relying solely on expected_output)
expected_output prose detection (flags "The agent should..." patterns)
Repeated file inputs across tests (recommends top-level input)
Naming prefix consistency across eval files in same directory

Step 2: Review script output

Report the script findings grouped by severity (error > warning > info). For each finding, include the file path and a concrete fix.

Step 3: Semantic review (LLM judgment)

The script catches structural issues but cannot assess:

Factual accuracy — Do tool/command names in expected_output match what the skill documents?
Coverage gaps — Are important edge cases missing?
Assertion discriminability — Would assertions pass for both good and bad output?
Cross-file consistency — Do output filenames match across evals and skills?

Read the relevant SKILL.md files and cross-check against the eval content for these issues.

Skill Resources

scripts/lint_eval.py — Deterministic eval linter (Python 3.11+, stdlib only)

Related Skills

entityprocess/agentv-trace-analyst

tools

VerifiedTrustedCommunity

Analyze AgentV evaluation traces and result JSONL files using `agentv inspect` and `agentv compare` CLI commands. Use when asked to inspect AgentV eval results, find regressions between AgentV evaluation runs, identify failure patterns in AgentV trace data, analyze tool trajectories, or compute cost/latency/score statistics from AgentV result files. Do NOT use for benchmarking skill trigger accuracy, analyzing skill-creator eval performance, or measuring skill description quality — those tasks belong to the skill-creator skill.

12SKILL.mdUpdated May 25, 2026

entityprocess/agentv-trace-analyst

entityprocess/agentv-governance

development

VerifiedTrustedCommunity

Author, edit, and lint `governance:` blocks in `*.eval.yaml` files. Use when creating or updating evaluation suites that carry AI-governance metadata (OWASP LLM Top 10, OWASP Agentic Top 10, MITRE ATLAS, EU AI Act, ISO 42001). Also use non-interactively (e.g., from a GitHub Action) to lint changed eval files and report violations against the rules in `references/lint-rules.md`. Do NOT use for running evals or benchmarking — that belongs to agentv-bench.

12SKILL.mdUpdated May 25, 2026

entityprocess/agentv-governance

entityprocess/agentv-eval-writer

development

VerifiedTrustedCommunity

Write, edit, review, and validate AgentV EVAL.yaml / .eval.yaml evaluation files. Use when asked to create new eval files, update or fix existing ones, add or remove test cases, configure graders (`llm-grader`, `code-grader`, `rubrics`), review whether an eval is correct or complete, convert between EVAL.yaml and evals.json using `agentv convert`, or generate eval test cases from chat transcripts (markdown conversation or JSON messages). Do NOT use for creating SKILL.md files, writing skill definitions, or running evals — running and benchmarking belongs to agentv-bench.

12SKILL.mdUpdated May 25, 2026

entityprocess/agentv-eval-writer