skills/run-eval/SKILL.md
Run EvalView regression checks against golden baselines to detect regressions in AI agent behavior after code, prompt, or model changes.
npx skillsauth add hidai25/eval-view run-evalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill after making changes to an AI agent (prompt edits, model swaps, tool changes, code refactors) to verify nothing broke.
EvalView compares current agent behavior against saved golden baselines. It runs your test cases, evaluates the outputs, and reports a diff status for each test:
Locate the test directory. Look for tests/evalview/ in the project. If it exists, use that. Otherwise check for a tests/ directory with .yaml test files.
Run a regression check using the run_check MCP tool:
run_check with the detected test_pathtest parameter with the test nameInterpret results:
If changes are intentional, offer to update the baseline by calling run_snapshot with an explanatory notes parameter.
Generate a visual report (optional) by calling generate_visual_report for a detailed HTML breakdown of traces, diffs, scores, and timelines.
evalview check tests/evalview/
evalview check tests/evalview/ --test "my-test"
evalview snapshot tests/evalview/ --notes "updated after prompt refactor"
run_check frequently — it calls the Python API directly with no subprocess overhead.testing
Start EvalView watch mode to automatically re-run regression checks whenever project files change.
testing
Generate EvalView test cases — either from a SKILL.md file using LLM-powered generation, or by capturing real agent interactions through a proxy.
development
A skill that helps review code for best practices, bugs, and security issues
tools
A simple skill that creates a greeting file