skills/watch/SKILL.md
Start EvalView watch mode to automatically re-run regression checks whenever project files change.
npx skillsauth add hidai25/eval-view watchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the user wants continuous regression monitoring during development. Watch mode observes file changes and automatically re-runs evalview check with debounced triggers.
EvalView's watch mode uses watchdog to monitor directories for file changes (.py, .yaml, .yml, .json, .md, .txt, .toml, .cfg, .ini). When a change is detected, it runs a regression check via the gate() API and displays a live scorecard with pass/fail status, score deltas, tool changes, and streak tracking.
Watch mode is a CLI command (not an MCP tool). Help the user run it:
evalview watch
--quick — Skip LLM judge, deterministic checks only ($0 cost, sub-second)--path src/ --path tests/ — Watch specific directories (default: current directory)--test "my-test" — Only check a specific test by name--test-dir tests/evalview — Path to test cases directory (default: tests)--interval 1 — Debounce interval in seconds (default: 2.0)--fail-on REGRESSION,TOOLS_CHANGED — Comma-separated statuses that count as failure (default: REGRESSION)--sound — Terminal bell on regression# Basic: watch everything, full checks
evalview watch
# Fast development loop: no LLM judge, 1-second debounce
evalview watch --quick --interval 1
# Watch specific directories and one test
evalview watch --path src/ --path tests/ --test "calculator-division"
# Strict mode: fail on any behavioral change
evalview watch --fail-on REGRESSION,TOOLS_CHANGED,OUTPUT_CHANGED --sound
Watch mode requires the watchdog package. If not installed:
pip install evalview[watch]
.evalview/, .git/, venv/, node_modules/, __pycache__/, and other common non-source directories automatically.--quick mode is ideal for tight development loops since it costs nothing and runs in sub-second time.development
Run EvalView regression checks against golden baselines to detect regressions in AI agent behavior after code, prompt, or model changes.
testing
Generate EvalView test cases — either from a SKILL.md file using LLM-powered generation, or by capturing real agent interactions through a proxy.
development
A skill that helps review code for best practices, bugs, and security issues
tools
A simple skill that creates a greeting file