hidai25/generate-tests

Start EvalView watch mode to automatically re-run regression checks whenever project files change.

hidai25/run-eval

development

Run EvalView regression checks against golden baselines to detect regressions in AI agent behavior after code, prompt, or model changes.

hidai25/code-reviewer

development

A skill that helps review code for best practices, bugs, and security issues

hidai25/code-reviewer

hidai25/hello-world

tools

A simple skill that creates a greeting file

Need help? View full Cowork setup guide →

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/hidai25/eval-view.git

# Copy into Claude Code skills folder (global)
cp -r eval-view/skills/generate-tests ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

hidai25/eval-view

82 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

hidai25/generate-tests

skills/generate-tests/SKILL.md

Generate EvalView test cases — either from a SKILL.md file using LLM-powered generation, or by capturing real agent interactions through a proxy.

82 stars

testing

Updated Apr 14, 2026

$ install --global

skillsauth

npx skillsauth add hidai25/eval-view generate-tests

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 14, 2026, 3:00 AM7.2s1 file scanned

SKILL.md

name:: generate-tests
description:: Generate EvalView test cases — either from a SKILL.md file using LLM-powered generation, or by capturing real agent interactions through a proxy.

Generate Tests

Use this skill when the user wants to create test cases for their AI agent or skill without writing YAML by hand.

Four approaches

1. Generate tests from a SKILL.md file

Steps:

Ask the user which SKILL.md to generate tests for (or detect it from context).
Call generate_skill_tests with:
- skill_path: path to the SKILL.md file
- output_path (optional): where to save the generated YAML
- count (optional): number of test cases (default: 10)
After generation, offer to run the tests with run_skill_test.

CLI equivalent:

evalview skill generate-tests .claude/skills/my-skill/SKILL.md --auto
evalview skill generate-tests .claude/skills/my-skill/SKILL.md -c 20 -o tests/my-skill-tests.yaml

2. Create individual test cases manually

Use the create_test MCP tool to create a single test YAML file from a description.

Steps:

Gather from the user: test name, query, expected tools, forbidden tools, expected output keywords, and minimum score.
Call create_test with the parameters.
After creating the test, call run_snapshot to establish the golden baseline.

3. Capture real interactions

Use the CLI evalview capture command to proxy real agent traffic and save interactions as test YAMLs automatically. This records the query, output, and tool calls from live usage.

CLI equivalent:

evalview capture --agent http://localhost:8080/execute --output-dir tests/test-cases
evalview capture --multi-turn  # saves all turns as one multi-turn conversation test

4. Validate a skill before testing

Use validate_skill to check a SKILL.md for correct structure and completeness before generating tests from it.

Running generated tests

After generating tests, execute them with run_skill_test:

test_file: path to the generated YAML
no_rubric: true for fast deterministic-only checks (no LLM cost)
verbose: true for detailed output on all tests

CLI equivalent:

evalview skill test tests/my-skill-tests.yaml
evalview skill test tests/my-skill-tests.yaml --no-rubric  # fast, $0
evalview skill test tests/my-skill-tests.yaml --verbose --model claude-sonnet-4-20250514

Related Skills

hidai25/watch

testing

Start EvalView watch mode to automatically re-run regression checks whenever project files change.

hidai25/run-eval

development

Run EvalView regression checks against golden baselines to detect regressions in AI agent behavior after code, prompt, or model changes.

hidai25/code-reviewer

development

A skill that helps review code for best practices, bugs, and security issues

hidai25/code-reviewer

hidai25/hello-world

tools

A simple skill that creates a greeting file