evaluate-plugin/skills/evaluate-plugin-batch/SKILL.md
Batch evaluate every skill in a plugin and produce a plugin-level report. Use when auditing an entire plugin's quality or validating before a release.
npx skillsauth add laurigates/claude-plugins evaluate-plugin-batchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Batch evaluate all skills in a plugin. Runs /evaluate:skill for each skill, then produces a plugin-level quality report.
| Use this skill when... | Use alternative when... |
|------------------------|------------------------|
| Auditing all skills in a plugin before release | Evaluating a single skill -> /evaluate:skill |
| Establishing quality baselines across a plugin | Viewing past results -> /evaluate:report |
| Checking overall plugin quality after refactoring | Need structural compliance -> plugin-compliance-check.sh |
bash ${CLAUDE_PLUGIN_ROOT}/scripts/inspect_eval.sh --plugin-dir $1Parse these from $ARGUMENTS:
| Parameter | Default | Description |
|-----------|---------|-------------|
| <plugin-name> | required | Name of the plugin to evaluate |
| --create-missing-evals | false | Generate evals for skills that lack them |
| --parallel N | 1 | Max concurrent skill evaluations |
Find all skills in the plugin:
<plugin-name>/skills/*/SKILL.md
List them and count the total.
For each skill, check if evals.json exists:
--create-missing-evals: include, will create evals during evaluationReport the breakdown:
Found N skills in <plugin-name>:
- M with eval cases
- K without eval cases (skipped | will create)
For each included skill, invoke /evaluate:skill via the SlashCommand tool:
SlashCommand: /evaluate:skill <plugin-name>/<skill-name> [--create-evals]
If --parallel N is set and N > 1, batch evaluations into groups of N. Otherwise, run sequentially.
Track progress with TodoWrite — mark each skill as it completes.
After all skill evaluations complete, read each skill's benchmark.json and aggregate:
bash evaluate-plugin/scripts/aggregate_benchmark.sh <plugin-name>
Write aggregated results to <plugin-name>/eval-results/plugin-benchmark.json.
Print a plugin-level summary table:
## Plugin Evaluation: <plugin-name>
| Skill | Evals | Pass Rate | Status |
|-------|-------|-----------|--------|
| skill-a | 4 | 100% | PASS |
| skill-b | 3 | 67% | PARTIAL |
| skill-c | 5 | 80% | PASS |
**Overall**: 82% pass rate across N eval cases
Rank skills by pass rate. Flag any below 50% as needing attention.
| Context | Command |
|---------|---------|
| Inventory plugin skills + evals | bash evaluate-plugin/scripts/inspect_eval.sh --plugin-dir <plugin> |
| Inspect a single skill's evals | bash evaluate-plugin/scripts/inspect_eval.sh --plugin <plugin> --skill <skill> |
| Aggregate results | bash evaluate-plugin/scripts/aggregate_benchmark.sh <plugin> |
| Flag | Description |
|------|-------------|
| --create-missing-evals | Generate eval cases for skills without them |
| --parallel N | Max concurrent evaluations (default: 1) |
tools
Scaffold a new ComfyUI custom-node repo (pyproject, CI, release-please, vitest+pytest, JS extension skeleton) in the picker/gesture vein. Use when bootstrapping or init-ing a comfyui node pack.
tools
Orchestrate a ComfyUI node pack from idea to registry: scaffold, create + seed the repo, open the gitops adoption PR. Use when releasing or spinning up a new comfyui node pack.
testing
macOS EndpointSecurity/EDR high CPU & battery drain. Use when Kandji ESF / XProtect pegs a core; trace the exec storm via powermetrics + eslogger.
development
odiff pixel-by-pixel image diffing. Use when comparing screenshots, detecting visual regressions, diffing before/after PNGs, asserting golden images.