Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

laurigates/evaluate-report

Name: evaluate-report
Author: laurigates

evaluate-plugin/skills/evaluate-report/SKILL.md

npx skillsauth add laurigates/claude-plugins evaluate-report

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

/evaluate:report

View evaluation results and benchmark reports for a skill or plugin.

When to Use This Skill

| Use this skill when... | Use alternative when... | |------------------------|------------------------| | Want to see results from a past evaluation | Need to run new evaluations -> /evaluate:skill | | Comparing benchmark runs over time | Want improvement suggestions -> /evaluate:improve | | Reviewing quality trends for a plugin | Need structural compliance -> plugin-compliance-check.sh |

Parameters

Parse these from $ARGUMENTS:

| Parameter | Default | Description | |-----------|---------|-------------| | <target> | required | plugin/skill-name for skill or plugin-name for plugin | | --latest | true | Show most recent benchmark only | | --history | false | Show all benchmark iterations | | --compare | false | Compare current vs previous benchmark |

Execution

Step 1: Resolve target

Determine if $ARGUMENTS refers to a skill or plugin:

Contains /: treat as plugin-name/skill-name, look for <plugin>/skills/<skill>/eval-results/benchmark.json
No /: treat as plugin-name, look for <plugin>/eval-results/plugin-benchmark.json

If the target file does not exist, report that no evaluation results are available and suggest running /evaluate:skill or /evaluate:plugin.

Step 2: Load results

Skill-level report (--latest or default): Read benchmark.json and display:

## Evaluation Report: <plugin/skill-name>

**Last run**: <timestamp>
**Runs per eval**: N

| Metric | With Skill | Baseline | Delta |
|--------|-----------|----------|-------|
| Pass Rate | 85% | 42% | +43% |
| Duration | 14s | 12s | +2s |

### Per-Eval Results

| Eval ID | Description | Pass Rate | Failures |
|---------|-------------|-----------|----------|
| eval-001 | Basic usage | 100% | — |
| eval-002 | Edge case | 67% | assertion X |

Skill-level history (--history): Read history.json and display improvement iterations:

## Evaluation History: <plugin/skill-name>

| Version | Date | Pass Rate | Changes |
|---------|------|-----------|---------|
| v3 (current) | 2026-03-04 | 85% | Improved step 3 instructions |
| v2 | 2026-03-01 | 72% | Added error handling |
| v1 | 2026-02-28 | 55% | Initial evaluation |

Skill-level comparison (--compare): Read current and previous benchmarks, show delta:

## Comparison: <plugin/skill-name>

| Metric | Previous | Current | Delta |
|--------|----------|---------|-------|
| Pass Rate | 72% | 85% | +13% |
| Duration | 16s | 14s | -2s |

Improved evals: eval-002, eval-004
Regressed evals: none

Plugin-level report: Read plugin-benchmark.json and display:

## Plugin Report: <plugin-name>

| Skill | Evals | Pass Rate | Status |
|-------|-------|-----------|--------|
| skill-a | 4 | 100% | PASS |
| skill-b | 3 | 67% | NEEDS WORK |

**Overall**: 82% pass rate
**Skills evaluated**: N / M total

Agentic Optimizations

| Context | Command | |---------|---------| | Find skill benchmark | cat <plugin>/skills/<skill>/eval-results/benchmark.json \| jq .summary | | Find plugin benchmark | cat <plugin>/eval-results/plugin-benchmark.json \| jq .summary | | Check for history | cat <plugin>/skills/<skill>/eval-results/history.json \| jq '.iterations \| length' | | List available results | find <plugin> -name benchmark.json -o -name plugin-benchmark.json |

Quick Reference

| Flag | Description | |------|-------------| | --latest | Most recent benchmark (default) | | --history | Show all iterations | | --compare | Compare current vs previous |

laurigates/evaluate-report

evaluate-plugin/skills/evaluate-report/SKILL.md

View evaluation results and benchmark reports for a skill or plugin. Use when reviewing past eval results, comparing benchmark runs, or tracking quality trends.

46 stars

tools

Updated Jul 19, 2026

$ install --global

skillsauth

npx skillsauth add laurigates/claude-plugins evaluate-report

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 19, 2026, 7:03 AM177.9s1 file scanned

SKILL.md

name:: evaluate-report
description:: View evaluation results and benchmark reports for a skill or plugin. Use when reviewing past eval results, comparing benchmark runs, or tracking quality trends.
args:: <plugin/skill-name | plugin-name> [--latest] [--history] [--compare]
allowed-tools:: Read, Glob, Grep, Bash(cat *), Bash(jq *), Bash(find *), Bash(ls *), TodoWrite
argument-hint:: git-plugin/git-commit [--latest] | git-plugin [--history]
created:: 2026-03-04
modified:: 2026-07-18
compatibility:: claude-code
reviewed:: 2026-03-04

/evaluate:report

View evaluation results and benchmark reports for a skill or plugin.

When to Use This Skill

Parameters

Parse these from $ARGUMENTS:

Execution

Step 1: Resolve target

Determine if $ARGUMENTS refers to a skill or plugin:

Contains /: treat as plugin-name/skill-name, look for <plugin>/skills/<skill>/eval-results/benchmark.json
No /: treat as plugin-name, look for <plugin>/eval-results/plugin-benchmark.json

If the target file does not exist, report that no evaluation results are available and suggest running /evaluate:skill or /evaluate:plugin.

Step 2: Load results

Skill-level report (--latest or default): Read benchmark.json and display:

## Evaluation Report: <plugin/skill-name>

**Last run**: <timestamp>
**Runs per eval**: N

| Metric | With Skill | Baseline | Delta |
|--------|-----------|----------|-------|
| Pass Rate | 85% | 42% | +43% |
| Duration | 14s | 12s | +2s |

### Per-Eval Results

| Eval ID | Description | Pass Rate | Failures |
|---------|-------------|-----------|----------|
| eval-001 | Basic usage | 100% | — |
| eval-002 | Edge case | 67% | assertion X |

Skill-level history (--history): Read history.json and display improvement iterations:

## Evaluation History: <plugin/skill-name>

| Version | Date | Pass Rate | Changes |
|---------|------|-----------|---------|
| v3 (current) | 2026-03-04 | 85% | Improved step 3 instructions |
| v2 | 2026-03-01 | 72% | Added error handling |
| v1 | 2026-02-28 | 55% | Initial evaluation |

Skill-level comparison (--compare): Read current and previous benchmarks, show delta:

## Comparison: <plugin/skill-name>

| Metric | Previous | Current | Delta |
|--------|----------|---------|-------|
| Pass Rate | 72% | 85% | +13% |
| Duration | 16s | 14s | -2s |

Improved evals: eval-002, eval-004
Regressed evals: none

Plugin-level report: Read plugin-benchmark.json and display:

## Plugin Report: <plugin-name>

| Skill | Evals | Pass Rate | Status |
|-------|-------|-----------|--------|
| skill-a | 4 | 100% | PASS |
| skill-b | 3 | 67% | NEEDS WORK |

**Overall**: 82% pass rate
**Skills evaluated**: N / M total

Agentic Optimizations

Quick Reference

| Flag | Description | |------|-------------| | --latest | Most recent benchmark (default) | | --history | Show all iterations | | --compare | Compare current vs previous |

Related Skills

laurigates/quoted-description

development

VerifiedTrustedCommunity

Debug HTTP APIs: trace requests, inspect headers. Use when a request fails: check status first.

46SKILL.mdUpdated Jul 20, 2026

laurigates/quoted-description

laurigates/folded-description

documentation

VerifiedTrustedCommunity

Render architecture diagrams from text sources. Use when documenting system topology.

46SKILL.mdUpdated Jul 20, 2026

laurigates/folded-description

laurigates/adapters/tests/fixtures/mini-marketplace-src/beta-plugin/skilldirs/no-name

tools

VerifiedTrustedCommunity

Inspect JSON payloads and extract nested fields. Use when parsing API responses.

46SKILL.mdUpdated Jul 20, 2026

laurigates/adapters/tests/fixtures/mini-marketplace-src/beta-plugin/skilldirs/no-name

laurigates/no-description

tools

VerifiedTrustedCommunity

--- name: no-description allowed-tools: Read --- # No Description This skill has no description and must be dropped with a warning.

46SKILL.mdUpdated Jul 20, 2026

laurigates/no-description

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/laurigates/claude-plugins.git

# Copy into Claude Code skills folder (global)
cp -r claude-plugins/evaluate-plugin/skills/evaluate-report ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

laurigates/claude-plugins

46 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT