Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

adriancooney/evals

Name: evals
Author: adriancooney

skills/evals/SKILL.md

npx skillsauth add adriancooney/evals evals

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Eval Skill

Run and create evals for testing agent behavior.

Discovering Evals

Evals are markdown files matching *.eval.md. Use glob to find them:

**/*.eval.md

A common pattern is to collect evals in an evals/ directory.

Eval Structure

An eval file contains a prompt and an expectation:

# Eval Title

<prompt>
Instructions for the agent to execute.
</prompt>

<expectation>
Success criteria - describe what must be true for the eval to pass.
</expectation>

Running an Eval

Read the eval file
Extract the <prompt> content
Spawn a subagent with the prompt (runs in current working directory with shared state)
The subagent evaluates its own result against the <expectation> using LLM judgment
Subagent outputs SUCCESS or FAIL with reasoning

When running multiple evals, spawn all subagents in parallel. Report aggregate results at the end.

Always end output with exactly one of these lines for CI parsing:

eval result: pass — all evals passed
eval result: fail — one or more evals failed

Subagent Instructions

IMPORTANT: The subagent must only test and observe. It must NOT attempt to fix, modify, or change anything to make the expectation pass. The subagent executes the prompt, observes the outcome, and reports whether the expectation was met. If the expectation fails, report FAIL — do not try to make it pass.

Commands

Run a single eval:

/eval run <path-to-eval.eval.md>

Run all evals:

/eval run-all

Creating an Eval

Gather from the user:

Context - The process or flow to evaluate
Expectation - Success criteria in natural language

/eval create <name>

Write the eval to <name>.eval.md in the current directory.

Isolation

When creating an eval, try to make it self-contained and reproducible. This isn't critical, but helps:

Try to avoid hardcoded paths — prefer relative paths or have the prompt create its own working directory rather than encoding specific temp directories or absolute paths.
Try to avoid external state — if the process relied on existing files or services, consider whether the eval should create that state itself.
Parameterize where possible — replace specific values (ports, filenames, IDs) with generic ones the eval can generate.

If you see an opportunity to improve isolation but need clarification, ask the user.

adriancooney/evals

skills/evals/SKILL.md

Run and create evals for testing agent behavior. Use when the user wants to create or run an eval.

testing

Updated Mar 30, 2026

$ install --global

skillsauth

npx skillsauth add adriancooney/evals evals

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 1, 2026, 7:51 AM64.2s2 files scanned

SKILL.md

name:: evals
description:: Run and create evals for testing agent behavior. Use when the user wants to create or run an eval.

Eval Skill

Run and create evals for testing agent behavior.

Discovering Evals

Evals are markdown files matching *.eval.md. Use glob to find them:

**/*.eval.md

A common pattern is to collect evals in an evals/ directory.

Eval Structure

An eval file contains a prompt and an expectation:

# Eval Title

<prompt>
Instructions for the agent to execute.
</prompt>

<expectation>
Success criteria - describe what must be true for the eval to pass.
</expectation>

Running an Eval

Read the eval file
Extract the <prompt> content
Spawn a subagent with the prompt (runs in current working directory with shared state)
The subagent evaluates its own result against the <expectation> using LLM judgment
Subagent outputs SUCCESS or FAIL with reasoning

When running multiple evals, spawn all subagents in parallel. Report aggregate results at the end.

Always end output with exactly one of these lines for CI parsing:

eval result: pass — all evals passed
eval result: fail — one or more evals failed

Subagent Instructions

Commands

Run a single eval:

/eval run <path-to-eval.eval.md>

Run all evals:

/eval run-all

Creating an Eval

Gather from the user:

Context - The process or flow to evaluate
Expectation - Success criteria in natural language

/eval create <name>

Write the eval to <name>.eval.md in the current directory.

Isolation

When creating an eval, try to make it self-contained and reproducible. This isn't critical, but helps:

Try to avoid hardcoded paths — prefer relative paths or have the prompt create its own working directory rather than encoding specific temp directories or absolute paths.
Try to avoid external state — if the process relied on existing files or services, consider whether the eval should create that state itself.
Parameterize where possible — replace specific values (ports, filenames, IDs) with generic ones the eval can generate.

If you see an opportunity to improve isolation but need clarification, ask the user.

Related Skills

steipete/skill-creator

testing

VerifiedTrustedCommunity

Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".

356,423SKILL.mdUpdated Apr 13, 2026

steipete/skill-creator

steipete/healthcheck

testing

VerifiedTrustedCommunity

Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).

356,423SKILL.mdUpdated Apr 13, 2026

openclaw/skill-creator

testing

VerifiedTrustedCommunity

353,662SKILL.mdUpdated Apr 10, 2026

openclaw/skill-creator

openclaw/healthcheck

testing

VerifiedTrustedCommunity

353,662SKILL.mdUpdated Apr 10, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/adriancooney/evals.git

# Copy into Claude Code skills folder (global)
cp -r evals/skills/evals ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

adriancooney/evals

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT