Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

sofer/evaluate

Name: evaluate
Author: sofer

skills/evaluate/SKILL.md

npx skillsauth add sofer/.agents evaluate

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Evaluate

Purpose

Make "done" testable before or after work is performed.

Process

Identify the deliverable or behaviour being evaluated.
Extract claims about what must be true.
Convert each claim into a check:
- automated check
- manual check
- human judgement rubric
Mark each check as:
- blocking: failure means not done
- important: should be fixed, but may not block
- nice to have: useful but optional
Add edge cases most likely to reveal failure.
For recurring work, define a small reusable eval harness and baseline examples.

Output

## Evaluation checklist

### Automated checks
- [ ] [Check] (blocking|important|nice to have)
  - How to verify: [command, test, script, query, or assertion]

### Manual checks
- [ ] [Check] (blocking|important|nice to have)
  - How to verify: [steps and expected result]

### Human review criteria
- [ ] [Criterion] (blocking|important|nice to have)
  - Pass: [Observable qualities]
  - Fail: [Observable failure]

### Edge cases
- [ ] [Scenario]
  - Expected behaviour: [Expected result]

Platform mode

For ~/code/fac-cra/, prioritise meaningful verification over speed:

relevant unit tests
typecheck and lint
relevant app build
Playwright or end-to-end smoke tests
browser checks for UI
fixtures, mocks, or harness improvements when current tests are weak

If meaningful verification is not feasible and the change is risky, stop and ask rather than calling the work done.

sofer/evaluate

skills/evaluate/SKILL.md

Define evaluation criteria, tests, or quality checks for a deliverable, implementation, workflow, or recurring agent task. Use when defining done, improving a test harness, or checking whether output meets its specification.

4 stars

testing

Updated May 1, 2026

$ install --global

skillsauth

npx skillsauth add sofer/.agents evaluate

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 1, 2026, 6:42 AM288.5s1 file scanned

SKILL.md

name:: evaluate
description:: Define evaluation criteria, tests, or quality checks for a deliverable, implementation, workflow, or recurring agent task. Use when defining done, improving a test harness, or checking whether output meets its specification.

Evaluate

Purpose

Make "done" testable before or after work is performed.

Process

Identify the deliverable or behaviour being evaluated.
Extract claims about what must be true.
Convert each claim into a check:
- automated check
- manual check
- human judgement rubric
Mark each check as:
- blocking: failure means not done
- important: should be fixed, but may not block
- nice to have: useful but optional
Add edge cases most likely to reveal failure.
For recurring work, define a small reusable eval harness and baseline examples.

Output

## Evaluation checklist

### Automated checks
- [ ] [Check] (blocking|important|nice to have)
  - How to verify: [command, test, script, query, or assertion]

### Manual checks
- [ ] [Check] (blocking|important|nice to have)
  - How to verify: [steps and expected result]

### Human review criteria
- [ ] [Criterion] (blocking|important|nice to have)
  - Pass: [Observable qualities]
  - Fail: [Observable failure]

### Edge cases
- [ ] [Scenario]
  - Expected behaviour: [Expected result]

Platform mode

For ~/code/fac-cra/, prioritise meaningful verification over speed:

relevant unit tests
typecheck and lint
relevant app build
Playwright or end-to-end smoke tests
browser checks for UI
fixtures, mocks, or harness improvements when current tests are weak

If meaningful verification is not feasible and the change is risky, stop and ask rather than calling the work done.

Related Skills

sofer/agent-config-audit

tools

VerifiedTrustedCommunity

Check whether Claude and Codex have equivalent access to shared agent resources, skills, hooks, plugins, MCP servers, permissions, startup behaviour, and provider-specific adapter config. Use when comparing agent environments, debugging missing capabilities after restart, or deciding whether to symlink a resource or configure a runtime.

4SKILL.mdUpdated May 10, 2026

sofer/agent-config-audit

sofer/skill-usage

testing

VerifiedTrustedCommunity

Record substantive skill use in an append-only local log. Use after choosing or invoking a non-system skill for real work, when a skill is inspected but not used, or when a skill fails to apply. Do not use for routine system skills or incidental file reads.

4SKILL.mdUpdated May 1, 2026

sofer/problem-statement

testing

VerifiedTrustedCommunity

Turn a vague or underspecified request into a self-contained problem statement. Use when the user has a rough idea, when a request would fail if handed directly to an agent, or before non-trivial work that needs shared understanding.

4SKILL.mdUpdated May 1, 2026

sofer/problem-statement

sofer/learning

data-ai

VerifiedTrustedCommunity

Append a one-line learning to ~/.agents/learning-log.md. Use when the user types /learning, or when something genuinely worth remembering surfaced during work and the user confirms it should be captured.

4SKILL.mdUpdated May 1, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/sofer/.agents.git

# Copy into Claude Code skills folder (global)
cp -r .agents/skills/evaluate ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

sofer/.agents

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT