skills/evaluate/SKILL.md
Define evaluation criteria, tests, or quality checks for a deliverable, implementation, workflow, or recurring agent task. Use when defining done, improving a test harness, or checking whether output meets its specification.
npx skillsauth add sofer/.agents evaluateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Make "done" testable before or after work is performed.
blocking: failure means not doneimportant: should be fixed, but may not blocknice to have: useful but optional## Evaluation checklist
### Automated checks
- [ ] [Check] (blocking|important|nice to have)
- How to verify: [command, test, script, query, or assertion]
### Manual checks
- [ ] [Check] (blocking|important|nice to have)
- How to verify: [steps and expected result]
### Human review criteria
- [ ] [Criterion] (blocking|important|nice to have)
- Pass: [Observable qualities]
- Fail: [Observable failure]
### Edge cases
- [ ] [Scenario]
- Expected behaviour: [Expected result]
For ~/code/fac-cra/, prioritise meaningful verification over speed:
If meaningful verification is not feasible and the change is risky, stop and ask rather than calling the work done.
tools
Check whether Claude and Codex have equivalent access to shared agent resources, skills, hooks, plugins, MCP servers, permissions, startup behaviour, and provider-specific adapter config. Use when comparing agent environments, debugging missing capabilities after restart, or deciding whether to symlink a resource or configure a runtime.
testing
Record substantive skill use in an append-only local log. Use after choosing or invoking a non-system skill for real work, when a skill is inspected but not used, or when a skill fails to apply. Do not use for routine system skills or incidental file reads.
testing
Turn a vague or underspecified request into a self-contained problem statement. Use when the user has a rough idea, when a request would fail if handed directly to an agent, or before non-trivial work that needs shared understanding.
data-ai
Append a one-line learning to ~/.agents/learning-log.md. Use when the user types /learning, or when something genuinely worth remembering surfaced during work and the user confirms it should be captured.