skills/agent-evaluation/SKILL.md
Use when evaluating or auditing an agent system design, reviewing agent definitions for role overlap or responsibility leakage, or when orchestrator routing clarity, skill responsibility, or sub-agent job clarity is in question.
npx skillsauth add maestria-co/ai-playbook agent-evaluationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Crisp boundaries distinguish scalable architectures from deteriorating ones. Every problem belongs to exactly one resolver; every skill executes exactly one operation; routing intelligence resides solely in the orchestrator.
Assess the presented agent architecture against these 5 principles:
PRINCIPLE 1 — PROMPT vs SKILL DEMARCATION
PRINCIPLE 2 — INFORMATION SINGULARITY
PRINCIPLE 3 — SUB-AGENT ROLE PRECISION
PRINCIPLE 4 — SKILL OPERATIONAL SCOPE
PRINCIPLE 5 — ORCHESTRATOR AUTHORITY
Per principle, structure responses as:
PRINCIPLE [N] — [PASS / WARNING / FAIL] Finding: [observed pattern] Problem: [architectural risk introduced] Fix: [concrete remediation]
Post-evaluation, synthesize:
OVERALL HEALTH: [CLEAN / NEEDS WORK / RESTRUCTURE REQUIRED]
PRIORITY FIXES: (urgency-ordered)
OPEN QUESTIONS: (required clarifications before implementing fixes)
When adding or editing Anti-Rationalization rows in any agent:
A contradiction between Process and Anti-Rationalization is worse than a gap — the agent will exhibit inconsistent behavior on every invocation.
development
Writes and runs a test suite for a piece of code, covering happy path, edge cases, error cases, and security cases. Use when: implementation is complete and needs test coverage, a bug needs a reproduction test and fix validation, or code needs coverage before a refactor. Do not use when: the code under test is not yet implemented, or the spec is still unclear.
testing
Use when creating a new skill, editing an existing skill, or helping a user author a skill for this system. Covers structure, discoverability, quality, and discipline hardening.
development
Evidence-based verification process to run before marking any task complete. Use this skill every time you're about to report that work is done — for features, bug fixes, refactoring, or any code change. This catches the most common failure mode: declaring "done" without proof. If you're finishing up and about to tell the user the task is complete, run this checklist first.
development
Teaches agents how to discover, select, and invoke skills from the skill library. Use this skill whenever you're uncertain which skill applies to a task, when composing multiple skills for complex work, or when you need to understand what skills are available. This is your go-to when facing an ambiguous task and need to figure out the right approach before diving into implementation.