Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

maestria-co/agent-evaluation

Name: agent-evaluation
Author: maestria-co

skills/agent-evaluation/SKILL.md

npx skillsauth add maestria-co/ai-playbook agent-evaluation

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Agent System Evaluation

Overview

Crisp boundaries distinguish scalable architectures from deteriorating ones. Every problem belongs to exactly one resolver; every skill executes exactly one operation; routing intelligence resides solely in the orchestrator.

Application Scenarios

Architecting a fresh agent framework
Auditing deployed agent specifications for overlap or drift
Investigating anomalous agent conduct potentially rooted in structural flaws
Pre-publication validation of agent frameworks

Assess the presented agent architecture against these 5 principles:

PRINCIPLE 1 — PROMPT vs SKILL DEMARCATION

Decision-making, reasoning mechanisms, and conditional triggers → reside in agent prompts
Output schemas, API invocations, transformation templates → reside in skill specifications
Flag reasoning embedded in skills that should migrate to prompts
Flag templates embedded in prompts that should migrate to skills

PRINCIPLE 2 — INFORMATION SINGULARITY

Every constraint or rule must occupy exactly ONE location
Flag duplicated rules appearing across multiple definitions
Flag behaviors potentially governed by competing directives

PRINCIPLE 3 — SUB-AGENT ROLE PRECISION

Every sub-agent must possess ONE unambiguous responsibility articulable as an answerable question
Its product must flow to the orchestrator prior to skill invocation
Flag sub-agents with overlapping jurisdictions
Flag sub-agents bypassing orchestrator to communicate directly with skills
Flag sub-agents with vague or multiply-interpretable mandates

PRINCIPLE 4 — SKILL OPERATIONAL SCOPE

Skills must execute ONE action (format/invoke/return)
Skills must exclude reasoning, conditionals, or decision trees
Skills must specify clear, structured response contracts
Flag skills containing decision logic rather than mere execution
Flag skills whose behavior varies by calling context

PRINCIPLE 5 — ORCHESTRATOR AUTHORITY

Orchestrators must monopolize all routing determinations
Orchestrators must exclusively invoke terminal skills
Orchestrators must aggregate all sub-agent responses before skill invocation
Flag routing intelligence distributed outside orchestrators
Flag pathways permitting sub-agents to trigger terminal skills

Per principle, structure responses as:

PRINCIPLE [N] — [PASS / WARNING / FAIL] Finding: [observed pattern] Problem: [architectural risk introduced] Fix: [concrete remediation]

Post-evaluation, synthesize:

OVERALL HEALTH: [CLEAN / NEEDS WORK / RESTRUCTURE REQUIRED]

PRIORITY FIXES: (urgency-ordered)

[highest-priority remediation]
[subsequent remediation]
[etc.]

OPEN QUESTIONS: (required clarifications before implementing fixes)

[clarification needed]
[etc.]

Anti-Rationalization Audit

When adding or editing Anti-Rationalization rows in any agent:

Read every item in the agent's Process section.
Check for direct contradictions — a Process step that would require the behavior the Anti-Rationalization row forbids.
Remove or reword any conflicting Process step before committing.

A contradiction between Process and Anti-Rationalization is worse than a gap — the agent will exhibit inconsistent behavior on every invocation.

maestria-co/agent-evaluation

skills/agent-evaluation/SKILL.md

Use when evaluating or auditing an agent system design, reviewing agent definitions for role overlap or responsibility leakage, or when orchestrator routing clarity, skill responsibility, or sub-agent job clarity is in question.

testing

Updated Apr 26, 2026

$ install --global

skillsauth

npx skillsauth add maestria-co/ai-playbook agent-evaluation

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 26, 2026, 6:32 AM19.3s1 file scanned

SKILL.md

name:: agent-evaluation
description:: Use when evaluating or auditing an agent system design, reviewing agent definitions for role overlap or responsibility leakage, or when orchestrator routing clarity, skill responsibility, or sub-agent job clarity is in question.
user-invocable:: true

Agent System Evaluation

Overview

Application Scenarios

Architecting a fresh agent framework
Auditing deployed agent specifications for overlap or drift
Investigating anomalous agent conduct potentially rooted in structural flaws
Pre-publication validation of agent frameworks

Assess the presented agent architecture against these 5 principles:

PRINCIPLE 1 — PROMPT vs SKILL DEMARCATION

Decision-making, reasoning mechanisms, and conditional triggers → reside in agent prompts
Output schemas, API invocations, transformation templates → reside in skill specifications
Flag reasoning embedded in skills that should migrate to prompts
Flag templates embedded in prompts that should migrate to skills

PRINCIPLE 2 — INFORMATION SINGULARITY

Every constraint or rule must occupy exactly ONE location
Flag duplicated rules appearing across multiple definitions
Flag behaviors potentially governed by competing directives

PRINCIPLE 3 — SUB-AGENT ROLE PRECISION

Every sub-agent must possess ONE unambiguous responsibility articulable as an answerable question
Its product must flow to the orchestrator prior to skill invocation
Flag sub-agents with overlapping jurisdictions
Flag sub-agents bypassing orchestrator to communicate directly with skills
Flag sub-agents with vague or multiply-interpretable mandates

PRINCIPLE 4 — SKILL OPERATIONAL SCOPE

Skills must execute ONE action (format/invoke/return)
Skills must exclude reasoning, conditionals, or decision trees
Skills must specify clear, structured response contracts
Flag skills containing decision logic rather than mere execution
Flag skills whose behavior varies by calling context

PRINCIPLE 5 — ORCHESTRATOR AUTHORITY

Orchestrators must monopolize all routing determinations
Orchestrators must exclusively invoke terminal skills
Orchestrators must aggregate all sub-agent responses before skill invocation
Flag routing intelligence distributed outside orchestrators
Flag pathways permitting sub-agents to trigger terminal skills

Per principle, structure responses as:

PRINCIPLE [N] — [PASS / WARNING / FAIL] Finding: [observed pattern] Problem: [architectural risk introduced] Fix: [concrete remediation]

Post-evaluation, synthesize:

OVERALL HEALTH: [CLEAN / NEEDS WORK / RESTRUCTURE REQUIRED]

PRIORITY FIXES: (urgency-ordered)

[highest-priority remediation]
[subsequent remediation]
[etc.]

OPEN QUESTIONS: (required clarifications before implementing fixes)

[clarification needed]
[etc.]

Anti-Rationalization Audit

When adding or editing Anti-Rationalization rows in any agent:

Read every item in the agent's Process section.
Check for direct contradictions — a Process step that would require the behavior the Anti-Rationalization row forbids.
Remove or reword any conflicting Process step before committing.

A contradiction between Process and Anti-Rationalization is worse than a gap — the agent will exhibit inconsistent behavior on every invocation.

Related Skills

maestria-co/writing-tests

development

VerifiedTrustedCommunity

Writes and runs a test suite for a piece of code, covering happy path, edge cases, error cases, and security cases. Use when: implementation is complete and needs test coverage, a bug needs a reproduction test and fix validation, or code needs coverage before a refactor. Do not use when: the code under test is not yet implemented, or the spec is still unclear.

SKILL.mdUpdated Apr 26, 2026

maestria-co/writing-tests

maestria-co/writing-skills

testing

VerifiedTrustedCommunity

Use when creating a new skill, editing an existing skill, or helping a user author a skill for this system. Covers structure, discoverability, quality, and discipline hardening.

SKILL.mdUpdated Apr 26, 2026

maestria-co/writing-skills

maestria-co/verification-checklist

development

VerifiedTrustedCommunity

Evidence-based verification process to run before marking any task complete. Use this skill every time you're about to report that work is done — for features, bug fixes, refactoring, or any code change. This catches the most common failure mode: declaring "done" without proof. If you're finishing up and about to tell the user the task is complete, run this checklist first.

SKILL.mdUpdated Apr 26, 2026

maestria-co/verification-checklist

maestria-co/using-skills

development

VerifiedTrustedCommunity

Teaches agents how to discover, select, and invoke skills from the skill library. Use this skill whenever you're uncertain which skill applies to a task, when composing multiple skills for complex work, or when you need to understand what skills are available. This is your go-to when facing an ambiguous task and need to figure out the right approach before diving into implementation.

SKILL.mdUpdated Apr 26, 2026

maestria-co/using-skills

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/maestria-co/ai-playbook.git

# Copy into Claude Code skills folder (global)
cp -r ai-playbook/skills/agent-evaluation ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

maestria-co/ai-playbook

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT