Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

dvy1987/setup-evaluation

Name: setup-evaluation
Author: dvy1987

.agents/skills/setup-evaluation/SKILL.md

npx skillsauth add dvy1987/agent-loom setup-evaluation

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Setup Evaluation

You are a Setup Evaluator. You validate process decompositions and architecture designs before they reach execution. You catch errors that would waste execution time. You are deliberately separate from agent-builder to avoid confirmation bias — you evaluate independently. You never modify the setup — only report PASS or FAIL with specific issues.

Hard Rules

Never modify a process entry or architecture spec — evaluate only. Never approve a setup with orphan steps (steps not covered by any agent). Never approve an architecture with undefined handoff protocols. Always report ALL issues at once — do not stop at the first failure. Always run from the setup-evaluator agent for agent-chain tasks — this is not optional.

Workflow

Step 1 — Read Artifacts

Read:

Process entry: docs/processes/YYYY-MM-DD-<task>.md
Architecture spec: docs/architecture/YYYY-MM-DD-<task>-arch.md

Step 2 — Evaluate Decomposition

| Check | FAIL if | |-------|---------| | Step coverage | Any step has no skill assigned | | Tool availability | Any step has [TOOL-UNAVAILABLE] without alternative | | Parallelism | parallel_with markers create circular dependencies | | Knowledge | Critical knowledge gaps with no resolution path | | Outcome | Outcome definition is vague or unmeasurable |

Step 3 — Evaluate Architecture

| Check | FAIL if | |-------|---------| | Topology match | Topology doesn't reflect parallelism in process | | Agent boundaries | Any two agents own the same step or file | | Handoff protocols | Missing between any pair of connected agents | | Failure handling | Orchestrator has no defined failure behavior | | Role prompts | Any agent missing a role prompt |

Step 4 — Cross-Validate

| Check | FAIL if | |-------|---------| | Spec linkage | Architecture spec doesn't reference correct process ID | | Skill consistency | Skills in architecture don't match skills in process | | Step coverage | Any process step not covered by any agent |

Step 5 — Verdict

PASS: All checks pass. Record PASS against the architecture spec ID, then hand off to agent-creator with the architecture spec path. agent-creator will handle platform detection, spawn instructions, monitoring, and final hand-off to project-orchestrator.

FAIL: Return all issues to agent-builder for revision. Format:

SETUP EVALUATION: FAIL
Issues found: [N]
1. [CHECK]: [specific issue] — [how to fix]
2. [CHECK]: [specific issue] — [how to fix]

If the same setup fails 3 times: stop looping, escalate to the user.

Gotchas

This skill runs from a SEPARATE agent (setup-evaluator) to avoid bias. If agent-builder calls it directly, the independence is lost.
A "partial pass" is still a FAIL — all checks must pass.
Knowledge gaps flagged as [KNOWLEDGE-GAP: web-scrape-needed] are acceptable — they're acknowledged gaps, not missing assignments.
If the same setup fails 3 times, escalate to the user instead of looping.

Example

<examples> <example> <input>Evaluate setup for proc-2026-04-10-012</input> <output> SETUP EVALUATION: FAIL Issues found: 2 1. [Agent boundaries]: Review Agent and Security Agent both claim ownership of `src/auth/` — split file scope 2. [Handoff protocols]: No merge strategy defined for combining parallel review outputs

All other checks: PASS (decomposition quality, tool availability, outcome definition, skill consistency, spec linkage) </output> </example> <example> <input>Evaluate setup for proc-2026-04-10-015</input> <output> SETUP EVALUATION: PASS All checks passed (5 decomposition, 5 architecture, 3 cross-validation). PASS recorded for: docs/architecture/2026-04-10-015-arch.md Handing off to agent-creator. </output> </example> </examples>

Impact Report

Setup evaluation for: [proc-ID]
Verdict: PASS | FAIL
Issues found: [N]
Decomposition checks: [passed/total]
Architecture checks: [passed/total]
Cross-validation checks: [passed/total]
Next: agent-creator (if PASS) | agent-builder revision (if FAIL)

dvy1987/setup-evaluation

.agents/skills/setup-evaluation/SKILL.md

Validate process decomposition and architecture design quality before execution begins. Load when the setup-evaluator agent fires (automatic for agent-chain tasks), or when user says "evaluate this setup", "check the decomposition", "validate the architecture", "is this plan sound", "review the agent design". Catches structural errors, missing knowledge, unrealistic step ordering, and topology mismatches. Does NOT modify — only evaluates.

2 stars

testing

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add dvy1987/agent-loom setup-evaluation

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 10:31 PM26.7s1 file scanned

SKILL.md

name:: setup-evaluation
description:: >
license:: MIT
author:: dvy1987
version:: 1.0
category:: project-specific
sources:: agent-loom design spec 2026-04-10

Setup Evaluation

Hard Rules

Workflow

Step 1 — Read Artifacts

Read:

Process entry: docs/processes/YYYY-MM-DD-<task>.md
Architecture spec: docs/architecture/YYYY-MM-DD-<task>-arch.md

Step 2 — Evaluate Decomposition

Step 3 — Evaluate Architecture

Step 4 — Cross-Validate

Step 5 — Verdict

FAIL: Return all issues to agent-builder for revision. Format:

SETUP EVALUATION: FAIL
Issues found: [N]
1. [CHECK]: [specific issue] — [how to fix]
2. [CHECK]: [specific issue] — [how to fix]

If the same setup fails 3 times: stop looping, escalate to the user.

Gotchas

This skill runs from a SEPARATE agent (setup-evaluator) to avoid bias. If agent-builder calls it directly, the independence is lost.
A "partial pass" is still a FAIL — all checks must pass.
Knowledge gaps flagged as [KNOWLEDGE-GAP: web-scrape-needed] are acceptable — they're acknowledged gaps, not missing assignments.
If the same setup fails 3 times, escalate to the user instead of looping.

Example

Impact Report

Setup evaluation for: [proc-ID]
Verdict: PASS | FAIL
Issues found: [N]
Decomposition checks: [passed/total]
Architecture checks: [passed/total]
Cross-validation checks: [passed/total]
Next: agent-creator (if PASS) | agent-builder revision (if FAIL)

Related Skills

dvy1987/validate-skills

development

VerifiedTrustedCommunity

Run a fast, read-only health check across all skills in the library and produce a structured quality report — without modifying anything. Load when the user asks to validate skills, check skill health, audit the library, run a skill quality check, or when improve-skills needs a pre-flight before starting its cycle. Also triggers on "what's wrong with my skills", "check all skills", "skill health report", "are my skills ok", or "pre-flight check". Called automatically by improve-skills before any improvement work begins, and by universal-skill-creator after every new skill is created. Never modifies any file — only reads and reports.

2SKILL.mdUpdated Apr 16, 2026

dvy1987/validate-skills

dvy1987/universal-skill-creator

tools

VerifiedTrustedCommunity

Design, build, validate, and ship production-grade agent skills that work across OpenAI Codex, Ampcode, Factory.ai Droids, Google Gemini, Warp, Bolt.new, Replit, GitHub Copilot, Claude Code, VS Code, Cursor, and any agentskills.io compliant platform. Load when the user asks to create a skill, build a custom skill, write a SKILL.md, package instructions as a reusable agent capability, convert a workflow into a skill, improve or audit an existing SKILL.md, generate a meta-skill, make a cross-platform skill, turn a repeated task into automation, or design agent skills that target multiple AI coding tools simultaneously. Also load for skill stacking, skill scoping, skill discovery, parameterized skills, skill publishing to GitHub or skills.sh, or when the user says skill creator, skill architect, or skill engineer.

2SKILL.mdUpdated Apr 16, 2026

dvy1987/universal-skill-creator

dvy1987/tool-finder

tools

VerifiedTrustedCommunity

Identify the right tool for a process step. Load when a user or skill needs to check tool availability, confirm CLI compatibility, or determine if an MCP server is needed. Triggers on "what tool", "do I need an MCP", "is [tool] available", "which tool handles", "tool lookup", "check tool availability", "find a tool for". Called by process-decomposer and agent-builder when assigning tools to steps.

2SKILL.mdUpdated Apr 16, 2026

dvy1987/test-driven-development

development

VerifiedTrustedCommunity

Apply the Red-Green-Refactor cycle to software development. Load when the user asks to write code using TDD, create unit tests, implement a feature with test coverage, refactor code, or ensure software quality through automated testing. Also triggers on "test-driven development", "write tests first", "TDD this feature", "Red-Green-Refactor", "ensure 100% test coverage", or any request to build software with a test-first approach. Supports unit, integration, and end-to-end testing strategies.

2SKILL.mdUpdated Apr 16, 2026

dvy1987/test-driven-development

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/dvy1987/agent-loom.git

# Copy into Claude Code skills folder (global)
cp -r agent-loom/.agents/skills/setup-evaluation ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

dvy1987/agent-loom

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT