Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

dcramer/agent-design-review

Name: agent-design-review
Author: dcramer

skills/agent-design-review/SKILL.md

npx skillsauth add dcramer/peated agent-design-review

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Agent Design Review

Design or review agents by identifying the success contract first, mapping the real execution path second, and changing the smallest layer that is actually causing failures.

Load only what applies:

| Need | Read | | --- | --- | | Choose architecture or multi-agent shape | references/principles.md | | Rewrite prompts or improve cache reuse | references/prompt-and-caching.md | | Draft or rewrite an actual system prompt | references/system-prompt-templates.md | | Draft a provider-specific prompt for OpenAI Responses or Anthropic tool use | references/provider-specific-templates.md | | Improve tool calling, tool schemas, or final outputs | references/tool-and-schema-design.md | | Draft or fix actual tool schemas | references/tool-schema-examples.md | | Review loops, approvals, side effects, or trust boundaries | references/runtime-and-guardrails.md | | Build evals or decide how to iterate | references/evals-and-iteration.md | | Review classifier, matcher, router, extractor, ranker, or moderation agent | references/classifier-agents.md | | Need examples of strong and weak output | references/review-examples.md |

Step 1: Set Mode and Success Contract

Set the task mode first:

design: a new agent or major redesign
review: assess an existing agent and prioritize changes
debug: explain why a current agent is failing and what to change first

Then write a short success contract:

task the system must complete
target quality or success metric
unacceptable failures
cost and latency budget
operator or reviewer load constraints
side effects the system may take
tools, data sources, and approvals available
current eval status

If the user asks only for a prompt rewrite, still check whether retrieval, tools, thresholds, or runtime policy dominate the failures.

Step 2: Choose the Smallest Architecture That Works

Use references/principles.md.

Classify the system before proposing changes:

| Pattern | Use when | Avoid when | | --- | --- | --- | | Deterministic workflow | The task is mostly rule-based or decomposes cleanly in code | The model must explore or use tools adaptively | | Single agent | One prompt plus tools can reliably solve the task in a loop | Prompt complexity or tool overload makes behavior unstable | | Multi-agent system | Distinct roles, tools, or trust boundaries must stay separate | You are adding agents without a measured bottleneck |

Prefer deterministic preprocessing, retrieval, routing, or thresholds before adding more agent autonomy.

Step 3: Map the Real Execution Path

Write an execution-path summary that names:

static instructions
dynamic request context
deterministic preprocessing and normalization
retrieval or candidate generation
tool list and tool descriptions
loop and stop conditions
final output schema
post-model validation or sanitization
automation thresholds and approval gates
current evals, traces, tests, or queue feedback

For classifier-style systems, separate deterministic stages from model-driven stages. Do not review only the prompt if code outside the prompt decides most of the behavior.

Step 4: Identify the Primary Bottleneck

Inspect the highest-risk layer first:

| Layer | Check | | --- | --- | | Architecture | Is this over-agentized? | | Prompt | Is policy explicit, structured, and stable enough for caching? | | Retrieval | Is the right evidence or candidate set available before the model decides? | | Tools | Are tool interfaces narrow, typed, and easy to choose correctly? | | Output contract | Are actions and state machine-checkable? | | Runtime | Are retries, stop conditions, and fallbacks explicit? | | Boundaries | Are approvals, auth, and trust boundaries enforced outside the prompt? | | Thresholds | Do confidence and automation gates map to real consequences? | | Evals | Can proposed changes be measured? |

Do not default to prompt rewrites if retrieval, thresholds, or post-model guards dominate the failures.

Step 5: Follow the Relevant Review or Design Path

Review or Debug Path

Summarize the execution path.
Name the primary bottleneck.
Report findings ordered by severity.
Recommend the smallest effective changes first.
Add an eval plan that can prove whether the changes helped.

For each finding, include:

layer
evidence from prompt, tools, code, traces, or tests
likely impact on quality, cost, or operator load
smallest effective change

Design Path

Define the success contract.
Justify the architecture choice.
Draft a stable prompt template.
Define tool contracts and typed outputs.
Define loop policy, approvals, and fallback behavior.
Define the eval plan before extensive iteration.

If you write a prompt, return a cache-friendly prompt skeleton with clear slots for dynamic inputs rather than an unstructured wall of text. If you write tool schemas, return concrete schema drafts with parameter descriptions, enums, and required fields instead of only high-level advice.

Output Format

When reviewing or debugging, produce:

Success contract
Execution-path summary
Architecture verdict
Primary bottleneck
Findings
Suggested changes
Eval plan

When designing, produce:

Success contract
Proposed execution path
Architecture rationale
Prompt skeleton
Tool and schema design
Runtime policy and guardrails
Eval plan

Exit Criteria

The work is complete only when:

the success contract is explicit
the architecture choice is justified
the biggest likely bottleneck is named
prompt, tools, outputs, runtime, boundaries, and eval gaps are each addressed or explicitly ruled out
recommendations are ordered from smallest effective change to larger redesign
the eval plan can measure improvement

dcramer/agent-design-review

skills/agent-design-review/SKILL.md

Designs, reviews, and iterates on LLM agents and agent-like workflows. Use when asked to "design an agent", "review this agent", "improve our system prompt", "optimize prompts for caching", "improve tool calling", "reduce hallucinated tool calls", "add structured outputs", "decide if this should be multi-agent", "reduce false positives", "tune agent thresholds", or "build evals for this agent". Covers architecture choice, cache-friendly prompt templates, tool and schema design, runtime loops, trust boundaries, and eval-driven iteration.

90 stars

tools

Updated May 8, 2026

$ install --global

skillsauth

npx skillsauth add dcramer/peated agent-design-review

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 8, 2026, 2:53 AM182.7s13 files scanned

SKILL.md

name:: agent-design-review
description:: Designs, reviews, and iterates on LLM agents and agent-like workflows. Use when asked to "design an agent", "review this agent", "improve our system prompt", "optimize prompts for caching", "improve tool calling", "reduce hallucinated tool calls", "add structured outputs", "decide if this should be multi-agent", "reduce false positives", "tune agent thresholds", or "build evals for this agent". Covers architecture choice, cache-friendly prompt templates, tool and schema design, runtime loops, trust boundaries, and eval-driven iteration.

Agent Design Review

Design or review agents by identifying the success contract first, mapping the real execution path second, and changing the smallest layer that is actually causing failures.

Load only what applies:

Step 1: Set Mode and Success Contract

Set the task mode first:

design: a new agent or major redesign
review: assess an existing agent and prioritize changes
debug: explain why a current agent is failing and what to change first

Then write a short success contract:

task the system must complete
target quality or success metric
unacceptable failures
cost and latency budget
operator or reviewer load constraints
side effects the system may take
tools, data sources, and approvals available
current eval status

If the user asks only for a prompt rewrite, still check whether retrieval, tools, thresholds, or runtime policy dominate the failures.

Step 2: Choose the Smallest Architecture That Works

Use references/principles.md.

Classify the system before proposing changes:

Prefer deterministic preprocessing, retrieval, routing, or thresholds before adding more agent autonomy.

Step 3: Map the Real Execution Path

Write an execution-path summary that names:

static instructions
dynamic request context
deterministic preprocessing and normalization
retrieval or candidate generation
tool list and tool descriptions
loop and stop conditions
final output schema
post-model validation or sanitization
automation thresholds and approval gates
current evals, traces, tests, or queue feedback

For classifier-style systems, separate deterministic stages from model-driven stages. Do not review only the prompt if code outside the prompt decides most of the behavior.

Step 4: Identify the Primary Bottleneck

Inspect the highest-risk layer first:

Do not default to prompt rewrites if retrieval, thresholds, or post-model guards dominate the failures.

Step 5: Follow the Relevant Review or Design Path

Review or Debug Path

Summarize the execution path.
Name the primary bottleneck.
Report findings ordered by severity.
Recommend the smallest effective changes first.
Add an eval plan that can prove whether the changes helped.

For each finding, include:

layer
evidence from prompt, tools, code, traces, or tests
likely impact on quality, cost, or operator load
smallest effective change

Design Path

Define the success contract.
Justify the architecture choice.
Draft a stable prompt template.
Define tool contracts and typed outputs.
Define loop policy, approvals, and fallback behavior.
Define the eval plan before extensive iteration.

Output Format

When reviewing or debugging, produce:

Success contract
Execution-path summary
Architecture verdict
Primary bottleneck
Findings
Suggested changes
Eval plan

When designing, produce:

Success contract
Proposed execution path
Architecture rationale
Prompt skeleton
Tool and schema design
Runtime policy and guardrails
Eval plan

Exit Criteria

The work is complete only when:

the success contract is explicit
the architecture choice is justified
the biggest likely bottleneck is named
prompt, tools, outputs, runtime, boundaries, and eval gaps are each addressed or explicitly ruled out
recommendations are ordered from smallest effective change to larger redesign
the eval plan can measure improvement

Related Skills

dcramer/iterate-pr

testing

VerifiedTrustedCommunity

Iterate on a PR until CI passes. Use when you need to fix CI failures, address review feedback, or continuously push fixes until all checks are green. Automates the feedback-fix-push-wait cycle.

90SKILL.mdUpdated Apr 4, 2026

dcramer/create-pr

tools

VerifiedTrustedCommunity

ALWAYS use this skill when creating pull requests — never create a PR directly without it. Follows Sentry conventions for PR titles, descriptions, and issue references. Trigger on any create PR, open PR, submit PR, make PR, push and create PR, or prepare changes for review task.

90SKILL.mdUpdated Apr 4, 2026

dcramer/code-simplifier

development

VerifiedTrustedCommunity

Simplifies and refines code for clarity, consistency, and maintainability while preserving all functionality. Use when asked to "simplify code", "clean up code", "refactor for clarity", "improve readability", or review recently modified code for elegance. Focuses on project-specific best practices.

90SKILL.mdUpdated Apr 4, 2026

dcramer/code-simplifier

openclaw/taskflow

tools

VerifiedTrustedCommunity

Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.

357,764SKILL.mdUpdated Apr 10, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/dcramer/peated.git

# Copy into Claude Code skills folder (global)
cp -r peated/skills/agent-design-review ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

dcramer/peated

90 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT