Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

kylejryan/agent-engineering

Name: agent-engineering
Author: kylejryan

skills/agent-engineering/SKILL.md

npx skillsauth add kylejryan/better-code agent-engineering

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Agent Engineering

Core Philosophy

An agent is an LLM that decides which tools to call, in what order, with what arguments, based on intermediate results — looping until the task is complete. Every loop iteration costs tokens, latency, and money. Every unnecessary tool call, every bloated prompt, every redundant context injection is waste that compounds across thousands of executions.

The engineering discipline is: accomplish the task in the minimum number of LLM calls, with the minimum tokens per call, using the minimum tool invocations, while maintaining reliability. These four objectives are not in tension — wasteful agents are also unreliable agents, because every unnecessary step is another opportunity for the model to hallucinate, lose track of its goal, or choose the wrong tool.

The Agent Performance Triangle

Every agent design decision involves three competing forces:

Reliability — does the agent accomplish the task correctly? This is the constraint, not the optimization target. Establish a reliability floor first (e.g., 95% correct on representative tasks), then optimize cost and speed without dropping below it.

Cost — how many tokens does the agent consume? Token cost = input tokens + output tokens across all LLM calls. Cost scales linearly with loop iterations and context size.

Latency — how long does the task take end-to-end? LLM inference time scales with input token count (time to first token) and output token count (generation time). Sequential tool calls add latency linearly; parallel tool calls add latency once.

The highest-leverage optimizations improve all three simultaneously: fewer loop iterations means lower cost, lower latency, AND fewer chances to go off-track.

When to Apply

Use this skill when:

Designing an agent's architecture (tools, prompts, loop structure)
Optimizing an agent that's too slow, too expensive, or unreliable
Writing system prompts for autonomous tool-using agents
Designing tool interfaces that agents will consume
Managing context windows across multi-step agent loops
Building multi-agent systems with coordination
Evaluating and benchmarking agent performance
Debugging agent failure modes (loops, hallucinations, drift)

Rule Categories by Priority

| # | Category | Prefix | Impact | Description | |---|----------|--------|--------|-------------| | 1 | System Prompt Engineering | prompt | CRITICAL | Dense, structured prompts that earn their tokens on every LLM call | | 2 | Context Window Management | context | CRITICAL | Control what enters context — less is more when it's more relevant | | 3 | Tool Design | tool | CRITICAL | Tools are the agent's hands — interface, granularity, and error design | | 4 | Agent Loop Architecture | loop | HIGH | Loop control, iteration reduction, parallel calls, ReAct vs Plan-and-Execute | | 5 | Model Selection & Routing | routing | HIGH | Right model for each step — plan with large, execute with medium, classify with small | | 6 | Multi-Agent Systems | multi | HIGH | Decomposition, coordination patterns, and inter-agent communication | | 7 | Reliability Engineering | reliability | CRITICAL | Failure modes, guardrails, and preventing runaway agents | | 8 | Evaluation & Measurement | eval | HIGH | Metrics, benchmarking, and data-driven optimization | | 9 | Token Optimization | token | HIGH | Concrete techniques for reducing token consumption |

Reference Guide

Detailed patterns and examples are in references/. Each file follows the format:

{prefix}-{topic}.md

Access them when you need specific implementation patterns for a category.

Agent Design Checklist

System prompt:

[ ] Under 500 tokens for the static portion
[ ] Every sentence traces to a measurable behavior improvement
[ ] Structured for scanning (not paragraphs of prose)
[ ] No redundant instructions

Tool design:

[ ] Names describe the action, not the resource
[ ] Parameters are typed, constrained, and defaulted
[ ] Returns are structured data, not prose — only fields the agent uses
[ ] Batch variants exist for frequently-repeated calls
[ ] Errors are structured with recovery suggestions

Context management:

[ ] Explicit token budget per run, tracked and enforced
[ ] Tool results truncated/filtered to relevant sections
[ ] Completed sub-task history summarized, not carried verbatim

Loop architecture:

[ ] Explicit termination conditions defined
[ ] Max iteration limit set (2-3x expected)
[ ] Loop detection for repeated actions
[ ] Parallel tool calls used where tools are independent
[ ] Front-loaded information gathering (batch reads)

Reliability:

[ ] Tool call arguments validated before execution
[ ] Output validated against task requirements
[ ] State-modifying tools sandboxed or gated
[ ] Cost circuit breaker prevents runaway spending

Evaluation:

[ ] Representative task set (20-50 tasks)
[ ] Per-run and aggregate metrics tracked
[ ] Prompt changes benchmarked on full task set
[ ] Top 10% expensive runs analyzed for optimization

kylejryan/agent-engineering

skills/agent-engineering/SKILL.md

Use this skill when designing, building, optimizing, or debugging AI agents — autonomous systems that use LLMs with tools to accomplish tasks. Triggers when the user asks about agent architecture, prompt engineering for agents, tool use optimization, token efficiency, context window management, agent loops, multi-agent systems, agent reliability, reducing agent cost, making agents faster, agent evaluation, or any discussion of building systems where an LLM orchestrates tool calls to achieve goals. Also triggers when an agent is working but is slow, expensive, unreliable, or producing inconsistent results. Do NOT use for simple single-turn LLM API calls without tool use or autonomy.

1 stars

tools

Updated May 6, 2026

$ install --global

skillsauth

npx skillsauth add kylejryan/better-code agent-engineering

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 6, 2026, 7:48 AM230.9s24 files scanned

SKILL.md

name:: agent-engineering
description:: >
author:: kylejryan
version:: 1.0.0
organization:: kylejryan
date:: March 2026
abstract:: >

Agent Engineering

Core Philosophy

The Agent Performance Triangle

Every agent design decision involves three competing forces:

Cost — how many tokens does the agent consume? Token cost = input tokens + output tokens across all LLM calls. Cost scales linearly with loop iterations and context size.

The highest-leverage optimizations improve all three simultaneously: fewer loop iterations means lower cost, lower latency, AND fewer chances to go off-track.

When to Apply

Use this skill when:

Designing an agent's architecture (tools, prompts, loop structure)
Optimizing an agent that's too slow, too expensive, or unreliable
Writing system prompts for autonomous tool-using agents
Designing tool interfaces that agents will consume
Managing context windows across multi-step agent loops
Building multi-agent systems with coordination
Evaluating and benchmarking agent performance
Debugging agent failure modes (loops, hallucinations, drift)

Rule Categories by Priority

Reference Guide

Detailed patterns and examples are in references/. Each file follows the format:

{prefix}-{topic}.md

Access them when you need specific implementation patterns for a category.

Agent Design Checklist

System prompt:

[ ] Under 500 tokens for the static portion
[ ] Every sentence traces to a measurable behavior improvement
[ ] Structured for scanning (not paragraphs of prose)
[ ] No redundant instructions

Tool design:

[ ] Names describe the action, not the resource
[ ] Parameters are typed, constrained, and defaulted
[ ] Returns are structured data, not prose — only fields the agent uses
[ ] Batch variants exist for frequently-repeated calls
[ ] Errors are structured with recovery suggestions

Context management:

[ ] Explicit token budget per run, tracked and enforced
[ ] Tool results truncated/filtered to relevant sections
[ ] Completed sub-task history summarized, not carried verbatim

Loop architecture:

[ ] Explicit termination conditions defined
[ ] Max iteration limit set (2-3x expected)
[ ] Loop detection for repeated actions
[ ] Parallel tool calls used where tools are independent
[ ] Front-loaded information gathering (batch reads)

Reliability:

[ ] Tool call arguments validated before execution
[ ] Output validated against task requirements
[ ] State-modifying tools sandboxed or gated
[ ] Cost circuit breaker prevents runaway spending

Evaluation:

[ ] Representative task set (20-50 tasks)
[ ] Per-run and aggregate metrics tracked
[ ] Prompt changes benchmarked on full task set
[ ] Top 10% expensive runs analyzed for optimization

Related Skills

kylejryan/targeted-vuln-analysis

development

VerifiedTrustedCommunity

Use this skill when performing the actual vulnerability analysis AFTER a threat model has been established (see threat-model skill). Triggers when the user asks to find vulnerabilities, audit code for security, hunt for bugs, or perform security review of source code AND a threat model already exists or the codebase context is clear. This skill enforces depth-first, exploitability-proven analysis — it actively prevents the breadth-first pattern-matching that produces lists of theoretical vulnerabilities. Do NOT use without a threat model; use threat-model skill first. Do NOT use for general code quality review.

1SKILL.mdUpdated May 6, 2026

kylejryan/targeted-vuln-analysis

kylejryan/systems-design-patterns

development

VerifiedTrustedCommunity

Staff+ engineering patterns for maximum leverage per line of code. Use this skill when designing abstractions, building reusable primitives, creating shared libraries, reducing code through architecture, reviewing code for leverage and reuse potential, choosing between building vs configuring, or establishing conventions and patterns across a codebase.

1SKILL.mdUpdated May 6, 2026

kylejryan/systems-design-patterns

kylejryan/software-testing

development

VerifiedTrustedCommunity

Use this skill when designing test strategies, writing tests beyond basic unit tests, verifying software for production readiness, or improving test coverage and reliability. Triggers when the user asks about testing strategy, integration tests, end-to-end tests, contract tests, property-based tests, load tests, chaos testing, test architecture, flaky tests, test confidence, 'how do I test this,' 'how do I know this is safe to deploy,' 'my tests are flaky,' 'what should I test,' 'test coverage,' CI/CD test pipelines, or any question about software verification and validation. Also triggers when the user is shipping a change and wants confidence it won't break production. Primarily targets TypeScript and Go but principles apply universally. Do NOT use for writing basic unit tests for simple functions — this skill is for the harder testing questions.

1SKILL.mdUpdated May 6, 2026

kylejryan/software-testing

kylejryan/root-cause-analysis

development

VerifiedTrustedCommunity

Use this skill when debugging software issues, performing root cause analysis, triaging errors from logs or alerts, or investigating why code isn't working as expected. Triggers when the user shares an error message, stack trace, log output, failing test, unexpected behavior, crash report, performance degradation, or says things like 'this isn't working,' 'I'm getting an error,' 'help me debug,' 'why is this failing,' 'something broke,' or 'I can't figure out what's wrong.' Also use when the user has been going back and forth trying fixes that aren't working — this is the signal to stop guessing and start systematically diagnosing. Do NOT use for writing new code from scratch, general code review, or feature development unless a bug is involved.

1SKILL.mdUpdated May 6, 2026

kylejryan/root-cause-analysis

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/kylejryan/better-code.git

# Copy into Claude Code skills folder (global)
cp -r better-code/skills/agent-engineering ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

kylejryan/better-code

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT