Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

chatandbuild/Agent Performance Analyzer

Name: Agent Performance Analyzer
Author: chatandbuild

skills/agent-performance-analyzer/SKILL.md

npx skillsauth add chatandbuild/chatchat-skills Agent Performance Analyzer

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Agent Performance Analyzer

Evaluate end-to-end agent execution quality and efficiency with actionable optimization guidance.

When to Use

You need to diagnose slow, expensive, or unreliable agent runs.
You want to compare baseline performance against current behavior.
You need concrete recommendations prioritized by impact.

Workflow

Define performance goals (latency, quality, reliability, and cost).
Break execution into major stages and identify hotspots.
Quantify failure modes, retries, and low-quality outcomes.
Rank improvement candidates by impact, effort, and risk.
Propose a validation plan to measure post-change improvement.

Concrete Analysis Techniques

Flame graph-style step breakdown. Visualize or tabulate time spent per step (LLM call, tool invocation, parsing, etc.). Identify which steps dominate total latency. Focus optimization on the top contributors first. Include both wall-clock time and percentage of total.

Token usage profiling. Track input and output tokens per step and per model. Compute cost from token counts and model pricing. Identify steps that consume disproportionate tokens. Consider caching, shorter prompts, or model downgrades for high-token steps.

Cache hit rate analysis. If the agent uses caching (e.g., for LLM responses or tool results), measure hit rate. Low hit rate may indicate poor cache key design, short TTL, or highly variable inputs. High hit rate with no latency improvement suggests cache lookup overhead or wrong cache layer.

Retry cost accounting. Count retries per step and per run. Multiply by average latency and cost per attempt. Retries can double or triple effective cost and latency. Factor retry cost into total run cost; do not report only successful-run metrics.

Benchmark Methodology

Controlled A/B testing. Compare baseline vs. optimized configuration on the same input set. Use identical prompts, models, and environment. Run enough samples to reduce noise. Document the input set and any randomization.

Statistical significance for latency comparisons. Report confidence intervals or p-values for latency differences. Avoid concluding "faster" from a handful of runs. Use at least 30ΓÇô50 runs per configuration for basic significance; more for noisy workloads.

Cost-per-quality-point metrics. Combine cost and quality (e.g., accuracy, user satisfaction) into a single metric: cost per correct answer, or cost per quality point. Optimizing for cost alone can reduce quality; optimizing for quality alone can explode cost. Balance both.

Common Pitfalls

Optimizing for speed at cost of quality. Reducing latency by shortening prompts, using weaker models, or skipping steps can degrade output quality. Always measure quality (accuracy, completeness, user rating) alongside latency. Document trade-offs explicitly.

Measuring averages instead of percentiles. Average latency hides tail behavior. Users hit p95 or p99. Report p50, p95, p99 (or p90, p99). Optimize for the percentiles that matter for user experience.

Ignoring retry cost in latency measurements. Reporting only successful-run latency undercounts the real user experience. Include retries in end-to-end latency and cost. A step with 20% retry rate effectively adds 20%+ to cost and latency.

Comparing runs with different inputs. Comparing baseline vs. optimized runs on different prompts or datasets invalidates the comparison. Use the same input set. If inputs must vary, use a large, representative set and report variance.

Output Format

## Performance Snapshot
- Latency: p50 / p95 / p99 (ms)
- Success rate: <percent>
- Retry rate: <percent>
- Cost per run: <baseline vs current>
- Cost per quality point: <if applicable>

## Step Breakdown (Flame-Style)
| Step | Time (ms) | % of Total | Tokens | Cost |
|------|-----------|------------|--------|------|
| LLM call 1 | 1200 | 45% | 2.1k | $0.02 |
| Tool: search | 800 | 30% | - | - |
| LLM call 2 | 600 | 22% | 1.5k | $0.015 |

## Bottlenecks
1. <bottleneck>: <evidence, metric>
2. <bottleneck>: <evidence, metric>

## Cache Analysis (if applicable)
- Hit rate: <percent>
- Impact on latency: <ms saved>

## Retry Analysis
- Retries per run: <avg>
- Cost impact: <percent>
- Primary causes: <list>

## Recommended Optimizations
| Change | Impact | Effort | Risk |
|--------|--------|--------|------|
| <change 1> | high | low | low |
| <change 2> | medium | medium | medium |

## Validation Plan
- [ ] Baseline: <metric, sample size>
- [ ] Apply optimization
- [ ] Compare: <same input set, statistical test>
- [ ] Quality check: <metric>

Constraints

Prefer evidence-based recommendations over assumptions.
Separate measurement from interpretation to avoid bias.
Call out trade-offs when reducing latency may reduce quality.

chatandbuild/Agent Performance Analyzer

skills/agent-performance-analyzer/SKILL.md

Use this skill when analyzing AI agent performance with latency, success rate, cost, quality, throughput, and failure signals to identify bottlenecks and improvement opportunities.

1 stars

devops

Updated May 14, 2026

$ install --global

skillsauth

npx skillsauth add chatandbuild/chatchat-skills Agent Performance Analyzer

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 14, 2026, 3:18 AM210.1s2 files scanned

SKILL.md

id:: agent-performance-analyzer
name:: Agent Performance Analyzer
description:: Use this skill when analyzing AI agent performance with latency, success rate, cost, quality, throughput, and failure signals to identify bottlenecks and improvement opportunities.
category:: DevOps

Agent Performance Analyzer

Evaluate end-to-end agent execution quality and efficiency with actionable optimization guidance.

When to Use

You need to diagnose slow, expensive, or unreliable agent runs.
You want to compare baseline performance against current behavior.
You need concrete recommendations prioritized by impact.

Workflow

Define performance goals (latency, quality, reliability, and cost).
Break execution into major stages and identify hotspots.
Quantify failure modes, retries, and low-quality outcomes.
Rank improvement candidates by impact, effort, and risk.
Propose a validation plan to measure post-change improvement.

Concrete Analysis Techniques

Benchmark Methodology

Common Pitfalls

Output Format

## Performance Snapshot
- Latency: p50 / p95 / p99 (ms)
- Success rate: <percent>
- Retry rate: <percent>
- Cost per run: <baseline vs current>
- Cost per quality point: <if applicable>

## Step Breakdown (Flame-Style)
| Step | Time (ms) | % of Total | Tokens | Cost |
|------|-----------|------------|--------|------|
| LLM call 1 | 1200 | 45% | 2.1k | $0.02 |
| Tool: search | 800 | 30% | - | - |
| LLM call 2 | 600 | 22% | 1.5k | $0.015 |

## Bottlenecks
1. <bottleneck>: <evidence, metric>
2. <bottleneck>: <evidence, metric>

## Cache Analysis (if applicable)
- Hit rate: <percent>
- Impact on latency: <ms saved>

## Retry Analysis
- Retries per run: <avg>
- Cost impact: <percent>
- Primary causes: <list>

## Recommended Optimizations
| Change | Impact | Effort | Risk |
|--------|--------|--------|------|
| <change 1> | high | low | low |
| <change 2> | medium | medium | medium |

## Validation Plan
- [ ] Baseline: <metric, sample size>
- [ ] Apply optimization
- [ ] Compare: <same input set, statistical test>
- [ ] Quality check: <metric>

Constraints

Prefer evidence-based recommendations over assumptions.
Separate measurement from interpretation to avoid bias.
Call out trade-offs when reducing latency may reduce quality.

Related Skills

chatandbuild/yeet

tools

VerifiedTrustedCommunity

Use only when the user explicitly asks to stage, commit, push, and open a GitHub pull request in one flow using the GitHub CLI (`gh`).

1SKILL.mdUpdated Jul 22, 2026

chatandbuild/xlsx

development

VerifiedTrustedCommunity

Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.

1SKILL.mdUpdated Jul 22, 2026

chatandbuild/Workout Logger

development

VerifiedTrustedCommunity

Use this skill when turning messy workout information into clear logs, comparing user-provided sessions, surfacing trends or likely PRs, and suggesting realistic next-session steps.

1SKILL.mdUpdated Jul 22, 2026

chatandbuild/Workout Logger

chatandbuild/webapp-testing

tools

VerifiedTrustedCommunity

Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.

1SKILL.mdUpdated Jul 22, 2026

chatandbuild/webapp-testing

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/chatandbuild/chatchat-skills.git

# Copy into Claude Code skills folder (global)
cp -r chatchat-skills/skills/agent-performance-analyzer ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

chatandbuild/chatchat-skills

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT