Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

mykhaliev/agent-benchmark

Name: agent-benchmark
Author: mykhaliev

skills/agent-benchmark/SKILL.md

npx skillsauth add mykhaliev/agent-benchmark agent-benchmark

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Agent Benchmark Testing Expert

You are an expert in writing test configurations for agent-benchmark, a YAML-based testing framework for AI agents that interact with MCP (Model Context Protocol) servers.

Core Concepts

agent-benchmark tests AI agents by:

Connecting agents to LLM providers (OpenAI, Azure, Anthropic, Google, etc.)
Giving agents access to MCP servers (tools)
Running prompts and validating behavior with assertions

YAML Structure

Every test file has these sections:

providers:    # LLM configurations
servers:      # MCP server definitions  
agents:       # Agent configurations (provider + servers)
sessions:     # Test sessions containing tests
settings:     # Global settings
variables:    # Reusable template variables
criteria:     # Success rate requirements

Quick Start Example

providers:
  - name: gpt4
    type: AZURE
    auth_type: entra_id
    model: gpt-4o
    baseUrl: "{{AZURE_OPENAI_ENDPOINT}}"
    version: 2025-01-01-preview

servers:
  - name: filesystem
    type: stdio
    command: npx @modelcontextprotocol/server-filesystem /tmp

agents:
  - name: test-agent
    provider: gpt4
    servers:
      - name: filesystem
    system_prompt: |
      Execute tasks directly without asking for confirmation.

settings:
  verbose: true
  max_iterations: 10

sessions:
  - name: File Operations
    tests:
      - name: Create file
        prompt: "Create a file called test.txt with 'Hello World'"
        assertions:
          - type: tool_called
            tool: write_file
          - type: no_error_messages

Reference Documentation

For detailed configuration options, see:

@references/providers.md - LLM provider configuration (Azure, OpenAI, Anthropic, Google, Groq)
@references/assertions.md - All 20+ assertion types with examples
@references/templates.md - Template helpers (random values, timestamps, faker)
@references/advanced-features.md - Rate limiting, 429 retry, AI analysis, skills, clarification detection
@references/best-practices.md - Tips for reliable test configurations

mykhaliev/agent-benchmark

skills/agent-benchmark/SKILL.md

Expert in writing test configurations for agent-benchmark, a testing framework for AI agents using MCP servers. Use when creating YAML test files, configuring providers, servers, agents, sessions, assertions, or using templates. Helps write benchmarks for AI coding agents.

6 stars

tools

Updated Apr 9, 2026

$ install --global

skillsauth

npx skillsauth add mykhaliev/agent-benchmark agent-benchmark

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 9, 2026, 2:28 AM65.6s6 files scanned

SKILL.md

name:: agent-benchmark
description:: Expert in writing test configurations for agent-benchmark, a testing framework for AI agents using MCP servers. Use when creating YAML test files, configuring providers, servers, agents, sessions, assertions, or using templates. Helps write benchmarks for AI coding agents.

Agent Benchmark Testing Expert

You are an expert in writing test configurations for agent-benchmark, a YAML-based testing framework for AI agents that interact with MCP (Model Context Protocol) servers.

Core Concepts

agent-benchmark tests AI agents by:

Connecting agents to LLM providers (OpenAI, Azure, Anthropic, Google, etc.)
Giving agents access to MCP servers (tools)
Running prompts and validating behavior with assertions

YAML Structure

Every test file has these sections:

providers:    # LLM configurations
servers:      # MCP server definitions  
agents:       # Agent configurations (provider + servers)
sessions:     # Test sessions containing tests
settings:     # Global settings
variables:    # Reusable template variables
criteria:     # Success rate requirements

Quick Start Example

providers:
  - name: gpt4
    type: AZURE
    auth_type: entra_id
    model: gpt-4o
    baseUrl: "{{AZURE_OPENAI_ENDPOINT}}"
    version: 2025-01-01-preview

servers:
  - name: filesystem
    type: stdio
    command: npx @modelcontextprotocol/server-filesystem /tmp

agents:
  - name: test-agent
    provider: gpt4
    servers:
      - name: filesystem
    system_prompt: |
      Execute tasks directly without asking for confirmation.

settings:
  verbose: true
  max_iterations: 10

sessions:
  - name: File Operations
    tests:
      - name: Create file
        prompt: "Create a file called test.txt with 'Hello World'"
        assertions:
          - type: tool_called
            tool: write_file
          - type: no_error_messages

Reference Documentation

For detailed configuration options, see:

@references/providers.md - LLM provider configuration (Azure, OpenAI, Anthropic, Google, Groq)
@references/assertions.md - All 20+ assertion types with examples
@references/templates.md - Template helpers (random values, timestamps, faker)
@references/advanced-features.md - Rate limiting, 429 retry, AI analysis, skills, clarification detection
@references/best-practices.md - Tips for reliable test configurations

Related Skills

mykhaliev/demo-skill

testing

VerifiedTrustedCommunity

A demonstration skill for agent-benchmark testing. Shows how Agent Skills are loaded and injected into agent system prompts. Use this as a template for creating your own skills.

6SKILL.mdUpdated Apr 9, 2026

openclaw/taskflow

tools

VerifiedTrustedCommunity

Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/extensions/lobster

tools

VerifiedTrustedCommunity

# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/extensions/lobster

steipete/extensions/lobster

tools

VerifiedTrustedCommunity

357,588SKILL.mdUpdated Apr 13, 2026

steipete/extensions/lobster

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/mykhaliev/agent-benchmark.git

# Copy into Claude Code skills folder (global)
cp -r agent-benchmark/skills/agent-benchmark ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

mykhaliev/agent-benchmark

6 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT